Documentation
Overview

Extensions

Overview

Extensions are modular components that add functionality to Jan. Each extension is designed to handle specific features.

Extensions can be managed through Settings () > Extensions:


Remove Model


Core Extensions

Cortex.cpp

The primary extension that manages both local and remote engines capabilities:

Local Engines

  • llama.cpp: Fast, efficient local inference engine that runs GGUF models directly on your device. Powers Jan's default local AI capabilities with support for multiple hardware configurations.
  • TensorRT-LLM: GPU-accelerated inference engine optimized for LLMs on NVIDIA GPUs, offering high-performance and low latency for transformer-based models.

Remote Engines

  • Anthropic: Access Claude models
  • Cohere: Access Cohere's models
  • Groq: High-performance inference
  • Martian: Specialized model access
  • MistralAI: Access Mistral models
  • NVIDIA NIM (NVIDIA Inference Microservices): Platform for deploying and serving GPU-accelerated AI models, providing enterprise-grade reliability and scalability.
  • OpenAI: Access GPT models
  • OpenRouter: Multi-provider model access
  • Triton-TRT-LLM: High-performance inference backend using NVIDIA Triton Inference Server with TensorRT-LLM optimization, designed for large-scale model deployment.

Jan Assistant

Enables assistants functionality, including Jan - the default assistant that can utilize all downloaded models. This extension manages:

  • Default assistant configurations
  • Model selection
  • Conversation management

Conversational

Manages all chat-related functionality and data persistence:

Model Management

Provides model exploration and seamless downloads:

  • Model discovery and browsing
  • Version control & configuration handling
  • Download management

System Monitoring

Provides system health and OS level data:

  • Hardware utilization tracking
  • Performance monitoring
  • Error logging