Meta
8GB+
Llama 3.1 (8B)
General Chat · 128k Ctx
The current undisputed king of small models. Blazing fast, incredibly smart, massive 128k context window.
Ollama Pull
ollama run llama3.1
Alibaba
16GB+
Qwen 2.5 (14B)
Coding & Math · 32k Ctx
Unbelievable coding and math capabilities that genuinely rival frontier models in a 14B package.
Ollama Pull
ollama run qwen2.5:14b
Mistral AI
16GB+
Mistral Nemo (12B)
Long Context · 128k Ctx
A 12B model built in collaboration with NVIDIA. Extremely capable with an enormous context window.
Ollama Pull
ollama run mistral-nemo
Microsoft
8GB+
Phi-3.5 Mini (3.8B)
Low RAM · 128k Ctx
Designed to run on almost anything, including older laptops. Insane 128k context length for its tiny size.
Ollama Pull
ollama run phi3.5
Google
16GB+
Gemma 2 (9B)
Creative Writing · 8k Ctx
Google's open-weights model. Incredibly punchy, safe, and creative conversational responses.
Ollama Pull
ollama run gemma2
Google
32GB+
Gemma 2 (27B)
Advanced Reasoning · 8k Ctx
The larger sibling to Gemma 2 9B. Rivals Llama 3 70B on many academic benchmarks.
Ollama Pull
ollama run gemma2:27b
DeepSeek
16GB+
DeepSeek Coder V2
Programming · 128k Ctx
An absolute monster at coding tasks. Consistently beats GPT-4 in several programming benchmarks.
Ollama Pull
ollama run deepseek-coder-v2
Haotian Liu
8GB+
LLaVA (7B)
Computer Vision · 4k Ctx
A Multimodal model. You can feed it images and ask it to describe or analyze them natively.
Ollama Pull
ollama run llava
NousResearch
8GB+
OpenHermes 2.5
Uncensored Chat · 8k Ctx
A fine-tuned Mistral 7B model known for its excellent conversational style and lack of guardrails.
Ollama Pull
ollama run openhermes
Mistral AI
32GB+
Mixtral (8x7B)
Complex Tasks · 32k Ctx
A Mixture of Experts (MoE) model. Runs as fast as a 12B model but thinks like a 70B model.
Ollama Pull
ollama run mixtral
Cohere
32GB+
Command R
RAG & Tools · 128k Ctx
Specifically trained for Retrieval Augmented Generation (RAG) and API tool use.
Ollama Pull
ollama run command-r
Meta
64GB+
Llama 3.1 (70B)
Frontier AI · 128k Ctx
GPT-4 class intelligence running completely offline. Requires serious hardware like a Mac Studio.
Ollama Pull
ollama run llama3.1:70b
Alibaba
8GB+
Qwen 2.5 Coder (7B)
Fast Coding · 32k Ctx
A smaller, incredibly fast coding assistant that fits comfortably on base model laptops.
Ollama Pull
ollama run qwen2.5-coder
Cognitive Comp
8GB+
Dolphin Llama 3 (8B)
Uncensored Code · 8k Ctx
An uncensored, highly compliant fine-tune of Llama 3 that refuses nothing and acts as a pure coding assistant.
Ollama Pull
ollama run dolphin-llama3
Google
8GB+
Gemma 2 (2B)
Edge Devices · 8k Ctx
A tiny 2B parameter model that runs blazing fast even on old hardware or Raspberry Pis.
Ollama Pull
ollama run gemma2:2b
Microsoft
8GB+
WizardLM-2 (7B)
Reasoning · 32k Ctx
Microsoft's highly optimized reasoning and instruction-following model.
Ollama Pull
ollama run wizardlm2
Stability AI
8GB+
Stable Code (3B)
Autocomplete · 16k Ctx
Optimized specifically for code completion in VS Code. Extremely low latency.
Ollama Pull
ollama run stable-code
StatNLP
4GB+
TinyLlama (1.1B)
Prototyping · 2k Ctx
Takes almost zero RAM. Great for testing API endpoints and learning local AI pipelines.
Ollama Pull
ollama run tinyllama