The Best Local AI Models.
Stop downloading garbage. We test every major open-weights model and curate only the ones actually worth your VRAM.
Llama 3.1 (8B)
General Chat · 128k Ctx
The current undisputed king of small models. Blazing fast, incredibly smart, massive 128k context window.
ollama run llama3.1
Qwen 2.5 (14B)
Coding & Math · 32k Ctx
Unbelievable coding and math capabilities that genuinely rival frontier models in a 14B package.
ollama run qwen2.5:14b
Mistral Nemo (12B)
Long Context · 128k Ctx
A 12B model built in collaboration with NVIDIA. Extremely capable with an enormous context window.
ollama run mistral-nemo
Phi-3.5 Mini (3.8B)
Low RAM · 128k Ctx
Designed to run on almost anything, including older laptops. Insane 128k context length for its tiny size.
ollama run phi3.5
Gemma 2 (9B)
Creative Writing · 8k Ctx
Google's open-weights model. Incredibly punchy, safe, and creative conversational responses.
ollama run gemma2
Gemma 2 (27B)
Advanced Reasoning · 8k Ctx
The larger sibling to Gemma 2 9B. Rivals Llama 3 70B on many academic benchmarks.
ollama run gemma2:27b
DeepSeek Coder V2
Programming · 128k Ctx
An absolute monster at coding tasks. Consistently beats GPT-4 in several programming benchmarks.
ollama run deepseek-coder-v2
LLaVA (7B)
Computer Vision · 4k Ctx
A Multimodal model. You can feed it images and ask it to describe or analyze them natively.
ollama run llava
OpenHermes 2.5
Uncensored Chat · 8k Ctx
A fine-tuned Mistral 7B model known for its excellent conversational style and lack of guardrails.
ollama run openhermes
Mixtral (8x7B)
Complex Tasks · 32k Ctx
A Mixture of Experts (MoE) model. Runs as fast as a 12B model but thinks like a 70B model.
ollama run mixtral
Command R
RAG & Tools · 128k Ctx
Specifically trained for Retrieval Augmented Generation (RAG) and API tool use.
ollama run command-r
Llama 3.1 (70B)
Frontier AI · 128k Ctx
GPT-4 class intelligence running completely offline. Requires serious hardware like a Mac Studio.
ollama run llama3.1:70b
Qwen 2.5 Coder (7B)
Fast Coding · 32k Ctx
A smaller, incredibly fast coding assistant that fits comfortably on base model laptops.
ollama run qwen2.5-coder
Dolphin Llama 3 (8B)
Uncensored Code · 8k Ctx
An uncensored, highly compliant fine-tune of Llama 3 that refuses nothing and acts as a pure coding assistant.
ollama run dolphin-llama3
Gemma 2 (2B)
Edge Devices · 8k Ctx
A tiny 2B parameter model that runs blazing fast even on old hardware or Raspberry Pis.
ollama run gemma2:2b
WizardLM-2 (7B)
Reasoning · 32k Ctx
Microsoft's highly optimized reasoning and instruction-following model.
ollama run wizardlm2
Stable Code (3B)
Autocomplete · 16k Ctx
Optimized specifically for code completion in VS Code. Extremely low latency.
ollama run stable-code
TinyLlama (1.1B)
Prototyping · 2k Ctx
Takes almost zero RAM. Great for testing API endpoints and learning local AI pipelines.
ollama run tinyllama