memory CURATED LEADERBOARD

The Best Local AI Models.

Stop downloading garbage. We test every major open-weights model and curate only the ones actually worth your VRAM.

Meta 8GB+

Llama 3.1 (8B)

General Chat · 128k Ctx

The current undisputed king of small models. Blazing fast, incredibly smart, massive 128k context window.

Ollama Pull
ollama run llama3.1
Alibaba 16GB+

Qwen 2.5 (14B)

Coding & Math · 32k Ctx

Unbelievable coding and math capabilities that genuinely rival frontier models in a 14B package.

Ollama Pull
ollama run qwen2.5:14b
Mistral AI 16GB+

Mistral Nemo (12B)

Long Context · 128k Ctx

A 12B model built in collaboration with NVIDIA. Extremely capable with an enormous context window.

Ollama Pull
ollama run mistral-nemo
Microsoft 8GB+

Phi-3.5 Mini (3.8B)

Low RAM · 128k Ctx

Designed to run on almost anything, including older laptops. Insane 128k context length for its tiny size.

Ollama Pull
ollama run phi3.5
Google 16GB+

Gemma 2 (9B)

Creative Writing · 8k Ctx

Google's open-weights model. Incredibly punchy, safe, and creative conversational responses.

Ollama Pull
ollama run gemma2
Google 32GB+

Gemma 2 (27B)

Advanced Reasoning · 8k Ctx

The larger sibling to Gemma 2 9B. Rivals Llama 3 70B on many academic benchmarks.

Ollama Pull
ollama run gemma2:27b
DeepSeek 16GB+

DeepSeek Coder V2

Programming · 128k Ctx

An absolute monster at coding tasks. Consistently beats GPT-4 in several programming benchmarks.

Ollama Pull
ollama run deepseek-coder-v2
Haotian Liu 8GB+

LLaVA (7B)

Computer Vision · 4k Ctx

A Multimodal model. You can feed it images and ask it to describe or analyze them natively.

Ollama Pull
ollama run llava
NousResearch 8GB+

OpenHermes 2.5

Uncensored Chat · 8k Ctx

A fine-tuned Mistral 7B model known for its excellent conversational style and lack of guardrails.

Ollama Pull
ollama run openhermes
Mistral AI 32GB+

Mixtral (8x7B)

Complex Tasks · 32k Ctx

A Mixture of Experts (MoE) model. Runs as fast as a 12B model but thinks like a 70B model.

Ollama Pull
ollama run mixtral
Cohere 32GB+

Command R

RAG & Tools · 128k Ctx

Specifically trained for Retrieval Augmented Generation (RAG) and API tool use.

Ollama Pull
ollama run command-r
Meta 64GB+

Llama 3.1 (70B)

Frontier AI · 128k Ctx

GPT-4 class intelligence running completely offline. Requires serious hardware like a Mac Studio.

Ollama Pull
ollama run llama3.1:70b
Alibaba 8GB+

Qwen 2.5 Coder (7B)

Fast Coding · 32k Ctx

A smaller, incredibly fast coding assistant that fits comfortably on base model laptops.

Ollama Pull
ollama run qwen2.5-coder
Cognitive Comp 8GB+

Dolphin Llama 3 (8B)

Uncensored Code · 8k Ctx

An uncensored, highly compliant fine-tune of Llama 3 that refuses nothing and acts as a pure coding assistant.

Ollama Pull
ollama run dolphin-llama3
Google 8GB+

Gemma 2 (2B)

Edge Devices · 8k Ctx

A tiny 2B parameter model that runs blazing fast even on old hardware or Raspberry Pis.

Ollama Pull
ollama run gemma2:2b
Microsoft 8GB+

WizardLM-2 (7B)

Reasoning · 32k Ctx

Microsoft's highly optimized reasoning and instruction-following model.

Ollama Pull
ollama run wizardlm2
Stability AI 8GB+

Stable Code (3B)

Autocomplete · 16k Ctx

Optimized specifically for code completion in VS Code. Extremely low latency.

Ollama Pull
ollama run stable-code
StatNLP 4GB+

TinyLlama (1.1B)

Prototyping · 2k Ctx

Takes almost zero RAM. Great for testing API endpoints and learning local AI pipelines.

Ollama Pull
ollama run tinyllama