Best Local AI Models | Curated Leaderboard

Meta 8GB+

Llama 3.1 (8B)

General Chat · 128k Ctx

The current undisputed king of small models. Blazing fast, incredibly smart, massive 128k context window.

Ollama Pull

ollama run llama3.1

Alibaba 16GB+

Qwen 2.5 (14B)

Coding & Math · 32k Ctx

Unbelievable coding and math capabilities that genuinely rival frontier models in a 14B package.

Ollama Pull

ollama run qwen2.5:14b

Mistral AI 16GB+

Mistral Nemo (12B)

Long Context · 128k Ctx

A 12B model built in collaboration with NVIDIA. Extremely capable with an enormous context window.

Ollama Pull

ollama run mistral-nemo

Microsoft 8GB+

Phi-3.5 Mini (3.8B)

Low RAM · 128k Ctx

Designed to run on almost anything, including older laptops. Insane 128k context length for its tiny size.

Ollama Pull

ollama run phi3.5

Google 16GB+

Gemma 2 (9B)

Creative Writing · 8k Ctx

Google's open-weights model. Incredibly punchy, safe, and creative conversational responses.

Ollama Pull

ollama run gemma2

Google 32GB+

Gemma 2 (27B)

Advanced Reasoning · 8k Ctx

The larger sibling to Gemma 2 9B. Rivals Llama 3 70B on many academic benchmarks.

Ollama Pull

ollama run gemma2:27b

DeepSeek 16GB+

DeepSeek Coder V2

Programming · 128k Ctx

An absolute monster at coding tasks. Consistently beats GPT-4 in several programming benchmarks.

Ollama Pull

ollama run deepseek-coder-v2

Haotian Liu 8GB+

LLaVA (7B)

Computer Vision · 4k Ctx

A Multimodal model. You can feed it images and ask it to describe or analyze them natively.

Ollama Pull

ollama run llava

NousResearch 8GB+

OpenHermes 2.5

Uncensored Chat · 8k Ctx

A fine-tuned Mistral 7B model known for its excellent conversational style and lack of guardrails.

Ollama Pull

ollama run openhermes

Mistral AI 32GB+

Mixtral (8x7B)

Complex Tasks · 32k Ctx

A Mixture of Experts (MoE) model. Runs as fast as a 12B model but thinks like a 70B model.

Ollama Pull

ollama run mixtral

Cohere 32GB+

Command R

RAG & Tools · 128k Ctx

Specifically trained for Retrieval Augmented Generation (RAG) and API tool use.

Ollama Pull

ollama run command-r

Meta 64GB+

Llama 3.1 (70B)

Frontier AI · 128k Ctx

GPT-4 class intelligence running completely offline. Requires serious hardware like a Mac Studio.

Ollama Pull

ollama run llama3.1:70b

Alibaba 8GB+

Qwen 2.5 Coder (7B)

Fast Coding · 32k Ctx

A smaller, incredibly fast coding assistant that fits comfortably on base model laptops.

Ollama Pull

ollama run qwen2.5-coder

Cognitive Comp 8GB+

Dolphin Llama 3 (8B)

Uncensored Code · 8k Ctx

An uncensored, highly compliant fine-tune of Llama 3 that refuses nothing and acts as a pure coding assistant.

Ollama Pull

ollama run dolphin-llama3

Google 8GB+

Gemma 2 (2B)

Edge Devices · 8k Ctx

A tiny 2B parameter model that runs blazing fast even on old hardware or Raspberry Pis.

Ollama Pull

ollama run gemma2:2b

Microsoft 8GB+

WizardLM-2 (7B)

Reasoning · 32k Ctx

Microsoft's highly optimized reasoning and instruction-following model.

Ollama Pull

ollama run wizardlm2

Stability AI 8GB+

Stable Code (3B)

Autocomplete · 16k Ctx

Optimized specifically for code completion in VS Code. Extremely low latency.

Ollama Pull

ollama run stable-code

StatNLP 4GB+

TinyLlama (1.1B)

Prototyping · 2k Ctx

Takes almost zero RAM. Great for testing API endpoints and learning local AI pipelines.

Ollama Pull

ollama run tinyllama

The Best Local AI Models.

Llama 3.1 (8B)

Qwen 2.5 (14B)

Mistral Nemo (12B)

Phi-3.5 Mini (3.8B)

Gemma 2 (9B)

Gemma 2 (27B)

DeepSeek Coder V2

LLaVA (7B)

OpenHermes 2.5

Mixtral (8x7B)

Command R

Llama 3.1 (70B)

Qwen 2.5 Coder (7B)

Dolphin Llama 3 (8B)

Gemma 2 (2B)

WizardLM-2 (7B)

Stable Code (3B)

TinyLlama (1.1B)