Run Ollama on Linux: The Definitive Guide

laptop_mac macOS Sonoma Intermediate schedule 8 min read
Author by Alex Rivera • May 14, 2024

Linux is the native home of machine learning. Running Ollama on Ubuntu or Debian gives you the absolute lowest latency and the best possible driver integration for NVIDIA and AMD GPUs.

Introduction

Ollama provides a 1-click install script for Linux that not only downloads the binary but automatically configures a systemd background service. This means your local AI API will start automatically when you boot your server or desktop.

Prerequisites

Before installing Ollama, ensure your GPU drivers are correctly installed.

For NVIDIA GPUs:

Terminal
# Install proprietary NVIDIA drivers and CUDA toolkit
sudo apt update
sudo apt install nvidia-driver-535 nvidia-cuda-toolkit
nvidia-smi  # Verify drivers are working

For AMD GPUs: Ollama supports AMD graphics cards via the ROCm platform. Ensure you have the latest amdgpu drivers installed for your specific distribution.

Step 1 Installation

The official installation script handles everything for you. Run this in your terminal:

Terminal
curl -fsSL https://ollama.com/install.sh | sh

During installation, the script will automatically detect your NVIDIA or AMD GPU and download the appropriate acceleration libraries.

Step 2 Managing the Service

Ollama runs as a daemon. You can manage it using standard systemd commands:

Terminal
# Check if Ollama is running
sudo systemctl status ollama

# Restart the service (useful after pulling large models or updating drivers)
sudo systemctl restart ollama

# View live server logs
journalctl -u ollama -f

Step 3 Pulling and Running Models

Once the service is active, you can pull your first model and drop into the chat interface. Let's use Meta's Llama 3:

Terminal
ollama run llama3

To exit the interactive prompt, type /bye or press Ctrl + d.

Hardware Limits

Because Linux has very low OS overhead, you can squeeze larger models into your VRAM compared to Windows.

Your VRAM Max Model Size Recommended Models
8GB ~8B parameters Llama 3 (8B), Mistral (7B)
16GB ~14B parameters Qwen 2.5 (14B), Command R
24GB ~30B parameters Mixtral (8x7B)

If you exceed your VRAM, Ollama will gracefully offload the remaining layers to your system RAM, though generation speed will drop significantly.

Step 4 Network Access

By default, Ollama only listens on 127.0.0.1 (localhost). If you are running Linux on a headless server and want to access the API from your MacBook or Windows PC, you need to bind it to your local network IP.

Edit the systemd service:

Terminal
sudo systemctl edit ollama

Add the following lines:

Terminal
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Restart the service:

Terminal
sudo systemctl restart ollama

Your Linux AI server is now accessible from anywhere on your local network!