laptop_mac macOS Sonoma
Intermediate
schedule 8 min read
by Alex Rivera • May 14, 2024
Linux is the native home of machine learning. Running Ollama on Ubuntu or Debian gives you the absolute lowest latency and the best possible driver integration for NVIDIA and AMD GPUs.
Introduction
Ollama provides a 1-click install script for Linux that not only downloads the binary but automatically configures a systemd background service. This means your local AI API will start automatically when you boot your server or desktop.
Prerequisites
Before installing Ollama, ensure your GPU drivers are correctly installed.
For NVIDIA GPUs:
Terminal
# Install proprietary NVIDIA drivers and CUDA toolkit
sudo apt update
sudo apt install nvidia-driver-535 nvidia-cuda-toolkit
nvidia-smi # Verify drivers are working
For AMD GPUs:
Ollama supports AMD graphics cards via the ROCm platform. Ensure you have the latest amdgpu drivers installed for your specific distribution.
Step 1 Installation
The official installation script handles everything for you. Run this in your terminal:
Terminal
curl -fsSL https://ollama.com/install.sh | sh
During installation, the script will automatically detect your NVIDIA or AMD GPU and download the appropriate acceleration libraries.
Step 2 Managing the Service
Ollama runs as a daemon. You can manage it using standard systemd commands:
Terminal
# Check if Ollama is running
sudo systemctl status ollama
# Restart the service (useful after pulling large models or updating drivers)
sudo systemctl restart ollama
# View live server logs
journalctl -u ollama -f
Step 3 Pulling and Running Models
Once the service is active, you can pull your first model and drop into the chat interface. Let's use Meta's Llama 3:
To exit the interactive prompt, type /bye or press Ctrl + d.
Hardware Limits
Because Linux has very low OS overhead, you can squeeze larger models into your VRAM compared to Windows.
| Your VRAM |
Max Model Size |
Recommended Models |
| 8GB |
~8B parameters |
Llama 3 (8B), Mistral (7B) |
| 16GB |
~14B parameters |
Qwen 2.5 (14B), Command R |
| 24GB |
~30B parameters |
Mixtral (8x7B) |
If you exceed your VRAM, Ollama will gracefully offload the remaining layers to your system RAM, though generation speed will drop significantly.
Step 4 Network Access
By default, Ollama only listens on 127.0.0.1 (localhost). If you are running Linux on a headless server and want to access the API from your MacBook or Windows PC, you need to bind it to your local network IP.
Edit the systemd service:
Terminal
sudo systemctl edit ollama
Add the following lines:
Terminal
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Restart the service:
Terminal
sudo systemctl restart ollama
Your Linux AI server is now accessible from anywhere on your local network!