Linux पर Ollama चलाएं: संपूर्ण मार्गदर्शिका

laptop_mac macOS Sonoma Intermediate schedule 8 min read

by Alex Rivera • May 14, 2024

Linux मशीन लर्निंग का स्वाभाविक घर है। Ubuntu या Debian पर Ollama चलाने से आपको सबसे कम latency और NVIDIA तथा AMD GPUs के लिए सर्वोत्तम driver integration मिलता है।

Step 1 परिचय

Ollama Linux के लिए एक 1-click install script प्रदान करता है जो न केवल binary डाउनलोड करता है, बल्कि स्वचालित रूप से एक systemd background service भी configure करता है। इसका अर्थ है कि आपका local AI API आपके server या desktop के बूट होने पर स्वतः प्रारंभ हो जाएगा।

Step 2 पूर्वापेक्षाएँ (Prerequisites)

Ollama install करने से पहले, सुनिश्चित करें कि आपके GPU drivers सही तरीके से installed हैं।

NVIDIA GPUs के लिए:

Terminal

# Install proprietary NVIDIA drivers and CUDA toolkit
sudo apt update
sudo apt install nvidia-driver-535 nvidia-cuda-toolkit
nvidia-smi  # Verify drivers are working

AMD GPUs के लिए: Ollama, ROCm platform के माध्यम से AMD graphics cards को support करता है। सुनिश्चित करें कि आपके specific distribution के लिए नवीनतम amdgpu drivers installed हैं।

Step 3 इंस्टॉलेशन

आधिकारिक installation script आपके लिए सब कुछ संभाल लेती है। इसे अपने terminal में चलाएँ:

Terminal

curl -fsSL https://ollama.com/install.sh | sh

Installation के दौरान, script स्वचालित रूप से आपके NVIDIA या AMD GPU को detect करेगी और उचित acceleration libraries डाउनलोड करेगी।

Step 4 Service का प्रबंधन

Ollama एक daemon के रूप में चलता है। आप इसे standard systemd commands का उपयोग करके manage कर सकते हैं:

Terminal

# Check if Ollama is running
sudo systemctl status ollama

# Restart the service (useful after pulling large models or updating drivers)
sudo systemctl restart ollama

# View live server logs
journalctl -u ollama -f

Step 5 Models को Pull और Run करना

एक बार service सक्रिय हो जाने पर, आप अपना पहला model pull कर सकते हैं और chat interface में प्रवेश कर सकते हैं। आइए Meta के Llama 3 का उपयोग करें:

Terminal

ollama run llama3

Interactive prompt से बाहर निकलने के लिए, /bye टाइप करें या Ctrl + d दबाएँ।

Step 6 हार्डवेयर सीमाएँ (Hardware Limits)

चूँकि Linux में OS overhead बहुत कम होता है, इसलिए आप Windows की तुलना में अपनी VRAM में बड़े models को समाहित कर सकते हैं।

आपकी VRAM	अधिकतम Model आकार	अनुशंसित Models
8GB	~8B parameters	Llama 3 (8B), Mistral (7B)
16GB	~14B parameters	Qwen 2.5 (14B), Command R
24GB	~30B parameters	Mixtral (8x7B)

यदि आप अपनी VRAM की सीमा से अधिक हो जाते हैं, तो Ollama शेष layers को सुचारू रूप से आपके system RAM पर offload कर देगा, हालाँकि generation speed में उल्लेखनीय गिरावट आएगी।

Step 7 नेटवर्क एक्सेस (Network Access)

डिफ़ॉल्ट रूप से, Ollama केवल 127.0.0.1 (localhost) पर listen करता है। यदि आप किसी headless server पर Linux चला रहे हैं और अपने MacBook या Windows PC से API तक पहुँचना चाहते हैं, तो आपको इसे अपने local network IP से bind करना होगा।

systemd service को edit करें:

Terminal

sudo systemctl edit ollama

निम्नलिखित lines जोड़ें:

Terminal

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Service को restart करें:

Terminal

sudo systemctl restart ollama

आपका Linux AI server अब आपके local network पर कहीं से भी accessible है!

Continue Reading

Performance

Linux पर Ollama चलाएं: संपूर्ण मार्गदर्शिका

Step 1 परिचय

Step 2 पूर्वापेक्षाएँ (Prerequisites)

Step 3 इंस्टॉलेशन

Step 4 Service का प्रबंधन

Step 5 Models को Pull और Run करना

Step 6 हार्डवेयर सीमाएँ (Hardware Limits)

Step 7 नेटवर्क एक्सेस (Network Access)

Continue Reading

Mistral 7B vs Llama 3 on Apple Silicon

Best GUI clients for Local LLMs

Quantization 101: Speed up your Inference