Setup LM Studio on Windows

laptop_mac macOS Sonoma Intermediate schedule 8 min read

by Alex Rivera • May 14, 2024

If you hate dealing with the terminal, Python environments, and broken dependencies, LM Studio is your sanctuary. It wraps llama.cpp inside a gorgeous, native Windows app that lets you download and chat with LLMs in one click.

Introduction

LM Studio is a free desktop application for Windows. It provides a clean, ChatGPT-like interface but runs 100% locally on your hardware. It handles downloading models, configuring settings, and even spinning up a local API server without touching a single line of code.

Step 1 Why LM Studio?

Visual Model Browser: Search and download HuggingFace models directly inside the app.
Hardware Auto-Detect: It automatically configures CUDA (NVIDIA) or ROCm (AMD) GPU acceleration.
RAM Estimator: It tells you exactly how much VRAM a model will use before you download it.

Step 2 Installation

Go to lmstudio.ai.
Click Download for Windows.
Open the .exe file to install it.

Step 3 Enabling GPU Acceleration

To get maximum speed, we need to ensure it uses your NVIDIA or AMD graphics card instead of the slower CPU.

Open LM Studio.
Go to the Settings tab (gear icon).
Scroll down to Hardware Settings.
Check the box for GPU Offload and maximize the slider to 99 layers.

Step 4 Downloading Models

Click the Magnifying Glass (Search) icon in the left sidebar.
Type a model name like Mistral 7B Instruct or Llama 3 8B.
Look at the results. LM Studio highlights models that fit in your PC's memory in green.
Choose a Q4_K_M or Q5_K_M quantization (best balance of speed and intelligence).
Click Download.

Step 5 Local API Server

LM Studio can act as a drop-in replacement for the OpenAI API.

Click the Local Server icon (<->) in the left sidebar.
Select your downloaded model from the top dropdown.
Click Start Server.

Your local AI is now listening on http://localhost:1234/v1. You can plug this URL into VS Code extensions, Python scripts, or any app that expects an OpenAI endpoint!

Continue Reading

Performance

Setup LM Studio on Windows

Introduction

Step 1 Why LM Studio?

Step 2 Installation

Step 3 Enabling GPU Acceleration

Step 4 Downloading Models

Step 5 Local API Server

Continue Reading

Mistral 7B vs Llama 3 on Apple Silicon

Best GUI clients for Local LLMs

Quantization 101: Speed up your Inference

Introduction

Step 1 Why LM Studio?

Step 2 Installation

Step 3 Enabling GPU Acceleration

Step 4 Downloading Models

Step 5 Local API Server

Continue Reading

Mistral 7B vs Llama 3 on Apple Silicon

Best GUI clients for Local LLMs

Quantization 101: Speed up your Inference

ChatEzzy Workspace