Local LLM on Proxmox (AMD ROCm)

Part 1 — Prepare the Proxmox Host

The amdgpu kernel driver lives on the Proxmox host. The LXC container gets raw access to the device files via bind mounts. Before creating the container, verify the GPU is properly recognized and note the device major numbers you'll need for the cgroup rules.

Verify GPU and KFD Device SSH into your Proxmox host:

# Confirm amdgpu driver is bound
lspci -k | grep -A 2 "VGA"
# Expected: Kernel driver in use: amdgpu

# Confirm KFD node exists — required for ROCm compute
ls -la /dev/kfd
# Expected: crw-rw---- 1 root render 510, 0 ...

# If /dev/kfd is missing:
modprobe amdgpu
dmesg | grep -i kfd

Check Kernel Version Compatibility ROCm 6.x requires kernel 5.15 or newer:

uname -r
# Proxmox 8.x ships with 6.x kernels — you're fine
# Proxmox 7.x may be on 5.15 — also fine

Identify Device Major Numbers You need the major numbers for your DRI and KFD devices for the cgroup rules:

ls -la /dev/dri/
# crw-rw---- 1 root video 226, 0 ... card0
# DRI major = 226

ls -la /dev/kfd
# crw-rw---- 1 root render 510, 0 ...
# KFD major — commonly 510, but verify yours

⚠️

Write both major numbers down. You'll need them in the LXC config. The KFD major (commonly 510) can vary — using the wrong number causes a silent failure.

Part 2 — Create the LXC Container

The container must be privileged — unprivileged containers cannot pass through GPU devices. You'll also need to edit the LXC config file before first boot to add the cgroup device rules and bind mounts.

Download the Ubuntu 22.04 Template

pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst

Or via the Proxmox web UI: your storage → CT Templates → Templates → download ubuntu-22.04-standard.

Create the LXC In the Proxmox web UI → Create CT:

Field	Value	Notes
Hostname	`llm-server`	Or whatever you prefer
Template	ubuntu-22.04-standard	—
Disk size	60 GB minimum	Models are large — a 14B Q4 is ~9GB
CPU cores	8+	—
RAM	16384 MB (16 GB)	ROCm userspace is memory-hungry
Swap	4096 MB	—
Network	DHCP or static IP	Set a static IP if you plan to bookmark the WebUI URL
Privileged	✅ CHECK THIS	Required — unprivileged containers cannot pass through GPU devices

Edit the LXC Config Before First Boot On the Proxmox host, edit the config (replace <CTID> with your container ID, e.g. 100):

nano /etc/pve/lxc/<CTID>.conf

Add these lines at the bottom:

# GPU passthrough — DRI devices (display/render)
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm

# KFD device (compute — required for ROCm/OpenCL)
lxc.cgroup2.devices.allow: c 510:0 rwm

# Bind mount the devices into the container
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file

⚠️

Replace 510 with the actual KFD major number you found in Part 1. If you use the wrong number, this line silently fails and ROCm won't see the GPU.

Start the Container and Verify GPU Visibility

pct start <CTID>
pct enter <CTID>

# Inside the container:
ls -la /dev/dri/
# Should show card0, card1, renderD128, renderD129

ls -la /dev/kfd
# Must exist — if missing, recheck cgroup major number

If /dev/kfd is missing, stop here and recheck the cgroup rule before continuing.

Part 3 — Install ROCm Inside the Container

All commands from here run inside the LXC container. We're installing only the ROCm userspace libraries — the kernel driver already lives on the Proxmox host.

Install Prerequisites

apt update && apt upgrade -y
apt install -y wget gnupg2 curl git build-essential \
  python3-pip libnuma-dev libpci-dev \
  ca-certificates software-properties-common

Add the ROCm Repository

# As of 2025, ROCm 6.1.3 is stable — check repo.radeon.com for latest
wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
dpkg -i amdgpu-install_6.1.60103-1_all.deb
apt update

Install ROCm (Userspace Only — No Kernel Modules) The --no-dkms flag is critical — kernel driver is already on the Proxmox host:

amdgpu-install --usecase=rocm --no-dkms -y

This installs the ROCm runtime, HIP compiler toolchain, HSA runtime, and rocminfo/rocm-smi tools. Expect ~2–3GB download and several minutes.

Add Users to the Correct Groups

usermod -aG render,video root

Set the GFX Version Override Navi 22 (gfx1031) is not on ROCm's official supported GPU list, but it works with a version override. Without this, ROCm will refuse to use the GPU:

echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/environment
echo 'HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/profile.d/rocm.sh

# Apply immediately for this session:
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Verify ROCm Sees the GPU

source /etc/environment
rocminfo

Look for: Name: gfx1031 and Compute Unit: 40. Also try:

rocm-smi
# Should show GPU temp, VRAM usage, utilization

If rocminfo hangs or shows no agents: check that /dev/kfd is present and the cgroup major number is correct.

Part 4 — Install Ollama

Ollama runs as a systemd service. The key configuration is injecting HSA_OVERRIDE_GFX_VERSION and OLLAMA_HOST=0.0.0.0 into the service so the GPU override survives reboots and the API is accessible to Open WebUI in Docker.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

This installs the ollama binary to /usr/local/bin and creates a systemd service.

Configure the Systemd Service Override

mkdir -p /etc/systemd/system/ollama.service.d

cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0"
EOF

HSA_OVERRIDE_GFX_VERSION — without this, ROCm ignores the GPU entirely
OLLAMA_HOST=0.0.0.0 — makes Ollama listen on all interfaces so Open WebUI (in Docker) can reach it

systemctl daemon-reload
systemctl restart ollama
systemctl enable ollama

Verify Ollama is Running with GPU

systemctl status ollama
journalctl -u ollama -n 50
# Look for: "GPU discovered: gfx1031" or ROCm being mentioned

If you see "CPU only" warnings, the HSA override may not have applied — double check the systemd override file.

Pull Your Model

ollama pull qwen2.5:14b

Downloads ~9GB. Other models worth having:

ollama pull qwen2.5-coder:14b   # Coding-focused variant
ollama pull phi4:14b             # Microsoft's reasoning model
ollama pull gemma3:12b           # Google's 12B, fits comfortably

Test a Quick Inference

ollama run qwen2.5:14b "Explain transformer attention in 3 sentences."

While it runs, check GPU utilization in a second terminal:

rocm-smi
# VRAM should jump to ~9–10GB during generation
# GPU utilization should be 80–100%

If VRAM stays at 0, Ollama is running on CPU. Go back to Part 3 and fix the ROCm setup before continuing.

Part 5 — Install Docker and Open WebUI

Open WebUI gives you a ChatGPT-like interface for your local models. It runs in Docker inside the same LXC container as Ollama.

Install Docker Don't use the apt version — it's outdated. Use Docker's official repo:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | \
  tee /etc/apt/sources.list.d/docker.list > /dev/null

apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Verify:
docker run hello-world

Deploy Open WebUI

docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://172.17.0.1:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

💡

Why 172.17.0.1? That's Docker's default bridge gateway — the IP the container uses to reach the host (where Ollama is listening). More reliable than host.docker.internal on Linux.

Verify:

docker ps
# Should show open-webui running, port 3000->8080

Access Open WebUI Find the LXC's IP:

ip addr show eth0 | grep "inet "

Open in your browser: http://<LXC-IP>:3000

Create an admin account (local only, no cloud sign-in)
Go to Settings → Connections — confirm the Ollama URL is correct
Your pulled models should auto-appear in the model dropdown

Part 6 — Tuning and Quality of Life

Model Parameters for Best Results In Open WebUI → Admin → Models, select your model and set a system prompt:

You are a helpful, harmless, and honest AI assistant. Be thorough but concise.
Ask clarifying questions when the request is ambiguous.
Acknowledge uncertainty rather than guessing. Use markdown formatting where helpful.

Recommended generation parameters:

Parameter	Value	Why
Temperature	0.7	Balanced — not too creative, not too rigid
Top-P	0.9	Prevents low-probability rambling
Top-K	40	Good default for 14B models
Context Length	8192	Qwen2.5 supports up to 128K but 8K is VRAM-safe
Repeat Penalty	1.1	Prevents looping

Increase Context Window Safely 12GB VRAM can handle more context than the default 2048:

ollama show qwen2.5:14b --modelfile > qwen_custom.modelfile
# Edit the file and add/change:
# PARAMETER num_ctx 8192
# PARAMETER num_gpu 99

ollama create qwen2.5-14b-custom -f qwen_custom.modelfile

⚠️

VRAM limits at Q4_K_M quantization: 4096 ctx ~9.5GB ✅ · 8192 ctx ~11GB ✅ (tight) · 16384 ctx → OOM ❌

Enable Ollama API Access from Your LAN The service is already listening on 0.0.0.0:11434 from the override in Part 4. Test from another machine on your network:

curl http://<LXC-IP>:11434/api/generate \
  -d '{"model":"qwen2.5:14b","prompt":"Hello!","stream":false}'

If it fails, check: iptables -L | grep 11434 on the Proxmox host.

Part 7 — Troubleshooting

ROCm Doesn't See the GPU

# Check HSA is actually set:
env | grep HSA

# Re-run with explicit override:
HSA_OVERRIDE_GFX_VERSION=10.3.0 rocminfo

# Check device permissions:
ls -la /dev/kfd /dev/dri/*
# Your user must be in render and video groups

# After adding to groups, re-login or use newgrp:
newgrp render

Ollama Runs on CPU Only

journalctl -u ollama -n 100 --no-pager | grep -i "error\|warn\|cpu\|gpu\|rocm"

# Verify the override conf is being applied:
systemctl show ollama | grep HSA
# Should show: Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0

Open WebUI Can't Connect to Ollama

# Test connectivity from inside the container:
curl http://localhost:11434/api/tags

# Test from inside the Docker container:
docker exec -it open-webui curl http://172.17.0.1:11434/api/tags

# If that fails, check Ollama is binding to 0.0.0.0:
ss -tlnp | grep 11434
# Should show: 0.0.0.0:11434

Expected Performance & Quick Reference

On RX 6700 XT with Qwen2.5 14B (Q4_K_M quantization):

Metric	Expected
Prompt processing (prefill)	~1500–2500 tokens/sec
Generation speed	~15–25 tokens/sec
VRAM usage (idle)	~8.5 GB
VRAM usage (during gen)	~10–11 GB
Time to first token	~1–3 seconds

All Service Commands

# Ollama
systemctl status ollama
systemctl restart ollama
journalctl -u ollama -f          # Live logs

# Models
ollama list                       # Installed models
ollama pull <model>               # Download a model
ollama rm <model>                 # Remove a model
ollama run <model>                # Interactive chat in terminal

# Docker / Open WebUI
docker ps                         # Check container status
docker restart open-webui         # Restart the UI
docker logs -f open-webui         # Live logs

# GPU monitoring
rocm-smi                          # GPU stats (temp, VRAM, utilization)
watch -n 1 rocm-smi               # Auto-refresh every second