AMD ROCm · Ollama · Open WebUI in a Privileged LXC
The amdgpu kernel driver lives on the Proxmox host. The LXC container gets raw access to the device files via bind mounts. Before creating the container, verify the GPU is properly recognized and note the device major numbers you'll need for the cgroup rules.
# Confirm amdgpu driver is bound
lspci -k | grep -A 2 "VGA"
# Expected: Kernel driver in use: amdgpu
# Confirm KFD node exists — required for ROCm compute
ls -la /dev/kfd
# Expected: crw-rw---- 1 root render 510, 0 ...
# If /dev/kfd is missing:
modprobe amdgpu
dmesg | grep -i kfd
uname -r
# Proxmox 8.x ships with 6.x kernels — you're fine
# Proxmox 7.x may be on 5.15 — also fine
ls -la /dev/dri/
# crw-rw---- 1 root video 226, 0 ... card0
# DRI major = 226
ls -la /dev/kfd
# crw-rw---- 1 root render 510, 0 ...
# KFD major — commonly 510, but verify yours
The container must be privileged — unprivileged containers cannot pass through GPU devices. You'll also need to edit the LXC config file before first boot to add the cgroup device rules and bind mounts.
pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
Or via the Proxmox web UI: your storage → CT Templates → Templates → download ubuntu-22.04-standard.
| Field | Value | Notes |
|---|---|---|
| Hostname | llm-server | Or whatever you prefer |
| Template | ubuntu-22.04-standard | — |
| Disk size | 60 GB minimum | Models are large — a 14B Q4 is ~9GB |
| CPU cores | 8+ | — |
| RAM | 16384 MB (16 GB) | ROCm userspace is memory-hungry |
| Swap | 4096 MB | — |
| Network | DHCP or static IP | Set a static IP if you plan to bookmark the WebUI URL |
| Privileged | ✅ CHECK THIS | Required — unprivileged containers cannot pass through GPU devices |
<CTID> with your container ID, e.g. 100):
nano /etc/pve/lxc/<CTID>.conf
Add these lines at the bottom:
# GPU passthrough — DRI devices (display/render)
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm
# KFD device (compute — required for ROCm/OpenCL)
lxc.cgroup2.devices.allow: c 510:0 rwm
# Bind mount the devices into the container
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
510 with the actual KFD major number you found in Part 1. If you use the wrong number, this line silently fails and ROCm won't see the GPU.pct start <CTID>
pct enter <CTID>
# Inside the container:
ls -la /dev/dri/
# Should show card0, card1, renderD128, renderD129
ls -la /dev/kfd
# Must exist — if missing, recheck cgroup major number
If /dev/kfd is missing, stop here and recheck the cgroup rule before continuing.
All commands from here run inside the LXC container. We're installing only the ROCm userspace libraries — the kernel driver already lives on the Proxmox host.
apt update && apt upgrade -y
apt install -y wget gnupg2 curl git build-essential \
python3-pip libnuma-dev libpci-dev \
ca-certificates software-properties-common
# As of 2025, ROCm 6.1.3 is stable — check repo.radeon.com for latest
wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
dpkg -i amdgpu-install_6.1.60103-1_all.deb
apt update
--no-dkms flag is critical — kernel driver is already on the Proxmox host:
amdgpu-install --usecase=rocm --no-dkms -y
This installs the ROCm runtime, HIP compiler toolchain, HSA runtime, and rocminfo/rocm-smi tools. Expect ~2–3GB download and several minutes.
usermod -aG render,video root
echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/environment
echo 'HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/profile.d/rocm.sh
# Apply immediately for this session:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
source /etc/environment
rocminfo
Look for: Name: gfx1031 and Compute Unit: 40. Also try:
rocm-smi
# Should show GPU temp, VRAM usage, utilization
If rocminfo hangs or shows no agents: check that /dev/kfd is present and the cgroup major number is correct.
Ollama runs as a systemd service. The key configuration is injecting HSA_OVERRIDE_GFX_VERSION and OLLAMA_HOST=0.0.0.0 into the service so the GPU override survives reboots and the API is accessible to Open WebUI in Docker.
curl -fsSL https://ollama.com/install.sh | sh
This installs the ollama binary to /usr/local/bin and creates a systemd service.
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0"
EOF
HSA_OVERRIDE_GFX_VERSION — without this, ROCm ignores the GPU entirelyOLLAMA_HOST=0.0.0.0 — makes Ollama listen on all interfaces so Open WebUI (in Docker) can reach itsystemctl daemon-reload
systemctl restart ollama
systemctl enable ollama
systemctl status ollama
journalctl -u ollama -n 50
# Look for: "GPU discovered: gfx1031" or ROCm being mentioned
If you see "CPU only" warnings, the HSA override may not have applied — double check the systemd override file.
ollama pull qwen2.5:14b
Downloads ~9GB. Other models worth having:
ollama pull qwen2.5-coder:14b # Coding-focused variant
ollama pull phi4:14b # Microsoft's reasoning model
ollama pull gemma3:12b # Google's 12B, fits comfortably
ollama run qwen2.5:14b "Explain transformer attention in 3 sentences."
While it runs, check GPU utilization in a second terminal:
rocm-smi
# VRAM should jump to ~9–10GB during generation
# GPU utilization should be 80–100%
If VRAM stays at 0, Ollama is running on CPU. Go back to Part 3 and fix the ROCm setup before continuing.
Open WebUI gives you a ChatGPT-like interface for your local models. It runs in Docker inside the same LXC container as Ollama.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# Verify:
docker run hello-world
docker run -d \
--name open-webui \
--restart always \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://172.17.0.1:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
172.17.0.1? That's Docker's default bridge gateway — the IP the container uses to reach the host (where Ollama is listening). More reliable than host.docker.internal on Linux.docker ps
# Should show open-webui running, port 3000->8080
ip addr show eth0 | grep "inet "
Open in your browser: http://<LXC-IP>:3000
You are a helpful, harmless, and honest AI assistant. Be thorough but concise.
Ask clarifying questions when the request is ambiguous.
Acknowledge uncertainty rather than guessing. Use markdown formatting where helpful.
Recommended generation parameters:
| Parameter | Value | Why |
|---|---|---|
| Temperature | 0.7 | Balanced — not too creative, not too rigid |
| Top-P | 0.9 | Prevents low-probability rambling |
| Top-K | 40 | Good default for 14B models |
| Context Length | 8192 | Qwen2.5 supports up to 128K but 8K is VRAM-safe |
| Repeat Penalty | 1.1 | Prevents looping |
ollama show qwen2.5:14b --modelfile > qwen_custom.modelfile
# Edit the file and add/change:
# PARAMETER num_ctx 8192
# PARAMETER num_gpu 99
ollama create qwen2.5-14b-custom -f qwen_custom.modelfile
0.0.0.0:11434 from the override in Part 4. Test from another machine on your network:
curl http://<LXC-IP>:11434/api/generate \
-d '{"model":"qwen2.5:14b","prompt":"Hello!","stream":false}'
If it fails, check: iptables -L | grep 11434 on the Proxmox host.
# Check HSA is actually set:
env | grep HSA
# Re-run with explicit override:
HSA_OVERRIDE_GFX_VERSION=10.3.0 rocminfo
# Check device permissions:
ls -la /dev/kfd /dev/dri/*
# Your user must be in render and video groups
# After adding to groups, re-login or use newgrp:
newgrp render
journalctl -u ollama -n 100 --no-pager | grep -i "error\|warn\|cpu\|gpu\|rocm"
# Verify the override conf is being applied:
systemctl show ollama | grep HSA
# Should show: Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0
# Test connectivity from inside the container:
curl http://localhost:11434/api/tags
# Test from inside the Docker container:
docker exec -it open-webui curl http://172.17.0.1:11434/api/tags
# If that fails, check Ollama is binding to 0.0.0.0:
ss -tlnp | grep 11434
# Should show: 0.0.0.0:11434
On RX 6700 XT with Qwen2.5 14B (Q4_K_M quantization):
| Metric | Expected |
|---|---|
| Prompt processing (prefill) | ~1500–2500 tokens/sec |
| Generation speed | ~15–25 tokens/sec |
| VRAM usage (idle) | ~8.5 GB |
| VRAM usage (during gen) | ~10–11 GB |
| Time to first token | ~1–3 seconds |
# Ollama
systemctl status ollama
systemctl restart ollama
journalctl -u ollama -f # Live logs
# Models
ollama list # Installed models
ollama pull <model> # Download a model
ollama rm <model> # Remove a model
ollama run <model> # Interactive chat in terminal
# Docker / Open WebUI
docker ps # Check container status
docker restart open-webui # Restart the UI
docker logs -f open-webui # Live logs
# GPU monitoring
rocm-smi # GPU stats (temp, VRAM, utilization)
watch -n 1 rocm-smi # Auto-refresh every second