[ Your Browser ] ↓ :3000 [ Open WebUI container ] ← Docker inside LXC ↓ :11434 [ Ollama service ] ← Systemd service inside LXC ↓ [ /dev/kfd + /dev/dri ] ← GPU devices bind-mounted from Proxmox host ↓ [ RX 6700 XT ] ← amdgpu driver on Proxmox host
Part 1 — Prepare the Proxmox Host

The amdgpu kernel driver lives on the Proxmox host. The LXC container gets raw access to the device files via bind mounts. Before creating the container, verify the GPU is properly recognized and note the device major numbers you'll need for the cgroup rules.

Verify GPU and KFD Device SSH into your Proxmox host:
# Confirm amdgpu driver is bound
lspci -k | grep -A 2 "VGA"
# Expected: Kernel driver in use: amdgpu

# Confirm KFD node exists — required for ROCm compute
ls -la /dev/kfd
# Expected: crw-rw---- 1 root render 510, 0 ...

# If /dev/kfd is missing:
modprobe amdgpu
dmesg | grep -i kfd
Check Kernel Version Compatibility ROCm 6.x requires kernel 5.15 or newer:
uname -r
# Proxmox 8.x ships with 6.x kernels — you're fine
# Proxmox 7.x may be on 5.15 — also fine
Identify Device Major Numbers You need the major numbers for your DRI and KFD devices for the cgroup rules:
ls -la /dev/dri/
# crw-rw---- 1 root video 226, 0 ... card0
# DRI major = 226

ls -la /dev/kfd
# crw-rw---- 1 root render 510, 0 ...
# KFD major — commonly 510, but verify yours
⚠️
Write both major numbers down. You'll need them in the LXC config. The KFD major (commonly 510) can vary — using the wrong number causes a silent failure.
Part 2 — Create the LXC Container

The container must be privileged — unprivileged containers cannot pass through GPU devices. You'll also need to edit the LXC config file before first boot to add the cgroup device rules and bind mounts.

Download the Ubuntu 22.04 Template
pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
Or via the Proxmox web UI: your storage → CT TemplatesTemplates → download ubuntu-22.04-standard.
Create the LXC In the Proxmox web UI → Create CT:
FieldValueNotes
Hostnamellm-serverOr whatever you prefer
Templateubuntu-22.04-standard
Disk size60 GB minimumModels are large — a 14B Q4 is ~9GB
CPU cores8+
RAM16384 MB (16 GB)ROCm userspace is memory-hungry
Swap4096 MB
NetworkDHCP or static IPSet a static IP if you plan to bookmark the WebUI URL
PrivilegedCHECK THISRequired — unprivileged containers cannot pass through GPU devices
Edit the LXC Config Before First Boot On the Proxmox host, edit the config (replace <CTID> with your container ID, e.g. 100):
nano /etc/pve/lxc/<CTID>.conf
Add these lines at the bottom:
# GPU passthrough — DRI devices (display/render)
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm

# KFD device (compute — required for ROCm/OpenCL)
lxc.cgroup2.devices.allow: c 510:0 rwm

# Bind mount the devices into the container
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
⚠️
Replace 510 with the actual KFD major number you found in Part 1. If you use the wrong number, this line silently fails and ROCm won't see the GPU.
Start the Container and Verify GPU Visibility
pct start <CTID>
pct enter <CTID>

# Inside the container:
ls -la /dev/dri/
# Should show card0, card1, renderD128, renderD129

ls -la /dev/kfd
# Must exist — if missing, recheck cgroup major number
If /dev/kfd is missing, stop here and recheck the cgroup rule before continuing.
Part 3 — Install ROCm Inside the Container

All commands from here run inside the LXC container. We're installing only the ROCm userspace libraries — the kernel driver already lives on the Proxmox host.

Install Prerequisites
apt update && apt upgrade -y
apt install -y wget gnupg2 curl git build-essential \
  python3-pip libnuma-dev libpci-dev \
  ca-certificates software-properties-common
Add the ROCm Repository
# As of 2025, ROCm 6.1.3 is stable — check repo.radeon.com for latest
wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
dpkg -i amdgpu-install_6.1.60103-1_all.deb
apt update
Install ROCm (Userspace Only — No Kernel Modules) The --no-dkms flag is critical — kernel driver is already on the Proxmox host:
amdgpu-install --usecase=rocm --no-dkms -y
This installs the ROCm runtime, HIP compiler toolchain, HSA runtime, and rocminfo/rocm-smi tools. Expect ~2–3GB download and several minutes.
Add Users to the Correct Groups
usermod -aG render,video root
Set the GFX Version Override Navi 22 (gfx1031) is not on ROCm's official supported GPU list, but it works with a version override. Without this, ROCm will refuse to use the GPU:
echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/environment
echo 'HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/profile.d/rocm.sh

# Apply immediately for this session:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
Verify ROCm Sees the GPU
source /etc/environment
rocminfo
Look for: Name: gfx1031 and Compute Unit: 40. Also try:
rocm-smi
# Should show GPU temp, VRAM usage, utilization
If rocminfo hangs or shows no agents: check that /dev/kfd is present and the cgroup major number is correct.
Part 4 — Install Ollama

Ollama runs as a systemd service. The key configuration is injecting HSA_OVERRIDE_GFX_VERSION and OLLAMA_HOST=0.0.0.0 into the service so the GPU override survives reboots and the API is accessible to Open WebUI in Docker.

Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
This installs the ollama binary to /usr/local/bin and creates a systemd service.
Configure the Systemd Service Override
mkdir -p /etc/systemd/system/ollama.service.d

cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0"
EOF
  • HSA_OVERRIDE_GFX_VERSION — without this, ROCm ignores the GPU entirely
  • OLLAMA_HOST=0.0.0.0 — makes Ollama listen on all interfaces so Open WebUI (in Docker) can reach it
systemctl daemon-reload
systemctl restart ollama
systemctl enable ollama
Verify Ollama is Running with GPU
systemctl status ollama
journalctl -u ollama -n 50
# Look for: "GPU discovered: gfx1031" or ROCm being mentioned
If you see "CPU only" warnings, the HSA override may not have applied — double check the systemd override file.
Pull Your Model
ollama pull qwen2.5:14b
Downloads ~9GB. Other models worth having:
ollama pull qwen2.5-coder:14b   # Coding-focused variant
ollama pull phi4:14b             # Microsoft's reasoning model
ollama pull gemma3:12b           # Google's 12B, fits comfortably
Test a Quick Inference
ollama run qwen2.5:14b "Explain transformer attention in 3 sentences."
While it runs, check GPU utilization in a second terminal:
rocm-smi
# VRAM should jump to ~9–10GB during generation
# GPU utilization should be 80–100%
If VRAM stays at 0, Ollama is running on CPU. Go back to Part 3 and fix the ROCm setup before continuing.
Part 5 — Install Docker and Open WebUI

Open WebUI gives you a ChatGPT-like interface for your local models. It runs in Docker inside the same LXC container as Ollama.

Install Docker Don't use the apt version — it's outdated. Use Docker's official repo:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | \
  tee /etc/apt/sources.list.d/docker.list > /dev/null

apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Verify:
docker run hello-world
Deploy Open WebUI
docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://172.17.0.1:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main
💡
Why 172.17.0.1? That's Docker's default bridge gateway — the IP the container uses to reach the host (where Ollama is listening). More reliable than host.docker.internal on Linux.
Verify:
docker ps
# Should show open-webui running, port 3000->8080
Access Open WebUI Find the LXC's IP:
ip addr show eth0 | grep "inet "
Open in your browser: http://<LXC-IP>:3000
  • Create an admin account (local only, no cloud sign-in)
  • Go to Settings → Connections — confirm the Ollama URL is correct
  • Your pulled models should auto-appear in the model dropdown
Part 6 — Tuning and Quality of Life
Model Parameters for Best Results In Open WebUI → Admin → Models, select your model and set a system prompt:
You are a helpful, harmless, and honest AI assistant. Be thorough but concise.
Ask clarifying questions when the request is ambiguous.
Acknowledge uncertainty rather than guessing. Use markdown formatting where helpful.
Recommended generation parameters:
ParameterValueWhy
Temperature0.7Balanced — not too creative, not too rigid
Top-P0.9Prevents low-probability rambling
Top-K40Good default for 14B models
Context Length8192Qwen2.5 supports up to 128K but 8K is VRAM-safe
Repeat Penalty1.1Prevents looping
Increase Context Window Safely 12GB VRAM can handle more context than the default 2048:
ollama show qwen2.5:14b --modelfile > qwen_custom.modelfile
# Edit the file and add/change:
# PARAMETER num_ctx 8192
# PARAMETER num_gpu 99

ollama create qwen2.5-14b-custom -f qwen_custom.modelfile
⚠️
VRAM limits at Q4_K_M quantization: 4096 ctx ~9.5GB ✅ · 8192 ctx ~11GB ✅ (tight) · 16384 ctx → OOM ❌
Enable Ollama API Access from Your LAN The service is already listening on 0.0.0.0:11434 from the override in Part 4. Test from another machine on your network:
curl http://<LXC-IP>:11434/api/generate \
  -d '{"model":"qwen2.5:14b","prompt":"Hello!","stream":false}'
If it fails, check: iptables -L | grep 11434 on the Proxmox host.
Part 7 — Troubleshooting
ROCm Doesn't See the GPU
# Check HSA is actually set:
env | grep HSA

# Re-run with explicit override:
HSA_OVERRIDE_GFX_VERSION=10.3.0 rocminfo

# Check device permissions:
ls -la /dev/kfd /dev/dri/*
# Your user must be in render and video groups

# After adding to groups, re-login or use newgrp:
newgrp render
Ollama Runs on CPU Only
journalctl -u ollama -n 100 --no-pager | grep -i "error\|warn\|cpu\|gpu\|rocm"

# Verify the override conf is being applied:
systemctl show ollama | grep HSA
# Should show: Environment=HSA_OVERRIDE_GFX_VERSION=10.3.0
Open WebUI Can't Connect to Ollama
# Test connectivity from inside the container:
curl http://localhost:11434/api/tags

# Test from inside the Docker container:
docker exec -it open-webui curl http://172.17.0.1:11434/api/tags

# If that fails, check Ollama is binding to 0.0.0.0:
ss -tlnp | grep 11434
# Should show: 0.0.0.0:11434
Expected Performance & Quick Reference

On RX 6700 XT with Qwen2.5 14B (Q4_K_M quantization):

MetricExpected
Prompt processing (prefill)~1500–2500 tokens/sec
Generation speed~15–25 tokens/sec
VRAM usage (idle)~8.5 GB
VRAM usage (during gen)~10–11 GB
Time to first token~1–3 seconds

All Service Commands

# Ollama
systemctl status ollama
systemctl restart ollama
journalctl -u ollama -f          # Live logs

# Models
ollama list                       # Installed models
ollama pull <model>               # Download a model
ollama rm <model>                 # Remove a model
ollama run <model>                # Interactive chat in terminal

# Docker / Open WebUI
docker ps                         # Check container status
docker restart open-webui         # Restart the UI
docker logs -f open-webui         # Live logs

# GPU monitoring
rocm-smi                          # GPU stats (temp, VRAM, utilization)
watch -n 1 rocm-smi               # Auto-refresh every second