Aider + Local LLM on Proxmox (AMD ROCm)

Part 1 — Prepare the Proxmox Host

The amdgpu kernel driver lives on the Proxmox host. You'll need the device major numbers before creating the LXC container.

Verify GPU and KFD Device

lspci -k | grep -A 2 "VGA"
# Expected: Kernel driver in use: amdgpu

ls -la /dev/kfd
# Expected: crw-rw---- 1 root render 510, 0 ...

# If /dev/kfd is missing:
modprobe amdgpu
dmesg | grep -i kfd

Identify Device Major Numbers for cgroup Rules

ls -la /dev/dri/
# Major number for DRI = 226 (look for: 226, 0 ... card0)

ls -la /dev/kfd
# Major number for KFD — commonly 510, verify yours

⚠️

Write both major numbers down. You'll need them in the LXC config. Using the wrong KFD major causes a silent failure — ROCm simply won't see the GPU.

Part 2 — Create the LXC Container

The container must be privileged — unprivileged containers cannot pass through GPU devices. Edit the LXC config before first boot to add the cgroup device rules.

Download the Ubuntu 22.04 Template

pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst

Create the LXC In the Proxmox web UI → Create CT:

Field	Value	Notes
Hostname	`llm-coder`	—
Template	ubuntu-22.04-standard	—
Disk size	60 GB minimum	Models are large — Qwen2.5-Coder 14B is ~9GB
CPU cores	8+	—
RAM	16384 MB (16 GB)	ROCm is memory-hungry
Swap	4096 MB	—
Network	Set a static IP	You'll reference this IP from your laptop's Aider config
Privileged	✅ CHECK THIS	Required — unprivileged containers cannot pass through GPU devices

Edit the LXC Config Before First Boot Replace <CTID> with your container ID:

nano /etc/pve/lxc/<CTID>.conf

Add these lines at the bottom:

# DRI devices (render nodes)
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm

# KFD compute device — replace 510 with your actual major number
lxc.cgroup2.devices.allow: c 510:0 rwm

# Bind mount into container
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file

Start the container and verify:

pct start <CTID>
pct enter <CTID>

ls -la /dev/dri/        # Should show card0, card1, renderD128, renderD129
ls -la /dev/kfd         # Must exist

Part 3 — Install ROCm Inside the Container

All commands from here run inside the LXC container. Installing ROCm userspace only — --no-dkms is critical since the kernel driver lives on the Proxmox host.

Install Prerequisites

apt update && apt upgrade -y
apt install -y wget gnupg2 curl git build-essential \
  python3-pip libnuma-dev libpci-dev \
  ca-certificates software-properties-common

Add the ROCm Repository and Install

wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
dpkg -i amdgpu-install_6.1.60103-1_all.deb
apt update

# --no-dkms is critical: kernel driver is already on the Proxmox host
amdgpu-install --usecase=rocm --no-dkms -y

Set Permissions and GFX Version Override Navi 22 (gfx1031) is not on ROCm's official support list. This override tells HSA to treat it as a supported gfx1030-class device:

usermod -aG render,video root

echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/environment
echo 'HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/profile.d/rocm.sh
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Verify ROCm Sees the GPU

rocminfo | grep -A 5 "gfx"
# Must show: Name: gfx1031 and Compute Unit: 40

rocm-smi
# Shows GPU temp, VRAM usage, utilization

If rocminfo hangs or shows no agents: recheck /dev/kfd is present and the cgroup major number matches.

Part 4 — Install and Configure Ollama

Ollama serves the model over a local API that Aider will call from your laptop. The key settings are injecting the GPU override into the systemd service and binding to all interfaces so it's accessible over LAN.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Configure the Systemd Service Override

mkdir -p /etc/systemd/system/ollama.service.d

cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=60m"
EOF

HSA_OVERRIDE_GFX_VERSION — without this, ROCm ignores the GPU entirely
OLLAMA_HOST=0.0.0.0 — binds to all interfaces so your laptop can reach it over LAN
OLLAMA_NUM_PARALLEL=1 — one request at a time; prevents VRAM fragmentation with a single user
OLLAMA_KEEP_ALIVE=60m — keeps model in VRAM between requests. Aider sessions are slow with cold loads (default is 5m)

systemctl daemon-reload
systemctl restart ollama
systemctl enable ollama

# Verify it started cleanly and detected the GPU:
journalctl -u ollama -n 50 | grep -i "gpu\|rocm\|error"

Pull the Coding Model and Create a Tuned Variant

ollama pull qwen2.5-coder:14b
# ~9GB download — also consider a smaller fallback:
ollama pull qwen2.5-coder:7b    # ~4.5GB

Tune the model for coding workloads (larger context, lower temperature):

ollama show qwen2.5-coder:14b --modelfile > /root/coder14b.modelfile

Edit the file and ensure these parameters:

PARAMETER num_ctx 16384
PARAMETER num_gpu 99
PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.05

num_ctx 16384 — 16K context fits in 12GB VRAM at this quantization, gives Aider room to load multiple files
temperature 0.2 — more deterministic, less "creative" with syntax
num_gpu 99 — forces all layers onto GPU

ollama create qwen2.5-coder-14b-aider -f /root/coder14b.modelfile
ollama list   # Verify it appears

Verify LAN Access from Your Laptop Before moving to Part 5, test connectivity from your laptop's terminal:

curl http://<LXC-IP>:11434/api/tags
# Should return JSON listing your models

# Also test a real inference call:
curl http://<LXC-IP>:11434/api/generate \
  -d '{"model":"qwen2.5-coder-14b-aider","prompt":"def fibonacci(n):","stream":false}'

Don't proceed to Part 5 until this curl returns a valid response.

Part 5 — Install and Configure Aider on Your Laptop

All commands from here run on your laptop, not the Proxmox server. Aider handles the file edits and Git commits locally — the Proxmox LXC just provides the inference engine over LAN.

Understand Aider's Model Prefix System

Prefix	Behavior
`ollama/qwen2.5-coder:14b`	Uses raw completions API. Works but misses system prompt formatting.
`ollama_chat/qwen2.5-coder:14b`	Uses chat completions API — proper role formatting. Use this one.

Always use the ollama_chat/ prefix for coding models.

Install Aider with pipx Don't use bare pip install on your system Python — use pipx for isolation:

# Install pipx if you don't have it:
sudo apt install pipx       # Ubuntu/Debian
# or: brew install pipx     # macOS

pipx install aider-chat
pipx ensurepath             # Adds ~/.local/bin to PATH

# Verify:
aider --version

Ensure Your Project is a Git Repo Aider uses Git to manage changes. It won't work properly in a plain directory:

cd /path/to/your/project

# If not already a repo:
git init
git add .
git commit -m "initial commit before aider session"

Aider makes a commit after every accepted change, giving you a clean undo trail.

Create a Persistent Config File Instead of typing flags every session, create ~/.aider.conf.yml:

cat > ~/.aider.conf.yml << 'EOF'
# Ollama server on Proxmox LXC
model: ollama_chat/qwen2.5-coder-14b-aider
ollama-api-base: http://<YOUR_LXC_IP>:11434

# Behavior
auto-commits: true
dirty-commits: true
stream: true

# Context
map-tokens: 2048
max-chat-history-tokens: 4096

# UI
pretty: true
EOF

With this file in place, just run aider with no flags.

Launch Aider and Core Modes

cd /your/project
aider

Aider has three interaction modes:

Code mode (default) — ask it to make changes, it edits files directly
Ask mode — /ask <question> asks without editing
Architect mode — /architect <task> reasons about the approach first, then writes code. Closest to how Claude Code works.

Essential slash commands:

/add <file>      Add a file to the active context (model can edit it)
/drop <file>     Remove a file from context (save tokens)
/ls              List files currently in context
/run <command>   Run a shell command and feed output to the model
/ask <question>  Ask without making changes
/diff            Show all changes made in this session
/undo            Revert the last commit Aider made
/clear           Clear chat history (keeps files in context)
/exit            Exit Aider

💡

Practical workflow: Add only files directly relevant to the task (/add). Review the diff before committing. Use /undo if the change isn't right. Feed test failures directly with /run pytest tests/ -v.

Part 6 — Monitoring and Troubleshooting

Monitor GPU During a Session Keep a second terminal open to your Proxmox LXC:

watch -n 1 rocm-smi

During active generation you should see: GPU utilization 80–100%, VRAM ~10–11GB (for 14B with 16K context), temp 60–80°C — normal for this card.

Aider Connects but Responses are Slow or Garbled

# Verify you're using ollama_chat/ not ollama/
# Check model is staying loaded (not reloading each request):
journalctl -u ollama -f
# You should NOT see "loading model" on every request
# If you do, OLLAMA_KEEP_ALIVE may not have applied:
systemctl show ollama | grep KEEP_ALIVE

"Model not found" Error

# On the server, list available models:
ollama list

# Make sure the exact name matches what's in your config
# Common mistake: config says qwen2.5-coder-14b-aider but you forgot
# to run `ollama create` with the custom modelfile

Aider Makes Edits but Git Commits Fail

git config --global user.email "[email protected]"
git config --global user.name "Your Name"

Context Window Overflow Errors If you see errors about context length:

Drop files from context: /drop anything not immediately needed
Clear history: /clear (removes conversation but keeps files)
Reduce num_ctx in your modelfile to 8192 if 16384 is causing OOM

Expected Performance & Quick Reference

On RX 6700 XT with Qwen2.5-Coder 14B (Q4_K_M) at 16K context:

Metric	Expected
Generation speed	12–20 tokens/sec
Prompt processing	~1000–2000 tokens/sec
VRAM (model loaded, idle)	~9.5 GB
VRAM (during generation)	~11–11.5 GB
Time to first token	1–4 seconds
Cold load time (not in VRAM)	8–15 seconds

With OLLAMA_KEEP_ALIVE=60m, cold loads only happen after 60 minutes of inactivity.

Quick Reference

# === On the Proxmox LXC (server) ===
systemctl status ollama
journalctl -u ollama -f
ollama list
rocm-smi
watch -n 1 rocm-smi

# === On your laptop (client) ===
aider                                          # Launch with config file defaults
aider --model ollama_chat/qwen2.5-coder:14b   # Override model

# Inside Aider:
/add <file>       # Add file to context
/drop <file>      # Remove file from context
/run <cmd>        # Run shell command, feed output to model
/ask <question>   # Ask without editing
/architect <task> # Reason first, then edit (most powerful mode)
/undo             # Revert last AI commit
/diff             # Show all changes this session
/clear            # Reset chat history