[ Your Laptop ] Aider CLI (local) Git repo (local SSD) File edits happen here ↓ LAN :11434 [ Proxmox LXC Container ] Ollama service (0.0.0.0:11434) Qwen2.5-Coder 14B loaded in VRAM ↓ /dev/kfd + /dev/dri ← bind-mounted from host ↓ [ RX 6700 XT ] ← amdgpu driver on Proxmox host
Part 1 — Prepare the Proxmox Host

The amdgpu kernel driver lives on the Proxmox host. You'll need the device major numbers before creating the LXC container.

Verify GPU and KFD Device
lspci -k | grep -A 2 "VGA"
# Expected: Kernel driver in use: amdgpu

ls -la /dev/kfd
# Expected: crw-rw---- 1 root render 510, 0 ...

# If /dev/kfd is missing:
modprobe amdgpu
dmesg | grep -i kfd
Identify Device Major Numbers for cgroup Rules
ls -la /dev/dri/
# Major number for DRI = 226 (look for: 226, 0 ... card0)

ls -la /dev/kfd
# Major number for KFD — commonly 510, verify yours
⚠️
Write both major numbers down. You'll need them in the LXC config. Using the wrong KFD major causes a silent failure — ROCm simply won't see the GPU.
Part 2 — Create the LXC Container

The container must be privileged — unprivileged containers cannot pass through GPU devices. Edit the LXC config before first boot to add the cgroup device rules.

Download the Ubuntu 22.04 Template
pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
Create the LXC In the Proxmox web UI → Create CT:
FieldValueNotes
Hostnamellm-coder
Templateubuntu-22.04-standard
Disk size60 GB minimumModels are large — Qwen2.5-Coder 14B is ~9GB
CPU cores8+
RAM16384 MB (16 GB)ROCm is memory-hungry
Swap4096 MB
NetworkSet a static IPYou'll reference this IP from your laptop's Aider config
PrivilegedCHECK THISRequired — unprivileged containers cannot pass through GPU devices
Edit the LXC Config Before First Boot Replace <CTID> with your container ID:
nano /etc/pve/lxc/<CTID>.conf
Add these lines at the bottom:
# DRI devices (render nodes)
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm

# KFD compute device — replace 510 with your actual major number
lxc.cgroup2.devices.allow: c 510:0 rwm

# Bind mount into container
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
Start the container and verify:
pct start <CTID>
pct enter <CTID>

ls -la /dev/dri/        # Should show card0, card1, renderD128, renderD129
ls -la /dev/kfd         # Must exist
Part 3 — Install ROCm Inside the Container

All commands from here run inside the LXC container. Installing ROCm userspace only — --no-dkms is critical since the kernel driver lives on the Proxmox host.

Install Prerequisites
apt update && apt upgrade -y
apt install -y wget gnupg2 curl git build-essential \
  python3-pip libnuma-dev libpci-dev \
  ca-certificates software-properties-common
Add the ROCm Repository and Install
wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
dpkg -i amdgpu-install_6.1.60103-1_all.deb
apt update

# --no-dkms is critical: kernel driver is already on the Proxmox host
amdgpu-install --usecase=rocm --no-dkms -y
Set Permissions and GFX Version Override Navi 22 (gfx1031) is not on ROCm's official support list. This override tells HSA to treat it as a supported gfx1030-class device:
usermod -aG render,video root

echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/environment
echo 'HSA_OVERRIDE_GFX_VERSION=10.3.0' >> /etc/profile.d/rocm.sh
export HSA_OVERRIDE_GFX_VERSION=10.3.0
Verify ROCm Sees the GPU
rocminfo | grep -A 5 "gfx"
# Must show: Name: gfx1031 and Compute Unit: 40

rocm-smi
# Shows GPU temp, VRAM usage, utilization
If rocminfo hangs or shows no agents: recheck /dev/kfd is present and the cgroup major number matches.
Part 4 — Install and Configure Ollama

Ollama serves the model over a local API that Aider will call from your laptop. The key settings are injecting the GPU override into the systemd service and binding to all interfaces so it's accessible over LAN.

Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Configure the Systemd Service Override
mkdir -p /etc/systemd/system/ollama.service.d

cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=60m"
EOF
  • HSA_OVERRIDE_GFX_VERSION — without this, ROCm ignores the GPU entirely
  • OLLAMA_HOST=0.0.0.0 — binds to all interfaces so your laptop can reach it over LAN
  • OLLAMA_NUM_PARALLEL=1 — one request at a time; prevents VRAM fragmentation with a single user
  • OLLAMA_KEEP_ALIVE=60m — keeps model in VRAM between requests. Aider sessions are slow with cold loads (default is 5m)
systemctl daemon-reload
systemctl restart ollama
systemctl enable ollama

# Verify it started cleanly and detected the GPU:
journalctl -u ollama -n 50 | grep -i "gpu\|rocm\|error"
Pull the Coding Model and Create a Tuned Variant
ollama pull qwen2.5-coder:14b
# ~9GB download — also consider a smaller fallback:
ollama pull qwen2.5-coder:7b    # ~4.5GB
Tune the model for coding workloads (larger context, lower temperature):
ollama show qwen2.5-coder:14b --modelfile > /root/coder14b.modelfile
Edit the file and ensure these parameters:
PARAMETER num_ctx 16384
PARAMETER num_gpu 99
PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.05
  • num_ctx 16384 — 16K context fits in 12GB VRAM at this quantization, gives Aider room to load multiple files
  • temperature 0.2 — more deterministic, less "creative" with syntax
  • num_gpu 99 — forces all layers onto GPU
ollama create qwen2.5-coder-14b-aider -f /root/coder14b.modelfile
ollama list   # Verify it appears
Verify LAN Access from Your Laptop Before moving to Part 5, test connectivity from your laptop's terminal:
curl http://<LXC-IP>:11434/api/tags
# Should return JSON listing your models

# Also test a real inference call:
curl http://<LXC-IP>:11434/api/generate \
  -d '{"model":"qwen2.5-coder-14b-aider","prompt":"def fibonacci(n):","stream":false}'
Don't proceed to Part 5 until this curl returns a valid response.
Part 5 — Install and Configure Aider on Your Laptop

All commands from here run on your laptop, not the Proxmox server. Aider handles the file edits and Git commits locally — the Proxmox LXC just provides the inference engine over LAN.

Understand Aider's Model Prefix System
PrefixBehavior
ollama/qwen2.5-coder:14bUses raw completions API. Works but misses system prompt formatting.
ollama_chat/qwen2.5-coder:14bUses chat completions API — proper role formatting. Use this one.
Always use the ollama_chat/ prefix for coding models.
Install Aider with pipx Don't use bare pip install on your system Python — use pipx for isolation:
# Install pipx if you don't have it:
sudo apt install pipx       # Ubuntu/Debian
# or: brew install pipx     # macOS

pipx install aider-chat
pipx ensurepath             # Adds ~/.local/bin to PATH

# Verify:
aider --version
Ensure Your Project is a Git Repo Aider uses Git to manage changes. It won't work properly in a plain directory:
cd /path/to/your/project

# If not already a repo:
git init
git add .
git commit -m "initial commit before aider session"
Aider makes a commit after every accepted change, giving you a clean undo trail.
Create a Persistent Config File Instead of typing flags every session, create ~/.aider.conf.yml:
cat > ~/.aider.conf.yml << 'EOF'
# Ollama server on Proxmox LXC
model: ollama_chat/qwen2.5-coder-14b-aider
ollama-api-base: http://<YOUR_LXC_IP>:11434

# Behavior
auto-commits: true
dirty-commits: true
stream: true

# Context
map-tokens: 2048
max-chat-history-tokens: 4096

# UI
pretty: true
EOF
With this file in place, just run aider with no flags.
Launch Aider and Core Modes
cd /your/project
aider
Aider has three interaction modes:
  • Code mode (default) — ask it to make changes, it edits files directly
  • Ask mode/ask <question> asks without editing
  • Architect mode/architect <task> reasons about the approach first, then writes code. Closest to how Claude Code works.
Essential slash commands:
/add <file>      Add a file to the active context (model can edit it)
/drop <file>     Remove a file from context (save tokens)
/ls              List files currently in context
/run <command>   Run a shell command and feed output to the model
/ask <question>  Ask without making changes
/diff            Show all changes made in this session
/undo            Revert the last commit Aider made
/clear           Clear chat history (keeps files in context)
/exit            Exit Aider
💡
Practical workflow: Add only files directly relevant to the task (/add). Review the diff before committing. Use /undo if the change isn't right. Feed test failures directly with /run pytest tests/ -v.
Part 6 — Monitoring and Troubleshooting
Monitor GPU During a Session Keep a second terminal open to your Proxmox LXC:
watch -n 1 rocm-smi
During active generation you should see: GPU utilization 80–100%, VRAM ~10–11GB (for 14B with 16K context), temp 60–80°C — normal for this card.
Aider Connects but Responses are Slow or Garbled
# Verify you're using ollama_chat/ not ollama/
# Check model is staying loaded (not reloading each request):
journalctl -u ollama -f
# You should NOT see "loading model" on every request
# If you do, OLLAMA_KEEP_ALIVE may not have applied:
systemctl show ollama | grep KEEP_ALIVE
"Model not found" Error
# On the server, list available models:
ollama list

# Make sure the exact name matches what's in your config
# Common mistake: config says qwen2.5-coder-14b-aider but you forgot
# to run `ollama create` with the custom modelfile
Aider Makes Edits but Git Commits Fail
git config --global user.email "[email protected]"
git config --global user.name "Your Name"
Context Window Overflow Errors If you see errors about context length:
  • Drop files from context: /drop anything not immediately needed
  • Clear history: /clear (removes conversation but keeps files)
  • Reduce num_ctx in your modelfile to 8192 if 16384 is causing OOM
Expected Performance & Quick Reference

On RX 6700 XT with Qwen2.5-Coder 14B (Q4_K_M) at 16K context:

MetricExpected
Generation speed12–20 tokens/sec
Prompt processing~1000–2000 tokens/sec
VRAM (model loaded, idle)~9.5 GB
VRAM (during generation)~11–11.5 GB
Time to first token1–4 seconds
Cold load time (not in VRAM)8–15 seconds

With OLLAMA_KEEP_ALIVE=60m, cold loads only happen after 60 minutes of inactivity.

Quick Reference

# === On the Proxmox LXC (server) ===
systemctl status ollama
journalctl -u ollama -f
ollama list
rocm-smi
watch -n 1 rocm-smi

# === On your laptop (client) ===
aider                                          # Launch with config file defaults
aider --model ollama_chat/qwen2.5-coder:14b   # Override model

# Inside Aider:
/add <file>       # Add file to context
/drop <file>      # Remove file from context
/run <cmd>        # Run shell command, feed output to model
/ask <question>   # Ask without editing
/architect <task> # Reason first, then edit (most powerful mode)
/undo             # Revert last AI commit
/diff             # Show all changes this session
/clear            # Reset chat history