Quick Reference

Home
Alpine Linux Boot Process
libinput on Alpine Linux
who on Alpine
ALSA Config
Android Hacking
Android Load apkx File
Artix Linux Install
Audio Stream Fu
Audio Recording (simple)
AWK
BASH quick ref
Benchmark with fio
BlackBerry as a Modem
BlackBerry Video Encoding
Building libcurl on Windows with VS
Building QEMU Images for Alpine Linux
for use with libvirt and terraform
Building TI MSP430 GCC
Certificate and Key Handling
Chart Drawing
Currency Converter
Custom Bootable Alpine Linux
Custom Alpine Kernel
Colourspace Conversion
ComfyUI on Fedora
Compile FMS from Source
Courier IMAP Server
Datasheets and Manuals
Decoding rtp Packets
Deduplication of Files
Deleting Old Devices
Delphi 7 Dynamic Linking
Delphi Compiler Warnings
Desktop Capture with FFMPEG
X11
Desktop Capture in Sway
Diffusers
(quick start)
Dovecot IMAP
Dovecot Replication
Docker HowTo (Simple Setup)
Remove Untagged Docker Images
Docker Xvfb Alpine Howto
Docker Storage
DomainKey Identified Mail with Exim
DRBD Over OpenVPN
Eclipse and Jetty
Eclipse Themes
Edge Detection
Electronics
Email, Bleeding Email
Emailing
ESP8266 Get Started
ESP8266 OpenSDK
Exchange 2010 and Lighttpd
Exim Process and Forward Incoming Mail
Exim with Dovecot
FFmpeg Webcam and v4l2 stuff
flickr image from file name
Force Video Output
KMS/FB and Weston
Fonts
Freenet Routing Fu
Flushing Filesystem Buffers
GDB in Windows
Gentoo eudev and udev
Gentoo omxplayer
Gentoo on Raspberry Pi
Gentoo initramfs
Gentoo Java Problems
Gentoo Stage4
Gentoo Update (portage)
Gentoo VirtualBox Install
GitLab Runner
git Quick Reference
git Workflow
GNUplot Curve Fitting
GNUplot Heart
GNUplotting
Quick Reference
GnuPG Simple
Hen*Plus
ImageMagick
Quick Reference
InspIRCd on Gentoo
Install Gentoo with Hyper-V
iproute2
iptables
iptables bwmon
IPv6 in IPv6 Tunnels STUB
IPv6 Setups STUB
ircclient
irssi
Java Anti-aliasing
Jetty JSP 9.2/9.3 Error
Jetty JSP 9.4 Error
LDAP Searching
and binding on Linux
llama.cpp on AMD AI Max+ 395
Logging in Java
LVM Mount on Gentoo
LVM Howto Rev 2
Video File Montage Creation
MPlayer play section of file
Install macOS X on Mac Pro
Maven for Java Development
MPlayer v4l Snapshots
mpv Quick Reference
MSP430 I2C
msys2 Home Modification
mtu Sizes
Multi Homed Hosts and ARP
Multiple RDP Sessions
NBD Connections
Get that block device on yer network
nginx and cgit
Oneliners
Oukitel C8 Hacking
GNU Parted Simple tutorial
PinePhone Modem
PinePhone Sleeping
Piping for msmtp

PixivUtil2 Install
powersave.sh
Powerdown with hdparm
qemu-img Quick Reference
qemu PCI Passthrough
qemu USB Passthrough
Quick VMs with QEMU
Quick VMs with QEMU UEFI
Reset Password in Windows
Routing Traffic in Windows
rpcontinued.xpi for Pale Moon
Screen Sharing with Sway
sed and grep
sfdisk howto
Shell and File Descriptors
Spring Security Upgrade from 3 to 4
Strong Host Model for Linux
SQLite3 Compile DLL with VS2010
SSL and Lighttpd
std::string to std::wstring and back
Sway IPC with swaymsg
Terminal Codes and socat
Traffic Control (tc)
In-depth with background and quirky routing and networking configuration
Traffic Control Again
Simpler method using ifb
UEFI Booting from Shell
Update Own Number with AT Commands
Videos, Interesting
VirtualBox to QEMU
Quick start guide
Visual Studio Reminders
whatsmeow on Alpine in Docker
Word and VBA Header Protection
wpa_cli
X11 TrackMan Marble
X11 Programming

llama.cpp on Strix Halo

`llama.cpp` on AMD Ryzen AI Max+ 395 w/Radeon 8060S

Configure the iGPU memory in Advanced to be Auto and iGPU Memory Size to be 0.5GB. This will allow the ROCm software to manage the memory split.

Using Artix Linux with OpenRC with the rocm-hip-sdk installed.

pacman -S rocm-hip-sdk

Get llama.cpp from github:

git clone --depth=1 https://github.com/ggml-org/llama.cpp

Build with cmake:

cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j$(nproc)

Listing the ROCm devices:

# llama-cli --list-devices
Available devices:
  ROCm0: AMD Radeon 8060S Graphics (64042 MiB, 77959 MiB free)

Run with a model:

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 build/bin/llama-server --host 0.0.0.0 \
    --port 8080 \
    --flash-attn on \
    --cache-prompt \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --gpu-layers 99 \
    --ctx-size 32768 \
    --mmproj ../models/Huihui-Qwen3.6-35B-A3B-abliterated-mmproj-BF16.gguf \
    --model ../models/Huihui-Qwen3.6-35B-A3B-abliterated-Q8_0.gguf

Running Ministral-3-8B-Reasoning-2512-Q8_0 produces:

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 build/bin/llama-cli --jinja --gpu-layers 99 \
    --ctx-size 32768 --model ../models/Ministral-3-8B-Reasoning-2512-Q8_0.gguf
[ Prompt: 836.3 t/s | Generation: 24.5 t/s ]

Pretty quick!

Running `llama.cpp` with Radeon RX 9070 XT (`gfx1201`)

This section will focus on compiling llama.cpp for a system that contains a AMD Ryzen 9 9950X. Which has an on-die GPU (gfx1036) and the discrete GPU, gfx1201.

cmake -S . -B build -DGGML_HIP=ON -DGGML_RPC=ON -DAMDGPU_TARGETS=gfx1201,gfx1036 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j$(nproc)

The process works much better on the RX 9070 XT (as expected). To see the list of devices that can run the model:

build/bin/llama-cli --list-devices

This should display something like:

Available devices:
  ROCm0: AMD Radeon RX 9070 (16304 MiB, 15770 MiB free)
  ROCm1: AMD Radeon Graphics (15617 MiB, 30068 MiB free)

The machine here has 32GiB of system RAM... strange numbers but whatever!

Running the Ministral-3-8B-Reasoning-2512-Q8_0.gguf model like this:

llama-cli --device ROCm0 --n-gpu-layers 99 --ctx-size 32768 \
    --model Ministral-3-8B-Reasoning-2512-Q8_0.gguf

Produces a quick response:

[ Prompt: 1416.6 t/s | Generation: 61.4 t/s ]

Running a model with the `llama.cpp` RPC Server

Add the compile time switch:

-DGGML_RPC=ON

Then see the page in References for more detail ;-)

TODO: Add more detail here

References

https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md

Running an MCP Server for File System Access

On Artix Linux and using podman.

pacman -S podman crun

Update /etc/containers/registries.conf so that docker.io is searched when an unqualified image is present in Dockerfile or specified on the command line.

unqualified-search-registries = ["docker.io"]

TODO: I should prefer my own solution than this random implementation. openaiclient

References

Last Updated 2026-05-06
2026-05-20

Quick Links: Techie Stuff | General | Personal | Quick Reference

Quick Reference

llama.cpp on AMD Ryzen AI Max+ 395 w/Radeon 8060S

Running llama.cpp with Radeon RX 9070 XT (gfx1201)

Running a model with the llama.cpp RPC Server

References

Running an MCP Server for File System Access

References

`llama.cpp` on AMD Ryzen AI Max+ 395 w/Radeon 8060S

Running `llama.cpp` with Radeon RX 9070 XT (`gfx1201`)

Running a model with the `llama.cpp` RPC Server