Crown Citadel Performance Lab

Ciru Inference Lab

Strix Halo local inference benchmarks: throughput, memory, serving, tuning, and quality.

Updated NixOS • Linux 7.0.1 RADV STRIX_HALO Open quality lab View models

Articles

Articles about running AI locally: hardware, settings, model behavior, tooling, and lessons from the lab.

Local AI Works When the Stack Is Built for It

A tuned Crown Citadel local stack finished a messy invoice sandbox in 5:49, while an Ollama + OpenClaw run stalled at 27:18 and ran out of context.

Read article

Your Local Model Settings Matter More Than You Think

How tuned llama.cpp settings changed prompt processing and generation speed on the same Strix Halo machine, with paired before/after bars and percent lifts.

Read article

UMA Memory / Speed Frontier

Peak effective UMA memory on x, standalone decode on y, and selected context speed as color.

Prompt / Prefill vs Decode

Context-side prompt processing on x and standalone decode on y, grouped by model family.

Context Curves

Only models with at least four measured context points and one 64k+ result are drawn, so the lines represent real scaling behavior rather than one-off tests.

Benchmark Atlas

Full benchmark matrix. Context columns show prompt-processing or combined prompt+generation throughput, while standalone decode is kept separate in the TG column.

Profile Backend KV / Shape 4k 8k 16k 32k 64k 80k 100k 128k TG Memory Measurements

MTP Serving Lab

Speculative serving sweeps: draft settings, acceptance rate, request time, decode speed, and memory deltas.

Run Config Total Prompt Decode Acceptance Memory

Serving Request Runs

End-to-end local serving measurements: first-token delay, total request time, prompt size, generated output, and sampled memory.

Run First token Total Prompt Output Memory

Server Tuning Takeaways

Short conclusions from focused real-prompt tuning runs. This is not a launch-command reference; it only shows which tested direction is worth using.

Scenario Use Evidence Caveat

Quality Suite Leaderboard

Completed coding and repair rows from the quality benchmark store. Suites require real task coverage before they can appear here; failed or incomplete rows stay out of the chart.

Open lab details
Profile Test Score Evidence Run

HermesAgent-20 Sweep

Official 20-scenario Hermes agent behavior runs. Score is verifier average across memory, workspace, skills, scheduling, delegation, recovery, and approval-boundary scenarios.

Quality context

Race Gallery

Each card explains what the race tests before you open it. These are focused head-to-head pages for speculative decoding, long-output behavior, task correctness, run health, and memory pressure.

Method Notes

How to read the dashboard without knowing the benchmark harness internals.