Crown Citadel Performance Lab

Ciru Inference Lab

NPU Bench

Strix Halo local inference benchmarks: throughput, memory, serving, tuning, and quality.

Updated NixOS • Linux 7.0.1 RADV STRIX_HALO Open quality lab View models

Articles

Articles about running AI locally: hardware, settings, model behavior, tooling, and lessons from the lab.

Local AI Works When the Stack Is Built for It

A tuned Crown Citadel local stack finished a messy invoice sandbox in 5:49, while an Ollama + OpenClaw run stalled at 27:18 and ran out of context.

Strix Halo Crown Citadel Local AI

Read article

Your Local Model Settings Matter More Than You Think

How tuned llama.cpp settings changed prompt processing and generation speed on the same Strix Halo machine, with paired before/after bars and percent lifts.

Strix Halo llama.cpp Tuning

Read article

UMA Memory / Speed Frontier

Peak effective UMA memory on x, standalone decode on y, and selected context speed as color.

Context

Prompt / Prefill vs Decode

Context-side prompt processing on x and standalone decode on y, grouped by model family.

Context Curves

Only models with at least four measured context points and one 64k+ result are drawn, so the lines represent real scaling behavior rather than one-off tests.

Benchmark Atlas

Full benchmark matrix. Context columns show prompt-processing or combined prompt+generation throughput, while standalone decode is kept separate in the TG column.

Search Metric Sort Family Show

Profile	Backend	KV / Shape	4k	8k	16k	32k	64k	80k	100k	128k	TG	Memory	Measurements

MTP Serving Lab

Speculative serving sweeps: draft settings, acceptance rate, request time, decode speed, and memory deltas.

Run	Config	Total	Prompt	Decode	Acceptance	Memory

Serving Request Runs

End-to-end local serving measurements: first-token delay, total request time, prompt size, generated output, and sampled memory.

Run	First token	Total	Prompt	Output	Memory

Server Tuning Takeaways

Short conclusions from focused real-prompt tuning runs. This is not a launch-command reference; it only shows which tested direction is worth using.

Scenario	Use	Evidence	Caveat

Quality Suite Leaderboard

Completed coding and repair rows from the quality benchmark store. Suites require real task coverage before they can appear here; failed or incomplete rows stay out of the chart.

Open lab details

Profile	Test	Score	Evidence	Run

HermesAgent-20 Sweep

Official 20-scenario Hermes agent behavior runs. Score is verifier average across memory, workspace, skills, scheduling, delegation, recovery, and approval-boundary scenarios.

Quality context

Race Gallery

Each card explains what the race tests before you open it. These are focused head-to-head pages for speculative decoding, long-output behavior, task correctness, run health, and memory pressure.

Method Notes

How to read the dashboard without knowing the benchmark harness internals.