process() block misses its
deadline. The audio callback has a hard budget — one block, ~5.3 ms at
256 samples / 48 kHz — and Hard Rule #6 (see CLAUDE.md) forbids
anything on that thread that could blow it: allocation, Mutex::lock,
RwLock, syscalls, file I/O, plugin instantiation, await. Break the
rule and the user hears a click, or worse, silence.
This page covers the four tools that catch Hard Rule #6 violations,
each cutting the problem at a different point. Pick the cheapest one
that can fail in a way you’d care about.
| Tool | Catches | How it fires | Cost / availability |
|---|---|---|---|
| RtCheckAllocator | Any heap alloc on the audio thread | Aborts + prints a backtrace, instantly at the offending call | Debug builds only; zero in release |
| rt-watchdog | A block that blocks or hangs (mutex, syscall, runaway DSP, deadlock) | Warns + dumps the stuck thread’s stack | Debug builds, Unix only; zero in release |
| Wall-time histograms | Cumulative cost / “every block is at 90 %“ | You drain + read a bucket distribution | Behind --features wall-time-histograms |
| Dropout counter | ”Did any block overrun?” (no location) | Bumps an atomic + fires a UI event | Always on, all builds |
Which one?
./utils/build-rs stderr — not
the browser console). The allocator and the watchdog both print there.
RtCheckAllocator — allocations
What it is. A#[global_allocator] that wraps the system allocator
and aborts the process the instant any allocation, deallocation, or
reallocation happens while the audio thread is inside process().
Allocation is never legitimate on the audio thread, so the response is
maximal: stop the world at the exact call.
rust/crates/engine/songbird-engine/src/rt_alloc_check.rs
How it works. process() enters an AudioThreadGuard at the top
(callback_state.rs), which bumps a per-thread depth counter. The
allocator checks that counter on every alloc; non-zero ⇒ violation. On a
violation it writes a fixed message to stderr (allocation-free), then
disarms the guard, captures a Backtrace, prints it, and aborts:
callback_state.rs:NNNN frame tells you
which process() step owns it. Fix by moving the allocation off the
audio thread — pre-allocate a buffer, or push the work to a worker and
hand the result across via SPSC.
Cost & gating. The guard and the allocator install are both
#[cfg(debug_assertions)], so release builds compile it out entirely
and never abort. When armed, each alloc pays one TLS load + one branch
(~1–2 ns) — invisible against real allocator cost. The violation counter
AUDIO_THREAD_ALLOCS is readable for post-mortems.
rt-watchdog — blocking & expensive blocks
What it is. The time-domain sibling of the allocator. Allocations are instant and caught at the call; blocking (a contendedMutex::lock, a
read() syscall, an accidentally-O(N) loop, or a flat-out deadlock)
doesn’t allocate, so the allocator can’t see it. The watchdog catches a
block by its wall-clock cost and shows you exactly where the audio
thread was stuck.
rust/crates/engine/songbird-engine/src/rt_watchdog.rs
How it works.
- At the top of
process()the audio thread publishes a per-block deadline =OVERRUN_FACTOR × block_budget(default 3×) viaRtWatchdogGuard— lock-free atomic stores only, audio-thread-safe. - A dedicated watchdog thread polls. If a block is still in flight past
its deadline, it sends
SIGUSR2to the audio thread. - The signal handler runs on the audio thread and walks its own stack (async-signal-safe: no alloc, no lock — it only records instruction pointers). Symbolization happens back on the watchdog thread.
- The watchdog logs the stalled stack.
callback_state.rs:NNNN frame names the step.
Why warn, not abort. Unlike an allocation, a long block is sometimes
legitimately transient — the first block, plugin warm-up, a session
hot-swap. So the watchdog logs loudly (rate-limited to ~1/s), it does
not abort. The 3× default is generous enough that only a genuinely
pathological block trips it.
Workflow.
[rt-watchdog] line in the Rust
terminal, fix the named call site, re-run until it stops printing.
Limits. It won’t help with glitches that aren’t a single long block
(OS-level underruns), and on Linux the audio thread is SCHED_FIFO, so a
single-core busy-loop hang could starve the normal-priority watchdog
(fine on multicore and for sleeping blocks). For the “every block is a
little too slow” case, reach for the histograms instead.
Wall-time histograms — death by a thousand cuts
What it is. Lock-free per-block timing recorded into power-of-two µs buckets. Where the watchdog catches one pathological spike, the histogram reveals a distribution — useful when there’s no single offender but the steady-state cost is creeping toward the budget.rust/crates/engine/songbird-engine/src/wall_time_histogram.rs
What’s recorded. Two global histograms, behind the feature flag:
WRAP_BLOCK_HISTOGRAM— the loop-wrap branch ofprocess()(does the loop-aware swap path stay cheap on heavy projects, or is the standby fallback firing?).FFT_BATCH_HISTOGRAM— the Signalsmith / WSOLA stretch batch (is the stretcher CPU the bottleneck vs. disk I/O?).
record() is ~30 ns (one elapsed, one leading_zeros, one
relaxed fetch_add) and tracks count, per-bucket distribution, and the
worst-case max_us since the last drain.
How to use it. Recording is off by default (zero cost otherwise).
Turn it on, then read the log file:
.profiling/wall-time.log (gitignored).
Reproduce the suspect path, then read the file — no UI or sync round-trip,
and it’s equally easy for a human to skim or an agent to parse:
max_us near the block budget,
~5300 µs at 256/48 k) is your signal that the steady-state cost — not a
one-off spike — is the dropout source. Idle windows (no samples) are
skipped, so the file only grows while something is being measured.
The reporter is started from main (a no-op unless the feature is
compiled in); the underlying API is WallTimeHistogram::snapshot_and_reset()
if you need to drain the histograms from somewhere else.
The dropout counter — the always-on signal
The cheapest detector runs in every build. After the DSP work,process() compares elapsed time to the block budget; on overrun it
bumps snapshot.dropout_count and fires a rate-limited Dropout engine
event for the UI.
rust/crates/engine/songbird-engine/src/controller/callback_state.rs
(search for dropout_count)
This is a yes/no, how-often signal — it confirms a dropout happened
and surfaces it in the UI, but tells you nothing about where. Treat it
as the smoke alarm: once it goes off, switch to the watchdog (for a
single spike or a hang) or the histograms (for a creeping distribution)
to find the fire. Note it can’t fire at all for a true hang — process()
never returns to reach the check — which is precisely the gap the
rt-watchdog fills.
See also
handoffs/perf/dropout_sources.md— audited symptom→cause map of real dropout sources withfile:lineevidence. Start here when a glitch correlates with a specific action.- Profiling —
samply/dhat/ Instruments for the release-build “where’s the CPU/RSS going” question. - Hard Rule #6 in
CLAUDE.md— the constraint all four tools enforce.