Tracking Dropouts

A dropout is what the user hears when one process() block misses its deadline. The audio callback has a hard budget — one block, ~5.3 ms at 256 samples / 48 kHz — and Hard Rule #6 (see CLAUDE.md) forbids anything on that thread that could blow it: allocation, Mutex::lock, RwLock, syscalls, file I/O, plugin instantiation, await. Break the rule and the user hears a click, or worse, silence. This page covers the four tools that catch Hard Rule #6 violations, each cutting the problem at a different point. Pick the cheapest one that can fail in a way you’d care about.

Tool	Catches	How it fires	Cost / availability
RtCheckAllocator	Any heap alloc on the audio thread	Aborts + prints a backtrace, instantly at the offending call	Debug builds only; zero in release
rt-watchdog	A block that blocks or hangs (mutex, syscall, runaway DSP, deadlock)	Warns + dumps the stuck thread’s stack	Debug builds, Unix only; zero in release
Wall-time histograms	Cumulative cost / “every block is at 90 %“	You drain + read a bucket distribution	Behind `--features wall-time-histograms`
Dropout counter	”Did any block overrun?” (no location)	Bumps an atomic + fires a UI event	Always on, all builds

Which one?

What's the symptom?
├─ App aborted with "AUDIO-THREAD ALLOCATION DETECTED"
│     → RtCheckAllocator already caught it. Read the backtrace it printed.
│
├─ Click coincides with a specific action (open plugin, load sample, switch device)
│     → rt-watchdog. The action triggers a blocking call; the watchdog
│       names it. (See the symptom map in handoffs/perf/dropout_sources.md.)
│
├─ Engine froze / transport stopped advancing and never recovered
│     → rt-watchdog. A true hang never returns, so only the watchdog can
│       tell you where the thread is parked.
│
├─ Constant low-level crackle, no single obvious cause
│     → Wall-time histograms. Likely many blocks near budget, not one
│       spike. The histogram shows the distribution.
│
└─ "Is this even a dropout?" / want a yes/no signal in the UI
      → Dropout counter (snapshot.dropout_count + the Dropout event).

When you only know “something glitches,” start the app, reproduce, and glance at the Rust terminal (the ./utils/build-rs stderr — not the browser console). The allocator and the watchdog both print there.

RtCheckAllocator — allocations

What it is. A #[global_allocator] that wraps the system allocator and aborts the process the instant any allocation, deallocation, or reallocation happens while the audio thread is inside process(). Allocation is never legitimate on the audio thread, so the response is maximal: stop the world at the exact call. rust/crates/engine/songbird-engine/src/rt_alloc_check.rs How it works. process() enters an AudioThreadGuard at the top (callback_state.rs), which bumps a per-thread depth counter. The allocator checks that counter on every alloc; non-zero ⇒ violation. On a violation it writes a fixed message to stderr (allocation-free), then disarms the guard, captures a Backtrace, prints it, and aborts:

[RtCheckAllocator] AUDIO-THREAD ALLOCATION DETECTED - Hard Rule #6 violation. Aborting.

   0: songbird_engine::rt_alloc_check::on_violation
   1: <RtCheckAllocator as GlobalAlloc>::alloc
   2: alloc::vec::Vec<T>::reserve                  ← what allocated
   3: songbird_engine::controller::callback_state::CallbackState::check_drum_pad_loads
          at .../controller/callback_state.rs:1534
   4: songbird_engine::controller::callback_state::CallbackState::process
   ...

How to read it. Skip frames 0–1 (the allocator itself). Frame 2 is the allocating call; the first callback_state.rs:NNNN frame tells you which process() step owns it. Fix by moving the allocation off the audio thread — pre-allocate a buffer, or push the work to a worker and hand the result across via SPSC. Cost & gating. The guard and the allocator install are both #[cfg(debug_assertions)], so release builds compile it out entirely and never abort. When armed, each alloc pays one TLS load + one branch (~1–2 ns) — invisible against real allocator cost. The violation counter AUDIO_THREAD_ALLOCS is readable for post-mortems.

rt-watchdog — blocking & expensive blocks

What it is. The time-domain sibling of the allocator. Allocations are instant and caught at the call; blocking (a contended Mutex::lock, a read() syscall, an accidentally-O(N) loop, or a flat-out deadlock) doesn’t allocate, so the allocator can’t see it. The watchdog catches a block by its wall-clock cost and shows you exactly where the audio thread was stuck. rust/crates/engine/songbird-engine/src/rt_watchdog.rs How it works.

At the top of process() the audio thread publishes a per-block deadline = OVERRUN_FACTOR × block_budget (default 3×) via RtWatchdogGuard — lock-free atomic stores only, audio-thread-safe.
A dedicated watchdog thread polls. If a block is still in flight past its deadline, it sends SIGUSR2 to the audio thread.
The signal handler runs on the audio thread and walks its own stack (async-signal-safe: no alloc, no lock — it only records instruction pointers). Symbolization happens back on the watchdog thread.
The watchdog logs the stalled stack.

Signalling a thread to sample its own stack is the technique sampling profilers and crash handlers use — it’s the only way to get a backtrace of the stuck thread from another thread.

[rt-watchdog] AUDIO THREAD STALLED ~142.3ms — Hard Rule #6 (blocking/expensive work on the audio thread). Stack of the stuck thread:
   0: songbird_engine::rt_watchdog::imp::handle_sigusr2
   1: <signal trampoline>
   2: parking_lot::raw_mutex::RawMutex::lock_slow     ← the culprit
          at .../parking_lot/src/raw_mutex.rs:230
   3: ...::CallbackState::check_session_swap
          at .../controller/callback_state.rs:940
   4: ...::CallbackState::process
   ...

How to read it. Same as the allocator: skip frames 0–1 (handler + trampoline), the first frame below is where the thread was the instant it was sampled. The callback_state.rs:NNNN frame names the step. Why warn, not abort. Unlike an allocation, a long block is sometimes legitimately transient — the first block, plugin warm-up, a session hot-swap. So the watchdog logs loudly (rate-limited to ~1/s), it does not abort. The 3× default is generous enough that only a genuinely pathological block trips it. Workflow.

./utils/build-rs                        # armed automatically in dev
SONGBIRD_RT_WATCHDOG_FACTOR=1 ./utils/build-rs   # fire on ANY over-budget block (noisier)
SONGBIRD_RT_WATCHDOG=off ./utils/build-rs        # silence it

Reproduce the glitch, read the [rt-watchdog] line in the Rust terminal, fix the named call site, re-run until it stops printing. Limits. It won’t help with glitches that aren’t a single long block (OS-level underruns), and on Linux the audio thread is SCHED_FIFO, so a single-core busy-loop hang could starve the normal-priority watchdog (fine on multicore and for sleeping blocks). For the “every block is a little too slow” case, reach for the histograms instead.

Wall-time histograms — death by a thousand cuts

What it is. Lock-free per-block timing recorded into power-of-two µs buckets. Where the watchdog catches one pathological spike, the histogram reveals a distribution — useful when there’s no single offender but the steady-state cost is creeping toward the budget. rust/crates/engine/songbird-engine/src/wall_time_histogram.rs What’s recorded. Two global histograms, behind the feature flag:

WRAP_BLOCK_HISTOGRAM — the loop-wrap branch of process() (does the loop-aware swap path stay cheap on heavy projects, or is the standby fallback firing?).
FFT_BATCH_HISTOGRAM — the Signalsmith / WSOLA stretch batch (is the stretcher CPU the bottleneck vs. disk I/O?).

Each record() is ~30 ns (one elapsed, one leading_zeros, one relaxed fetch_add) and tracks count, per-bucket distribution, and the worst-case max_us since the last drain. How to use it. Recording is off by default (zero cost otherwise). Turn it on, then read the log file:

./utils/build-rs --wall-time     # one-shot; or toggle "7. wall-time" in ./utils/build-rs -o

When enabled, a background reporter drains the histograms every ~2 s and appends a readable block to .profiling/wall-time.log (gitignored). Reproduce the suspect path, then read the file — no UI or sync round-trip, and it’s equally easy for a human to skim or an agent to parse:

── wall-time @ unix=1748452800 +6.0s ──
  wrap-block: count=512 max=2100us
    256-512us           410  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
    512-1024us           78  ▇▇▇▇
    1024-2048us          22  ▇
    2048-4096us           2  ▇
  fft-batch: no samples

A heavy tail in the high buckets (or a max_us near the block budget, ~5300 µs at 256/48 k) is your signal that the steady-state cost — not a one-off spike — is the dropout source. Idle windows (no samples) are skipped, so the file only grows while something is being measured. The reporter is started from main (a no-op unless the feature is compiled in); the underlying API is WallTimeHistogram::snapshot_and_reset() if you need to drain the histograms from somewhere else.

The dropout counter — the always-on signal

The cheapest detector runs in every build. After the DSP work, process() compares elapsed time to the block budget; on overrun it bumps snapshot.dropout_count and fires a rate-limited Dropout engine event for the UI. rust/crates/engine/songbird-engine/src/controller/callback_state.rs (search for dropout_count) This is a yes/no, how-often signal — it confirms a dropout happened and surfaces it in the UI, but tells you nothing about where. Treat it as the smoke alarm: once it goes off, switch to the watchdog (for a single spike or a hang) or the histograms (for a creeping distribution) to find the fire. Note it can’t fire at all for a true hang — process() never returns to reach the check — which is precisely the gap the rt-watchdog fills.

​Which one?

​RtCheckAllocator — allocations

​rt-watchdog — blocking & expensive blocks

​Wall-time histograms — death by a thousand cuts

​The dropout counter — the always-on signal

​See also

Which one?

RtCheckAllocator — allocations

rt-watchdog — blocking & expensive blocks

Wall-time histograms — death by a thousand cuts

The dropout counter — the always-on signal

See also