| Mode | What it exercises | Typical time | Picks up failures in… |
|---|---|---|---|
| Rust tests | Engine, DSP, sync dispatch, state | seconds | Audio rendering, scheduling, undo, state mutations |
| WebSocket tap | Full sync engine + headless backend, no React | hundreds of ms | Command/event round-trips, state propagation, persistence |
| Vitest | React components in isolation | milliseconds per test | Component logic, Zustand store wiring, hook behavior |
| agent-browser e2e | Real UI in Chrome talking to a real headless backend | tens of seconds | Layout, drag-and-drop, pointer events, full-flow regressions |
Picking the right mode
./utils/validateis the default. It runs Rust tests, Vitest, ESLint, andtsc --noEmit. For most changes, that’s the whole bar.- agent-browser is opt-in. Don’t spin up the headless harness unless the user explicitly asked for browser verification, or the change is one that can only be caught visually (a layout regression, a drag-and-drop gesture, a focus trap).
- Prefer the layer where the bug would actually live. A bug in the clip scheduler is a Rust test; a bug in dispatch wiring is a WebSocket tap; a bug in how a knob renders is Vitest; a bug in cross-panel state flow is agent-browser.
- Determinism beats realism. Rust tests on synthesized audio beat ear-tests on real audio; a WebSocket tap that asserts on specific events beats a browser snapshot you’d have to eyeball.
1. Rust tests
The Rust workspace is exhaustively unit-tested. Audio code is verified with deterministic output checks — RMS, peak amplitude, zero-crossing frequency estimation, spectral energy ratios, golden-output fingerprints — so engine work doesn’t need a sound card.- Tests live in sibling
tests.rsfiles, never inline#[cfg(test)] mod tests { … }blocks. The parent file declares#[cfg(test)] mod tests;. - For DSP atoms/molecules, use the golden-file pattern documented in the
dsp-golden-testingskill. - For end-to-end engine flows (load session → play → render → assert),
see
rust/crates/engine/songbird-engine/tests/and theaudio_pipeline_testexample.
When this is enough
- The change is pure backend (engine, DSP, state, dispatch handler logic).
- The change is observable in the engine’s output samples, emitted events,
or
StateStoremutations.
When this isn’t enough
- The change moves data across the Tauri / WebSocket boundary in a way Rust tests don’t exercise (use mode 2).
- The change affects how the React UI renders or behaves (use mode 3 or 4).
2. WebSocket tap (spoofing sync engine commands)
The headless server speaks the same WebSocket protocol the Tauri React UI uses. A test client that opens a WebSocket connection tows://localhost:<port> and sends framed JSON gets to drive the entire
backend (sync engine, state, audio engine, plugin host) without spinning
up a browser. This is the fastest way to assert on cross-boundary behavior.
Wire protocol
The protocol is documented inrust/crates/app/songbird-headless/src/main.rs.
Short form:
Invoke a command (request/response):
name values are the same "channel.action" strings the React UI uses
(transport.play, mixer.volume, clip.add, …). event values match
the channel event names (transport:state, mixer:audio_clip_peaks, …).
Example: a Node.js tap
When this is the right tool
- You need to assert on events, not pixels.
- You’re verifying a multi-step backend flow (dispatch → state mutation → event emission → second dispatch).
- You want test runs that take ~1 s, not ~30 s.
- You’re exercising MCP-style scenarios (everything reachable via
dispatch_commandis reachable here too).
When this isn’t enough
- The bug is in how React subscribes to or renders the events (use mode 3 or 4).
- The bug only manifests with real user input timing (drag latency, pointer-event coalescing) — use mode 4.
Engine behavior under tests
By default./utils/build/agent-headless.sh runs the engine on a virtual
audio clock (--virtual-audio), so transport advances, the clip scheduler
fires, and meters emit — without opening a real device. This means a
WebSocket tap can assert on transport:position movement, level meters,
clip triggers, etc. Pass --no-audio if you want the engine quiet, or
--real-audio if you genuinely need cpal.
3. Vitest (React component tests)
Component-level React tests live next to the component (Foo.test.ts
next to Foo.tsx). They use Vitest + Testing Library.
- Mock the sync engine surface (
@/sync/api) — don’t spin up a real backend. Vitest is for component logic, not integration. - Mock Zustand selectors with
useFooStore.setState({ … })inbeforeEach. - Don’t write a Vitest test that exercises real WebSocket / Tauri behavior — that’s mode 2 or mode 4.
When this is the right tool
- A component renders the wrong thing for a given prop combination.
- A custom hook computes the wrong value.
- A Zustand selector triggers the wrong re-renders.
When this isn’t enough
- The bug is in how the backend produces the data the component renders (use mode 1 or 2).
- The bug only appears when multiple components interact, or when real layout / pointer events are in play (use mode 4).
4. agent-browser (end-to-end)
Drives real Chrome via CDP against a real headless backend. Slow, context-heavy, but the only mode that catches layout regressions, drag-and-drop bugs, and full-stack flow problems./.claude/skills/e2e-headless/SKILL.md.
Browser-driver reference:
/.claude/skills/agent-browser/SKILL.md.
Per-worktree ports
agent-headless.sh derives a stable port offset per worktree so multiple
worktrees never collide: hash($PWD) mod 89 + 10, giving each worktree
its own deterministic slot in [10, 98].
Offsets 0–9 are reserved for explicit manual use (launch.sh, dev). On
collision (~22% odds with 7+ active worktrees), the script fails fast on a
port-already-bound check; pass --port-offset N to override.
When this is the right tool
- The user explicitly asked for browser verification.
- The change crosses multiple panels / organisms.
- The change involves pointer-event drag, focus management, or layout.
When this is too much
- The behavior is observable via mode 1, 2, or 3. Use those first — they’re faster, more reliable, and don’t burn agent context on screenshots.
Watching live
Start the agent-browser dashboard before driving the UI:Choosing in practice — a worked example
You’re fixing a bug where audio clips don’t trigger when transport starts before the clip’s onset. Here’s how the modes stack up:- Mode 1 (Rust tests). Yes — write a test in
songbird-enginethat builds a session with a clip at bar 3, callstransport.play()from bar 1, drivesprocess()for enough frames to reach bar 3, and asserts the clip emitted audio. This is where the bug lives, so this is where the regression test goes. Fast, deterministic, catches the regression. - Mode 2 (WebSocket tap). Optional — could write a tap that does the
same flow over WS and asserts on
transport:position+ a meter event. But the bug isn’t on the boundary, so this would just be a second copy of the Rust test with more moving parts. Skip. - Mode 3 (Vitest). No — the React side renders whatever the engine emits; if the engine emits the wrong samples, no component test catches it. Skip.
- Mode 4 (agent-browser). Only if you also changed the UI’s transport controls or playhead rendering. For an engine-only fix, this would just be a slow screenshot of a working DAW. Skip unless the user asked.