Node Graph Extension Plan
Living document — phases land incrementally. This top section is the at-a-glance status; phase sections below have the details. Scope: extend the existing audio graph with texture (visual) signals, a control-rate tier for off-audio-thread compute, and protocol-adapter nodes (OSC, etc.) so that visualizers, Veo video, image-conditioned ML generation, and external control surfaces all live in one model. Implementation strategy: incremental — ship one port type and one use case at a time, not a big-bang rewrite.
Phase status
| # | Phase | Status | Notes |
|---|---|---|---|
| 1 | Texture port + Veo viewer | ✅ shipped | VisualSlice + visual channel; Veo creates a VideoSource node |
wired to ViewportSink; framerate-aware drift controller; FE-side | |||
| metadata probing (no hardcoded duration/fps). | |||
| 2 | Control-rate scheduler | ✅ shipped | ControlRateScheduler on DispatchContext; Veo LRO migrated; |
real ml.cancel_veo (cancel-by-nonce). No fixed tick rate — nodes | |||
| self-pace. | |||
| 3 | Visualizer-as-node | ✅ partial | Kind + input-source picks persist via VisualizerSource node. |
Per-kind detailed settings + true audio_buffer port deferred. | |||
| 4 | Protocols channel (OSC + future) | ✅ scaffolding | One channel hosts all external-input adapters (OSC today; HID / |
| gamepad / WebSocket / sensors as future adapter variants). UDP | |||
listener runs as a control-rate task; protocols:message events | |||
| end-to-end. Routing UI deferred. | |||
| 5 | three.js compositor | ✅ shipped | Single shared <GpuCanvas> drives both visualizer and video modes. |
Video frames feed THREE.VideoTexture via VideoTextureGL. The | |||
<video> element stays in the DOM at `opacity-0 pointer-events- | |||
noneso the browser still does GPU decode +rVFC`, but the visible | |||
| output comes from the canvas — frame-synced through the GPU | |||
pipeline. (visibility:hidden would stop WebKit from compositing | |||
the layer, which suspends rVFC; opacity-0 keeps it firing.) | |||
| — | Image → ML audio (Lyria/Magenta) | 🚫 out of scope | Architecture preserves room (see “Out of scope” below). |
| — | Visual node editor (Max/PD) | 🚫 out of scope | Graph is real and addressable from code; UI a separate question. |
| — | Distributed / networked nodes | 🚫 out of scope | Songbird is single-process today. |
Motivation
The immediate forcing functions are Veo video clips (need a way to play generated MP4s synced to the song) and visualizers (currently a hardcoded panel with no routing). Looking forward, Lyria/Magenta-style ML models that take image embeddings as input are coming, and the team wants OSC for external control surfaces. Each of these has been treated as a one-off so far. They share enough structure that a common abstraction pays off, and the existing audio graph is already richer than “audio + MIDI” — it just doesn’t span all the signal types we now need.What the graph is today
The Songbird audio engine (rust/crates/engine/songbird-engine) runs a
unified DAG on the audio thread. Nodes wrap a Processor trait
(PluginChainProcessor or SingleProcessor). Connections carry typed
signals between nodes:
| Signal type | How it flows today |
|---|---|
| Audio buffers | Port 0 (main stereo), ports 1+ (sidechain/aux). Multi-channel. |
| MIDI events | Same graph as audio, midi_per_node parallel buffer. |
| Sidechain audio | First-class via SidechainPort + SidechainConnection. |
| Sends/returns | Explicit nodes + Connections between fader and bus inputs. |
| Hardware I/O | OutputRouting::HardwareOutput(n) per node. |
| Modulation | Lfo, EnvelopeFollower, StepSequencer → ModulationRouting → plugin params. Audio-thread, sample-accurate. |
| Automation | Bezier/step/exp curves on plugin params, audio-thread. |
- Texture / visual signals — visualizers read from RT meter buffers
via ad-hoc subscriptions; the Veo player is wired to a global
videoFilePath. - Off-audio-thread compute — Lyria streaming, ML inference, audio separator, time-stretch run around the graph as standalone modules that pre/post-process clip data. They’re not nodes.
- External I/O — OSC, network, gamepad/HID, sensors. Nothing.
- A formal control-rate tier — modulation and automation are control-rate in spirit but run on the audio thread because they’re cheap pure DSP. ML/OSC/decode can’t.
What we add
Two architectural primitives, in this order:1. Texture as a port type
Atexture<2D, RGBA> signal. Same shape as TouchDesigner TOPs: any node
that produces a per-frame texture, any node that consumes one. The Veo
player and the visualizer both become texture-source nodes feeding a
viewport sink (the Video tab and Visualizer tab respectively).
Why this is the right primitive: video and shaders and image
generators all output the same thing — a 2D RGBA buffer per frame.
They differ only in how the pixels are sourced. Downstream
(compositing, blending, sampling, displaying) doesn’t care. So the
abstraction isn’t “video node,” it’s “node with a texture port.”
Sub-distinctions inside texture-producing nodes:
- TextureSource (pure producer): video player, image, generator (noise, gradient), camera feed.
- TextureTransform (consumer + producer): shader, blend, color correct, compositor.
- TextureSink: viewport (Video tab, Visualizer tab), recorder, texture → audio (e.g. spectrogram-as-texture written back).
VeoSource, ViewportSink)
prove the abstraction; subsequent texture nodes (shader, compositor)
should be 50–100 LOC each.
2. A control-rate scheduler tier
A second graph tick rate, running on a worker thread (not the audio thread). Targets:- 30–60 Hz for visual updates and OSC drain.
- On-demand for ML inference (caller schedules a tick when input changes).
- ML inference (Lyria stream wrapper, image→audio gen, embedding extractors)
- Veo decode driver (frame timing computed here, GPU upload happens on the render thread)
- OSC adapter (UDP socket reader)
- Network adapters (WebSocket, etc.)
- Sensor adapters (HID, gyroscope)
- Image processing (texture → embedding)
await, hold locks, do file I/O,
talk to the network. None of that is allowed on the audio thread.
Texture and control-rate are independent
Importantly: the texture port type and the control-rate tier are orthogonal. You can have audio-thread texture work (rare; e.g. an offline render) and you can have control-rate audio work (e.g. ML that emits audio buffers via the existing Lyria ring). We name them together because both are needed for the immediate use cases, but the design keeps them separable.Cross-tier texture access (gesture / camera / vision use cases)
A class of future nodes needs to consume textures from the control-rate tier — not just produce them. The driving example is laptop / webcam input for gesture control: aWebcam node captures
camera frames as a texture stream, a PoseExtractor (or hand-tracker,
face-mesh, optical-flow) node runs ML inference on those frames to emit
a vector or event<trigger> port, and downstream the existing
modulation-routing layer maps those values to plugin parameters or
triggers. The same pipeline shape covers depth sensors, AR markers,
and any computer-vision-driven control.
For this to work, the design needs three properties:
- Texture sources can run on the control-rate tier. Webcam capture
isn’t part of the audio-rate world and shouldn’t pretend to be. The
textureport type is independent of which tier produces it — a control-rate node can have a texture output port and present a new frame on each tick. - Control-rate nodes can read texture ports. Pose detection,
embedding extraction, and any vision ML need to sample texture data
on the worker thread. This means the GPU/texture resource layer
must allow reads from off-render-thread consumers (typically via
readback into a CPU buffer; on browsers,
<canvas>orgetImageDatafrom a video element; on native, wgpu staging buffers). - Texture → vector/event/scalar conversions are first-class.
Vision ML emits structured data (landmarks, embeddings, classifications),
not pixels. Those become standard
vector/event<trigger>/scalarports — no new port type — so a hand-position landmark driving a synth filter is the same kind of wire as an LFO or an OSC value driving the same parameter.
Why OSC needs the control-rate tier specifically
This came up in design discussion and is worth pinning down because it’s the cleanest illustration of the tier split. OSC arrives over UDP at unpredictable rates and times — a control surface might burst 30 messages in 5 ms during a fader sweep, an iPad patch might send at 60 Hz, a sensor might send at 100 Hz. Two constraints force OSC off the audio thread:- Socket I/O is forbidden on the audio thread.
recv()is a syscall and can block on the network stack. Even non-blocking reads require system calls and kernel buffer interaction. The audio thread’s ~5 ms block budget can’t absorb a syscall storm. - OSC packet arrival isn’t aligned with audio blocks. Even if reads were free, you’d still have to buffer arbitrary-size bursts and drain them at some rate that isn’t necessarily a multiple of the audio block rate.
Port type discipline
A small, semantic set of port types beats a per-protocol explosion.| Port type | Carries |
|---|---|
audio_buffer | Per-block PCM samples (existing). |
midi_event | Per-block MIDI events (existing). |
texture | Per-frame 2D RGBA (new). |
scalar | Float — automation, mod, knob position. |
vector | Fixed-size float array — embeddings, MFCC, EQ curves. |
event<note> | Discrete pitched events (any source). |
event<trigger> | Discrete fire-and-forget events. |
string | Prompts, labels, OSC addresses, chat content. |
event<note> doesn’t
know or care if the source is a hardware controller, an OSC
/synth/note message, or a step sequencer. This keeps the type system
small and protocols pluggable.
Phasing
Each phase ships independently and proves the abstraction before the next is built.Phase 1 — Texture port type + Veo viewer (validates texture) ✅
Status: shipped. What landed:VisualSliceinsongbird-statewithVisualNode/VisualEdge/VisualNodeKind { VideoSource, ViewportSink }/VisualPortKind { Texture }. Side-routed persistence todaw.visual.jsonmirroring the mixer/plugin/ai/sections pattern (#[serde(skip)]inStateManager).- New
visualsync channel undersongbird-sync/src/channels/visual/withadd_video_source,add_viewport_sink,connect,disconnect,remove_node,move_node,update_video_source_metadata,get_statecommands and a singlevisual:stateevent (full slice broadcast). All mutations are idempotent where it matters (sinks unique per target, edges dedupe on same from/to). - Veo handler now creates a
VideoSourcenode with astart_beatanchor + auto-connects to the singletonvideo_tabViewportSink, then pushesvisual:statebeforeml:veo_completeso the FE Zustand mirror is hot when the panel sees completion. - FE:
useVisualStore(Zustand mirror),visualchannel registered inwiring.ts,VideoPlayerreads a VideoSource node from the graph and falls back to the legacyvideoFilePathfor drag-drop. - Drift controller is now framerate-aware:
FRAME_THRESHOLD = max(33ms, 1.5/fps). The 24 fps Veo source no longer stutters because the threshold is wider than one frame interval. Audio position is also clamped to[0, durationSeconds]so 8 s clips in 30 s songs don’t constantly hard-seek past their own end. - No hardcoded source metadata: Veo creates the node with
zero values for duration / framerate / dimensions. The FE probes
real values via
<video>.loadedmetadata(duration + dimensions) andrequestVideoFrameCallbackinterval averaging (framerate), then patches the node via the newvisual.update_video_source_metadatacommand. Same path covers any container the browser plays — drag-dropped arbitrary videos populate the same fields.
- Multi-clip / playhead-based source selection. Phase 1 picks
the latest VideoSource by Map insertion order; Phase 3+ should
pick the active clip whose
[startBeat, startBeat + duration]window contains the playhead. - Snap
start_beatto the live transport position when Veo generation is dispatched. Today it defaults to 0 becausetransport.position_beatslives on a runtime atomic, not the persisted slice. Wiring it through the dispatch context is a small follow-up, deferred so it doesn’t block Phase 2. - Render video clips as blocks in the Arrangement view (so users can drag/move/delete them without the dispatch CLI).
Phase 2 — Control-rate scheduler (validates the second tier) ✅
Status: shipped. What landed:songbird-sync/src/control_rate.rs—ControlRateScheduleris a managed pool of long-running async tasks keyed byTaskKey, built on a shared multi-threaded tokio runtime (one worker per CPU). Cheap to clone (internals areArc-wrapped), embedded inDispatchContextso every dispatch handler can reach it.- API:
register(key, future) → key,cancel(&key) → bool,is_registered,keys,forget. Registration is last-writer-wins — re-registering the same key aborts the prior task. Idempotent semantics for “ensure exactly one X is running” UI patterns. - Veo’s LRO poll migrated: the dispatch handler now calls
ctx.control_rate.register(TaskKey::new(format!("veo:{nonce}")), ...)instead of fire-and-forgetspawn_async.ml.cancel_veoactually cancels the in-flight task now (was a no-op stub before) — by nonce when the FE supplies one, otherwise cancels everyveo:*-prefixed task as a safety net. - Unit tests cover
register_and_cancel(in-flight task is aborted before it can complete) andregister_replaces_existing_key(re-registering aborts the prior task).
tokio::time::sleep, tokio::select!,
or stream loops. The scheduler is just “registry of long-lived
async tasks with cancellation handles.” Simpler and more flexible
than a fixed cadence.
Future nodes (OSC, ML, webcam) all slot in by registering an
async function with the scheduler — same pattern Veo now uses.
SPSC ring marshalling into the audio thread is unchanged from how
Lyria already does it; no new infrastructure.
Phase 3 — Visualizer-as-node (validates cross-tier composition) ✅ (partial)
Status: kind + input-source persistence shipped. Per-kind detailed settings and a trueaudio_buffer port type are deferred.
What landed:
- New
VisualizerSource { panel_id, kind, input_source }variant inVisualNodeKind. One node per panel id (the only panel today is"main"); future per-track inline visualizers mint their own ids. visual.upsert_visualizer_sourcecommand — singleton-per-panel, last-writer-wins, returns the node id. Idempotent so re-mounts don’t leak nodes.- The Visualizer panel hydrates
visKindandinputSourcefrom the graph on first availability and writes them back on every change. Hydration guard prevents the writeback from clobbering the initial pull and from re-hydrating after the user edits locally. input_sourceis round-tripped as JSON becauseVisInputSourceis a structural FE-side type with three variants (main,track,input). Carrying it as opaque JSON avoids re-deriving it Rust-side for a feature that never reads it Rust-side.
- Per-kind settings (routing matrices, base params). The
Visualizer panel’s
nearfieldRouting,scatterBase, etc. stay React-side. Migrating these is a bigger refactor with no behavioral win for v1 (they reset to defaults on reload today, which is acceptable). When persistence becomes a requirement, model them as per-nodesettings: serde_json::Valueand patch via a newvisual.update_visualizer_source_settingscommand. - AudioTap as a separate node. The plan called for splitting
VisualizerSourceintoAudioTap → VisualizerSourceover anaudio_bufferport. We heldinput_sourceas a field onVisualizerSourcefor now: there is no FE-side or BE-side runtime use of anaudio_bufferport (visualizers readgetRtBuffer()directly), so introducing it as state today buys nothing but bookkeeping. When ML/embedding nodes need real audio_buffer routing, modelAudioTapthen. - Visualizers don’t actually produce textures yet. They render
to canvases owned by the panel, not to a
ViewportSinkconsuming a texture port. Phase 1 + Phase 3 share the design intent; the full picture (every visualizer kind is a control-rate node producing texture) is a follow-up that lands once we want multi-output (recording the visualizer to disk, layering with Veo, etc.).
Phase 4 — Protocols channel + OSC adapter ✅ (scaffolding)
Status: scaffolding shipped. UDP listener works end-to-end via the control-rate scheduler. Routing UI deferred. Channel naming. Songbird’s existing channels map to concepts (mixer, clip, transport, ml, recording), not technologies — so OSC shouldn’t be its own channel any more than “VST3” or “ReWire” should be. The shared concept is “protocol adapters that translate external traffic into Songbird’s typed ports.” OSC, HID, gamepad, WebSocket, sensors all fit, share the same lifecycle (start a listener, decode incoming traffic, emit typed events), and share the same downstream consumer (a routing UI that maps incoming addresses → Songbird parameters). Oneprotocols channel hosts them all.
What landed:
rosc 0.10added to workspace deps; pure-Rust OSC parser.- New
protocolssync channel undersongbird-sync/src/channels/protocols/:AdapterConfigenum — one variant per protocol (AdapterConfig::Osc { port }today;Hid { … },Websocket { … }, etc. land as new arms). Statically exhaustive in dispatch; FE wire-encodes as JSON so adding an adapter doesn’t require a TS migration.protocols/adapters/subdirectory — one file per protocol’s runtime. Adding a new adapter is one new file + one new arm.protocols.start_listener { config }— config is the JSON-serializedAdapterConfig. Registers a control-rate task keyedprotocols:listener:<listener_id>wherelistener_idis derived deterministically from the config ("osc:8000"etc.) so registration is idempotent.protocols.stop_listener { listener_id }— real cancellation via Phase 2’s scheduler.protocols.list_listeners— diagnostics.protocols:message { protocol, listenerId, source, payload }event —protocollets subscribers route by adapter without parsing payload.protocols:listener_status { protocol, listenerId, listening, error? }event — bind/unbind/error lifecycle.
- OSC adapter (
protocols/adapters/osc.rs) projects every standardOscTypeonto JSON: primitives (Int / Float / String / Long / Double / Bool / Char) become primitive JSON; structured types (Time / Color / Midi / Blob / Array) use a{type, ...}envelope. - FE
protocolschannel is registered inwiring.tswith a diagnostic subscriber that logs to thesync:protocolsconsole namespace.
- Routing UI. Map
/track1/volume→ fader on track 1, with ranges, smoothing, learn-mode (move a controller, song captures the address). This is its own substantial UX surface — picking the routing model (per-track? per-param? per-route persistence?) warrants its own design pass. - Address namespacing / templates. TouchOSC-style “layouts” that pre-define a controller’s expected addresses. Out of scope for v1.
- Outbound OSC (sending values back to the controller for motorized faders / LED feedback). Same channel, mirror command shape — left for whoever builds the routing UI.
- Other adapters (HID, gamepad, WebSocket, sensors). Same control-rate-task pattern; nothing structurally new. Add them when you have a use case driving the design.
protocols.start_listener { config: '{"protocol":"osc","port":8000}' }
and sending OSC messages from any controller (TouchOSC, Max, Python’s
python-osc, etc.) produces protocols:message events visible in
the DevTools console. The architectural pattern (UDP → control-rate node →
typed event → FE channel) is validated end-to-end. Building on this
to wire concrete parameters is straightforward; whoever ships the
routing UI inherits a working signal pipeline.
Phase 5 — three.js compositor ✅ (shipped)
Status: the Video tab and the existing GL visualizers share a single<GpuCanvas> at the Visualizer panel level. Switching between
mode === 'visualizer' and mode === 'video' doesn’t churn
GL/GPU contexts — the canvas stays mounted, only its content swaps.
Video display goes through THREE.VideoTexture on a fullscreen
quad; the bare <video> is kept in the DOM but rendered invisible.
What landed:
VideoTextureGL.tsx(r3f component) wraps anHTMLVideoElementinTHREE.VideoTexture, renders on a fullscreen-aligned plane with object-contain semantics. Always positions the plane (canvas-size fallback before the video has decoded its first frame) so it never renders as a 1px dot at origin.- A module-scope
useActiveVideoElementStore(Zustand) holds the currently-mounted<video>element.VideoPlayerwrites to it via a stableuseCallbackref-callback; the parent’s GpuCanvas reads from it via a small<VideoTextureSlot>helper. No prop drilling, no per-render ref-callback churn (which earlier caused an infinite render loop). - The
<video>is rendered atopacity-0 pointer-events-noneso the browser still does GPU decode (VideoToolbox / equivalent) andrequestVideoFrameCallbackkeeps firing on the same element the drift controller drives. Don’t switch tovisibility:hidden: WebKit stops compositing hidden video layers, which suspends rVFC and leaves the canvas black. - The shared GpuCanvas mounts whenever
mode === 'visualizer' || mode === 'video'— single context for the lifetime of the Visualizer panel. - Drift controller tuned for canvas display: ±1 % max playbackRate nudge (was ±5 %) and a 2-frame deadband. The canvas presents every frame straight from the texture, so rate changes that were masked by the browser’s compositor are now directly visible.
glRenderer is toggled off (debug setting), no
GpuCanvas mounts in video mode and the <video> element flips to
visible so the tab isn’t blank.
Path to compositor nodes from here:
- Pipeline executor (~200 lines): walks the visual graph in
topological order, allocates
THREE.WebGLRenderTargetper intermediate node, binds inputs, dispatches passes. Lets aVeo → ColorCorrect → Composite → ViewportSinkchain “just work.” - First shader node (~50 lines):
THREE.ShaderMaterialwrapping a fragment shader + uniforms; input/output texture ports. - Visualizers as overlays: existing GL visualizers can layer
over Veo by rendering in the same scene as
VideoTextureGL— they already use the same canvas now, so it’s a structural change inside one component, not cross-component plumbing.
- Pipeline executor (~200 lines): walks the visual graph in
topological order, allocates
THREE.WebGLRenderTargetper intermediate node, binds input textures to materials, dispatches draws. Lets aVeo → ColorCorrect → Composite → ViewportSinkchain “just work.” - First shader node (~50 lines): a
THREE.ShaderMaterialwrapper that takes a fragment shader + uniform schema and exposes input/output texture ports. - Existing GL visualizers can layer over Veo by mounting in
the same
GpuCanvasscene — natural compositing without bespoke glue.
VideoTexture in
the Video tab with audio sync intact, and adding a one-shot color
filter (sepia, grayscale, threshold) on top is genuinely 50 lines.
Open questions for the next iteration:
- Does the visualizer tab’s
<GpuCanvas>and the Video tab’s<GpuCanvas>share a renderer / scene, or stay separate? Sharing buys natural overlay (visualizer-on-top-of-video) but the existing visualizer infrastructure is already mounted per panel. Probably keep separate scenes per panel and let the pipeline executor stitch when needed. - WebGPU vs WebGL2 backend: three.js auto-picks WebGL2; opting into the WebGPU renderer is a one-line change but only worth it once a node demands compute shaders / multi-target rendering that WebGL2 can’t handle.
drawImage, blend modes) but no programmable
shaders — hits a ceiling fast for compositing video + visualizers +
shader effects. Raw WebGPU is the right substrate but requires
building a full pipeline framework. Three.js is already in the
codebase (react_ui/src/lib/gpu/GpuCanvas.tsx and the existing
SpectrumGL / WaveformGL / CosmosGL / FlowShieldGL /
LightCubeGL / GeometricGL visualizers), abstracts WebGPU vs
WebGL2, and gives you VideoTexture, render targets, ShaderMaterial,
and a scene graph out of the box. The compositor we’d build on raw
WebGPU is roughly what three.js already is. So Phase 5 lands on
three.js — same substrate the existing GL visualizers use.
Scope:
VideoSource→ three.jsVideoTexture. The Veo node still sources its frames from a hidden<video>element (browser does the GPU decode), but the texture lives in three.js, not in a DOM<video>rendered to the page. Audio sync stays the same —requestVideoFrameCallbackis on the underlying<video>, the drift controller already uses it.ViewportSink→<GpuCanvas>render target. Replaces the current bare<video>element in the Video tab with a three.js scene that renders the VideoTexture. The sameGpuCanvasalready hosts the visualizers — sharing the canvas means the existing visualizer scenes can layer over Veo without glue code.- Pipeline executor (~200 lines): walks the visual graph in topological order, allocates render targets, binds input textures to materials, dispatches draws. This is the piece that makes shader/compositor nodes “just work” once added.
- Future nodes become trivial: a shader node is a
THREE.ShaderMaterialwrapping a fragment shader; a blend node is a two-input material; a feedback-buffer node is a render target ping-pong. Each is ~50 lines.
VideoTexture, audio sync intact (drift controller still
hooks the underlying <video> element), and a follow-up
“add a colour-correct shader node” exercise is genuinely 50 lines.
The existing GpuCanvas-based visualizers can render in the same
scene as the Veo video without bespoke code.
Open questions for implementation time:
- Should the visualizer scene and the Veo scene share one
GpuCanvasor be separate three.js scenes composited at the viewport level? Sharing means cheaper GPU upload and natural layering; separating means cleaner per-node ownership. Probably separate scenes that the pipeline executor stitches into a composite. - Does the FE-side audio drift controller need to move into the
three.js render loop, or stay in its current
requestVideoFrameCallbacksubscribeRtBufferform? Today’s form works fine and is decoupled from rendering — keep as-is.
- WebGPU backend selection: three.js auto-picks WebGL2 today; manually opting into the WebGPU renderer is a one-line change but only worth it if a node turns out to need WebGPU-only features (compute shaders, etc.). Defer until a concrete node demands it.
Out of scope (for now)
- Image → ML audio generation.
ImageSource/Webcam→TextureToEmbedding→LyriaInput/Magenta is exactly the composition this graph is designed to enable, but it’s deferred — it only ships once Phase 1 (texture) and Phase 2 (control-rate tier) prove themselves on simpler ground. The architecture below explicitly preserves room for it; we’re not building it yet. - Replacing the audio-thread DAG. The existing track/insert/send/master topology stays as-is; new node types extend it, they don’t replace it.
- A user-facing visual node editor (Max/PD style). The graph is real and addressable from code; whether it gets a UI is a separate question.
- Multi-output sinks (e.g. recording the texture stream to disk). Easy to add once the abstraction holds, but not part of v1.
- Distributed / networked nodes. Songbird is a single-process app today.
Open questions
- GPU resource ownership. Does each texture-producing node own its output texture, or does the scheduler pool textures? The latter scales better but requires reference counting. Constraint from the vision/gesture use case: textures must be readable from the control-rate tier, not just the render thread — so whatever ownership model wins, it has to expose a readback path.
- Time semantics for the control-rate tier. Does a control-rate node tick produce a value tagged with a future audio-block timestamp (sample-accurate scheduling), or is it just “latest value wins”? Modulation is sample-accurate today; OSC probably can’t be.
- Persistence. Do graph topology changes go through the existing sync engine like everything else? (Probably yes — it’s just state.) Per-node settings serialize to the project file.
- Threading model. One worker thread for control-rate, or a pool? Network and ML probably want separate threads to avoid head-of-line blocking.
References
- Existing graph:
rust/crates/engine/songbird-engine/src/ - Existing modulation/automation: same crate
- Lyria off-graph compute pattern (the closest thing to a control-rate
node today):
rust/crates/integration/songbird-lyria/ - Veo (this plan’s forcing function):
rust/crates/integration/songbird-veo/ - TouchDesigner TOP/CHOP architecture is the design north star for the texture port type and node-graph composition discipline.