songbird-inference

Shared ML inference infrastructure for Songbird. Provides a trait-based backend abstraction, ONNX Runtime session management, execution provider selection, model weight download/cache/verification, and progress reporting.

ONNX Runtime Dynamic Loading

This crate uses ort with the load-dynamic feature. The ONNX Runtime shared library (libonnxruntime.dylib / .so / .dll) is loaded at runtime, not linked at compile time. This means:

The app compiles without the dylib present.
The dylib must be discoverable at runtime for inference to work.
If the dylib isn’t found, the app still runs — stem separation is simply unavailable (graceful degradation).

Setup for Development

# From the repo root — downloads the dylib for your platform
./utils/binaries/fetch-onnxruntime

This places the library in rust/target/onnxruntime/. The ort_init module finds it via the sibling-of-target search path (exe_dir/../onnxruntime/).

Setup for Production Builds

# 1. Fetch the dylib for the target platform
./utils/binaries/fetch-onnxruntime macos-arm64   # or linux-x64, windows-x64, etc.

# 2. Build with Tauri (dylib is bundled alongside the binary)
TAURI_CONFIG=crates/app/songbird-app/tauri.onnxruntime.json cargo tauri build

# Or use the release script (fetches dylib automatically)
./utils/release-rs

The tauri.onnxruntime.json overlay adds the dylib directory as a bundled resource. On macOS it lands in Contents/Resources/, on Linux/Windows next to the binary.

Search Order (`ort_init::resolve_dylib_path`)

ORT_DYLIB_PATH environment variable (explicit override)
Same directory as the running executable
macOS: ../Frameworks/ and ../Resources/ relative to executable (app bundle)
exe_dir/../onnxruntime/ — dev workflow (fetch-onnxruntime drops the dylib here, sibling of target/{debug,release}/)
Tauri resource directory (if provided)
System library paths (/usr/local/lib, /opt/homebrew/lib, etc.)

Supported Platforms

Platform	Library	Execution Providers
macOS arm64	libonnxruntime.dylib	CoreML (ANE/GPU) → CPU
macOS x86_64	libonnxruntime.dylib	CPU
Linux x86_64	libonnxruntime.so	CUDA → CPU
Linux aarch64	libonnxruntime.so	CPU
Windows x86_64	onnxruntime.dll	DirectML → CUDA → CPU

Version Compatibility

The ort crate version (2.0.0-rc.11) requires ONNX Runtime >= 1.23.x. The fetch script downloads 1.23.2 (latest patch in the minimum-required line). Using an older version causes a hard failure at ort::init_from with a clear “not compatible” error. When bumping ort, re-check the version constraint in ort_init::ORT_VERSION and the fetch script.

Public API

InferenceBackend trait — uniform interface for ONNX, llama.cpp, MLX, and remote backends
BackendRegistry — routes inference requests by ModelCapability
SamplingParams / GeneratedToken — text generation parameters and streaming output
InferenceSession — wraps ort::Session with provider management
SessionConfig / ExecutionProvider — configuration types
ModelStore — model weight download, caching, verification, and local model listing
- ensure_model() — download + SHA-256 verify
- download_model_no_verify() — generic download for arbitrary GGUF URLs (no checksum)
- list_local_models() → Vec<LocalModelInfo> — scans cache for GGUF files and HF repo dirs
- list_cached() — returns paths for all cached ONNX and GGUF files
ModelManifest — describes a downloadable model (URL, checksum, size)
LocalModelInfo — metadata for a locally available model (id, name, backend, path, size, status)
ProgressSink trait — progress callbacks for downloads and inference
MiniLmEmbedder — text → 384-dim f32 embedding (requires embeddings feature)
OnnxBackend — InferenceBackend impl with tensor compute + embedding support
ort_init — dynamic library discovery and initialization

Feature flags

embeddings: enables MiniLmEmbedder and the tokenizers dependency. Used by songbird-agent for prompt classification.

Depended on by songbird-separator (stem separation), songbird-agent (LLM pipeline and prompt classification), and sync-engine ml.* commands. Lives in ml/ group — not core/ (too many external deps) and not integration/ (runs locally, not an external service bridge).

Crate Structure

Module	Purpose
`backend`	`InferenceBackend` trait, error types
`backends/`	Concrete backends (ONNX, future: llama.cpp, MLX)
`model_store`	Download, cache, SHA-256 verify model weights
`ort_init`	Dynamic library discovery and initialization
`progress`	`ProgressSink` trait for long operations
`providers`	Execution provider enum and registration
`registry`	`BackendRegistry` routes by capability
`session`	ONNX session wrapper with provider fallback
`tensor`	ndarray ↔ ort tensor conversion helpers

​songbird-inference

​ONNX Runtime Dynamic Loading

​Setup for Development

​Setup for Production Builds

​Search Order (ort_init::resolve_dylib_path)

​Supported Platforms

​Version Compatibility

​Public API

​Feature flags

​Crate Structure