songbird-inference
Shared ML inference infrastructure for Songbird. Provides a trait-based backend abstraction, ONNX Runtime session management, execution provider selection, model weight download/cache/verification, and progress reporting.ONNX Runtime Dynamic Loading
This crate usesort with the load-dynamic feature. The ONNX Runtime
shared library (libonnxruntime.dylib / .so / .dll) is loaded at
runtime, not linked at compile time. This means:
- The app compiles without the dylib present.
- The dylib must be discoverable at runtime for inference to work.
- If the dylib isn’t found, the app still runs — stem separation is simply unavailable (graceful degradation).
Setup for Development
rust/target/onnxruntime/. The ort_init module
finds it via the sibling-of-target search path (exe_dir/../onnxruntime/).
Setup for Production Builds
tauri.onnxruntime.json overlay adds the dylib directory as a bundled
resource. On macOS it lands in Contents/Resources/, on Linux/Windows
next to the binary.
Search Order (ort_init::resolve_dylib_path)
ORT_DYLIB_PATHenvironment variable (explicit override)- Same directory as the running executable
- macOS:
../Frameworks/and../Resources/relative to executable (app bundle) exe_dir/../onnxruntime/— dev workflow (fetch-onnxruntimedrops the dylib here, sibling oftarget/{debug,release}/)- Tauri resource directory (if provided)
- System library paths (
/usr/local/lib,/opt/homebrew/lib, etc.)
Supported Platforms
| Platform | Library | Execution Providers |
|---|---|---|
| macOS arm64 | libonnxruntime.dylib | CoreML (ANE/GPU) → CPU |
| macOS x86_64 | libonnxruntime.dylib | CPU |
| Linux x86_64 | libonnxruntime.so | CUDA → CPU |
| Linux aarch64 | libonnxruntime.so | CPU |
| Windows x86_64 | onnxruntime.dll | DirectML → CUDA → CPU |
Version Compatibility
Theort crate version (2.0.0-rc.11) requires ONNX Runtime >= 1.23.x.
The fetch script downloads 1.23.2 (latest patch in the minimum-required
line). Using an older version causes a hard failure at ort::init_from
with a clear “not compatible” error. When bumping ort, re-check the
version constraint in ort_init::ORT_VERSION and the fetch script.
Public API
InferenceBackendtrait — uniform interface for ONNX, llama.cpp, MLX, and remote backendsBackendRegistry— routes inference requests byModelCapabilitySamplingParams/GeneratedToken— text generation parameters and streaming outputInferenceSession— wrapsort::Sessionwith provider managementSessionConfig/ExecutionProvider— configuration typesModelStore— model weight download, caching, verification, and local model listingensure_model()— download + SHA-256 verifydownload_model_no_verify()— generic download for arbitrary GGUF URLs (no checksum)list_local_models()→Vec<LocalModelInfo>— scans cache for GGUF files and HF repo dirslist_cached()— returns paths for all cached ONNX and GGUF files
ModelManifest— describes a downloadable model (URL, checksum, size)LocalModelInfo— metadata for a locally available model (id, name, backend, path, size, status)ProgressSinktrait — progress callbacks for downloads and inferenceMiniLmEmbedder— text → 384-dim f32 embedding (requiresembeddingsfeature)OnnxBackend—InferenceBackendimpl with tensor compute + embedding supportort_init— dynamic library discovery and initialization
Feature flags
embeddings: enablesMiniLmEmbedderand thetokenizersdependency. Used bysongbird-agentfor prompt classification.
songbird-separator (stem separation), songbird-agent (LLM pipeline
and prompt classification), and sync-engine ml.* commands. Lives in ml/ group
— not core/ (too many external deps) and not integration/ (runs locally, not
an external service bridge).
Crate Structure
| Module | Purpose |
|---|---|
backend | InferenceBackend trait, error types |
backends/ | Concrete backends (ONNX, future: llama.cpp, MLX) |
model_store | Download, cache, SHA-256 verify model weights |
ort_init | Dynamic library discovery and initialization |
progress | ProgressSink trait for long operations |
providers | Execution provider enum and registration |
registry | BackendRegistry routes by capability |
session | ONNX session wrapper with provider fallback |
tensor | ndarray ↔ ort tensor conversion helpers |