Skip to main content

songbird-inference

Shared ML inference infrastructure for Songbird. Provides a trait-based backend abstraction, ONNX Runtime session management, execution provider selection, model weight download/cache/verification, and progress reporting.

ONNX Runtime Dynamic Loading

This crate uses ort with the load-dynamic feature. The ONNX Runtime shared library (libonnxruntime.dylib / .so / .dll) is loaded at runtime, not linked at compile time. This means:
  1. The app compiles without the dylib present.
  2. The dylib must be discoverable at runtime for inference to work.
  3. If the dylib isn’t found, the app still runs — stem separation is simply unavailable (graceful degradation).

Setup for Development

# From the repo root — downloads the dylib for your platform
./utils/binaries/fetch-onnxruntime
This places the library in rust/target/onnxruntime/. The ort_init module finds it via the sibling-of-target search path (exe_dir/../onnxruntime/).

Setup for Production Builds

# 1. Fetch the dylib for the target platform
./utils/binaries/fetch-onnxruntime macos-arm64   # or linux-x64, windows-x64, etc.

# 2. Build with Tauri (dylib is bundled alongside the binary)
TAURI_CONFIG=crates/app/songbird-app/tauri.onnxruntime.json cargo tauri build

# Or use the release script (fetches dylib automatically)
./utils/release-rs
The tauri.onnxruntime.json overlay adds the dylib directory as a bundled resource. On macOS it lands in Contents/Resources/, on Linux/Windows next to the binary.

Search Order (ort_init::resolve_dylib_path)

  1. ORT_DYLIB_PATH environment variable (explicit override)
  2. Same directory as the running executable
  3. macOS: ../Frameworks/ and ../Resources/ relative to executable (app bundle)
  4. exe_dir/../onnxruntime/ — dev workflow (fetch-onnxruntime drops the dylib here, sibling of target/{debug,release}/)
  5. Tauri resource directory (if provided)
  6. System library paths (/usr/local/lib, /opt/homebrew/lib, etc.)

Supported Platforms

PlatformLibraryExecution Providers
macOS arm64libonnxruntime.dylibCoreML (ANE/GPU) → CPU
macOS x86_64libonnxruntime.dylibCPU
Linux x86_64libonnxruntime.soCUDA → CPU
Linux aarch64libonnxruntime.soCPU
Windows x86_64onnxruntime.dllDirectML → CUDA → CPU

Version Compatibility

The ort crate version (2.0.0-rc.11) requires ONNX Runtime >= 1.23.x. The fetch script downloads 1.23.2 (latest patch in the minimum-required line). Using an older version causes a hard failure at ort::init_from with a clear “not compatible” error. When bumping ort, re-check the version constraint in ort_init::ORT_VERSION and the fetch script.

Public API

  • InferenceBackend trait — uniform interface for ONNX, llama.cpp, MLX, and remote backends
  • BackendRegistry — routes inference requests by ModelCapability
  • SamplingParams / GeneratedToken — text generation parameters and streaming output
  • InferenceSession — wraps ort::Session with provider management
  • SessionConfig / ExecutionProvider — configuration types
  • ModelStore — model weight download, caching, verification, and local model listing
    • ensure_model() — download + SHA-256 verify
    • download_model_no_verify() — generic download for arbitrary GGUF URLs (no checksum)
    • list_local_models()Vec<LocalModelInfo> — scans cache for GGUF files and HF repo dirs
    • list_cached() — returns paths for all cached ONNX and GGUF files
  • ModelManifest — describes a downloadable model (URL, checksum, size)
  • LocalModelInfo — metadata for a locally available model (id, name, backend, path, size, status)
  • ProgressSink trait — progress callbacks for downloads and inference
  • MiniLmEmbedder — text → 384-dim f32 embedding (requires embeddings feature)
  • OnnxBackendInferenceBackend impl with tensor compute + embedding support
  • ort_init — dynamic library discovery and initialization

Feature flags

  • embeddings: enables MiniLmEmbedder and the tokenizers dependency. Used by songbird-agent for prompt classification.
Depended on by songbird-separator (stem separation), songbird-agent (LLM pipeline and prompt classification), and sync-engine ml.* commands. Lives in ml/ group — not core/ (too many external deps) and not integration/ (runs locally, not an external service bridge).

Crate Structure

ModulePurpose
backendInferenceBackend trait, error types
backends/Concrete backends (ONNX, future: llama.cpp, MLX)
model_storeDownload, cache, SHA-256 verify model weights
ort_initDynamic library discovery and initialization
progressProgressSink trait for long operations
providersExecution provider enum and registration
registryBackendRegistry routes by capability
sessionONNX session wrapper with provider fallback
tensorndarray ↔ ort tensor conversion helpers