The Nervous System for AI
A high-performance, modular, local-first AI orchestration library written in Rust.
Local-First Intelligence
Why Local?
In the age of Cloud AI, local execution offers absolute privacy, zero latency, and offline reliability. Your data never leaves your machine, and your intelligence isn't gated by a subscription or an internet connection.
The Difference
Unlike API-based services (OpenAI, Anthropic) which process data on remote servers, Rusty-Genius is built for on-device orchestration. While other options exist for remote AI usage, this library focuses exclusively on the unique challenges and performance requirements of local inference.
Configuration
Control behavior via environment variables or a local registry:
| Variable | Description |
|---|---|
| GENIUS_HOME | Base config dir (~/.config/rusty-genius) |
| GENIUS_CACHE | Model storage path ($HOME/cache) |
Injected Manifest API
Extend the registry by creating
registry.toml in your home directory. This allows you to "inject" any HuggingFace model
directly into the orchestrator.
[[models]]
name = "my-local-model"
repo = "Org/Repo-GGUF"
filename = "model.Q4_K_M.gguf"
quantization = "Q4_K_M"
Main Components
Crate Cortex
The Muscle (Inference Engine). Handles KV caching, token streaming, and
logic processing. Wraps llama.cpp for local execution or uses stubs for
testing.
Crate Brainstem
The Orchestrator. The central event loop that manages state, delegates asset retrieval, and controls the lifecycle (TTL) of the inference engine.
Crate Facecrab
The Supplier. An autonomous asset authority. Resolves HuggingFace paths,
manages the local registry, and downloads models via surf/smol.
Installation
Add Rusty-Genius to your project's
Cargo.toml:
[dependencies]
rusty-genius = { version = "0.1.1", features = ["metal"] }
Try It Out
Run these commands in the terminal to test the crates locally:
1. Test Asset Downloader
cargo run -p facecrab --example downloader
2. Test Local Inference
Run the chat example with your preferred hardware acceleration:
cargo run -p rusty-genius --example basic_chat --features metal
cargo run -p rusty-genius --example basic_chat --features cuda
Usage
Initialize the orchestrator, download a model, and start chatting:
use rusty_genius::Orchestrator;
use rusty_genius::core::protocol::{AssetEvent, BrainstemInput, BrainstemOutput, InferenceEvent};
use futures::{StreamExt, sink::SinkExt, channel::mpsc};
#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Core orchestration setup
let mut genius = Orchestrator::new().await?;
let (mut input, rx) = mpsc::channel(100);
let (tx, mut output) = mpsc::channel(100);
async_std::task::spawn(async move {
if let Err(e) = genius.run(rx, tx).await {
eprintln!("Orchestrator error: {}", e);
}
});
// 2. Select model (downloads and verifies automatically)
input.send(BrainstemInput::LoadModel(
"tiny-model".into()
)).await?;
// 3. Submit prompt
input.send(BrainstemInput::Infer {
prompt: "Once upon a time...".into(),
config: Default::default(),
}).await?;
// 4. Stream results
while let Some(msg) = output.next().await {
match msg {
BrainstemOutput::Asset(a) => match a {
AssetEvent::Complete(path) => println!("Model ready at: {}", path),
AssetEvent::Error(e) => eprintln!("Download error: {}", e),
_ => {}
},
BrainstemOutput::Event(e) => match e {
InferenceEvent::Content(c) => print!("{}", c),
InferenceEvent::Complete => break,
_ => {}
},
BrainstemOutput::Error(err) => {
eprintln!("Error: {}", err);
break;
}
}
}
Ok(())
}
Inside the Cranium
Internal Crates
- rusty-genius-cortex - Inference engine
- rusty-genius-stem - Orchestration
- facecrab - Model asset management
Key Dependencies
- llama-cpp-2 - llama.cpp bindings
- async-std - async runtime
- surf - http client (async)
OS Requirements
Rusty-Genius leverages hardware acceleration via
llama-cpp-2. Requirements vary by platform and GPU availability.
macOS
xcode-select --install and
brew install cmake
Apple Silicon (M1/M2/M3)
- • Native Metal acceleration (Default)
- • Optimized via ARM NEON and Accelerate
- • Feature:
features = ["metal"]
Intel Mac
- • CPU-only or Accelerate framework
- • Uses AMX/AVX2 instructions where available
Linux
apt install build-essential cmake libclang-dev
NVIDIA (CUDA)
- • Requires CUDA Toolkit 11.x/12.x
- • Feature:
features = ["cuda"]
AMD (ROCm)
- • Requires ROCm stack / hipBLAS
- • Feature:
features = ["hipblas"]
Intel / Generic GPU
- • Requires Vulkan SDK
- • Feature:
features = ["vulkan"]
Windows
Native (MSVC)
- • Supports
cudaandvulkanbackends - • Ensure
LIBCLANG_PATHis in environment
WSL2 (Recommended)
- • Best performance and compatibility
- • WSL2 GPU Passthrough Guide