The Nervous System for AI

A high-performance, modular, local-first AI orchestration library written in Rust.

View on GitHub View Crate

Rusty Genius Architecture: A brain with cortex, brainstem, and facecrab attachment

Local-First Intelligence

Why Local?

In the age of Cloud AI, local execution offers absolute privacy, zero latency, and offline reliability. Your data never leaves your machine, and your intelligence isn't gated by a subscription or an internet connection.

The Difference

Unlike API-based services (OpenAI, Anthropic) which process data on remote servers, Rusty-Genius is built for on-device orchestration. While other options exist for remote AI usage, this library focuses exclusively on the unique challenges and performance requirements of local inference.

Architecture

Crate Cortex

The Muscle (Inference Engine). Handles KV caching, token streaming, and logic processing. Wraps llama.cpp for local execution or uses stubs for testing.

Crate Brainstem

The Orchestrator. The central event loop that manages state, delegates asset retrieval, and controls the lifecycle (TTL) of the inference engine.

Crate Facecrab

The Supplier. An autonomous asset authority. Resolves HuggingFace paths, manages the local registry, and downloads models via surf/smol.

Command Line Usage

The ogenius CLI provides an interactive chat REPL and an OpenAI-compatible API server, with automatic model downloading from Huggingface.

Quick Install

# Standard install (CPU only)
cargo install ogenius

# Install with Metal (macOS)
brew install cmake
cargo install ogenius --features metal

# Install with CUDA (cross-platform)
cargo install ogenius --features cuda

# Install with Vulkan (cross-platform)
cargo install ogenius --features vulkan

Basic Usage

# Download a model
ogenius download Qwen/Qwen2.5-1.5B-Instruct

# Start interactive chat
ogenius chat --model Qwen/Qwen2.5-1.5B-Instruct

# Run the API & Web Server (defaults to port 8080)
ogenius serve --model Qwen/Qwen2.5-1.5B-Instruct

Library Usage

Add Rusty-Genius to your project's Cargo.toml:

[dependencies]
rusty-genius = { version = "0.1.2", features = ["metal"] }

or use:

cargo add rusty-genius --features metal

Configuration

Control behavior via environment variables or manifest files:

Variable	Description
GENIUS_HOME	Base config dir (~/.config/rusty-genius)
GENIUS_CACHE	Model storage & dynamic registry path

manifest.toml

Static, user-editable file in the config directory to extend the built-in model list.

registry.toml

Dynamically updated index in the cache directory tracking all downloaded models.

Extension API

Extend the registry by creating manifest.toml in your home directory.

[[models]]
name = "my-local-model"
repo = "Org/Repo-GGUF"
filename = "model.Q4_K_M.gguf"
quantization = "Q4_K_M"

Try It Out

Run these commands in the terminal to test the crates locally:

1. Test Asset Downloader

cargo run -p facecrab --example downloader

2. Test Local Inference

Run the chat example with your preferred hardware acceleration:

macOS (Metal)

cargo run -p rusty-genius --example basic_chat --features metal

NVIDIA (CUDA)

cargo run -p rusty-genius --example basic_chat --features cuda

Usage

Initialize the orchestrator, download a model, and start chatting:

use rusty_genius::Orchestrator;
use rusty_genius::core::protocol::{AssetEvent, BrainstemInput, BrainstemOutput, InferenceEvent};
use futures::{StreamExt, sink::SinkExt, channel::mpsc};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Core orchestration setup
    let mut genius = Orchestrator::new().await?;
    let (mut input, rx) = mpsc::channel(100);
    let (tx, mut output) = mpsc::channel(100);

    async_std::task::spawn(async move { 
        if let Err(e) = genius.run(rx, tx).await {
            eprintln!("Orchestrator error: {}", e);
        }
    });

    // 2. Select model (downloads and verifies automatically)
    input.send(BrainstemInput::LoadModel(
        "tiny-model".into()
    )).await?;

    // 3. Submit prompt
    input.send(BrainstemInput::Infer {
        prompt: "Once upon a time...".into(),
        config: Default::default(),
    }).await?;

    // 4. Stream results
    while let Some(msg) = output.next().await {
        match msg {
            BrainstemOutput::Asset(a) => match a {
                AssetEvent::Complete(path) => println!("Model ready at: {}", path),
                AssetEvent::Error(e) => eprintln!("Download error: {}", e),
                _ => {}
            },
            BrainstemOutput::Event(e) => match e {
                InferenceEvent::Content(c) => print!("{}", c),
                InferenceEvent::Complete => break,
                _ => {}
            },
            BrainstemOutput::Error(err) => {
                eprintln!("Error: {}", err);
                break;
            }
        }
    }

    Ok(())
}

Inside the Cranium

Internal Crates

rusty-genius-cortex - Inference engine
rusty-genius-stem - Orchestration
facecrab - Model asset management

Key Dependencies

llama-cpp-2 - llama.cpp bindings
async-std - async runtime
surf - http client (async)

OS Requirements

Rusty-Genius leverages hardware acceleration via llama-cpp-2. Requirements vary by platform and GPU availability.

macOS

Prerequisites: xcode-select --install and brew install cmake

Apple Silicon (M1/M2/M3)

• Native Metal acceleration (Default)
• Optimized via ARM NEON and Accelerate
• Feature: features = ["metal"]

Intel Mac

• CPU-only or Accelerate framework
• Uses AMX/AVX2 instructions where available

Linux

Prerequisites: apt install build-essential cmake libclang-dev

NVIDIA (CUDA)

• Requires CUDA Toolkit 11.x/12.x
• Feature: features = ["cuda"]

AMD (ROCm)

• Requires ROCm stack / hipBLAS
• Feature: features = ["hipblas"]

Intel / Generic GPU

• Requires Vulkan SDK
• Feature: features = ["vulkan"]

Windows

Prerequisites: VS 2022 Community (C++ Workload) + CMake

Native (MSVC)

• Supports cuda and vulkan backends
• Ensure LIBCLANG_PATH is in environment

WSL2 (Recommended)

• Best performance and compatibility
• WSL2 GPU Passthrough Guide