The one-click
hybrid LLM solution
Automatically route your LLM coding tasks between frontier and local models while working directly in the coding tool you already use. Cut costs, keep data private, and build faster.
Free · v0.1.9 · Requires Apple Silicon · 18 GB+ RAM
Getting started
Up and running in minutes
Three clicks. No configuration. No terminal required.
Drag to Applications
One DMG, everything bundled for your Mac. No Homebrew, no Python, no dependencies to manage.
Download your models
Pick the local models you want. Glass Slipper downloads and tunes them for your Apple Silicon hardware, with no configuration required.
Connect and go
Point your coding tool's MCP at Glass Slipper. The router classifies every task and delegates automatically in the background.
Live demo
See it in action
The router classifies each task and sends the cheap ones to a local model automatically, in the background. You keep working as usual.
Real output from Claude Code with Glass Slipper installed
Features
The best of cloud and local, automatically
Frontier models handle the hard thinking. Local models handle the rest. Glass Slipper routes between them so you don't have to.
Intelligent task routing
The router classifies each incoming task and decides which model should handle it: frontier models for hard thinking, local models for the grunt work. It learns how to delegate better over time.
Tokenmax without the limits
Stop burning credits and hitting usage caps. Local models take on most of your tasks at the same quality, keeping your costs down while you keep shipping.
Stay in your tool
Glass Slipper works in the background with Claude, Codex, or Cursor. There's no new interface to learn, so you keep working exactly where you already are.
Privacy when you need it
Some data should never leave your machine. Tell your tool to route sensitive work through Glass Slipper and it stays on device.
Zero configuration
A single DMG with everything bundled: a Rust MCP server, a tuned harness, and a vendored llama-server for Apple Silicon. No tinkering with temperature, quantization, or top_k.
Fast on-device inference
Built and tuned for Apple Silicon, local inference runs fast, speeding up routine tasks instead of waiting on a round trip to the cloud.
Initial public release
- +Intelligent router that classifies tasks and delegates between cloud and local models automatically
- +MCP server with built-in local_summarize, local_explain, and local_review tools
- +One-click model download and llama-server management from the menu bar
- +Local inference tuned for Apple Silicon, so there's no tinkering with temperature, quantization, or top_k
- +No telemetry, no auto-update, fully self-contained bundle (no Homebrew or Python required)
Stay in the loop
Monthly release notes. No spam, unsubscribe any time.