Butter sits between your application and AI providers, offering a unified OpenAI-compatible API with multi-provider routing, automatic failover, and sub-50μs overhead. Written in Go with a single dependency.
Everything you need to route AI traffic with confidence.
Drop-in replacement for any OpenAI SDK client. Just change the base URL.
Route models to OpenAI, Anthropic, and OpenRouter with priority or round-robin strategies.
Full streaming support with immediate per-chunk flush via SSE relay.
Configurable retry-on status codes with exponential backoff across providers.
Weighted random key selection with per-key model allowlists.
Ordered hook chains for request/response processing with fail-open design. Built-in request logger and rate limiter included.
Built-in token bucket rate limiter with global or per-IP modes. Plugins can short-circuit requests before they reach providers.
Sandboxed WASM plugins via Extism for external custom logic in any language.
In-memory LRU and Redis caching to reduce costs and latency.
OpenTelemetry tracing and Prometheus metrics for production monitoring.
Up and running in under a minute.
Download the latest binary from GitHub Releases, or build from source:
git clone https://github.com/temikus/butter.git
cd butter
go build -o pkg/bin/butter ./cmd/butter/
cp config.example.yaml config.yaml
export OPENAI_API_KEY="sk-..."
export OPENROUTER_API_KEY="sk-or-v1-..."
./pkg/bin/butter -config config.yaml
# {"level":"INFO","msg":"butter listening","address":":8080"}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Say hello!"}]
}'
Works with any OpenAI-compatible client. Just change the base URL.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # Butter uses its own configured keys
)
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused",
});
const completion = await client.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);
Minimal layers, maximum throughput.
Engineered for negligible overhead.
| Metric | Target |
|---|---|
| Per-request overhead (no plugins) | <50μs |
| Per-request overhead (built-in plugins) | <100μs |
| Per-request overhead (1 WASM plugin) | <150μs |
| Streaming TTFB overhead | <1ms |
| Memory at idle | <30MB |