Platform · Gateway
Token spend is a real problem. Gateway puts all of it — cloud and local, every provider — on one set of rails, with spend controls that actually work.
Unified routing
Anthropic, OpenAI, Google, local open-weight models — Gateway puts them all behind one endpoint. Your agents don’t need to know which provider they’re talking to. You get one bill, one usage dashboard, one place to manage everything. Built on LiteLLM and OpenRouter.
Spend controls
Set daily, weekly, or monthly spend limits across your entire account. Set limits per agent, per task type, or per model tier. Notifications when you’re approaching a limit, hard stops when you hit one.
All of this is surfaced in the otto app. You can adjust limits, review spend by agent, and see exactly what each task cost — in real time, not at the end of the billing cycle.
Local models
Running a local model shouldn’t mean downloading 30GB and figuring out the rest yourself. Gateway manages local model selection, provisioning, and routing the same way it handles cloud providers — pick from curated open-weight models, and they’re running under the same familiar interface. When you need to flex between cloud and local, Gateway handles the switch automatically.
Smart routing
Not every task needs a frontier model. Gateway will analyze the requirements of a task and route to the most cost-effective model that can handle it — saving spend without you having to think about it. Coming soon.
Private mode
For sensitive workloads or low-latency setups, flip Gateway to private mode — inference stays entirely on-device. No tokens leave your network. Requires Otto Pro hardware for local inference.
Pricing
Gateway is part of otto OS — not a SaaS product with a monthly seat fee. You were going to spend on tokens anyway. We sit at the OS layer and help you manage that spend better.
We take a small transaction fee on cloud spend to cover our own infrastructure and hedging costs. Local inference runs free through Gateway. You pay for what you use. Nothing else.