Guardrails as a Service
Per-user limits and visibility for realtime AI APIs
A realtime AI API proxy that adds token and cost tracking, usage limits, and enforcement guardrails without changing how you build. Point your client at the proxy, send identity headers, and guardrails apply automatically.
Free tier: up to 10M tokens monitored. No credit card required.
A thin proxy layer between your clients and AI providers
Tokenist acts as a WebSocket (and WebRTC) proxy between your application and AI realtime APIs. It supports different AI service providers and is designed for minimal (sub-10ms) added latency. Traffic is relayed bidirectionally with lightweight interception for token counting and policy checks—end-users get the same low-latency experience as calling the provider directly.
- Per-user accounting — Token and cost tracking by user and optional organization.
- Enforcement guardrails — Cost and token limits with immediate connection closure when exceeded.
- Blocklist — Block users by ID with optional reason and expiry.
- Admin API & dashboard — Query usage, set limits, and manage users without touching application code.
Everything you need to control realtime AI usage
Developer-friendly, minimal configuration. No SDK lock-in—just a thin proxy that enforces limits and keeps usage under your control.
Identity & headers
Clients send x-user-id (required) and optional x-org-id on the WebSocket handshake. In-memory or MongoDB modes; proxy API keys (ug_...) when using MongoDB.
Per-user usage & cost
Input and output tokens estimated from realtime events. Cost from configurable model pricing. In-memory (LRU) or MongoDB; optional Redis for multi-instance.
Usage windows
When MongoDB is enabled: daily (UTC midnight), monthly, or rolling_24h. Default and per-user window configurable.
Guardrail thresholds
Per-user max_cost_usd and max_total_tokens. Enforced on connect and after each message; connection closed with defined close codes when exceeded.
Blocklist
Block by user ID with optional reason and expiry. Unblock and list blocked users via admin API. Blocked users cannot open new connections.
Admin HTTP API
Health, user usage, list users, set threshold, block/unblock, list blocked. With MongoDB: create user, rotate key, usage by period, org summary.
Dashboard
React + Next.js app for org-level visibility: total cost, filters by period (monthly/daily/rolling 24h), feature, and users. Refreshes on interval and focus.
Protocols & latency
WebSocket primary; WebRTC supported. Designed for sub-10ms added latency; bidirectional relay with lightweight parsing and policy checks.
Connection close codes
Consistent close codes so clients can handle failures.
Minimal integration
Point your OpenAI-style client at the proxy URL and send identity headers. No SDK lock-in.
Connect to proxy (WebSocket)
const ws = new WebSocket(
'wss://proxy.example.com/v1/realtime?model=gpt-4o-realtime-preview',
{
headers: {
'x-user-id': 'user_abc123',
'x-org-id': 'org_xyz', // optional
},
}
);OpenAI client with baseUrl
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://proxy.example.com',
apiKey: process.env.OPENAI_API_KEY, // or proxy key in MongoDB mode
});
// Identity for usage tracking (e.g. via custom fetch/headers)
// Tokenist reads x-user-id and x-org-id from WebSocket handshake
const realtime = await client.beta.realtime.connect({
model: 'gpt-4o-realtime-preview',
// ... pass user/org in your connection layer
});Guardrails apply automatically. Over limit or blocked? Connection closes with a defined code (4003, 4004) so your client can handle it.
Simple, usage-based pricing
Pay for tokens monitored. Generous free tier; scale as you grow.
Free
Startups, small projects, early testing
Up to 10M tokens monitored
Overage: 10¢ per 1M extra tokens
- Full core enforcement
- Basic dashboards + raw log export
- Community/email support
Starter
Early commercial apps testing guardrails
$290/yr (~2 months free)
50M tokens monitored
Overage: 8¢ per 1M tokens
- Basic analytics
- Threshold alert emails
- Per-org dashboarding
Growth
Growing products with more users and activity
$1,990/yr (~2 months free)
200M tokens monitored
Overage: 6¢ per 1M tokens
- Rich dashboards + cohort token usage segmentation
- Slack alerts & webhook integrations
- Longer data retention (e.g. 90 days)
Pro
Serious usage and enterprise needs
$7,990/yr
1B tokens monitored
Overage: 4¢ per 1M tokens
- Priority support
- SLA guarantees
- Advanced alerting (anomalies, model impact)
- Export to external data stores
- Unlimited dashboards
Enterprise
Custom quota (1B+), dedicated support, SLA, onboarding. Custom limits for telemetry retention and org governance.
Typically $20,000+/yr — custom quotes based on volume and needs.
Contact salesOptional add-ons: Premium Alerts & Automation +$49/mo, Dedicated Support/CSM +$150/mo, Longer Data Retention (360 days) +$100/mo.
Frequently asked questions
Common questions about Tokenist and realtime AI guardrails.