Guardrails as a Service

Per-user limits and visibility for realtime AI APIs

A realtime AI API proxy that adds token and cost tracking, usage limits, and enforcement guardrails without changing how you build. Point your client at the proxy, send identity headers, and guardrails apply automatically.

Get started free View code example

Free tier: up to 10M tokens monitored. No credit card required.

[ Hero image placeholder: proxy flow / dashboard ]

A thin proxy layer between your clients and AI providers

Tokenist acts as a WebSocket (and WebRTC) proxy between your application and AI realtime APIs. It supports different AI service providers and is designed for minimal (sub-10ms) added latency. Traffic is relayed bidirectionally with lightweight interception for token counting and policy checks—end-users get the same low-latency experience as calling the provider directly.

Per-user accounting — Token and cost tracking by user and optional organization.
Enforcement guardrails — Cost and token limits with immediate connection closure when exceeded.
Blocklist — Block users by ID with optional reason and expiry.
Admin API & dashboard — Query usage, set limits, and manage users without touching application code.

[ Image placeholder: architecture diagram — client → proxy → upstream ]

Everything you need to control realtime AI usage

Developer-friendly, minimal configuration. No SDK lock-in—just a thin proxy that enforces limits and keeps usage under your control.

Identity & headers

Clients send x-user-id (required) and optional x-org-id on the WebSocket handshake. In-memory or MongoDB modes; proxy API keys (ug_...) when using MongoDB.

Per-user usage & cost

Input and output tokens estimated from realtime events. Cost from configurable model pricing. In-memory (LRU) or MongoDB; optional Redis for multi-instance.

Usage windows

When MongoDB is enabled: daily (UTC midnight), monthly, or rolling_24h. Default and per-user window configurable.

Guardrail thresholds

Per-user max_cost_usd and max_total_tokens. Enforced on connect and after each message; connection closed with defined close codes when exceeded.

Blocklist

Block by user ID with optional reason and expiry. Unblock and list blocked users via admin API. Blocked users cannot open new connections.

Admin HTTP API

Health, user usage, list users, set threshold, block/unblock, list blocked. With MongoDB: create user, rotate key, usage by period, org summary.

Dashboard

React + Next.js app for org-level visibility: total cost, filters by period (monthly/daily/rolling 24h), feature, and users. Refreshes on interval and focus.

Protocols & latency

WebSocket primary; WebRTC supported. Designed for sub-10ms added latency; bidirectional relay with lightweight parsing and policy checks.

Connection close codes

Consistent close codes so clients can handle failures.

4001Missing user ID

4003User blocked

4004Threshold exceeded

4502Upstream error

Minimal integration

Point your OpenAI-style client at the proxy URL and send identity headers. No SDK lock-in.

Connect to proxy (WebSocket)

const ws = new WebSocket(
  'wss://proxy.example.com/v1/realtime?model=gpt-4o-realtime-preview',
  {
    headers: {
      'x-user-id': 'user_abc123',
      'x-org-id': 'org_xyz',  // optional
    },
  }
);

OpenAI client with baseUrl

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://proxy.example.com',
  apiKey: process.env.OPENAI_API_KEY, // or proxy key in MongoDB mode
});

// Identity for usage tracking (e.g. via custom fetch/headers)
// Tokenist reads x-user-id and x-org-id from WebSocket handshake
const realtime = await client.beta.realtime.connect({
  model: 'gpt-4o-realtime-preview',
  // ... pass user/org in your connection layer
});

Guardrails apply automatically. Over limit or blocked? Connection closes with a defined code (4003, 4004) so your client can handle it.

Simple, usage-based pricing

Pay for tokens monitored. Generous free tier; scale as you grow.

Free

Startups, small projects, early testing

$0/mo

Up to 10M tokens monitored

Overage: 10¢ per 1M extra tokens

Full core enforcement
Basic dashboards + raw log export
Community/email support

Get started free

Popular

Starter

Early commercial apps testing guardrails

$29/mo

$290/yr (~2 months free)

50M tokens monitored

Overage: 8¢ per 1M tokens

Basic analytics
Threshold alert emails
Per-org dashboarding

Start trial

Growth

Growing products with more users and activity

$199/mo

$1,990/yr (~2 months free)

200M tokens monitored

Overage: 6¢ per 1M tokens

Rich dashboards + cohort token usage segmentation
Slack alerts & webhook integrations
Longer data retention (e.g. 90 days)

Contact sales

Pro

Serious usage and enterprise needs

$799/mo

$7,990/yr

1B tokens monitored

Overage: 4¢ per 1M tokens

Priority support
SLA guarantees
Advanced alerting (anomalies, model impact)
Export to external data stores
Unlimited dashboards

Contact sales

Enterprise

Custom quota (1B+), dedicated support, SLA, onboarding. Custom limits for telemetry retention and org governance.

Typically $20,000+/yr — custom quotes based on volume and needs.

Contact sales

Optional add-ons: Premium Alerts & Automation +$49/mo, Dedicated Support/CSM +$150/mo, Longer Data Retention (360 days) +$100/mo.

Frequently asked questions

Common questions about Tokenist and realtime AI guardrails.

Tokenist is a Guardrails as a Service product: a realtime AI API proxy that adds per-user token and cost tracking, usage limits, and enforcement guardrails. You point your client at the proxy URL, send identity headers (x-user-id, optional x-org-id), and guardrails apply automatically—no SDK lock-in.