How It Works

The price you pay for inference changes constantly — providers cut rates, launch new tiers, adjust pricing overnight. Exostream gives you two things: a way to measure that moving price, and a way to fix it.

Fixed vs. Floating Price

Right now, every API call you make is priced at the floating rate — whatever the provider charges today. That rate moves. Sometimes in your favor (price cuts), sometimes against you (new premium models, tier changes). You can't budget against a moving target.

A fixed rate is an agreed price that doesn't change for the duration of a contract. You know exactly what you'll pay per million tokens next month, next quarter, regardless of what the market does.

FLOATING (today)

You pay spot price on every call. Your monthly bill is unpredictable. A 30% overnight price cut saves you money. A new model launch at 2x the price doesn't.

FIXED (with EICS)

You lock a rate for 1, 3, or 6 months. Your bill is known in advance. You miss surprise discounts, but you're protected from surprise spikes.

EICI — Exostream Inference Cost Index

EICI is a weighted benchmark that tracks the average cost of AI inference across 42 models and 7 providers. It's the floating rate — a single number that tells you what the market charges today.

When OpenAI cuts prices, when Anthropic launches a new tier, when Google adjusts Gemini — EICI reflects it. Updated daily. It's the reference price both sides of a swap agree on.

EICS — Exostream Inference Cost Swap

A swap is an agreement to exchange a floating price for a fixed one. With EICS, you swap your unpredictable per-token cost for a known fixed rate, settled against the EICI benchmark.

You choose a term (1M, 3M, or 6M). Exostream prices the swap using the EICI forward curve — a projection of where inference costs are headed based on historical decay rates. The result is a fair fixed rate for that period.

If the market moves above your fixed rate, you're protected. If it drops below, you're overpaying — but you bought predictability. That's the tradeoff.

The Tradeoff, Visualized

The green line is the EICI — the floating market rate, moving daily. The dashed line is your EICS fixed rate. Green shading means the market is above your rate and you're saving money. Red shading means the market dipped below and you're paying more than spot — the cost of certainty.

Market above your rate — you saveMarket below your rate — you overpayYour EICS fixed rate ($8.02/M)

Illustrative data. Real EICI values depend on constituent model pricing.

Why Fix Inference Costs?

Inference costs have been falling. So why would you lock in a fixed rate on something that's getting cheaper? Because the trend isn't guaranteed, and even if it holds, the rate and timing of decline are unpredictable.

Budget visibility

You can't run a P&L on "probably cheaper next quarter." Finance needs a number for the forecast. Procurement needs a number for the contract. A swap gives you that number.

Margin protection

If you sell a product with AI inside, your pricing to customers is sticky — you quoted them a rate. If inference costs spike, your margin disappears. A swap locks your unit economics for the term of the contract.

Long-term contract exposure

You signed a 12-month deal with a customer assuming inference at $X/M. If costs end up at 2X, you eat the difference. The swap hedges the gap between your revenue commitment and your cost exposure.

The growth-to-revenue regime change

Foundation model providers are in land-grab mode — pricing at or below cost to capture share. When they need to turn profitable (and they will), prices stop falling or reverse. A swap protects against the day the subsidies end.

Frontier model premium

Each new generation launches at a higher price than the last. If your product needs frontier capability, you're on a treadmill of increasingly expensive new models that pull the index up — even as older models get cheaper.

Rate of decline is uncertain

Even if you're confident prices will fall, how fast matters. Budget for a 30% decline, get 10%, and you've blown your forecast. A swap removes the "how much" and "when" from the equation.

Competitive dynamics are discontinuous

Price wars end overnight. Providers exit, get acquired, or pivot strategy. These aren't smooth curves — they're step functions you can't model in a spreadsheet.

Use Cases

Who uses the oracle, and why.

For AI Engineers

Track what you're actually spending across models.

κ tells you how exposed you are to price changes — high context = high exposure.

Forward curves show where prices are headed so you can plan migrations.

Cost alerts when a cheaper model crosses below your current one.

For Finance & Procurement Teams

Budget forecasting with forward curves, not guesswork.

The oracle publishes θ (decay rate) — how fast prices are falling per model.

Inference Budget Planner: input your monthly volume, get spot + forward cost projections.

"Inference costs fell 4.2% last month per the Exostream Index" — the number for your CFO.

For LLM Tooling Platforms

Replace hardcoded price tables with a live API feed.

Observability platforms: accurate auto-updating cost tracking.

Router/gateway projects: real-time pricing for cost-aware routing decisions.

One integration, all providers, always current.

For AI Analysts & Investors

Structured historical pricing data that doesn't exist anywhere else.

Depreciation analytics: half-life of pricing power per model generation.

Provider competitive intelligence: who's cutting fastest, where margins compress.

The EICI — a single benchmark for the market.

For Agentic Systems

MCP server: query pricing conversationally from any MCP-compatible agent.

x402 integration: dynamic, market-referenced pricing for machine-to-machine inference.

Cost-aware routing: agents that optimize spend in real-time against the oracle.

Methodology

The math behind the EICI benchmark and the pricing model that powers EICS swaps. Each model's cost is decomposed into a spot price, structural Greeks, and a decay rate that drives forward projections.

The Fundamental Equation

C(T, M, t) = S(T, M) * D(M, t)

Total expected cost equals spot cost times the decay factor. At spot (t = 0), D = 1 and you get the exact, observable cost.

Ticker Price beta

The anchor of the model: beta is the published output token price at the origin provider, in USD per million tokens ($/M).

MODELSync output reference price

MODEL.BBatch output price

Structural Greeks

Greek	Definition	Range
r_in	Input/output price ratio	0.20 - 0.50
r_cache	Cache price as fraction of output	0.01 - 0.10
r_think	Thinking token price ratio	0.50 - 1.00+
r_batch	Batch discount ratio	0.40 - 0.60

Effective Input Rate

r_in_eff = r_in_depth * (1 - eta) + r_cache * eta

Combines context-depth pricing (for tiered pricing models) with cache discounts. eta is your cache hit ratio (0 to 1).

kappa - The Task's Delta

kappa = 1 + (n_in / n_out) * r_in_eff

kappa is both the context cost multiplier and your delta to beta movements. If beta moves by $1/M, your task cost moves by kappa * n_out * 10^-6.

Spot Cost

S = beta * [n_out + n_in * r_in_eff + n_think * r_think] * 10^-6

Decay Rate theta

theta is the continuous monthly decay rate, estimated from historical price data. It absorbs all sources of price decline into a single continuous rate.

theta > 0Price declining (typical)

theta = 0Stable pricing

theta < 0Price increasing (rare)

Forward Price

beta_fwd(M, t) = beta(M) * e^(-theta(M) * t)

Published at standard tenors: 1M, 3M, 6M.

Calculator Guide

How to use the Exostream calculators to understand and forecast your inference costs.

Cost Calculator

The Cost Calculator prices a single API call. Open it from the Calculators page.

Step 1: Choose a model

Select the model from the dropdown. The ticker (e.g. OPUS-4.5, GPT-4.1) matches the ticker board on the home page.

Step 2: Set your token counts

n_in — How many tokens you send to the model (your prompt, context, documents). A typical page of text is ~500 tokens. A long document might be 30K-100K tokens.

n_out — How many tokens the model generates in response. A short answer is ~200 tokens. A detailed code generation might be 2K-4K tokens.

n_think — Only shown for reasoning models (o3, o4-mini, etc.). These models "think" before responding, consuming extra tokens internally.

Step 3: Set cache hit rate

eta — What percentage of your input tokens are cached from previous calls (0-100%). If you're sending the same system prompt repeatedly, your cache rate might be 50-80%. For unique prompts each time, set this to 0%.

Step 4: Set forward horizon

horizon — How far into the future to project the cost. "Spot" gives today's price. "3M" projects what this call will cost in 3 months, based on the model's historical price decay rate (theta).

Presets

Use the preset buttons to quickly load common workload profiles:

RAG — Retrieval-augmented generation: large context (30K in), short answer (800 out), high cache (60%)
Code Gen — Code generation: moderate context (5K in), longer output (2K out), some cache (20%)
Summarize — Document summarization: very large input (50K in), short summary (500 out), no cache

Reading the results

Spot Cost — The cost of this single call at current prices.

kappa — Your context cost multiplier. A kappa of 5 means your call costs 5x what it would if you only paid for output tokens. Higher kappa = more sensitive to model price changes.

Cache Savings — How much caching is saving you compared to zero caching.

System Canvas

The System Canvas models your entire AI system — multiple models, task types, and volume — to project monthly costs and unit economics. Open it from the Calculators page.

Choose a preset or customize

Start with a preset that matches your use case — GSD for agentic coding, SaaS for product-integrated AI, Trading for high-value analytics, etc. Each preset configures volume, task mix, model allocation, and unit economics to match real-world architectures.

Configure your system

Monthly API Calls — Total inference calls per month across all task types.

Task Distribution — What percentage of calls are Simple (short), Medium, Complex (long context), or Reasoning (chain-of-thought). Must sum to 100%.

Model Allocation — Which models handle what share of traffic. Add up to 6 models. Weights should sum to 100%.

Unit Economics

Set your revenue per task (p), overhead per task (t), and fixed monthly costs (F) to see profit margins, break-even volume, and how profitability scales with volume.

Optimization Levers

The canvas automatically ranks your biggest savings opportunities — switching to cheaper models, improving caching, waiting for theta decay, or downshifting heavy tasks. Each lever shows estimated monthly savings in dollars and percentage.

API

The Exostream API provides programmatic access to live pricing data, forward curves, and cost calculations. Free tier requires no API key (60 requests/hour per IP). See the full API Reference for curl examples, parameters, and response schemas.

Endpoints

Method	Endpoint	Description
GET	/v1/spots	Current spot prices (beta) for all models
GET	/v1/greeks	Full Greek sheet — spot prices plus structural Greeks and extrinsic parameters
GET	/v1/forwards/:ticker	Forward curve for a model at 1M, 3M, 6M tenors
POST	/v1/price	Calculate cost for a task profile (model, tokens, cache, horizon)
POST	/v1/compare	Compare all models for a task, ranked by cost
GET	/v1/history/:ticker	Historical price data with provenance markers
GET	/v1/ici	EICI value with constituents, weights, and confidence
POST	/v1/swap	Price an EICS swap — fair fixed rate, NPV, Greeks
GET	/v1/monte-carlo/:ticker	Monte Carlo simulation for future price distribution
GET	/v1/backtest	Forward curve backtest — tenor summaries and predictions

View full API Reference →

Research

Open data and diagnostics for quants, researchers, and anyone stress-testing the model.

Forward Curve Backtest

The EPM generates forward curves using theta-based exponential decay fitted to each model family's historical pricing. We backtest these projections against realised prices to measure model fit.

Important context: Forward curves in any market are not predictions — they are a consistent, reproducible basis for pricing future contracts. Interest rate forwards are routinely "wrong" about where rates end up. Oil futures miss by 30-50% over 6 months. The value is in providing a transparent, arbitrage-consistent reference that counterparties can agree on, not in forecasting accuracy.

AI inference pricing is especially volatile: providers make discrete strategic pricing decisions (50% overnight cuts, successor models launching at higher price points) that no smooth decay model can anticipate. The theta-based curve is a bootstrap mechanism — as swap market liquidity develops, the market itself will determine the forward curve through trading.

Current Backtest Results

Live data from GET /v1/backtest — results update as new price history accumulates.

Tenor	Median MAPE	Mean MAPE	Predictions	Notes
1M	~39%	~600%	54	Mean inflated by successor repricing events
3M	~52%	~303%	66	Wider tenor amplifies discrete price shocks
6M	~59%	~553%	44	Fewer data points, higher structural uncertainty

Why the error is high

Step-function repricing: Providers cut prices 30-50% overnight in competitive responses. A smooth decay model cannot capture discontinuous moves.

Successor launches at higher prices: New model generations sometimes launch above their predecessor (e.g. GPT-5 to GPT-5.2), inverting the decay assumption within a family.

Strategic pricing: AI model pricing is set by business strategy (margin targets, market share), not cost curves. No model-derived forward can anticipate a competitor's pricing decision.

Mean vs median divergence: Mean MAPE is 5-15x median, confirming a fat-tailed error distribution. A small number of extreme repricing events dominate the mean while median captures typical model fit.

Explore the data

GET https://api.exostream.ai/v1/backtest

# Returns: tenor summaries, family breakdowns,
# individual predictions with predicted vs actual prices,
# filtering metadata (audit corrections, step-function events)

Pull the raw data and run your own analysis. We publish everything.

Data Integrity

How Exostream ensures pricing data quality and reliability.

Three-tier sources

5 independent pricing archives (Helicone, LiteLLM, ArtificialAnalysis, Wayback Machine, pricepertoken) plus 3 token distribution datasets. No single source can corrupt the feed.

Cryptographic signing

Every price report signed with ed25519 at ingestion. Tampered entries rejected before index computation.

Circuit breaker

Auto-trips on >15% intraday moves. Stale prices (>48h) isolated from index.

Open methodology

Full model specification, weighting formula, and backtest data available via the /v1/backtest API endpoint. See the Accuracy Report for detailed validation methodology.

Open Questions

Areas we're actively investigating. Contributions welcome.

Regime-switching models: Can a hidden Markov model detect "competitive response" vs "steady decay" regimes to improve forward accuracy?
Jump-diffusion: Merton-style jump processes may better capture overnight repricing events than pure exponential decay.
Cross-provider contagion: When one provider cuts, how quickly and how much do competitors follow? Estimating contagion coefficients.
Implied theta from swaps: Once EICS contracts trade, market-implied decay rates vs model-derived theta — the basis trade.