How It Works
The price you pay for inference changes constantly — providers cut rates, launch new tiers, adjust pricing overnight. Exostream gives you two things: a way to measure that moving price, and a way to fix it.
Fixed vs. Floating Price
Right now, every API call you make is priced at the floating rate — whatever the provider charges today. That rate moves. Sometimes in your favor (price cuts), sometimes against you (new premium models, tier changes). You can't budget against a moving target.
A fixed rate is an agreed price that doesn't change for the duration of a contract. You know exactly what you'll pay per million tokens next month, next quarter, regardless of what the market does.
You pay spot price on every call. Your monthly bill is unpredictable. A 30% overnight price cut saves you money. A new model launch at 2x the price doesn't.
You lock a rate for 1, 3, or 6 months. Your bill is known in advance. You miss surprise discounts, but you're protected from surprise spikes.
EICI — Exostream Inference Cost Index
EICI is a weighted benchmark that tracks the average cost of AI inference across 42 models and 7 providers. It's the floating rate — a single number that tells you what the market charges today.
When OpenAI cuts prices, when Anthropic launches a new tier, when Google adjusts Gemini — EICI reflects it. Updated daily. It's the reference price both sides of a swap agree on.
EICS — Exostream Inference Cost Swap
A swap is an agreement to exchange a floating price for a fixed one. With EICS, you swap your unpredictable per-token cost for a known fixed rate, settled against the EICI benchmark.
You choose a term (1M, 3M, or 6M). Exostream prices the swap using the EICI forward curve — a projection of where inference costs are headed based on historical decay rates. The result is a fair fixed rate for that period.
If the market moves above your fixed rate, you're protected. If it drops below, you're overpaying — but you bought predictability. That's the tradeoff.
The Tradeoff, Visualized
The green line is the EICI — the floating market rate, moving daily. The dashed line is your EICS fixed rate. Green shading means the market is above your rate and you're saving money. Red shading means the market dipped below and you're paying more than spot — the cost of certainty.
Illustrative data. Real EICI values depend on constituent model pricing.
Why Fix Inference Costs?
Inference costs have been falling. So why would you lock in a fixed rate on something that's getting cheaper? Because the trend isn't guaranteed, and even if it holds, the rate and timing of decline are unpredictable.
You can't run a P&L on "probably cheaper next quarter." Finance needs a number for the forecast. Procurement needs a number for the contract. A swap gives you that number.
If you sell a product with AI inside, your pricing to customers is sticky — you quoted them a rate. If inference costs spike, your margin disappears. A swap locks your unit economics for the term of the contract.
You signed a 12-month deal with a customer assuming inference at $X/M. If costs end up at 2X, you eat the difference. The swap hedges the gap between your revenue commitment and your cost exposure.
Foundation model providers are in land-grab mode — pricing at or below cost to capture share. When they need to turn profitable (and they will), prices stop falling or reverse. A swap protects against the day the subsidies end.
Each new generation launches at a higher price than the last. If your product needs frontier capability, you're on a treadmill of increasingly expensive new models that pull the index up — even as older models get cheaper.
Even if you're confident prices will fall, how fast matters. Budget for a 30% decline, get 10%, and you've blown your forecast. A swap removes the "how much" and "when" from the equation.
Price wars end overnight. Providers exit, get acquired, or pivot strategy. These aren't smooth curves — they're step functions you can't model in a spreadsheet.
Use Cases
Who uses the oracle, and why.
For AI Engineers
Track what you're actually spending across models.
κ tells you how exposed you are to price changes — high context = high exposure.
Forward curves show where prices are headed so you can plan migrations.
Cost alerts when a cheaper model crosses below your current one.
For Finance & Procurement Teams
Budget forecasting with forward curves, not guesswork.
The oracle publishes θ (decay rate) — how fast prices are falling per model.
Inference Budget Planner: input your monthly volume, get spot + forward cost projections.
"Inference costs fell 4.2% last month per the Exostream Index" — the number for your CFO.
For LLM Tooling Platforms
Replace hardcoded price tables with a live API feed.
Observability platforms: accurate auto-updating cost tracking.
Router/gateway projects: real-time pricing for cost-aware routing decisions.
One integration, all providers, always current.
For AI Analysts & Investors
Structured historical pricing data that doesn't exist anywhere else.
Depreciation analytics: half-life of pricing power per model generation.
Provider competitive intelligence: who's cutting fastest, where margins compress.
The EICI — a single benchmark for the market.
For Agentic Systems
MCP server: query pricing conversationally from any MCP-compatible agent.
x402 integration: dynamic, market-referenced pricing for machine-to-machine inference.
Cost-aware routing: agents that optimize spend in real-time against the oracle.
Methodology
The math behind the EICI benchmark and the pricing model that powers EICS swaps. Each model's cost is decomposed into a spot price, structural Greeks, and a decay rate that drives forward projections.
The Fundamental Equation
Total expected cost equals spot cost times the decay factor. At spot (t = 0), D = 1 and you get the exact, observable cost.
Ticker Price beta
The anchor of the model: beta is the published output token price at the origin provider, in USD per million tokens ($/M).
Structural Greeks
| Greek | Definition | Range |
|---|---|---|
| r_in | Input/output price ratio | 0.20 - 0.50 |
| r_cache | Cache price as fraction of output | 0.01 - 0.10 |
| r_think | Thinking token price ratio | 0.50 - 1.00+ |
| r_batch | Batch discount ratio | 0.40 - 0.60 |
Effective Input Rate
Combines context-depth pricing (for tiered pricing models) with cache discounts. eta is your cache hit ratio (0 to 1).
kappa - The Task's Delta
kappa is both the context cost multiplier and your delta to beta movements. If beta moves by $1/M, your task cost moves by kappa * n_out * 10^-6.
Spot Cost
Decay Rate theta
theta is the continuous monthly decay rate, estimated from historical price data. It absorbs all sources of price decline into a single continuous rate.
Forward Price
Published at standard tenors: 1M, 3M, 6M.
Calculator Guide
How to use the Exostream calculators to understand and forecast your inference costs.
Cost Calculator
The Cost Calculator prices a single API call. Open it from the Calculators page.
Step 1: Choose a model
Select the model from the dropdown. The ticker (e.g. OPUS-4.5, GPT-4.1) matches the ticker board on the home page.
Step 2: Set your token counts
n_in — How many tokens you send to the model (your prompt, context, documents). A typical page of text is ~500 tokens. A long document might be 30K-100K tokens.
n_out — How many tokens the model generates in response. A short answer is ~200 tokens. A detailed code generation might be 2K-4K tokens.
n_think — Only shown for reasoning models (o3, o4-mini, etc.). These models "think" before responding, consuming extra tokens internally.
Step 3: Set cache hit rate
eta — What percentage of your input tokens are cached from previous calls (0-100%). If you're sending the same system prompt repeatedly, your cache rate might be 50-80%. For unique prompts each time, set this to 0%.
Step 4: Set forward horizon
horizon — How far into the future to project the cost. "Spot" gives today's price. "3M" projects what this call will cost in 3 months, based on the model's historical price decay rate (theta).
Presets
Use the preset buttons to quickly load common workload profiles:
- RAG — Retrieval-augmented generation: large context (30K in), short answer (800 out), high cache (60%)
- Code Gen — Code generation: moderate context (5K in), longer output (2K out), some cache (20%)
- Summarize — Document summarization: very large input (50K in), short summary (500 out), no cache
Reading the results
Spot Cost — The cost of this single call at current prices.
kappa — Your context cost multiplier. A kappa of 5 means your call costs 5x what it would if you only paid for output tokens. Higher kappa = more sensitive to model price changes.
Cache Savings — How much caching is saving you compared to zero caching.
System Canvas
The System Canvas models your entire AI system — multiple models, task types, and volume — to project monthly costs and unit economics. Open it from the Calculators page.
Choose a preset or customize
Start with a preset that matches your use case — GSD for agentic coding, SaaS for product-integrated AI, Trading for high-value analytics, etc. Each preset configures volume, task mix, model allocation, and unit economics to match real-world architectures.
Configure your system
Monthly API Calls — Total inference calls per month across all task types.
Task Distribution — What percentage of calls are Simple (short), Medium, Complex (long context), or Reasoning (chain-of-thought). Must sum to 100%.
Model Allocation — Which models handle what share of traffic. Add up to 6 models. Weights should sum to 100%.
Unit Economics
Set your revenue per task (p), overhead per task (t), and fixed monthly costs (F) to see profit margins, break-even volume, and how profitability scales with volume.
Optimization Levers
The canvas automatically ranks your biggest savings opportunities — switching to cheaper models, improving caching, waiting for theta decay, or downshifting heavy tasks. Each lever shows estimated monthly savings in dollars and percentage.
API
The Exostream API provides programmatic access to live pricing data, forward curves, and cost calculations. Free tier requires no API key (60 requests/hour per IP). See the full API Reference for curl examples, parameters, and response schemas.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/spots | Current spot prices (beta) for all models |
| GET | /v1/greeks | Full Greek sheet — spot prices plus structural Greeks and extrinsic parameters |
| GET | /v1/forwards/:ticker | Forward curve for a model at 1M, 3M, 6M tenors |
| POST | /v1/price | Calculate cost for a task profile (model, tokens, cache, horizon) |
| POST | /v1/compare | Compare all models for a task, ranked by cost |
| GET | /v1/history/:ticker | Historical price data with provenance markers |
| GET | /v1/ici | EICI value with constituents, weights, and confidence |
| POST | /v1/swap | Price an EICS swap — fair fixed rate, NPV, Greeks |
| GET | /v1/monte-carlo/:ticker | Monte Carlo simulation for future price distribution |
| GET | /v1/backtest | Forward curve backtest — tenor summaries and predictions |
Research
Open data and diagnostics for quants, researchers, and anyone stress-testing the model.
Forward Curve Backtest
The EPM generates forward curves using theta-based exponential decay fitted to each model family's historical pricing. We backtest these projections against realised prices to measure model fit.
Important context: Forward curves in any market are not predictions — they are a consistent, reproducible basis for pricing future contracts. Interest rate forwards are routinely "wrong" about where rates end up. Oil futures miss by 30-50% over 6 months. The value is in providing a transparent, arbitrage-consistent reference that counterparties can agree on, not in forecasting accuracy.
AI inference pricing is especially volatile: providers make discrete strategic pricing decisions (50% overnight cuts, successor models launching at higher price points) that no smooth decay model can anticipate. The theta-based curve is a bootstrap mechanism — as swap market liquidity develops, the market itself will determine the forward curve through trading.
Current Backtest Results
Live data from GET /v1/backtest — results update as new price history accumulates.
| Tenor | Median MAPE | Mean MAPE | Predictions | Notes |
|---|---|---|---|---|
| 1M | ~39% | ~600% | 54 | Mean inflated by successor repricing events |
| 3M | ~52% | ~303% | 66 | Wider tenor amplifies discrete price shocks |
| 6M | ~59% | ~553% | 44 | Fewer data points, higher structural uncertainty |
Why the error is high
Step-function repricing: Providers cut prices 30-50% overnight in competitive responses. A smooth decay model cannot capture discontinuous moves.
Successor launches at higher prices: New model generations sometimes launch above their predecessor (e.g. GPT-5 to GPT-5.2), inverting the decay assumption within a family.
Strategic pricing: AI model pricing is set by business strategy (margin targets, market share), not cost curves. No model-derived forward can anticipate a competitor's pricing decision.
Mean vs median divergence: Mean MAPE is 5-15x median, confirming a fat-tailed error distribution. A small number of extreme repricing events dominate the mean while median captures typical model fit.
Explore the data
GET https://api.exostream.ai/v1/backtest # Returns: tenor summaries, family breakdowns, # individual predictions with predicted vs actual prices, # filtering metadata (audit corrections, step-function events)
Pull the raw data and run your own analysis. We publish everything.
Data Integrity
How Exostream ensures pricing data quality and reliability.
Three-tier sources
5 independent pricing archives (Helicone, LiteLLM, ArtificialAnalysis, Wayback Machine, pricepertoken) plus 3 token distribution datasets. No single source can corrupt the feed.
Cryptographic signing
Every price report signed with ed25519 at ingestion. Tampered entries rejected before index computation.
Circuit breaker
Auto-trips on >15% intraday moves. Stale prices (>48h) isolated from index.
Open methodology
Full model specification, weighting formula, and backtest data available via the /v1/backtest API endpoint. See the Accuracy Report for detailed validation methodology.
Open Questions
Areas we're actively investigating. Contributions welcome.
- Regime-switching models: Can a hidden Markov model detect "competitive response" vs "steady decay" regimes to improve forward accuracy?
- Jump-diffusion: Merton-style jump processes may better capture overnight repricing events than pure exponential decay.
- Cross-provider contagion: When one provider cuts, how quickly and how much do competitors follow? Estimating contagion coefficients.
- Implied theta from swaps: Once EICS contracts trade, market-implied decay rates vs model-derived theta — the basis trade.