Cut LLM API costs
by up to 55%.
No model changes. One extra API call. Tokelang compiles verbose prompts into smaller model-facing prompts and falls back to the original text when wording must stay exact.
Prompt -> Tokelang
search q1 sales data db compare revenue trends last quarter analyze spending spikes finance generate brief summary finance lead
Smaller when safe. Original when exact.
Tokelang compresses prompts that survive compaction and deliberately leaves exact prompts alone when fidelity matters more than savings.
"First, search for the Q1 sales data in the database. Then summarize the emerging trends in detail."
input search q1 sales data db simple output summarize emerging trends detail simple
What hits the model is smaller.
Tokelang returns a smaller prompt when it clears the savings threshold and does not trip the passthrough guardrails.
// Representative compile response
{
"compact": "input\nsearch q1 sales data db simple\noutput\nsummarize emerging trends detail simple"
}
Drop it in front of your model.
Put Tokelang between your app and the model provider. Compile first, then forward the returned output with the decode system prompt.
Prompts are not stored.
Tokelang Lite records usage metadata for the product, but does not persist raw prompt text as part of compilation.