Tag: infrastructure

Nov 4, 2025 · Amit Kothari · AI
Cache the prompt, not the response - why most LLM caching fails
Your LLM API bills are eating your budget because you are caching the wrong thing. Most teams cache responses when they should cache prompts. Prompt caching reuses processed context instead of reprocessing it every call, so cache reads cost a small fraction of the standard rate. Anthropic reports up to 90% off.
aicost-optimizationperformanceinfrastructure

AI advisory services via Blue Sheen.