Caching Common Questions to Improve LLM Response Time
Press Space for next Tweet
John: why aren't LLMs caching answers to common questions to increase response time? "I've asked ChatGPT, ‘When was OpenAI founded?’ three different times. It's the exact same query." "It doesn't need to light the GPUs on fire for that question. So cache those results and give them to the user instantly." "LLMs felt slow for a really long time. They actually got slower once the reasoning models came out. It was like: close your phone and come back in 20 minutes. That doesn't have to be the end state."
Topics
Read the stories that matter.The stories and ideas that actually matter.
Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.