Caching Common Questions to Improve LLM Response Time

Press Space for next Tweet

John: why aren't LLMs caching answers to common questions to increase response time? "I've asked ChatGPT, ‘When was OpenAI founded?’ three different times. It's the exact same query." "It doesn't need to light the GPUs on fire for that question. So cache those results and give them to the user instantly." "LLMs felt slow for a really long time. They actually got slower once the reasoning models came out. It was like: close your phone and come back in 20 minutes. That doesn't have to be the end state."

View

Topics

artificial intelligence machine learning technology software engineering product management user experience performance optimization

Read the stories that matter.The stories and ideas that actually matter.

Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.