Finding signal on Twitter is more difficult than it used to be. We curate the best tweets on topics like AI, startups, and product development every weekday so you can focus on what matters.

Caching Common Questions to Improve LLM Response Time

John: why aren't LLMs caching answers to common questions to increase response time? "I've asked ChatGPT, ‘When was OpenAI founded?’ three different times. It's the exact same query." "It doesn't need to light the GPUs on fire for that question. So cache those results and give them to the user instantly." "LLMs felt slow for a really long time. They actually got slower once the reasoning models came out. It was like: close your phone and come back in 20 minutes. That doesn't have to be the end state."

Video thumbnail
View

Topics

Read the stories that matter.

Save hours a day in 5 minutes