Why New AI Models Go From Hyped to Trash in Weeks
Press Space for next Tweet
Why does it feel like new models get incredibly hyped at their launch, but then a few weeks later, they're now "trash"? I've been thinking about this a lot and many things can be true at the same time. I'll give you my most optimistic & pessimistic takes. Let's start positive. When new models come out, especially those that are state of the art (SOTA), they are genuinely incredible to try and use. Things that the prior generation of models sucked at... new models sometimes completely solve/fix! This can feel like a massive unlock for building software, knowledge work, research, data analysis, etc. One new model we've been testing is incredibly good at making accurate SQL queries, seemingly better than any other model we've tried. This is exciting! And it makes sense people then share those opinions here. Okay, more pessimistic. For better or for worse, people are incentivized to share colorful takes on pretty much anything. Some of these folks are relying on those X creator payouts for side cash. The views literally convert to dollars! This can create... tension. It's hard to tell when a take is honest and genuine, versus sensationalized for Elon bucks. (Side note: the "paid partnership" labels on tweets are a step in the right direction, although I don't think they really solve this inherent issue with creator payouts, as it's not a direct payment from company → creator) So there's lots of hype when new models drop. We see benchmarks where numbers usually go up and to the right, but it's hard to tell if that actually translates to better performance on the things we care about. The only way to really know is to try it, tinker, build things... but that takes time to do correctly, which is why the best takes on models are often a little delayed while people really "taste test" them. There's another angle here that makes it hard to understand hype/hate, which is that these models all have their own personalities/style/quirks. One person might love the verbosity and warmth of a model, while someone else completely hates it. At least for coding, it does seem like Codex/Opus/etc are converging to a similar style, but they are definitely still different (and people feel strongly about those differences!). So people use the latest frontier models for weeks to a month, but then you notice that the tides may turn online. Opus was the best model in the world, and now people think it's dumb/slow/bad. Rinse and repeat for Codex or other models. It's helpful to remember that most people are busy happily building/shipping at this point! Sometimes this feeling of model degradation is due to an actual issue! Maybe there was an inference bug, or provider downtime, or small updates/tweaks. The model checkpoints can change. However I would argue this is not the majority case. The best explanation for this, to me, is "hedonic adaptation". You quickly can get used to an improvement, so that what previously felt amazing and innovative now feels like your new baseline. Then it's no longer new/sexy. This is just how our brains are wired and not really specific to AI models. The best way to combat it is to be aware of your own biases. So... what should you do to make sense of all the takes on this site? 1. Try to read lots of opinions, not just official posts, but those from a variety of people using the models for things *you're interested in* 2. Listen and take note of opinions, but make sure you're forming your own opinions based on your usage/tinkering/experimentation 3. Remember to be skeptical of sensationalized posts about new models (it's so over / we're so back cycle)