Finding signal on Twitter is more difficult than it used to be. We curate the best tweets on topics like AI, startups, and product development every weekday so you can focus on what matters.

Environment Slop and the Future of Model Training

The co-founders of @flappyairplanes call the current RL paradigm for model training "environment slop." They explain: "The reinforcement paradigms of today are shockingly inefficient. You don't really get much generalization across tasks, you teach a model through one kind of learning and then you teach it the next one. It's kind of like whack-a-mole. We look at this and think it's kind of crazy. The next paradigm of AI will not be environment slop." "Human level intelligence is not the ceiling, it is merely the floor on what is possible. If you can train models with vastly less data and possibly more compute in very different ways, what is going to happen? We actually don't know. But I do think they'll be different and weird and they'll have interesting capabilities that we'll find really valuable ways to use."

Video thumbnail
View

Topics

Read the stories that matter.

Save hours a day in 5 minutes