Environment Slop and the Future of Model Training

Press Space for next Tweet

The co-founders of @flappyairplanes call the current RL paradigm for model training "environment slop." They explain: "The reinforcement paradigms of today are shockingly inefficient. You don't really get much generalization across tasks, you teach a model through one kind of learning and then you teach it the next one. It's kind of like whack-a-mole. We look at this and think it's kind of crazy. The next paradigm of AI will not be environment slop." "Human level intelligence is not the ceiling, it is merely the floor on what is possible. If you can train models with vastly less data and possibly more compute in very different ways, what is going to happen? We actually don't know. But I do think they'll be different and weird and they'll have interesting capabilities that we'll find really valuable ways to use."

View

Topics

artificial intelligence machine learning reinforcement learning model training ai research deep learning technology

Read the stories that matter.The stories and ideas that actually matter.

Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.