Finding signal on Twitter is more difficult than it used to be. We curate the best tweets on topics like AI, startups, and product development every weekday so you can focus on what matters.

Reinforcement Learning with Verifiable Rewards Chapter

Ch 6 on RL with verifiable rewards is now available. Essentially GRPO from scratch, and probably my favorite chapter so far. (First 363 pages done, yay!) I'm now working on the follow-up with more RLVR runs, more metrics & analyses, and extensions like policy clipping and KL

962
22
109
657

Topics

Read the stories that matter.

Save hours a day in 5 minutes