Reinforcement Learning with Verifiable Rewards Chapter
Press Space for next Tweet
Ch 6 on RL with verifiable rewards is now available. Essentially GRPO from scratch, and probably my favorite chapter so far. (First 363 pages done, yay!) I'm now working on the follow-up with more RLVR runs, more metrics & analyses, and extensions like policy clipping and KL
962
22
109
657
Topics
Read the stories that matter.The stories and ideas that actually matter.
Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.