Community Evals and Benchmark repositories launched
Press Space for next Tweet
We just shipped Community Evals and Benchmark repositories for decentralized evals π€ > Scores you and model authors report are on leaderboards ππ» > Benchmark datasets host live leaderboards of reported results π > You can open PRs to add scores, they live in model repositories. Community Evals will expose scores currently distributed across model cards, papers, and benchmarks. It wonβt solve the differences in scores, but it is transparent!
Topics
Read the stories that matter.The stories and ideas that actually matter.
Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.