Finding signal on Twitter is more difficult than it used to be. We curate the best tweets on topics like AI, startups, and product development every weekday so you can focus on what matters.

Community Evals and Benchmark repositories launched

We just shipped Community Evals and Benchmark repositories for decentralized evals πŸ€— > Scores you and model authors report are on leaderboards πŸ™ŒπŸ» > Benchmark datasets host live leaderboards of reported results πŸš€ > You can open PRs to add scores, they live in model repositories. Community Evals will expose scores currently distributed across model cards, papers, and benchmarks. It won’t solve the differences in scores, but it is transparent!

Video thumbnail
View
7
2
2
2

Topics

Read the stories that matter.

Save hours a day in 5 minutes