AI Benchmarking Focuses on Coding Over Real Work Tasks
Press Space for next Tweet
What a great illustration of the central problem of AI benchmarking for real work All of the effort is going into benchmarking for coding, but that is a small part of the actual jobs people do, which leaves the true trajectory of AI progress less clear. https://arxiv.org/pdf/2603.01203

