New AI benchmarks

New benchmark called APEX-Agents shows that every AI lab is failing in the workplace. “Faced with queries from real professionals, even the best models struggled to get more than a quarter of the questions right. The vast majority of the time, the model came back with a wrong answer or no answer at all.”

TechCrunch | Are AI agents ready for the workplace? A new benchmark raises doubts.