Loading
Humans: 92.7% GPT-4o: 69.9% However, they didn’t evaluate on any recent reasoning models. If they did, they’d find that o3 gets 96.5%, beating humans.
submitted by /u/Separate-Way5095 [link] [comments]
Save my name, email, and website in this browser for the next time I comment.
Δ