Loading
Today’s AI models can describe what’s happening in a video. But what if you asked them why it’s happening, or what it means emotionally, symbolically, or across different scenes?
A new benchmark called MMR-V challenges AI to go beyond just seeing, to actually reason across long videos like a human would. Not just “the man picked up a coat,” but “what does that coat symbolize?” Not just “a girl gives a card,” but “why did she write it, and for whom?”
It turns out that even the most advanced AI models struggle with this. Humans score ~86% on these tasks. The best AI? Just 52.5%.
If you’re curious about where AI really stands with video understanding, and where it’s still falling short, this benchmark is one of the clearest tests yet.
submitted by /u/SlightLion7
[link] [comments]