J A B B Y A I

Loading

My favorite overall benchmark is livebench. If you click show subcategories for language average you will be able to rank by plot_unscrambling which to me is the most important benchmark for writing:

https://livebench.ai/

Vals is useful for tax and law intelligence:

https://www.vals.ai/models

The rest are interesting as well:

https://github.com/vectara/hallucination-leaderboard

https://artificialanalysis.ai/

https://simple-bench.com/

https://agi.safe.ai/

https://aider.chat/docs/leaderboards/

https://eqbench.com/creative_writing.html

https://github.com/lechmazur/writing

Please share your favorite benchmarks too! I’d love to see some long context benchmarks.

submitted by /u/Mr-Barack-Obama
[link] [comments]

Leave a Comment