Loading
My favorite overall benchmark is livebench. If you click show subcategories for language average you will be able to rank by plot_unscrambling which to me is the most important benchmark for writing:
Vals is useful for tax and law intelligence:
The rest are interesting as well:
https://github.com/vectara/hallucination-leaderboard
https://artificialanalysis.ai/
https://aider.chat/docs/leaderboards/
https://eqbench.com/creative_writing.html
https://github.com/lechmazur/writing
Please share your favorite benchmarks too! I’d love to see some long context benchmarks.
submitted by /u/Mr-Barack-Obama
[link] [comments]