Loading
https://reddit.com/link/1j12vc6/video/5qrwwq0tq3me1/player Last few weeks where a bit crazy with all the new gen of models, this makes it a bit easier to compare the models against. I was particularly surprised at how bad R1 performed to my liking, and a
Ive heard a lot of people say that LLMs can’t reason outsude their training data both in and outside of this sub, which is completely untrue. Here’s my proof for why I believe that: MIT study shows language models defy
submitted by /u/Fabulous_Bluebird931 [link] [comments]
submitted by /u/F0urLeafCl0ver [link] [comments]
submitted by /u/F0urLeafCl0ver [link] [comments]
submitted by /u/F0urLeafCl0ver [link] [comments]
This paper introduces a test-time optimization method called R2-T2 that improves routing in mixture-of-experts (MoE) models without requiring retraining. The core idea is using gradient descent during inference to optimize how inputs get routed to different experts, particularly for multimodal
submitted by /u/ZephyrBrightmoon [link] [comments]
submitted by /u/Fabulous_Bluebird931 [link] [comments]