Loading
NEW PAPER: Unifying Probabilistic Learning in Transformers
What if attention, diffusion, reasoning and training were all the same thing?
Our paper proposes a novel, unified way of understanding AI — and it looks a lot like quantum mechanics.
Intelligent models should not be a melting pot of different structures. This work aims to take a first step in unifying those ideas — next-token prediction, diffusion, attention, reasoning, test-time training… Can these objects which all seem so different all arise from the same framework? The paper includes a novel, exact derivation and explanation of attention. More interesting still, however, is that the framework (and so AI) appears to be an approximation of a quantum system.
What do you think about the work? Please let me know I’m eager for thoughts on the content or ideas!
submitted by /u/LahmacunBear
[link] [comments]