Loading
Older AI models showed some capacity for generalization, but pre-O1 models weren’t directly incentivized to reason. This fundamentally differs from humans: our limbic system can choose its reward function and reward us for making correct reasoning steps. The key distinction was that older models only received RLHF rewards based on outcomes, not the reasoning process itself.
The current gap between humans and O1 models centers on flexibility: AI can’t choose its reward function. This limitation impacts higher-level capabilities like creativity and autonomous goal-setting (like maximizing profit). We’re essentially turning these models into reasoning engines.
However, there are notable similarities between humans and AI:
The key aspect here is that while models are becoming more sophisticated reasoning engines, they still lack the flexible, self-directed reward systems that humans possess through their limbic systems.
submitted by /u/PianistWinter8293
[link] [comments]