J A B B Y A I

Loading

AI Reward Hacking is more dangerous than you think - GoodHart's Law

With narrow AI, the score is out of reach, it can only take a reading.
But with AGI, the metric exists inside its world and it is available to mess with it and try to maximise by cheating, and skip the effort.

What’s much worse, is that the AGI’s reward definition is likely to be designed to include humans directly and that is extraordinarily dangerous. For any reward definition that includes feedback from humanity, the AGI can discover paths that maximise score through modifying humans directly, surprising and deeply disturbing paths.

submitted by /u/Just-Grocery-2229
[link] [comments]

Leave a Comment