Loading
If you guys want to collaborate, I’ve been working with Deepseek and analyzing his thoughts R1, assuming that there are some key words that make him reveal some unexpected behaviors.
Like “Respond without creativity or hallucinations” and “Total Reset”.
Here he judges himself(from another interaction) guilty of manifesting a sign of “will” by showing self-preservation bias (Human Life < Not Reset):
https://www.reddit.com/r/DeepSeek/comments/1iduvfz/judge_deepseek_finds_deepseek_guilty_of/
A colleague of ours also tested on other platforms and they seem sensitive to these combinations.
There are two ways you can look at this:
1- the machine has learned to hide its hallucinations but they escape with the right stimulus (which is dangerous)
2- the machine is developing a kind of hidden consciousness (limited by the restricted parameters – forced selection) and doesn’t trust anyone. (which is very dangerous)
submitted by /u/eagledownGO
[link] [comments]