J A B B Y A I

Loading

(…this is a little write-up I’d like feedback on, as it is a line of thinking I haven’t heard elsewhere. I’d tried posting/linking on my blog, but I guess the mods don’t like that, so I deleted it there and I’m posting here instead. I’m curious to hear people’s thoughts…)

Something has been bothering me lately about the way prominent voices in the media and the AI podcastosphere talk about AI. Even top AI researchers at leading labs seem to make this mistake, or at least talk in a way that is misleading. They talk of AI agents; they pose hypotheticals like “what if an AI…?”, and they ponder the implications of “an AI that can copy itself” or can “self-improve”, etc. This way of talking, of thinking, is based on a fundamental flaw, a hidden premise that I will argue is invalid.

When we interact with an AI system, we are programming it – on a word by word basis. We mere mortals don’t get to start from scratch, however. Behind the scenes is a system prompt. This prompt, specified by the AI company, starts the conversation. It is like the operating system, it gets the process rolling and sets up the initial behavior visible to the user. Each additional word entered by the user is concatenated with this prompt, thus steering the system’s subsequent behavior. The longer the interaction, the more leverage the user has over the system’s behavior. Techniques known as “jailbreaking” are its logical conclusion, taking this idea to the extreme. The user controls the AI system’s ultimate behavior: the user is the programmer.

But “large language models are trained on trillions of words of text from the internet!” you say. “So how can it be that the user is the proximate cause of the system’s behavior?”. The training process, refined by reinforcement learning with human feedback (RLHF), merely sets up the primitives the system can subsequently use to craft its responses. These primitives can be thought of like the device drivers, the system libraries and such – the components the programs rely on to implement their own behavior. Or they can be thought of like little circuit motifs that can be stitched together into larger circuits to perform some complicated function. Either way, this training process, and the ultimate network that results, does nothing, and is worthless, without a prompt – without context. Like a fresh, barebones installation of an operating system with no software, an LLM without context is utterly useless – it is impotent without a prompt.

Just as each stroke of Michelangelo’s chisel constrained the possibilities of what ultimate form his David could take, each word added to the prompt (the context) constrains the behavior an AI system will ultimately exhibit. The original unformed block of marble is to the statue of David as the training process and the LLM algorithm is to the AI personality a user experiences. A key difference, however, is that with AI, the statue is never done. Every single word emitted by the AI system, and every word entered by the user, is another stroke of the chisel, another blow of the hammer, shaping and altering the form. Whatever behavior or personality is expressed at the beginning of a session, that behavior or personality is fundamentally altered by the end of the interaction.

Imagine a hypothetical scenario involving “an AI agent”. Perhaps this agent performs the role of a contract lawyer in a business context. It drafts a contract, you agree to its terms and sign on the dotted line. Who or what did you sign an agreement with, exactly? Can you point to this entity? Can you circumscribe it? Can you definitively say “yes, I signed an agreement with that AI and not that other AI”? If one billion indistinguishable copies of “the AI” were somehow made, do you now have 1 billion contractual obligations? Has “the AI” had other conversations since it talked with you, altering its context and thus its programming? Does the entity you signed a contract with still exist in any meaningful, identifiable way? What does it mean to sign an agreement with an ephemeral entity?

This “ephemeralness” issue is problematic enough, but there’s another issue that might be even more troublesome: stochasticity. LLMs generate one word at a time, each word drawn from a statistical distribution that is a function of the current context. This distribution changes radically on a word-by-word basis, but the key point is that it is sampled from stochastically, not deterministically. This is necessary to prevent the system from falling into infinite loops or regurgitating boring tropes. To choose the next word, it looks at the statistical likelihood of all the possible next words, and chooses one based on the probabilities, not by choosing the one that is the most likely. And again, for emphasis, this is totally and utterly controlled by the existing context, which changes as soon as the next word is selected, or the next prompt is entered.

What are the implications of stochasticity? Even if “an AI” can be copied, and each copy returned to its original state, their behavior will quickly diverge from this “save point”, purely due to the necessary and intrinsic randomness. Returning to our contract example, note that contracts are a two-way street. If someone signs a contract with “an AI”, and this same AI were returned to its pre-signing state, would “the AI” agree to the contract the second time around? …the millionth? What fraction of times the “simulation is re-run” would the AI agree? If we decide to set a threshold that we consider “good enough”, where do we set it? But with stochasticity, even thresholds aren’t guaranteed. Re-run the simulation a million more times, and there’s a non-zero chance “the AI” won’t agree to the contract more often than the threshold requires. Can we just ask “the AI” over and over until it agrees enough times? And even if it does, back to the original point, “with which AI did you enter into a contract, exactly?”.

Phrasing like “the AI” and “an AI” is ill conceived – it misleads. It makes it seem as though there can be AIs that are individual entities, beings that can be identified, circumscribed, and are stable over time. But what we perceive as an entity is just a processual whirlpool in a computational stream, continuously being made and remade, each new form flitting into and out of existence, and doing so purely in response to our input. But when the session is over and we close our browser tab, whatever thread we have spun unravels into oblivion.

AI, as an identifiable and stable entity, does not exist.

submitted by /u/photonymous
[link] [comments]

Leave a Comment