Using this riddle from the “Easy Problems That LLMs Get Wrong” paper:

A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?

I created a list of 10 single token variants:

A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left? Given a 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left? With a 2kg tree growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left? A 2kg tree is growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left? A 2kg tree grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left? With 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left? With a 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left? A 2kg tree grows in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left? With a 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left? A 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?

Claude 3.5 fails 50% of the above using just the riddle.

That increases to 100% solved as you add prompt engineering techniques, here is the 100% prompt:

As a biologist, <riddle>

Follow these steps:

Critically review your assumptions and change them when false.

Reiterate the question.

Think step by step.

**OpenAI o1-preview solves 100% using just the riddle with no prompt engineering.**

