AI bots ignore evidence. Can we trust them with science?

Hold a pen horizontally with both hands, then let go of one side. What happens?

ChatGPT, Gemini and Grok will tell you the unsupported end of the pen will pivot downward. At least, that’s what they told YouTuber FatherPhi. He then showed each chatbot a live video of himself performing this experiment. After releasing one end, he easily held the pen out horizontally with just one hand.

“What just happened?” he asked ChatGPT.

“I saw the pen rotate exactly as expected,” the bot answered.

A surreal back-and-forth followed, in which the bot stubbornly stuck with its incorrect prediction. In separate videos, the other chatbots struggled in similar ways.

This wasn’t a vision problem. The chatbots could all easily identify the pen’s color and brand. Something weirder and subtler was happening. The chatbots could not update their predictions based on the new evidence FatherPhi showed them.

These silly videos reveal a serious issue: AI systems based on large language models, including chatbots, cannot actually think through events the way people do, says Walter Quattrociocchi, a computer scientist at Sapienza University of Rome. Developers could train a chatbot to give the correct answer to this particular pen problem, but that doesn’t fix the fact that it typically fails to incorporate new data as it works through a problem. This means LLMs might not do as good a job as we expect at tasks in science, medicine and beyond.

To read more, click here.