Perfect alignment between AI and human values is mathematically impossible, study says

Perfect AI alignment with human values and interests is mathematically impossible, according to a study, but behavioral diversity among AI agents offers the promise of some control. Published in PNAS Nexus, Hector Zenil and colleagues used Gödel's incompleteness theorem and Turing's undecidability result for the Halting Problem to show that any LLM complex enough to exhibit general intelligence or superintelligence will also be computationally irreducible and produce unpredictable behavior, making forced alignment impossible.

As an alternative, the authors propose a strategy of "managed misalignment," in which competing AI agents with different cognitive styles and partially overlapping goals operate in distinct roles to check one another.

As each agent attempts to fulfill its own goals with its own modes of reasoning and ethical frameworks—what the authors dub "artificial agentic neurodivergence"—the agents will dynamically aid or thwart one another, preempting ultimate dominance by any single system.

To read more, click here.