In 1950, in a paper entitled ‘Computing Machinery and Intelligence’1, Alan Turing proposed his ‘imitation game’. Now known as the Turing test, it addressed a question that seemed purely hypothetical: could machines display the kind of flexible, general cognitive competence that is characteristic of human thought, such that they could pass themselves off as humans to unaware humans?
Three-quarters of a century later, the answer looks like ‘yes’. In March 2025, the large language model (LLM) GPT-4.5, developed by OpenAI in San Francisco, California, was judged by humans in a Turing test to be human 73% of the time — more often than actual humans were2. Moreover, readers even preferred literary texts generated by LLMs over those written by human experts3.
This is far from all. LLMs have achieved gold-medal performance at the International Mathematical Olympiad, collaborated with leading mathematicians to prove theorems4, generated scientific hypotheses that have been validated in experiments5, solved problems from PhD exams, assisted professional programmers in writing code, composed poetry and much more — including chatting 24/7 with hundreds of millions of people around the world. In other words, LLMs have shown many signs of the sort of broad, flexible cognitive competence that was Turing’s focus — what we now call ‘general intelligence’, although Turing did not use the term.
To read more, click here.