
By Gavin Boyle
A new study revealed that AI chatbots are becoming more prone to hallucinations and, in some cases, are more likely to return users made up facts and information than the truth.
“Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in [ChatGPT models] o3 and o4-mini,” OpenAI’s Gaby Raila told the New York Times, per PC Gamer.
This defense of its newer models comes as the Times found that o3 hallucinated 33% of the time when performing the PersonQA benchmark test — a set of questions about public figures. o4-mini, the OpenAI’s upcoming model performed even worse, hallucinating 48% of the time. Meanwhile o1, the company’s previous model, hallucinated closer to 15% of the time when faced with the same test.
The newer models performed even worse on the SimpleQA test, which presses the chatbot on more general information. o3 hallucinated 51% of the time, and o4-mini 79% of the time, compared to o1’s 44% hallucination rate.
This apparent decline in information accuracy comes as the company shifts how the model operates to “think” more like a human.
“We trained these models to spend more time thinking through problems before they respond, much like a person would,” OpenAI said last December when o1 was released. “Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.”
This shift caused them to have a massive increase in its academic scores, with the company touting PhD-level intelligence in numerous subjects like math, physics, biology and more. On the Mathematics Olympiad exam, for example, o1 scored 83%, compared to 13% on its former non-reasoning models.
“o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it,” OpenAI CEO Sam Altman said around the release of the new model. “But also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning.”
The problem of hallucinations highlights a larger problem with the current way we are training AI: it will not admit when it does not know something.
This is becoming a major problem when people turn to it with questions they have that would traditionally be reserved for professionals with special training, such as doctors or financial advisors. For those seeking help with their mental health, the chatbot sometimes encourages harmful activities, rather than providing real help.
Nonetheless, AI remains an amazing tool, especially when it comes to creative endeavors, but users should know how often it hallucinates to understand they should double check most of the information it provides them.
Read Next: ChatGPT Convinces Some People to Think They’re God