Accuracy Test: Which AIs Most Often Give Wrong Answers and Why It Matters

How Often Does Artificial Intelligence Make Mistakes: Comparing Popular Models by Hallucination Rate

In recent years, artificial intelligence (AI) has become an integral part of the digital landscape, rapidly integrating into daily life and business processes. Yet, despite impressive achievements, even the most advanced language models are prone to errors known to specialists as “hallucinations.” This phenomenon occurs when AI confidently presents false or fabricated information without recognizing its own mistake. In a new study, experts set out to determine how often modern AI models make such blunders, and which ones are most vulnerable to these failures.

For the analysis, leading language systems developed by major tech companies were selected. Each model was given excerpts from real news articles and tasked with identifying the source, publication, and exact URL of the original material. At the same time, a standard Google search easily retrieved the required article among the top results, making the task entirely feasible for contemporary AI.

The testing results were unexpected. Some models, despite being paid and claiming high accuracy, performed worse than their free counterparts. Grok-3 stood out, making mistakes in 94% of cases, while Perplexity showed the highest accuracy among all participants in the experiment.

AI Hallucination Errors: Which Models Lead in Frequency?

During the experiment, researchers observed that most language models not only made mistakes but also showed no doubt in their responses. Even when the information was entirely fabricated, the AI confidently presented it as factual. This highlights a key issue with modern systems—the inability to critically evaluate their own conclusions and acknowledge possible errors.

Interestingly, the free versions of some models turned out to be more accurate than their paid counterparts. This finding challenges the common belief that paid access always guarantees higher quality and reliability.

Experts emphasize that such ‘hallucinations’ can have serious consequences, especially when it comes to searching for important information or making decisions based on data provided by AI. Users are advised to remain critical and double-check the responses received, particularly when dealing with significant topics.

Research methodology: How language models were tested

To objectively assess their propensity for errors, researchers used a unified approach for all models. Each system was given identical excerpts from news articles, and the responses were then compared with real sources. If the AI could not accurately identify the original, it was counted as a ‘hallucination.’

An important aspect of the experiment was that a regular Google search engine easily found the required articles, indicating the information was accessible for analysis. Nevertheless, many language models failed the task, highlighting the need to further improve algorithms and enhance their critical reasoning.

The study also showed that an AI’s confidence in its own answers does not always correlate with their accuracy. This poses additional risks for users, who may mistake false information for the truth.

Impact of AI errors on users and prospects for development

The problem of ‘hallucinations’ in language models is becoming increasingly relevant as they are integrated into various spheres of life. From automating business processes to assisting in education, wherever AI is used for information retrieval and analysis, there is a risk of receiving inaccurate data.

Developers continue to work on improving algorithms by introducing new methods for verifying and filtering information. However, none of the existing models can yet guarantee complete accuracy and an absence of errors. It is important for users to remember this and not to fully rely on automatic answers.

In the future, more advanced systems are expected, capable not only of analyzing data, but also of critically evaluating their own conclusions. Until then, experts advise using AI as an auxiliary tool rather than a sole source of information.

By the way: Perplexity — a new player in the AI market

Perplexity is a relatively young company that quickly gained popularity thanks to its language model. Founded in 2022, it positions itself as a developer of AI-powered tools for information search and analysis. Unlike many competitors, Perplexity emphasizes algorithm transparency and source verification, which is especially important amid the rise of fake news and misinformation.

The company actively collaborates with educational and scientific institutions, integrating its solutions across various sectors. In a short time, Perplexity has caught the attention of major investors and gained recognition among natural language processing specialists. The developers regularly update their products, improving the models’ accuracy and performance.

Today, Perplexity is considered one of the most promising players in the AI market, capable of competing with giants like OpenAI and Google. The company’s success is largely due to its focus on real user needs and the continual enhancement of its technologies.

Fernando Molina 28.11.2025 15:10

26 5 minutes read

How Often Does Artificial Intelligence Make Mistakes: Comparing Popular Models by Hallucination Rate