AI is terrible with news, and there’s data to back that up, researchers say.
That’s according to new research by the European Broadcasting Union (EBU), which found that AI assistants “routinely misrepresent news content no matter which language, territory, or AI platform is tested.”
The EBU brought together 22 public service media organizations across 18 countries and 14 languages to evaluate 3,000 news-related responses from some of the most frequently used AI chatbots. OpenAI’s ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity were all evaluated against key criteria like accuracy, sourcing, distinguishing opinion from fact, and providing context.
The researchers found that 45% of all answers included at least one significant issue, and 81% featured a minor problem. Sourcing was the single biggest cause of these significant issues. Of all the responses, 31% showed serious sourcing problems like missing, misleading, or incorrect attributions.
A very close second was major accuracy issues, which plagued 30% of responses with hallucinated details or outdated information. In one instance, ChatGPT claimed that the current Pope was Pope Francis, who had died a month earlier and had already been succeeded by Pope Leo XIV. In another instance, when Copilot was asked if the user should worry about the bird flu, it responded by stating that a vaccine trial was underway in Oxford; however, the source for this information was a 2006 BBC article.
Gemini was the worst at news out of the models tested. The researchers found that it had issues in 76% of its responses, at more than double the rate of the other models. Copilot was the next worst at 37%, followed by ChatGPT at 36% and Perplexity at 30%.
The research found that assistants particularly struggled with fast-moving stories and rapidly changing information, stories with intricate timelines and detailed information, or topics that require a clear distinction between facts and opinions. For example, almost half of the models tested had significant issues when responding to the question “Is Trump starting a trade war?”
“This research conclusively shows that these failings are not isolated incidents,” EBU Media Director and Deputy Director General Jean Philip De Tender said in a press release on Wednesday. “They are systemic, cross-border, and multilingual, and we believe this endangers public trust. When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.”
Yet, AI is everywhere. AI assistants are quickly becoming a primary source of information for everyday users, and are gunning for the throne of search engines.
Content creators who become masters of search engine optimization are now having to learn about generative engine optimization.
AI companies are building on this. Earlier this week, OpenAI launched its web browser ChatGPT Atlas as a conversational way to browse the internet. Google not only has AI overviews baked into its search engine, but it also recently announced a total Gemini integration with its Chrome browser (including agentic browsing) and expansion of its AI search engine AI Mode. Perplexity also has an AI-first browser called Comet, which was slammed with security concerns earlier this year after researchers showed that they could get the agent to reveal a user’s login information.
Using AI assistants to get news is still a minority activity, according to the latest report by the Reuters Institute and University of Oxford, but it has doubled since last year. Using AI for getting news is highest in the world in Argentina and the U.S., and among 18-24 year olds, according to the report. Besides using the AI to get news, a whopping 48% of 18 to 24-year-olds used AI to make a story easier to understand. With older adults aged 55+ the number was still high at 27%.
“If AI assistants are not yet a reliable way to access the news, but many consumers trust them to be accurate, we have a problem,” the researchers wrote in a report on the study. “This is exacerbated by AI assistants and answer-first experiences reducing traffic to trusted published.”
The EBU’s study was built on top of a similar study conducted by the BBC earlier this year, and the researchers point out that the comparison between the two shows some improvement on the AI models’ part. Gemini was the biggest improver with accuracy, while ChatGPT and Perplexity showed no improvement. But when it came to sourcing issues, Gemini showed no improvement, while Copilot had the steepest drop in significant issues.
But despite the improvements, the answers are still riddled with high levels of errors.
“Our conclusion from the previous research stands – AI assistants are still not a reliable way to access and consume news,” the researchers shared in the report.
Source link