Nearly half of AI-generated news summaries had serious problems in a BBC/EBU study. Common errors included missing citations, fabricated details and out-of-context answers; Google’s Gemini was the worst performer with a 76 % error rate.
AI Chatbots Flunk News Summaries: Major Study Finds Frequent Errors
AI‑powered chatbots are rapidly being integrated into search engines and personal assistants, but can they summarize breaking news reliably? A joint European Broadcasting Union (EBU) and BBC study involving 22 public‑service media organizations across 18 countries posed that question by asking seven consumer AI models to answer 105 news‑related queries in 14 languages. The study generated 3,062 responses and found that almost half contained serious problems.
Key Findings: Sourcing and Accuracy Problems
The researchers judged each AI answer for accuracy, context and sourcing. They discovered that 45 % of the AI‑generated answers contained at least one significant issue. The most frequent problem was failure to cite sources—31 % of responses either lacked citations or linked to unrelated material. Accuracy errors, such as fabricating quotes or misrepresenting facts, appeared in 20 % of answers. Out‑of‑context responses (e.g., summarizing the wrong story) occurred in about 14 % of cases.
Gemini Struggles While GPT‑4 Performs Best
Among the seven AI tools tested—Google’s Gemini, OpenAI’s GPT‑4 and GPT–3.5, Anthropic’s Claude 3, Perplexity, You.com and Mistral—the error rates varied widely. Google’s Gemini model produced erroneous or poorly sourced answers in 76 % of cases, making it the worst performer. GPT‑4 had the best record, but even it returned at least one notable issue in about 21 % of responses. You.com and Perplexity also struggled, each with roughly half of their answers failing on sourcing or accuracy.
Why AI Misrepresents the News
Researchers concluded that large language models often make simple factual mistakes, fabricate details or misattribute quotes because they are designed to predict plausible text rather than retrieve verified information. When chatbots rely on their own statistical inferences instead of vetted sources, they may invent news or link to unrelated pages. The problem is amplified when dealing with multiple languages or less‑covered regions because training data may be sparse.
Implications for News Consumers and Publishers
Incorrect news summaries threaten public understanding, especially when AI tools are presented as neutral assistants. The researchers warn that AI companies must improve transparency about data sources and encourage users to click through to original reports. The study recommends that generative models incorporate clear citations, prioritise reliable outlets and provide warnings when confidence is low. Without such safeguards, there is a risk of misinformation at scale.
What’s Next?
This investigation is part of an ongoing News Integrity Initiative among public broadcasters to ensure that AI tools enhance rather than erode public trust. The EBU called on developers to reduce hallucinations and be transparent about how AI models gather and rank news content. While chatbots remain useful for basic searches, the findings suggest that news‑related queries should still be cross‑checked with trusted sources. As AI continues to evolve, collaboration between technology companies and journalists will be crucial to prevent the spread of false or misleading information.



