Fact checking with ChatGPT - MBZUAI MBZUAI

Fact checking with ChatGPT

Monday, June 12, 2023

A datum is one discreet value in a collection of values that we commonly refer to as data. From these nuclear particles of philosophical and scientific thought we assemble what we understand as facts. And through the facts that we hear and read, we come to know the world of things that we cannot directly observe for ourselves. This is how we know “facts” about our immune system, about our economy, and our cosmos, for example.The process of establishing, debating, and communicating what is and is not fact is fraught. In mathematics, a fact is something that can be subjected to the rigors of mathematical reasoning and demonstrated to be true.

In the broader disciplines of science, facts are observable and repeatable — they are the empirically derived truths that, stacked upon one another, bring us to ever higher levels of understanding about the nature of reality. But agreeing upon a fact, particularly in the published word, has proven to be a timeless challenge thanks to the differing needs of publishers, wordsmiths, influencers, and the countless counter movements that seek to obscure, or simply disagree, about facts.

Fake news — misinformation and disinformation more generally — has been a problem dating back to at least the advent of the printing press. The concept is explored as far back as 1646 in Sir Thomas Browne’s Pseudodoxia Epidemica. And the term “fake news” was first referenced in English around 1890 according to Merriam Webster.

The proliferation of disinformation in various forms, especially in social media, has made automated fact-checking a crucial application of natural language processing.

Preslav Nakov
Deputy Department Chair of NLP, and Professor of NLP
The early days of the internet saw an explosion in the capacity to freely publish information, both true and false. And that freedom of publishing, combined with the advent of generative AI, has accelerated exponentially the development and publishing of content. This acceleration has meant that the slow, reactive process of fact checking is ever more beset by the creators of fake news, misinformation, and disinformation.

How then, do consumers of news and information discern fact from fiction? Professor Preslav Nakov and his Postdoctoral Fellow Liangming Pan at MBZUAI, together with their colleagues from UC Santa Barbara, Nanyang Technological University, and the National University of Singapore, believe they can redirect the power of ChatGPT into the service of fact checking.

MBZUAI Deputy Department Chair and Professor of Natural Language Processing Preslav Nakov and his co-authors demonstrate in their latest paper that ChatGPT – the very engine by which bad actors churn out fake news at an astronomical rate – can be harnessed to fact-check published information just as quickly.

In their paper: “Fact-Checking Complex Claims with Program-Guided Reasoning,” which has been accepted at ACL 2023, Nakov et al. lay out the development of ProgramFC, a system that takes complex claims, breaks them into their constituent parts or datum, and checks them with the intention of issuing a verdict — whether the claim is true or false and why. The authors test their system against several industry benchmarks with the result that ProgramFC outperforms significantly all known systems.

The promise and pitfalls of LLMs

Large language models (LLM) such as ChatGPT have become globally known both for their exploits, and their failures. Ask a question and you will get a surprisingly human and nuanced answer that might well be incorrect. One would be forgiven, then, for assuming that the use of ChatGPT to fact-check claims made on the internet was a bit counterintuitive. But in the realm of science, sometimes counterintuitive ideas turn out to be quite useful.

In their frustration with the state of present-day fact checking, Nakov and his co-authors opened up ChatGPT and began giving the system various tasks related to fact-checking. With clear direction and some structured testing, the result is an empirically better way of fact-checking claims.

One key aspect is chain-of-thought reasoning, which the researchers posit both reduces the burden on the large language model, and allows for more flexibility in the fact-checking itself. Better still, ProgramFC uses “reasoning programs to provide clear explanations of its reasoning process to the user,” according to the paper.

Not only is the system capable of breaking down and scoring tracts of content quickly, but it is also capable of avoiding the black box trap that many AI systems today fall into — not helping human users fully understand how the system came to its conclusions.

“The proliferation of disinformation in various forms, especially in social media, has made automated fact-checking a crucial application of natural language processing,” Nakov said. “We’ve set out to develop a fact-checking system that is efficient, that is explainable, and that can apply complex reasoning to ensure we have a high rate of success and ProgramFC is an excellent step in that direction.

“ProgramFC takes complex claims and decomposes them into a plan, or a sequence of simple reasoning steps. In essence, it translates complex claims into a brief computer program that implements the logical plan for fact-checking. The program is composed of three basic elements: answering simple questions, fact-checking simple claims, and solving logical expressions.”

To illustrate the system, the authors offer an example. In the paper they breakdown and interrogate the compound claim, “James Cameron and the director of the film Interstellar were born in Canada.”

The system uses Codex to decompose this complex claim into a reasoning program, consisting of three reasoning steps: where James Cameron was born; who directed Interstellar; and where that person was born. After making this ‘blueprint’ of reasoning, ProgramFC delegates each subtask to a suitable external agent, like a Question-Answering engine, to solve each sub-problem.

“In the end, the system determines, based on the fact that Christopher Nolan was the director of Interstellar, and he was born in the U.K., that the claim is false. And further, the system breaks this down into simple human language, so that users can both fact-check, and understand complex claims quickly.” explained Liangming Pan, the main developer of the system.

 

MBZUAI Postdoctoral Fellow, Liangming Pan.

Interestingly, the team’s system became more effective, relative to benchmarks, as the scenarios became more complex. Essentially, the deeper the claim, and the more steps necessary to fact-check the claim fully, the more capable the system became in arriving at an accurate conclusion as compared to rivaling systems. The authors do note, however, that implicit claims such as “Aristotle couldn’t have used a laptop,” are more difficult to handle and that the system scored worse in such scenarios, presenting an area for further improvement.

Eventually, the team believes that the model could be of interest both to the general public as well as to the fact-checkers they aim to support. The team also acknowledge that the training of GPUs and TPUs has a substantial carbon footprint, something a number of research teams at MBZUAI are working to address.

Related

thumbnail
Friday, October 24, 2025

The rise of agentic AI: homegrown Lawa.AI gains momentum

MBZUAI’s student-led startup shows how agentic AI is reshaping digital engagement and redefining the future of intelligent.....

  1. innovation ,
  2. entrepreneurship ,
  3. students ,
  4. startup ,
  5. llms ,
  6. large language model ,
  7. agentic ,
Read More
thumbnail
Wednesday, September 03, 2025

MBZUAI opens admissions for Fall 2026 intake

The world’s first AI-dedicated university invites top students worldwide to shape the future through its undergraduate and.....

  1. Bachelor's ,
  2. students ,
  3. post-graduate ,
  4. graduates ,
  5. Undergraduate ,
  6. master's ,
  7. Ph.D. ,
  8. intake ,
  9. applications ,
Read More
thumbnail
Tuesday, August 19, 2025

New method reveals major cross-lingual gaps in language models

Researchers from MBZUAI found that across the languages they studied, most models performed 50-60% worse than they.....

  1. language ,
  2. languages ,
  3. ACL ,
  4. nlp ,
  5. multilingual ,
  6. llms ,
Read More