The standard approach to training AI-image detectors is based on supervised learning, a technique that is used widely across AI. With this approach, a detector is fed huge sets of both real and fake images that are labeled as such and it learns to identify characteristics that set them apart.
This works well in many, but not all, cases. For example, if you train a detector on images from one AI-image generator it might detect images made by that system nearly every time. But give it images created by a different one that it wasn’t exposed to in training and its performance will drop.
This is an important limitation because it’s not practical to train detectors on data from every image generator, explains Mingming Gong, Affiliated Associate Professor of Machine Learning at MBZUAI. And with the rapid pace of development, new generative models are being released all the time.
Gong and colleagues from MBZUAI, the Hong Kong University of Science and Technology, Hong Kong Baptist University, and other institutions, have developed a new approach for detecting AI-generated images that holds the potential to address this problem — at least for now. Instead of training on large sets of real and synthetic images, their approach focuses on identifying deep structural patterns found only in real photos.
Gong and his colleagues will present their method at the 39th Annual Conference on Neural Information Processing (NeurIPS 2025) in San Diego, California. In addition to Gong, the authors of the study are Yonggang Zhang, Jun Nie, Xinmei Tian, Kun Zhang, and Bo Han.
The researchers call their system ‘consistency verification’, or ConV. The key concept that makes it work is what’s known as a data manifold. In a high-dimensional space, representations of all possible natural images can only occupy a small, curved region of the space – the manifold – while generated images fall outside it. The relationship between representations and the manifold makes it possible to determine real from fake. The challenge is that the manifold can’t be measured directly.
To solve this, the researchers developed a consistency test. They take an image and create slightly modified versions, adjusting brightness, rotation, blur, and other characteristics. Both the original and each transformed version are run through DINOv2, a computer-vision model trained only on natural images, that converts each into a mathematical representation. The difference between the original and transformed representations is then computed.
For real images, the transformed versions stay on the manifold and move along what’s known as a tangent space, with the difference between the original and transformed versions remaining small. The same kinds of edits push transformed versions of generated images further from the manifold, leading to bigger differences. The model classifies these as synthetic.
The great benefit of ConV over a supervised learning approach is its reliance on fitting the distribution of natural image data rather than the distribution of generated image data. As Gong explains, “this means that our method could generalize across different generative models, and even to new ones, and it would be less costly than supervised models to do this.”
At the same time, the researchers acknowledge that as generators improve, the deviation between the natural image manifold and generated images will shrink. To address this challenge, they developed a trained version of their system, F-ConV, that uses a technique called normalizing flows. It transforms the natural image manifold into a Gaussian distribution, making it easier to identify generated images in relation to the manifold.
The researchers tested ConV on several benchmark datasets containing images from different image generators, including diffusion and transformer-based models. ConV achieved an average detection accuracy of 87.1% (measured by AUROC, a standard metric) across generators on the ImageNet benchmark. This was competitive with trained models and far superior to an untrained model. F-ConV was even better, with an average accuracy of 93.77%.
They also tested their detectors on frames taken from videos generated by OpenAI’s Sora. Since Sora isn’t publicly available, the researchers couldn’t train a traditional detector on its output. But ConV and F-ConV outperformed supervised learning methods that had been trained on data from other generators.
Gong says the results were better than expected: “You’d think there would be some degradation in performance because we only look at natural data, but that wasn’t the case.” He was also surprised to see how well ConV generalized across generators.
There are serious implications for this work, as there is a real need to build detectors that can accurately identify generated content. Deepfakes can be used to undermine politicians around key moments like elections, or be used to discredit or embarrass individuals.
But while ConV showed impressive results, even without training, Gong says that because the AI field is progressing so rapidly, it’s hard to know if detectors will work on the fake images of the future. “It’s possible at some point that generative models will exactly match natural images,” he says, adding that that researchers will need to develop entirely new ways to identify AI-generated content.
A new benchmark from MBZUAI blends realism with ground truth – revealing how well causal models understand.....
GURU, a new benchmark from the Institute of Foundation Models at MBZUAI, exposes the uneven ways reasoning.....
Read MoreA new benchmark from MBZUAI Ph.D. students shows why advanced web agents crumble on CAPTCHAs – and what.....
Read More