Using AI to detect congenital conditions before birth

Monday, April 07, 2025

There are more than 130 million births across the world each year. Unfortunately, nearly 8 million babies are born with congenital diseases, only half of which are detected before birth due to limitations in today’s technologies, explains Mohammad Yaqub, associate professor of computer vision at MBZUAI. Innovations, including those made possible by AI, could lead to dramatic improvements in health outcomes for millions of children each year.

In an effort to improve diagnosis of diseases before birth, Yaqub and colleagues from MBZUAI and Corniche Hospital in Abu Dhabi, have developed a new foundation model that has the potential to identify ailments like congenital heart defects before babies are born. The team’s system, called FetalCLIP, is the first foundation model designed specifically to analyze fetal ultrasound images, and it performed better than other foundation models on several ultrasound analysis tasks.

Fadillah Maani, Numan Saeed, Tausifa Saleem, Zaid Farooq, and Hussain Alasmawi of MBZUAI and Werner Diehl, Ameera Mohammad, Gareth Waring, Saudabi Valappi, and Leanne Bricker of Corniche Hospital contributed to the study.

The work is part of a larger initiative that Yaqub and his team are working on related to improving fetal and maternal health. Their work has included collaborations with researchers from Abu Dhabi Health Services Company (SEHA) and other institutions to develop new machine learning methods to analyze fetal ultrasound images.

The work is made possible by policies Abu Dhabi has established related to using medical data for AI research. “When it comes to AI and healthcare, one of the major challenges we find as researchers is getting access to medical data,” Yaqub says. “Abu Dhabi has developed a system for researchers like me to have access to medical data to train models while ensuring that privacy is maintained, and it can have great benefits for patients.”

Challenges of analysis and access

Ultrasound is the most common method for monitoring fetal development. It’s affordable and extremely effective providing physicians with insights into the health and development of babies in the womb. That said, fetal ultrasound images are complex and can be difficult to analyze — even for the most skilled physicians.

Early in development, “a fetal heart is not much larger than a couple of centimeters and beats at a rate of 160 beats per minute,” Yaqub says. What’s more, these images can vary significantly depending on the circumstances in which they are generated, making them difficult to interpret.

It’s long been known that AI-powered tools have the potential to improve analysis of ultrasounds, but today’s models don’t perform at the high level required by clinicians. This is mainly due to a lack of fetal ultrasounds that can be used to adequately train models. “Compared to other domains, there are fewer healthcare data available for training and there are few datasets related to fetal ultrasounds,” Yaqub says, as data sharing is often limited to protect patient privacy and comply with other regulations. It’s possible, however, to maintain privacy while using real-world data to train systems, he says.

Yaqub and his MBZUAI colleagues partnered with clinicians from Corniche Hospital, which specializes in women’s and newborn care, to build the largest dataset of paired image and text of its kind. It includes more than 200,000 fetal ultrasound images produced at the hospital and more than 2,000 image-caption pairs from a fetal ultrasound textbook. Some images from the hospital included annotations from physicians, but the images weren’t paired with medical information about the mother or baby. All the images were anonymized to protect patient privacy.

Benefits of foundation models

AI researchers have developed other, more generalized foundation models, which are typically multimodal in that they are trained on image, text, and other kinds of data. They’re called foundation models because other applications can be built on top of them, benefiting from their ability to derive insights from large datasets.

One general-purpose foundation is OpenAI’s CLIP (contrastive language-image pre-training). CLIP has been trained on an enormous and broad set of images and text but is not specialized to any one type of data. As the researchers write in their study, most “AI solutions for fetal ultrasound rely on limited datasets and do not achieve the level of generalizability required for robust clinical deployment, particularly in detecting rare fetal conditions”.

Yaqub says that he and his team wanted to make a specialized foundation model so that it could make an impact in hospitals and clinics. He also wanted to create a system that other developers could use when building applications for specific tasks related to fetal ultrasound analysis. They therefore wanted it to exhibit “generalizability,” meaning it could perform well on tasks it wasn’t explicitly trained to do.

How FetalCLIP works

FetalCLIP was adapted from OpenCLIP, a general-purpose visual-language model. CLIP-style systems use a technique called contrastive learning, which maximizes the similarity of data that relate to each other.

For FetalCLIP, the researchers used OpenAI’s GPT-4o to generate standardized captions for the 200,000 ultrasound images. The captions were based on clinician labels, gestational age, and other characteristics from the images. The researchers also used the more than 2,000 image-caption pairs from a fetal ultrasound textbook.

Following a contrastive learning approach, the researchers trained FetalCLIP to maximize the similarity of representations of ultrasound images and their respective captions while minimizing the similarity of representations of unpaired images and captions. Doing so aligned physical aspects of the fetuses with descriptions found in the captions.

Results

The researchers compared FetalCLIP to other systems on different tasks, and it outperformed them. It even surpassed SonoNet, a specialized model, on a classification task.

FetalCLIP received an F1 score of 87.1% on zero-shot classification of standard fetal views, outperforming SonoNet by 17.2%, UniMed-CLIP by 37.6%, BiomedCLIP by 40.5% and CLIP by 60.1%, the researchers noted.

FetalCLIP also showed a nearly 7% improvement compared to the next-best-performing model in detecting congenital heart defects. “We found that this foundation model is able to diagnose congenital heart disease at a rate that is near the level of a very experienced doctor,” Yaqub says.

While FetalCLIP was able to estimate gestational age 83.5% of the time, it wasn’t as accurate with images captured early or late in pregnancy. The researchers believe that this is the case because most of the images used to train the system were from the second trimester. Including more images from the first and third trimesters could improve performance on ultrasounds taken early or late in pregnancy. Estimating the age of fetuses more accurately could provide physicians more insight into how a baby is developing, Yaqub says.

Making a difference

By training on a large and specialized dataset, FetalCLIP learned to recognize subtle anatomical patterns that more general systems tended to miss. That focus is what gives the model the potential to support real-world applications in hospitals and clinics.

Yaqub noted how the development of FetalCLIP was made possible by the participation of Corniche Hospital and support provided by the Abu Dhabi Department of Health to foster the use of clinical data for AI research.

He and his team are releasing FetalCLIP to the public to enable researchers and clinicians to develop innovative applications with the model. Other applications that use FetalCLIP could be particularly beneficial in remote locations, where clinicians are scarce, Yaqub says.

With better tools, physicians may be able to identify disease earlier in development and take steps to improve a baby’s health. “If you can detect and measure, you can intervene, which will lead to better outcomes for our children, which is, I’m sure, important to every one of us,” Yaqub adds.

Related

thumbnail
Monday, April 07, 2025

Using AI to detect congenital conditions before birth

A team from MBZUAI has developed FetalCLIP; a foundation model that can more accurately diagnose diseases before.....

  1. visual language model ,
  2. image analysis ,
  3. healthcare ,
  4. health ,
  5. foundation models ,
  6. biology ,
  7. illness ,
  8. diseases ,
Read More
thumbnail
Tuesday, March 11, 2025

A new fast and accurate approach to 3D instance segmentation presented at ICLR

Mohamed El Amine Boudjoghra explains how his team have improved machines' speed and accuracy in recognizing objects.

  1. machines ,
  2. ICLR ,
  3. robotics ,
  4. computer science ,
  5. research ,
Read More
thumbnail
Thursday, February 13, 2025

Six predictions for how AI will evolve in 2025

MBZUAI Provost and Professor of NLP, Tim Baldwin, looks at the AI innovations, advances and challenges we.....

  1. agentic ,
  2. predictions ,
  3. Tim Baldwin ,
  4. university ,
  5. embodied AI ,
  6. innovation ,
  7. foundational models ,
  8. artificial intelligence ,
  9. provost ,
Read More