The story of the discovery of antibiotics is often used to illustrate the role chance has played in the advancement of science. When scientist Alexander Fleming returned to his lab following a long absence, he found that mold had contaminated petri dishes in which he was growing bacteria. Interestingly, the mold, which Fleming identified as Penicillium notatum, was killing the bacteria.
Fleming’s serendipitous revelation led to the development of penicillin and other antibiotics that have saved countless lives over the past century. Infections that were once deadly can now be cured with a few pills.
But humanity’s good fortune in the fight against bacterial infections is rapidly shifting. Microbes are evolving. New strains of bacteria are emerging that are resistant to once effective medicines. Indeed, the rise of drug-resistant bacteria has outpaced the development of new treatments. A recent study estimated that antimicrobial resistance (AMR) will be responsible for up to 2 million deaths a year by 2050 if no solution is found.
Scientists at the Mohamed bin Zayed University of Artificial Intelligence have developed a new machine-learning method for analyzing electronic medical records that could help physicians identify patients who are at risk for AMR. The team’s innovation was published earlier this year in the journal Scientific Reports. The work was led by Shahad Hardan, a Ph.D. candidate in machine learning at MBZUAI.
Analyzing electronic health records
The great success of antibiotics, and the frequency with which they are administered, has contributed to their growing ineffectiveness. Doctors often prescribe antibiotics without full knowledge that a patient is in fact suffering from a bacterial infection. This is because determining the true cause of infection takes time and waiting days for lab results is often not an option when infections can kill quickly.
Hardan and her team’s machine-learning approach is designed to predict if a patient will experience AMR when prescribed a particular antibiotic or if infected with a particular bacterium. The system makes these predictions simply by analyzing the patient’s electronic health record (EHR).
“EHRs are very accessible and hospitals in many countries have a unified way of storing them. But EHRs contain all different kinds of data, including demographics, lab results and notes from physicians,” said Mohammad Yaqub, associate professor of computer vision at MBZUAI and author on the study.
The significant amount of information contained in EHRs makes it difficult for physicians to interpret them under the tight time constraints encountered in the clinic. “Machine learning can analyze this information much faster,” Yaqub said.
Yaqub explained that a major challenge in developing a system that can predict AMR based on EHRs is devising a data pipeline that allows algorithms to interpret different types of information. “We had to come up with a way to make sense of time-independent and time-dependent data in an intelligent way so that we could utilize the data properly,” he said.
Few studies have explored the application of machine-learning models in AMR prediction using EHR data. Previous research focused on what are known as single modality datasets, drawing on only one kind of data in an EHR. Other approaches required significant input by humans to make a prediction.
This study by the MBZUAI scientists represents an advancement in real-time AMR prediction as it makes use of three types of data from patient records — time-invariant data, such as patient age and gender; time-series data, such as lab results; and clinical notes, which are descriptive notes added to a record by clinicians.
To make sense of this information, the researchers developed a framework that included three separate encoders to handle the three types of data. After the data is processed by each encoder, it is fused and a prediction about the likelihood of AMR is made.
This arrangement is similar to the way a human would address the problem, Yaqub said. “A physician would take account of the different kinds of data separately then digest them,” he said. “It’s likely that the protocol physicians follow is ideal, but the problem with humans is we can only look at a small amount of data at any given moment in time.”
Fusion and prediction
The scientists used a massive dataset called MIMIC-IV, which contains detailed EHR data collected over the course of a decade from intensive care units at a large hospital in Boston. The team used a preprocessing method called FIDDLE to prepare the dataset for analysis by the encoders.
A linear encoder was used to encode the time-invariant data. A language model called ClinicalBERT, which is based on Google’s BERT (bidirectional encoder representations from transformers), was used to encode the physician notes. Three different methods were used to encode the time-series data: long short-term memory (LSTM), StarTransformer and transformer encoder. The three methods were thoroughly compared.
Four fusion mechanisms — attention fusion, tensor fusion, MAGBERT and Multimodal InfoMax (MMIM) — were tested to explore how the performance of the system could be improved.
The researchers tested the system’s ability to predict AMR based on patient history in relation to a particular antibiotic, gentamicin, and in relation to a specific pathogen, P. aeruginosa.
The MMIM fusion mechanism showed the highest accuracy in predicting gentamicin resistance at a rate of 76%, according to a metric called area under the receiver operating characteristic curve (AUROC). Similarly, for the P. aeruginosa dataset, MMIM outperformed other methods with an AUROC of 69%.
A call to action
The fight against AMR is still in its early stages. However, innovations like this one developed by Hardan, Yaqub and their colleagues offer a promising glimpse into a future in which machine learning will aid clinical decision-making. An additional benefit is that because the pipeline uses data already collected by many hospitals, the system can be seamlessly integrated into existing hospital workflows.
“AMR is a very challenging problem, and we are just scratching the surface,” Yaqub said. “There hasn’t been a lot of work done in this domain and we see this initiative as a call to the community to help address this very serious threat of AMR.”
Developed by MBZUAI scientists, the new dataset will enable greater cultural and linguistic inclusivity in multimodal LLMs.
The students won the best student paper runners up award at ACCV for their new method called.....
A team from MBZUAI presented a new approach for optimizing neural networks at the recent NeurIPS conference.