Large language models, which can integrate and process large amounts of data in biomedicine, have great potential in modeling complex diseases and discovering functional biomolecules for potential therapeutics. To model complex diseases and identify the potential drug targets for such diseases, we built a language model trained on the insurance claims of around 123 million US people. With the model, we can give a unified representation of all the common complex diseases, which enables us to predict the genetic parameters of the diseases and discover unique genetic loci related to them efficiently. Then, we developed models based on protein language models to efficiently discover remote homologs and functional biomolecules from nature, such as signal peptides. With the model, we can identify remote homologs 22 times faster than PSI-BLAST and discover diverse functional peptides with sequence similarity lower than 20% against the known ones. Finally, we developed an RNA language model to model the RNA sequence and structure relation, which enables us to perform RNA structure prediction and reverse design effectively. Within two months, we designed and experimentally validated 19 RNA aptamers that are structurally similar, yet sequence dissimilar, to known light-up aptamers. More importantly, 10 designed aptamers show higher fluorescence than the native Mango-I. The above projects demonstrate the great potential of large language models in promoting fundamental computational biological research and transformational development.
Yu Li is an Assistant Professor in the Department of Computer Science and Engineering at CUHK, leading the Artificial Intelligence in Healthcare (AIH) group. He is also the Visiting Assistant Professor at MIT/Harvard, working with Prof. James Collins. He works at the intersection between machine learning, healthcare and bioinformatics, developing new machine learning methods to resolve the computational problems in biology and healthcare, which leads to works published in top venues, such as Nature Biotechnology, Nature Methods, Nature Computational Science, Nature Communications, and top machine learning as well as computational biology conferences. In 2022, he was selected to the Forbes 30 Under 30 Asia list, Healthcare & Science. He obtained his Ph.D. in computer science from KAUST in 2020, after which he was nominated KAUST Alumni Change Makers Awards in 2022. Before that, he got the Bachelor degree in Biosciences with the First-class Honor from University of Science and Technology of China (USTC). He received the Department Exemplary Teaching Award in 2024
Read More
Read More