• 2024-06-17

Tsinghua scholars focus on the digitalization of life, using AI + dual-core driv

The field of life sciences is currently undergoing a major transformation driven by digital technology. Among them, the digital twin model is an exact virtual model of a physical object and belongs to the forefront of the latest generation of technological changes.

Digital twin life is the precise modeling of life, which can significantly enhance people's understanding and intervention capabilities for complex biological systems, and has the potential to be applied to cell factory design, optimization of industrial fermentation conditions, drug development, and personalized diagnosis and treatment.

Li Feiran, an assistant professor at Tsinghua University Shenzhen International Graduate School, has been committed to the research of life digitization for a long time. She has developed cutting-edge digital life frameworks by integrating AI and systems biology methods, from microbial modeling to more complex human cell modeling. She has made multiple research achievements in the field of synthetic biology, such as understanding cell metabolism and guiding the design of cell factories, as well as in the field of medical and health.

Advertisement

With her research in life digitization and the development of the first deep learning method to predict enzyme parameters, Li Feiran was selected as one of the "35 Innovators Under 35" in the China region of the MIT Technology Review in 2023.

Dr. Li Feiran graduated from the Department of Biology and Bioengineering at Chalmers University of Technology in Sweden, under the guidance of Professor Jens Nielsen, a foreign academician of the Chinese Academy of Engineering, and completed her postdoctoral research in the laboratory. Her current research focuses on the research and application of digital life models, including the development and analysis of metabolic models or regulatory models of microorganisms, mammalian cells, organs, and the human body, exploring the dark matter of cell metabolism, promoting the excavation of new pathways and enzymes, and developing deep learning models to help understand the relationship between protein sequence, function, and parameters.Focus on Digital Cells, Constructing Metabolic Models with Dual AI and Mechanism-Driven Cores

As a child, Feiran Li was passionate about reading science fiction novels, such as "The Three-Body Problem," "Mirror," and "Dune," which planted the seed of exploring the infinite possibilities of future technology in her heart. "Since then, I have developed a strong interest in the virtual world, looking forward to using new technologies to turn the descriptions in science fiction novels into reality. These experiences have cultivated my curiosity and spirit of exploration, and also endowed me with more open thinking and innovative abilities in subsequent scientific research," Feiran Li recalls.

During her undergraduate studies, Feiran Li's research direction was bio-chemical engineering and synthetic biology, mainly improving the yield by modifying microbial strains. However, in the process of modifying the strains, she found that methods such as overexpression or knockout of genes did not bring the expected ideal transformation effects. "This made me think, could we predict the changes inside the strains and the effects after modification through constructing mathematical models, thereby improving the success rate of transformation and more rationally guiding the modification of microorganisms?"

Since then, Feiran Li has focused on the digital cell (digital life) project, and systematically studied the modeling and analysis of microbial metabolism, and the modeling targets have also expanded from microorganisms to human cells. In 2017, she joined the team of Academician Jens Nielsen at Chalmers University of Technology in Sweden, dedicated to improving the simulation accuracy of digital life models and expanding the simulation range of models, from explanatory models to predictive models.

Before, microbial metabolic modeling was mainly based on mechanism models, which are precise mathematical models established according to the internal mechanisms or material flow transfer mechanisms of the object and production process. Such models usually require a deep understanding of the biological system to clearly describe and predict its behavior. If the biological process is not fully understood, the construction of accurate mechanism models will be limited, and the prediction effect will not be satisfactory.At that time, artificial intelligence was in its prime. AI and deep learning models had advantages in predictive performance, with AI's predictability being very strong, but its interpretability was lacking. On the other hand, mechanistic models had very strong interpretability. Therefore, we decided to try to combine the advantages of these two types of models and proposed the idea of a dual-core model framework driven by mechanistic models + artificial intelligence. This framework combines the interpretability of mechanistic models with the predictability of deep learning models, allowing us to model from known to unknown life processes comprehensively, with the model having both predictability and interpretability.

Based on the aforementioned dual-core driving concept, in response to the bottleneck of slow experimental measurement of enzyme parameters in the construction of digital life models, Feiran Li developed the first deep learning method to predict enzyme activity parameters - DLKcat. This model can predict its activity by only inputting the substrate information and sequence of the enzyme, which can be used for the prediction of enzyme activity in any species, accelerating the understanding of the relationship between protein sequence-structure-function, and has the potential to become a very practical universal prediction tool in the field of enzyme engineering modification and design. It can not only more accurately understand and predict the behavior of complex biological systems but also provide more possibilities for designing and optimizing efficient cell factories.

Enzyme components are one of the most critical components in the field of synthetic biology. According to Feiran Li, after the publication of this research, it triggered a surge in the prediction of enzyme parameters and was selected as one of the twelve focus papers in the "Machine Learning in Catalysis" column of Nature Catalysis.

Subsequently, based on this deep learning method, Feiran Li built a super large-scale open-source enzyme database GotEnzymes, which includes more than 20 million enzyme-substrate pairs of enzyme activity parameters. Public information shows that this database is more than 1500 times the enzyme activity experimental measurement data collected by the mainstream BRENDA and SabioRK databases.Multiple models have been constructed and the work on transformation is being advanced.

After joining the Shenzhen International Graduate School of Tsinghua University, Feiran Li established an independent laboratory to continue building digital twin models and applied these models to the fields of synthetic biology and healthcare. The research team has precisely modeled these two scenarios, mainly developing two types of models centered around eukaryotic organisms: one is eukaryotic microorganism models, and the other is human body models.

For eukaryotic microorganism modeling, the plan is to achieve a full-chain design process in applications such as cell factory design, significantly improving the efficiency of the design process while reducing time and costs. For modeling different organs and tissues of the human body, the aim is to simulate the exchange process of substances and energy between different organs, thereby providing corresponding suggestions for personalized health management and treatment.

According to Feiran Li, the current work is mainly based on metabolic models, which is the starting point of the research team, and then extends from metabolic networks to regulatory network models of protein translation, transcription, modification, and related functions. The latest published research includes the genome-scale metabolic model of Saccharomyces cerevisiae, the protein secretion model of Saccharomyces cerevisiae, and the human genome-scale metabolic model. Among them, more than 300 models of yeast genus have been constructed, more than 1000 industrial application models of Saccharomyces cerevisiae, and human body models for nearly 20 organs for different populations have been built.

In 2022, Feiran Li proposed a complex protein secretion model called pcSecYeast. The number of reactions covered by this model has increased from 4,000 to 37,000, including detailed processes of how proteins are synthesized and modified into mature forms within cells. This model successfully predicted the systematic modification targets of protein cell factories, providing a new method for rational modification targets and the design of cell factories, which can be used for the production of industrial or pharmaceutical proteins.On the other hand, the team is also constructing comprehensive digital twin human models driven by mechanisms and AI. They have already established digital twin human metabolic models for five types of populations, covering different stages from infants to the elderly, including infants, adult males, adult females, and the elderly. "Our goal is to reveal the differences in drug combination metabolism and food metabolism among different populations. In the future, we hope to go beyond the population level and carry out personalized modeling for each individual. For example, if everyone can undergo whole exome sequencing or genome sequencing, we can build personalized digital twin models for everyone, serving their health management and personalized treatment needs."

According to Li Feiran, efforts are being made to transform and apply these models, with a focus on accelerating the application of enzyme parameter prediction models in the enzyme industry to improve the accuracy of enzyme modification and de novo design. On the other hand, future plans include collaborating with sequencing companies or health management companies to establish whole-body digital human models and combine genome sequencing to establish digital models for individuals, applying these digital twin human models to health management, personalized diet recommendations, and lifestyle advice applications.

"Basic research is like raising a child; we hope to see research work truly applied, driving industrial transformation, and even changing existing models. Shenzhen, as a rapidly developing place, provides abundant opportunities, and we hope to see more industries accept and apply cutting-edge technologies from the laboratory."

"Promoting a qualitative leap in the model"

Genome metabolic models are a class of mathematical models that systematically describe cellular metabolism, capable of simulating the relationship between genomic information and metabolic phenotypes. This provides an interpretative framework for metabolic-related experimental data and makes whole-cell metabolic simulation experiments simpler.Since the construction of the world's first genome-scale metabolic network model for Haemophilus influenzae in 1999, thousands of metabolic network models for various species have been built globally. According to statistics, by 2019, more than 6,000 metabolic network models had been constructed, and these models have been widely applied in fields such as biomanufacturing and life health, including systems biology, metabolic engineering, drug development, enzyme function prediction, and so on.

"Looking at the development of genome-scale metabolic network models, there has been a leap in the number of models. In the early stages, due to the lack of relevant knowledge, it took a long time to build a model. With the advent of big data, AI, and automated model construction tools, model construction has become increasingly easier," said Li Feiran, who built a model of a prokaryotic organism during her three years of master's study, and thanks to technological advancements, she built a total of 1,700 models during her doctoral studies.

Li Feiran continued to add that in the past 30 years, life models have made a leap in quantity and have demonstrated some very prominent applications. However, we believe that this is far from enough, and the field is still in a relatively early stage. So far, modeling still relies on methods similar to those used more than 20 years ago, and there has not been a revolutionary breakthrough in the field. Compared to complex biological systems, the quality of current models can simulate phenomena and make predictions is relatively limited. Therefore, we now need to move from quantitative change to qualitative change, advancing towards a qualitative leap, integrating advanced technologies such as artificial intelligence and digital twins to promote model progress, and developing towards more accurate and predictive directions.

Comment