Wengong Jin planned to research language processing for his computer science PhD. But when Jin learned about research on machine-learning for drug discovery at the MIT Computer Science and Artificial Intelligence Laboratory, he told his advisor, Regina Barzilay, that he’d had a change of heart.
“She thought I was jet lagged, because I’d just come over from China and I was proposing a really big switch,” he said.
Jin, now a fellow at the Eric and Wendy Schmidt Center, stayed the course. Six years later, he and a team of researchers have come up with a new kind of model to automatically design antibodies — holding huge potential for immunotherapy.
Meanwhile, another Eric and Wendy Schmidt Center Fellow, PhD candidate Adit Radhakrishnan, recently developed a simple yet powerful method for virtually screening new drug candidates. That framework appears in a study published this April in Proceedings of the National Academy of Sciences.
“A number of research institutes have started using machine learning to answer key questions in biology. But at the Eric and Wendy Schmidt Center, as Jin’s and Radhakrishnan’s research shows, our goal is to also go in the other direction, by using biomedical problems to drive advances in machine-learning,” said Caroline Uhler, co-Director of the Eric and Wendy Schmidt Center, a core member of the Broad Institute, and professor in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems and Society at MIT.
Discovering drugs has traditionally been a labor-intensive process, with researchers toiling away for years to test millions of molecules only to come up with a handful of candidates. Now, researchers like Jin and Radhakrishnan are working to automate that process.
“The idea is that we don't need experts to get a cup of coffee and then work all night trying to figure out a new molecule, but rather, to let the machine do the heavy lifting,” Jin said.
During his PhD, Jin was part of a research team that developed a machine-learning algorithm to speed up antibiotic discovery. The researchers found a new antibiotic that was effective against bacteria that are resistant to multiple drugs. In this instance, the team provided the model with roughly a million possible compounds to sort through.
That left Jin and other researchers wondering: Could they use artificial intelligence to design molecules from scratch?
The answer was yes. Jin and other researchers developed a generative model that designed antibodies — Y-shaped proteins that bind to viruses, bacteria, and other pathogens, activating our bodies’ immune response — that could neutralize the SARS-CoV-2 virus. Their findings were published earlier this year in a paper at the International Conference on Learning Representations.
"The new model can propose in a couple of seconds an antibody that has a high likelihood of working — totally changing the game,” said Jin.
While researchers had worked on generative models for antibody discovery before, those models could only come up with a protein’s amino acid sequence — not its shape. In contrast, the new model, which represents the antibody as a graph, simultaneously designs both the sequence and structure of its binding region. “Whether or not the antibody is the right shape to bind to a virus or other pathogen is crucial to its success,” said Jin.
“The new model can propose in a couple of seconds an antibody that has a high likelihood of working — totally changing the game." — Wengong Jin
"While human experts have methods to generate neutralizing antibodies, it takes time and effort. The task becomes even more challenging when additional properties need to be enforced. As our understanding of disease biology and immune system deepens, the number of such desired characteristics will continue to grow. Computational methods for antibody design are particularly useful to address this challenge,” said Regina Barzilay, the AI faculty lead for the MIT Jameel Clinic for Machine Learning in Health.
And, because so many types of data are structured as networks, the model also represents an advance in the field of machine learning. “It’s an example of how biology proposed a new problem for machine learning to solve,” said Jin.
Adit Radhakrishnan's father had pursued a mathematics education in India prior to immigrating to the U.S. He instilled in his son a love of math, which led the younger Radhakrishnan to pursue a PhD of his own in electrical engineering and computer science at MIT.
Radhakrishnan researches the fundamentals of deep learning — a kind of artificial intelligence modeled after the human brain that processes unstructured data. Understanding why deep learning is successful, and using that knowledge to build novel models for the healthcare and genomic space, underpins much of Radhakrishnan’s research as an Eric and Wendy Schmidt Center fellow.
Over the past few years, deep learning has become widely adopted in biological applications, with researchers increasingly turning to it to screen potential new drugs. In order to perform well on such tasks, researchers use very large deep learning models that often require significant computing power. Moreover, the complexity of this approach makes it hard for scientists to understand why these models make a given prediction, shedding little light on why a proposed drug could work.
To get around the complexities of deep learning, Radhakrishnan and other researchers, including Uhler and Mikhail Belkin, a professor at the Halıcıoğlu Data Science Institute at the University of California, San Diego, turned to an older class of machine learning models: kernel methods. Prior to the recent wave of deep learning, kernel methods were a prominent and computationally simple approach for machine learning tasks. These models have recently become popular again since they can serve as a proxy for using very large deep learning models with much less computational burden.
The team came up with a simple yet highly adaptable kernel framework that was able to predict the effect that a drug has on gene expression, a measure of how cells change in response to a drug. “In contrast to the expertise needed to train large deep learning models to solve a particular problem, it takes about three lines of code to train the kernel method to do the same task,” said Radhakrishnan.
The framework has uses beyond biology; the researchers demonstrated, for example, that it could be used by video streaming providers to predict how a viewer would rank a particular movie they hadn’t yet seen. And the framework allows researchers to gain insights into how more complex deep learning models function.
According to Radhakrishnan, who is not trained as a biologist, the best part of being a fellow at the Eric and Wendy Schmidt Center is that the center puts machine learning experts and biologists in constant conversation with each other.
“You don’t just have computational researchers running their methods on a biology dataset without a biologist in the mix. You can get continuous feedback on: Is this actually useful?” said Radhakrishnan. “So it gives you a much more guided focus on what biological problems are important and what computational methods are missing.”