Winning the opening phase of the Autoimmune Disease Machine Learning Challenge marked a major milestone for ETH Zürich PhD student Kalin Nonchev, and a strong start to an ambitious global competition. “This challenge gave me the opportunity to test our methods in a completely different setting – autoimmune disease instead of cancer,” he said. “It was very rewarding but challenging, and it highlighted the importance of building flexible, modular models that can be efficiently adapted to new disease contexts.” His model would go on to place second in the next phase, Crunch 2, further fortifying his role in a competition that drew nearly 1,000 participants from 62 countries. To learn more about the top participants’ experiences – keep reading below.
Launched on October 28, 2024 by the Eric and Wendy Schmidt Center and the Klarman Cell Observatory (KCO) at the Broad Institute and hosted on the CrunchDAO platform, the three-part challenge aimed to improve the diagnosis of inflammatory bowel disease (IBD) by applying machine learning to real biological data. Participants built models that integrated spatial transcriptomics with histopathology images, tools that could eventually enable earlier detection and personalized treatment for patients with chronic conditions.
“It’s exciting to see these challenges become an annual tradition that brings together interdisciplinary teams from around the world,” said Caroline Uhler, director of the Eric and Wendy Schmidt Center and the Andrew (1956) and Erna Viterbi Professor of Engineering at MIT. “What sets our challenges apart is not just the scale or the scientific ambition, but the fact that we follow through with experimental validation. The challenges are powerful because they bridge computational predictions and biological testing.”
“This challenge demonstrates how introducing the global machine learning community to a biological problem can accelerate scientific and clinical discoveries,” added Ramnik Xavier, director of the KCO, gastroenterologist at Massachusetts General Hospital, and Kurt J. Isselbacher Professor of Medicine at Harvard Medical School. “Looking beyond the boundaries of one domain reveals opportunities we wouldn’t find alone.”
The challenge consists of three parts, or Crunches. In Crunch 1, participants predicted gene expression in spatial transcriptomics data from matched pathology images and in Crunch 2, they predicted unseen genes. Finally, in Crunch 3, participants identified gene markers for pre-cancerous regions. The gene panels developed by top performers will be tested in patient samples through lab experiments at the Broad (see the full, detailed specifications here).
CrunchDAO, a data science competition platform, hosted the challenge. “The beauty of ML challenges is that it brings a community together – collective intelligence has immense potential and can lead to progress much faster than an individual working alone,” said Jean Herelle, founder and CEO of CrunchDAO. “CrunchDAO was really excited to partner with the Schmidt Center to bring a new type of data science problem with a biological focus to our community. It was a perfect fit because, like our other challenges, this one had a purpose-driven mission.”
Top-performing teams in the challenge approached the task from a range of backgrounds and perspectives, but they shared a few key strategies.
Kalin Nonchev (1st in Crunch 1, 2nd in Crunch 2)
For Kalin Nonchev, the Autoimmune Disease Machine Learning Challenge wasn’t just about building a high-performing model – it was also an opportunity to push his research into new terrain. With a background in computer science and bioinformatics, Kalin had been developing DeepSpot, a multimodal framework for spatial transcriptomics, in the context of cancer.
“Adapting DeepSpot to this new environment involved hands-on troubleshooting and provided deeper insight into the unique characteristics of spatial transcriptomics across diverse tissue regions, particularly in complex diseases like inflammatory bowel disease,” he said. “The histology images, spatial structure, and technical artifacts differed significantly from what we had trained on before.”
What stood out most to Kalin, though, was the community. “The challenge drew competitors from around the world, fostering a friendly atmosphere despite the rivalry, and it was genuinely fun,” Kalin said. “It was clear that everyone shared a strong passion for advancing spatial biology methods. Having an independent platform like CrunchDAO to test ideas, compare methods, and see where your approach stands was extremely motivating.”
Manfred Seiwald (2nd in Crunch 1)
Manfred Seiwald, a senior software developer in the Department of Biosciences and Medical Biology at the University of Salzburg, secured second place in Crunch 1. Similarly to most top performing models in Crunch 1, his model leveraged the embedding of a histopathology foundation model as an image encoder. A key feature of his approach was the use of shared decoders trained jointly on histopathology images and gene expression data to predict gene expression. This approach allowed his model to effectively align visual and molecular features without the need for complex adaptations.
Team PathBio (3rd in Crunch 1)
Team PathBio brought together deep learning researchers from Stanford University and Sichuan University: Sen Yang (team lead, Stanford University), Jinxi Xiang (Stanford University), Wei Yuan (Sichuan University), Yijiang Chen (Stanford University), and Xiyue Wang (Stanford University).
Though experienced in medical image analysis, none of the team members had worked with spatial biology data before. “This challenge offered us invaluable insights into spatial biology, significantly expanding our understanding beyond our prior expertise in deep learning and medical image analysis,” they said. “One particularly impactful aspect was discovering the intricate relationship between tissue morphology and spatial gene expression patterns.”
That realization shaped their modeling strategy, which captured spatial context by applying a successive attention mechanism and gaussian masking to small patches of the histopathology images. “We observed that spatially adjacent tissue pixels often harbor similar cell populations, thereby exhibiting similar gene expression profiles,” they explained.
The experience inspired the team’s broader goals. “We picked up a wealth of knowledge in bioinformatics during the hands-on experimentations of the challenge, from processing complex genomic datasets to understanding the biological context behind autoimmune disease markers,” they said. “The nature of spatial biology data co-registered with digital pathology data at the pixel level has opened a lot of doors and sparked many new ideas for us as future research directions, and we can’t wait to experiment and explore these ideas.”
Alexis Gassmann – Tarandros (4th in Crunch 1, 1st in Crunch 2)
Alexis Gassmann, a freelance data scientist and organic farmer, stood out in both phases of the challenge with a sophisticated modeling pipeline that combined modern machine learning techniques. He was the only participant who applied contrastive learning to learn a shared latent space between image and molecular data. He also employed a mixture-of-experts approach – a flexible and scalable machine learning technique that relies on multiple "expert" subnetworks trained to handle different parts of a problem – and a spatial coordinate attention mechanism, enabling the model to capture both local cell organization and global image features.
Alexis ranked among the top performers in Crunch 1. In Crunch 2, his contrastive learning, similarity-based method allowed for accurate cross-modal gene expression transfer from single-cell to spatial context, ultimately earning him first place.
Expanding on his earlier work was a key component for helping Alexis reach the top. “It was demanding but highly rewarding,” he said. “The challenge was structured in three parts that built on each other, making it feel like solving a solution step by step. I enjoyed designing a solution that progressed from predicting specific gene expression from histology images then generalizing across the full transcriptome and using them to identify early biomarkers of colorectal cancer.”
His success across both phases demonstrated the power of combining deep representation learning with biologically informed structure – an approach that proved both adaptable and effective across distinct modeling tasks.
Marios Gavrielatos and Konstantinos Kyriakidis (5th in Crunch 1, 3rd in Crunch 2)
Marios Gavrielatos, a researcher at the Mayo Clinic, and Konstantinos Kyriakidis, a researcher at UC Santa Cruz, returned to this competition after winning the Cancer Immunotherapy Machine Learning Competition in 2023. This time, they brought a hybrid model that combined convolutional (CNN) and fully connected networks (MLP) to analyze spatial gene expression in IBD.
In Crunch 1, their model enhanced embeddings from pretrained histopathology models to make gene-level predictions, by incorporating the statistical properties of the outputs to refine accuracy. For Crunch 2, they developed a multi-channel CNN that encoded structured input from five nearest neighbors in single-cell space, followed by an MLP to generalize to genes unseen in training. This architecture captured local spatial relationships while preserving flexibility across gene sets.
For both Marios and Konstantinos, the challenge offered a chance to sharpen their research skills and apply their training in new ways. “This year's challenge allowed me to fully demonstrate my research capabilities,” said Marios. “I had significantly more time to dedicate compared to last year... working from the ground up. Successfully contributing to our top-performing solution validated my ability to rapidly master new domains.”
Konstantinos added: “The point of these competitions, at least for me, is to expand the current boundaries of thinking... Every challenge adds a new layer of knowledge to my thinking toolbox.”
They also credited their biology backgrounds – Marios with degrees in Biology and Biomedical Data Science and Konstantinos with degrees in Pharmaceutical Sciences – for helping them bridge the machine learning and biological domains.
“These challenges showcase, once again, the importance of interdisciplinary collaboration,” they said. “People with different backgrounds and experiences work together and create a unique flow of ideas that sometimes can be transformative. We hope we see more challenges like these in the future!”
Despite the complexity of the task, common trends emerged from the top-performing submissions. The submissions are being analyzed by Schmidt Center scientists, including postdoctoral fellows Kai Cao and Luezhen Yuan, computational scientist Rajshikhar Gupta, and director of computational biology at the KCO Orr Ashenberg.
High-performing models in Crunch 1 commonly utilize foundational models trained on vast histopathology image datasets like UNI, TIAToolbox, Path etc. to derive meaningful representations of the images. They then align single-cell gene expression and histopathology imaging data into a shared representation. Some successful methods also incorporate the spatial arrangement of cells through positional encoding or self-attention techniques.
"While the difference in performance between ranked models is modest, all of them significantly outperform the baseline,” said Raj.
The performance of Crunch 2 depends heavily on the accuracy of Crunch 1, as it takes Crunch 1’s predictions as input. Most existing methods for Crunch 2 follow relatively simple strategies, which can be broadly classified into two main categories: similarity-based methods and MLP-based methods.
The similarity-based methods compute the similarity between spatial and single-cell data either directly, or after alignment of the two modalities in a shared latent space, based on the genes provided in Crunch 1. After establishing neighbor relationships, they predict the expression of genes unseen in Crunch 1 by transferring expression from single-cell to spatial data using weighted averages. In contrast, MLP-based methods train a neural network on single-cell data to map from the genes seen in training to the unseen genes. Once trained, the model takes Crunch 1’s predictions as input and outputs the imputed expression for the unseen genes.
"It's remarkable how, despite the simplicity of the design, the participating models proved to be highly effective, particularly in predicting highly expressed genes,” said Kai.
“These top models seem to have potential in predicting a cell type's spatial distribution and other more demanding downstream tasks from H&E image alone," added Luezhen.
The use of these diverse but complementary strategies shows how a variety of machine learning approaches can converge toward solving real biological challenges.
While Kai and Raj continue the formal benchmarking of top methods in Crunch 1 and Crunch 2, other team members, including Luezhen and Orr, are analyzing the Crunch 3 submissions.
“We are in the process of preparing gene panels based on the participants’ proposed potential markers of dysplasia for experimental validation at the Broad Institute,” said Orr. “Experimental validation is an important part of the challenge and represents a key step toward clinical translation, turning model predictions into potential diagnostic tools for IBD and related diseases.”
Stay tuned for updates on the final results, as well as the launch of our next machine learning challenge – the second in our Cell Perturbation Prediction Challenge series, which will focus on shifting adipocyte cell states in vitro.
The Autoimmune Disease Machine Learning Challenge is hosted in collaboration with the Eric and Wendy Schmidt Center and the Klarman Cell Observatory at the Broad Institute of MIT and Harvard, along with the Crunch Foundation, Foundry, Harvard's Laboratory for Innovation Science, the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems and Society at MIT, and the Mass General Hospital Center for the Study of Inflammatory Bowel Disease.