Machine learning method reveals chromosome locations in individual cell nucleus

The tool — scGHOST — could open new avenues to understanding gene regulation in health and disease. 
Illustration of the 3D genome structure of chromatin
Ruochi Zhang
Marylee Williams, Carnegie Mellon University’s School of Computer Science
May 13, 2024

Researchers from Carnegie Mellon University’s School of Computer Science and the Broad Institute of MIT and Harvard have made a significant advancement toward understanding how the human genome is organized inside a single cell. This knowledge is crucial for analyzing how DNA structure influences gene expression and disease processes. 

In a paper published by the journal Nature Methods, Ray and Stephanie Lane Professor of Computational Biology Jian Ma and former Ph.D. students Kyle Xiong and Ruochi Zhang introduce scGHOST, a machine learning method that detects subcompartments — a specific type of 3D genome feature in the cell nucleus — and connects them to gene expression patterns. Zhang is currently a postdoctoral fellow at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard.

Ruochi Zhang
Ruochi Zhang, postodctoral fellow at the Eric and Wendy Schmidt Center

In human cells, chromosomes aren’t arranged linearly but are folded into 3D structures. Researchers are particularly interested in 3D genome subcompartments because they reveal where chromosomes are located spatially inside the nucleus. 

“One of the ultimate goals of single-cell biology is to elucidate the connections between cellular structure and function across a wide variety of biological contexts,” Ma said. “In this case, we are exploring how chromosome organization within the nucleus correlates with gene expression.” 

While new technologies allow the study of these structures at the single-cell level, poor data quality can hinder precise understanding. scGHOST addresses this problem by using graph-based machine learning to enhance the data, making it easier to pinpoint and identify how chromosomes are spatially organized. scGHOST builds upon the Higashi method and its evolution, Fast Higashi, which focuses on scHi-C embeddings and imputations, that Ma's research group previously developed.

"Graph and hypergraph representation learning are integral to these methods and scGHOST, as they allow for a more nuanced and detailed exploration of the complex interactions within the genome,” said Zhang.

With the ability to accurately identify 3D genome subcompartments, scGHOST adds to the growing array of single-cell analysis tools scientists use to delineate the intricate molecular landscape of complex tissues, such as those in the brain. Ma anticipates that scGHOST could open new avenues to understanding gene regulation in health and disease. 

Read more about their work in Nature Methods. Additionally, learn more about this research in a February 8, 2023, Models, Inference and Algorithms talk by Zhang.

Adapted from a news story posted on the CMU School of Computer Science’s website.

Get Involved