A Machine Learning-Based Framework for Correcting Batch Effects in Microbiome Data

Tags:
Health-Biomedicine

Analyses of such human microbiome data have identified intriguing disease-associated compositional shifts across a wide range of diseases, prompting an explosion of metagenomics-based microbiome studies. Unfortunately, however, comparing and/or pooling data from multiple studies is extremely challenging due to study-specific batch effects, brought about by variation in sample collection, preservation, sequencing, and processing.

In this project, we assess various approaches for addressing this challenge, implementing previously introduced methods, adapting methods developed for other domains, and developing new approaches inspired by machine learning and data science techniques.

We specifically use such methods to learn a mapping from each dataset to a large target dataset of healthy microbiome samples and then use the learned mappings to translate all microbiome samples from each dataset to the shared target dataset, allowing researchers to reliably compare and pool samples from multiple datasets.

This framework has the potential to advance microbiome science, utilizing the plethora of data currently available and facilitating large-scale, comprehensive, and more powerful meta-analyses.