header

Integrating Multi-Omics Data and Electronic Health Records for Early-Onset Type 2 Diabetes Prediction: A Comparative Evaluation of Advanced Hybrid Ensemble Classifiers

Authors
  • Billy Elly

    Lautech
    Author
Keywords:
Type 2 Diabetes Prediction, Multi-Omics Integration, Hybrid Ensemble Classifiers, Electronic Health Records, Machine Learning, Precision Medicine, Early Detection
Abstract

Type 2 diabetes (T2D) represents a significant global health challenge, affecting over 500 million individuals worldwide, with early-onset cases rising at an alarming rate. Current predictive models predominantly rely on single data sources, typically either electronic health records (EHRs) or genomic data, failing to capture the complex interplay of genetic, metabolic, and clinical factors that characterize diabetes etiology. This study addresses this critical gap by developing and evaluating a hybrid ensemble classification framework that integrates multi-omics data (genomics, metabolomics) with longitudinal electronic health records for early-onset T2D prediction. Leveraging data from the All of Us Research Program cohort of 42,256 participants, we implemented and compared six hybrid ensemble architectures: Random Forest-XGBoost stacking, Support Vector Machine-Multilayer Perceptron (SVC+MLP) voting, hypergraph neural network with transformer attention, weighted voting ensembles, deep neural network with multi-modal fusion, and gradient boosting with microbiome integration . The hypergraph-based hybrid framework achieved the highest predictive performance with an AUROC of 89.64%, accuracy of 89.58%, and F1-score of 88.20%, significantly outperforming single-model baselines (p < 0.001) . SHapley Additive Explanations (SHAP) analysis identified fasting plasma glucose, polygenic risk scores for beta-cell function, HbA1c, body mass index, age, and specific metabolomic markers (glycine, butyrate-associated metabolites) as the most influential predictors . The findings demonstrate that hybrid ensemble classifiers integrating multi-modal biomedical data offer superior predictive accuracy for early T2D identification compared to traditional approaches. This framework provides a replicable, clinically implementable methodology for precision diabetes screening and has implications for proactive intervention strategies, healthcare resource allocation, and personalized medicine protocols.

Cover Image
Downloads
Published
06/27/2026
Section
Articles
License

Copyright (c) 2026 Billy Elly (Author)

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Integrating Multi-Omics Data and Electronic Health Records for Early-Onset Type 2 Diabetes Prediction: A Comparative Evaluation of Advanced Hybrid Ensemble Classifiers. (2026). The Science Post, 2(2). https://www.thesciencepostjournal.com/index.php/tsp/article/view/151