header

Predicting Gestational Diabetes Mellitus (GDM) Risk in Geographically and Socioeconomically Diverse Populations: Leveraging XGBoost and Random Forest on Stratified Electronic Health Records

Authors
  • Abilly Elly

    Texas University
    Author
Keywords:
Gestational Diabetes Mellitus, Machine Learning, XGBoost, Random Forest, Electronic Health Records, Predictive Modeling, Health Equity, Prenatal Risk Stratification
Abstract

Gestational Diabetes Mellitus (GDM) is a prevalent pregnancy complication with global prevalence reaching approximately 14%, posing significant risks to maternal and fetal health. Current diagnostic approaches rely on oral glucose tolerance tests (OGTT) performed at 24–28 weeks gestation, which delays intervention and fails to leverage the predictive potential of early pregnancy data. While machine learning has shown promise in disease prediction, existing models predominantly focus on single populations, limiting generalizability across geographically and socioeconomically diverse groups. This study addresses this gap by developing and validating a hybrid predictive framework using XGBoost and Random Forest classifiers on stratified Electronic Health Records (EHR) from 27,561 pregnancies across multiple healthcare settings. The proposed framework incorporates clinical, demographic, and obstetric history features collected during the first trimester (8-14 weeks). The ensemble model achieved superior predictive performance with an accuracy of 89.4% and an AUROC of 0.904, significantly outperforming traditional logistic regression baselines (AUROC 0.817). Feature importance analysis identified maternal age, pre-pregnancy BMI, family history of diabetes, and prior GDM history as the most influential predictors. The framework demonstrates robust performance across socioeconomic strata, with consistent AUROC values ranging from 0.881 to 0.904 across subgroups. This research provides a replicable, interpretable framework for early GDM risk stratification, enabling timely interventions and personalized prenatal care. The findings have significant implications for clinical practice, health policy, and the advancement of predictive analytics in obstetrics.

Cover Image
Downloads
Published
06/27/2026
Section
Articles
License

Copyright (c) 2026 Abilly Elly (Author)

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Predicting Gestational Diabetes Mellitus (GDM) Risk in Geographically and Socioeconomically Diverse Populations: Leveraging XGBoost and Random Forest on Stratified Electronic Health Records. (2026). The Science Post, 2(2). https://www.thesciencepostjournal.com/index.php/tsp/article/view/152