Predicting Gestational Diabetes Mellitus (GDM) Risk in Geographically and Socioeconomically Diverse Populations: Leveraging XGBoost and Random Forest on Stratified Electronic Health Records
- Authors
-
-
Abilly Elly
Texas UniversityAuthor
-
- Keywords:
- Gestational Diabetes Mellitus, Machine Learning, XGBoost, Random Forest, Electronic Health Records, Predictive Modeling, Health Equity, Prenatal Risk Stratification
- Abstract
-
Gestational Diabetes Mellitus (GDM) is a prevalent pregnancy complication with global prevalence reaching approximately 14%, posing significant risks to maternal and fetal health. Current diagnostic approaches rely on oral glucose tolerance tests (OGTT) performed at 24–28 weeks gestation, which delays intervention and fails to leverage the predictive potential of early pregnancy data. While machine learning has shown promise in disease prediction, existing models predominantly focus on single populations, limiting generalizability across geographically and socioeconomically diverse groups. This study addresses this gap by developing and validating a hybrid predictive framework using XGBoost and Random Forest classifiers on stratified Electronic Health Records (EHR) from 27,561 pregnancies across multiple healthcare settings. The proposed framework incorporates clinical, demographic, and obstetric history features collected during the first trimester (8-14 weeks). The ensemble model achieved superior predictive performance with an accuracy of 89.4% and an AUROC of 0.904, significantly outperforming traditional logistic regression baselines (AUROC 0.817). Feature importance analysis identified maternal age, pre-pregnancy BMI, family history of diabetes, and prior GDM history as the most influential predictors. The framework demonstrates robust performance across socioeconomic strata, with consistent AUROC values ranging from 0.881 to 0.904 across subgroups. This research provides a replicable, interpretable framework for early GDM risk stratification, enabling timely interventions and personalized prenatal care. The findings have significant implications for clinical practice, health policy, and the advancement of predictive analytics in obstetrics.
- Downloads
- Published
- 06/27/2026
- Section
- Articles
- License
-
Copyright (c) 2026 Abilly Elly (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
