Addressing Severe Data Sparsity and Imbalance in Rural Telehealth Diabetes Screening: A Comparative Study of Cost-Sensitive XGBoost and Synthetic Over-Sampling Random Forest Pipelines
- Authors
-
-
Abilly Elly
Texas UniversityAuthor
-
- Keywords:
- Diabetes Screening, Rural Telehealth, Data Imbalance, Cost-Sensitive Learning, XGBoost, Random Forest, SMOTE
- Abstract
-
Diabetes mellitus remains a critical global health challenge, with rural populations facing disproportionate barriers to early screening and diagnosis. Machine learning (ML) approaches offer promising solutions for diabetes risk stratification, yet their application in rural telehealth contexts is severely constrained by two interrelated challenges: extreme data sparsity and class imbalance, where diabetic cases are significantly underrepresented. This study addresses these limitations through a comparative analysis of two specialized ML pipelines for diabetes screening in resource-constrained rural settings. We propose and evaluate a Cost-Sensitive XGBoost (CS-XGB) pipeline incorporating class-weighted optimization and a Synthetic Minority Over-sampling Technique enhanced Random Forest (SMOTE-RF) pipeline designed for imbalanced medical data. Using the PIMA Indian Diabetes dataset as a benchmark, the CS-XGB pipeline achieved an accuracy of 89.4% with a minority-class recall of 0.91 and AUC of 0.94, outperforming the SMOTE-RF pipeline (accuracy 87.6%, recall 0.88, AUC 0.92) and conventional baseline models. Cost-sensitive learning demonstrated superior handling of extreme imbalance without introducing synthetic data bias. Feature importance analysis identified glucose, BMI, and age as the strongest predictors, consistent with clinical literature. This research provides a replicable framework for deploying robust, interpretable diabetes screening tools in rural telehealth systems, with practical implications for improving early detection in underserved populations.
- Downloads
- Published
- 06/27/2026
- Section
- Articles
- License
-
Copyright (c) 2026 Abilly Elly (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
