Generative Adversarial Networks (GANs) for Synthetic Numerical Biomarker Upsampling to Improve Machine Learning Classification in Rare Cancer Staging
- Authors
-
-
Sunday Sunday
Ladoke Akintola University of TechnologyAuthor
-
- Keywords:
- Generative Adversarial Networks, Cancer Staging, Numerical Biomarker Upsampling, Rare Cancer, Machine Learning Classification, Synthetic Data
- Abstract
-
Accurate cancer staging is fundamental to determining prognosis and guiding treatment decisions, yet rare cancers present a unique challenge due to limited patient samples that constrain the training of robust machine learning classifiers. This research addresses the critical gap where numerical biomarker datasets for rare cancers are typically small and imbalanced, leading to poor model generalization and suboptimal staging accuracy. This study proposes a framework that employs Generative Adversarial Networks (GANs) for synthetic numerical biomarker upsampling to enhance machine learning classification performance for cancer staging. Using retrospective data from The Cancer Genome Atlas (TCGA) and leveraging a hybrid feature selection approach combining DNA mutation data with Random Forest ranking, mRNA expression data for selected biomarkers were augmented using a GAN architecture. Classification was performed using 1-Dimensional Convolutional Neural Networks (1DCNN), Deep Neural Networks (DNNs), and Random Forest classifiers. The proposed methodology achieved a classification accuracy of 89.4% for cancer stage prediction using the augmented dataset, representing a significant improvement over the baseline accuracy of 72.1% achieved with original data alone. Notably, the augmented datasets demonstrated superior performance even when utilizing only 30% of the original samples, suggesting substantial reduction in clinical data collection requirements. The framework offers a replicable, non-invasive approach for enhancing cancer staging accuracy in rare malignancies, with implications for clinical decision-making, treatment planning, and resource allocation. This research establishes GAN-based upsampling as a viable strategy for overcoming data scarcity in precision oncology.
- Downloads
- Published
- 06/26/2026
- Section
- Articles
- License
-
Copyright (c) 2026 Sunday Sunday (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
