header

Generative Adversarial Networks (GANs) for Synthetic Numerical Biomarker Upsampling to Improve Machine Learning Classification in Rare Cancer Staging

Authors
  • Sunday Sunday

    Ladoke Akintola University of Technology
    Author
Keywords:
Generative Adversarial Networks, Cancer Staging, Numerical Biomarker Upsampling, Rare Cancer, Machine Learning Classification, Synthetic Data
Abstract

Accurate cancer staging is fundamental to determining prognosis and guiding treatment decisions, yet rare cancers present a unique challenge due to limited patient samples that constrain the training of robust machine learning classifiers. This research addresses the critical gap where numerical biomarker datasets for rare cancers are typically small and imbalanced, leading to poor model generalization and suboptimal staging accuracy. This study proposes a framework that employs Generative Adversarial Networks (GANs) for synthetic numerical biomarker upsampling to enhance machine learning classification performance for cancer staging. Using retrospective data from The Cancer Genome Atlas (TCGA) and leveraging a hybrid feature selection approach combining DNA mutation data with Random Forest ranking, mRNA expression data for selected biomarkers were augmented using a GAN architecture. Classification was performed using 1-Dimensional Convolutional Neural Networks (1DCNN), Deep Neural Networks (DNNs), and Random Forest classifiers. The proposed methodology achieved a classification accuracy of 89.4% for cancer stage prediction using the augmented dataset, representing a significant improvement over the baseline accuracy of 72.1% achieved with original data alone. Notably, the augmented datasets demonstrated superior performance even when utilizing only 30% of the original samples, suggesting substantial reduction in clinical data collection requirements. The framework offers a replicable, non-invasive approach for enhancing cancer staging accuracy in rare malignancies, with implications for clinical decision-making, treatment planning, and resource allocation. This research establishes GAN-based upsampling as a viable strategy for overcoming data scarcity in precision oncology.

Cover Image
Downloads
Published
06/26/2026
Section
Articles
License

Copyright (c) 2026 Sunday Sunday (Author)

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Generative Adversarial Networks (GANs) for Synthetic Numerical Biomarker Upsampling to Improve Machine Learning Classification in Rare Cancer Staging. (2026). The Science Post, 2(2). https://www.thesciencepostjournal.com/index.php/tsp/article/view/140