Research Article

Predicting Type 2 Diabetes Using Random Forest and XGBoost Algorithms: A Comparative Machine Learning Approach

Volume: 17 Number: April, May, June 2026 April 30, 2026

Predicting Type 2 Diabetes Using Random Forest and XGBoost Algorithms: A Comparative Machine Learning Approach

Abstract

Purpose: This study aims to compare the performance of random forest (RF) and XGBoost (XGB) ensemble learning algorithms for predicting Type 2 diabetes. Methods: The widely known and used PIMA Indians Diabetes dataset was utilized for model development. After data pre-processing, 5-fold cross-validation repeated five times was applied. Model performances were evaluated using accuracy, precision, sensitivity, specificity, F1-score, and AUC metrics. Results: The RF model achieved 0.817 accuracy and an AUC of 0.874, while the XGB model yielded 0.791 accuracy and an AUC of 0.874. In both models, glucose was identified as the most significant feature for predicting diabetes. Conclusion: The results showed that RF and XGB models demonstrated comparable discriminative performance under a reproducible analytical framework, with no statistically significant difference in AUC.

Keywords

References

  1. 1. Mayo Clinic Staff. Diabetes - Symptoms and causes - Mayo Clinic. Mayo Clinic [Internet]. 2020 [cited 2025 Apr 25]; Available from: https://www.mayoclinic.org/diseases-conditions/diabetes/ symptoms-causes/syc-20371444
  2. 2. American Diabetes Association (ADA). Understanding Type 2 Diabetes [Internet]. 2025 [cited 2025 Apr 25]. Available from: https:// diabetes.org/about-diabetes/type-2
  3. 3. World Health Organization (WHO). Diabetes [Internet]. 2024 [cited 2025 Apr 25]. Available from: https://www.who.int/news-room/ fact-sheets/detail/diabetes
  4. 4. Abnoosian K, Farnoosh R, Behzadi MH. Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics. 2023 Dec 1;24(1):1–24.
  5. 5. Sharma T, Shah M. A comprehensive review of machine learning techniques on diabetes detection. Visual Computing for Industry, Biomedicine, and Art. 2021 Dec 1;4(1):30.
  6. 6. Zaferani N, Afrash MR, Moulaei K. Predicting and classifying type 2 diabetes using a transparent ensemble model combining random forest, k-nearest neighbor, and neural networks. Scientific Reports. 2026 Dec 19;16(1):1892-.
  7. 7. Jayakumar A, Saji AK, Tom P, Thomas J. A Detailed Study on Diabetes Detection using The PIMA Indian Diabetes Database. International Research Journal of Modernization in Engineering. 2025;7(3):10353–8.
  8. 8. Abu-Shareha AA, Mosleh Abualhaj, Abdelrahman H. Hussein, Amal Amer, Anusha Achuthan, Alfian Abdul Halin. Diabetes Prediction Using Hybrid Supervised and Unsupervised Techniques Based on PIMA Dataset. Journal of Artificial Intelligence and Technology. 2025 Nov 23;6:79–87.

Details

Primary Language

English

Subjects

Clinical Sciences (Other)

Journal Section

Research Article

Publication Date

April 30, 2026

Submission Date

August 5, 2025

Acceptance Date

March 23, 2026

Published in Issue

Year 2026 Volume: 17 Number: April, May, June 2026

EndNote
Emre İE (April 1, 2026) Predicting Type 2 Diabetes Using Random Forest and XGBoost Algorithms: A Comparative Machine Learning Approach. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi 17 April, May, June 2026