Research Article

Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination

Volume: 17 Number: January, February, March 2026 February 20, 2026

Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination

Abstract

Background: This study systematically evaluated the performance of large language models (LLMs) on official periodontology questions from the Turkish Dental Specialization Examination (DUS). Methods: A total of 180 text-based questions (159 multiple-choice (MCQs), 21 combination-type MCQs (C-MCQs)) were categorized into nine domains across 13 years (2012–2024). In April 2025, eight LLMs were tested: ChatGPT-4o, ChatGPT-4o mini (OpenAI), Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash (Google DeepMind), Copilot (Microsoft), DeepSeek-V3 (DeepSeek), and Qwen 2.5-Max (Alibaba Cloud). Each question was submitted independently via official interfaces. Accuracy rates were compared across models, domains, years, and question types using Pearson’s chi-square test, with Cramér’s V and Phi coefficients reported for effect sizes. Results: Accuracy differed significantly by domain (χ²(8, N = 1440) = 38.20, p < .001, Cramér’s V = .163). Gemini 2.5 Pro achieved the highest performance, scoring 100% in six domains and ≥87.5% in others. ChatGPT-4o mini and Qwen 2.5-Max underperformed, particularly in Periodontium and Periodontal Treatment. Year-based analysis showed stable performance across 2012–2024 (χ²(12, N = 1440) = 14.51, p = .269). No difference emerged between MCQs and C-MCQs (χ²(1, N = 1440) = 1.42, p = .233). Conclusion: LLM accuracy in periodontology is domain- and model-dependent. Advanced systems such as Gemini 2.5 Pro show potential as supportive tools for education and clinical decision-making, yet persistent weaknesses in reasoning- and calculation-intensive areas underscore the need for expert oversight.

Keywords

Supporting Institution

No specific grant or financial support was received from any institution for this research.

Ethical Statement

Ethical approval was not required for this study as it involved analysis of publicly available data/literature and did not include human or animal subjects.

Thanks

Not applicable

References

  1. 1. Kamath P, Kamath P, Saldanha SJR, et al. A brief exploration of artificial intelligence in dental healthcare: a narrative review. F1000Res. 2024;13:37 DOI:10.12688/f1000research.140481.2
  2. 2. El-Hakim M, Anthonappa R and Fawzy A. Artificial intelligence in dental education: a scoping review of applications, challenges, and gaps. Dent J. 2025;13:384. DOI:10.3390/dj13090384
  3. 3. Patil R and Gudivada V. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci. 2024;14:2074. DOI:10.3390/app14052074
  4. 4. Ahmad P, Asif JA, Alam MK et al. A bibliometric analysis of Periodontology 2000. Periodontol 2000. 2020;82:286–97. DOI:10.1111/prd.12328
  5. 5. You W, Hao A, Li S, et al. Deep learning-based dental plaque detection on primary teeth: a comparison with clinical assessments. BMC Oral Health. 2020;20:141. DOI:10.1186/s12903-020-01114-6
  6. 6. Lin CC, Sun JS, Chang CH, et al. Performance of artificial intelligence chatbots in national dental licensing examination. J Dent Sci. 2025 DOI:10.1016/j.jds.2025.05.012
  7. 7. Ölçme, Seçme ve Yerleştirme Merkezi (ÖSYM). DUS çıkmış sorular. Available from: http://www.osym.gov.tr/TR,15070/dus-cikmissorular. html (Accessed April 21, 2025).
  8. 8. Sismanoglu S and Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and Gemini Advanced achieve comparable results to humans?. BMC Med Educ. 2025;25:214 DOI:10.1186/s12909-024-06389-9

Details

Primary Language

English

Subjects

Clinical Sciences (Other)

Journal Section

Research Article

Publication Date

February 20, 2026

Submission Date

November 3, 2025

Acceptance Date

December 11, 2025

Published in Issue

Year 2026 Volume: 17 Number: January, February, March 2026

EndNote
Erişken Y, Karaaslan F (February 1, 2026) Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi 17 January, February, March 2026