Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination

Yaren Erişken; Fatih Karaaslan

doi:10.31067/acusaglik.1816444

Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination

Abstract

Background: This study systematically evaluated the performance of large language models (LLMs) on official periodontology questions from the Turkish Dental Specialization Examination (DUS). Methods: A total of 180 text-based questions (159 multiple-choice (MCQs), 21 combination-type MCQs (C-MCQs)) were categorized into nine domains across 13 years (2012–2024). In April 2025, eight LLMs were tested: ChatGPT-4o, ChatGPT-4o mini (OpenAI), Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash (Google DeepMind), Copilot (Microsoft), DeepSeek-V3 (DeepSeek), and Qwen 2.5-Max (Alibaba Cloud). Each question was submitted independently via official interfaces. Accuracy rates were compared across models, domains, years, and question types using Pearson’s chi-square test, with Cramér’s V and Phi coefficients reported for effect sizes. Results: Accuracy differed significantly by domain (χ²(8, N = 1440) = 38.20, p < .001, Cramér’s V = .163). Gemini 2.5 Pro achieved the highest performance, scoring 100% in six domains and ≥87.5% in others. ChatGPT-4o mini and Qwen 2.5-Max underperformed, particularly in Periodontium and Periodontal Treatment. Year-based analysis showed stable performance across 2012–2024 (χ²(12, N = 1440) = 14.51, p = .269). No difference emerged between MCQs and C-MCQs (χ²(1, N = 1440) = 1.42, p = .233). Conclusion: LLM accuracy in periodontology is domain- and model-dependent. Advanced systems such as Gemini 2.5 Pro show potential as supportive tools for education and clinical decision-making, yet persistent weaknesses in reasoning- and calculation-intensive areas underscore the need for expert oversight.

Keywords

Supporting Institution

No specific grant or financial support was received from any institution for this research.

Ethical Statement

Ethical approval was not required for this study as it involved analysis of publicly available data/literature and did not include human or animal subjects.

Thanks

Not applicable

References

1. Kamath P, Kamath P, Saldanha SJR, et al. A brief exploration of artificial intelligence in dental healthcare: a narrative review. F1000Res. 2024;13:37 DOI:10.12688/f1000research.140481.2
2. El-Hakim M, Anthonappa R and Fawzy A. Artificial intelligence in dental education: a scoping review of applications, challenges, and gaps. Dent J. 2025;13:384. DOI:10.3390/dj13090384
3. Patil R and Gudivada V. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci. 2024;14:2074. DOI:10.3390/app14052074
4. Ahmad P, Asif JA, Alam MK et al. A bibliometric analysis of Periodontology 2000. Periodontol 2000. 2020;82:286–97. DOI:10.1111/prd.12328
5. You W, Hao A, Li S, et al. Deep learning-based dental plaque detection on primary teeth: a comparison with clinical assessments. BMC Oral Health. 2020;20:141. DOI:10.1186/s12903-020-01114-6
6. Lin CC, Sun JS, Chang CH, et al. Performance of artificial intelligence chatbots in national dental licensing examination. J Dent Sci. 2025 DOI:10.1016/j.jds.2025.05.012
7. Ölçme, Seçme ve Yerleştirme Merkezi (ÖSYM). DUS çıkmış sorular. Available from: http://www.osym.gov.tr/TR,15070/dus-cikmissorular. html (Accessed April 21, 2025).
8. Sismanoglu S and Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and Gemini Advanced achieve comparable results to humans?. BMC Med Educ. 2025;25:214 DOI:10.1186/s12909-024-06389-9

9. Naik S, Al-Kheraif AA and Vellappally S. Artificial intelligence in dentistry: assessing the informational quality of YouTube videos. PLoS One. 2025;20:e0316635. DOI:10.1371/journal.pone.0316635
10. Louca C, Tonni I, Leung A,et al. Artificial intelligence: friend or foe in the assessment of dental students?. J Dent. 2025;156:105676. DOI:10.1016/j.jdent.2025.105676
11. El-Hakim M, Anthonappa R and Fawzy A. Artificial intelligence in dental education: a scoping review of applications, challenges, and gaps. Dent J. 2025;13:384. DOI:10.3390/dj13090384
12. Bahadir HS, Keskin NB, Çakmak EŞK, et al. Comparison of diagnoses made by dentistry students and by artificial intelligence dentists. J Dent Educ. 2025;89:1165-73. DOI:10.1002/jdd.13810
13. Prakash K and Prakash R. An artificial intelligence-based dental semantic search engine as a reliable tool for dental students and educators. J Dent Educ. 2024;88:1257–66. DOI:10.1002/jdd.13560
14. Qutieshat A, Al Rusheidi A, Al Ghammari S et al. Comparative analysis of diagnostic accuracy in endodontic assessments: dental students vs. artificial intelligence. Diagnosis (Berl). 2024;11:259–65. DOI:10.1515/dx-2024-0034
15. Ardestani M, Kamalloo E and Rafiei D. LongRecall: a structured approach for robust recall evaluation in long-form text. arXiv preprint arXiv:2508.15085. 2025. DOI:10.48550/arXiv.2508.15085
16. Wang Y, Cui H and Kleinberg J. Microstructures and accuracy of graph recall by large language models. Adv Neural Inf Process Syst. 2024;37:65154-82. DOI:10.48550/arXiv.2402.11821
17. Lucas HC, Upperman JS and Robinson JR. A systematic review of large language models and their implications in medical education. Med Educ. 2024;58:1276–85. DOI:10.1111/medu.15402
18. Kung TH, Cheatham M, Medenilla A ,et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. DOI:10.1371/ journal.pdig.0000198
19. Puleio F, Lo Giudice G, Bellocchio AM et al. Clinical, research, and educational applications of ChatGPT in dentistry: a narrative review. Appl Sci. 2024;14:10802. DOI:10.3390/app142310802

Details

Primary Language

English

Subjects

Clinical Sciences (Other)

Journal Section

Research Article

Authors

Yaren Erişken ^*
0009-0004-8624-2621
Türkiye

Fatih Karaaslan
0000-0002-9899-3316
Türkiye

Publication Date

February 20, 2026

Submission Date

November 3, 2025

Acceptance Date

December 11, 2025

Published in Issue

Year 2026 Volume: 17 Number: January, February, March 2026

DOI

https://doi.org/10.31067/acusaglik.1816444

IZ

https://izlik.org/JA68GM89SU

Cite

RIS / Bibtex

APA

Erişken, Y., & Karaaslan, F. (2026). Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi, 17(January, February, March 2026). https://doi.org/10.31067/acusaglik.1816444

AMA

1.Erişken Y, Karaaslan F. Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acibadem Univ Saglik Bilim Derg. 2026;17(January, February, March 2026). doi:10.31067/acusaglik.1816444

Chicago

Erişken, Yaren, and Fatih Karaaslan. 2026. “Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination”. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi 17 (January, February, March 2026). https://doi.org/10.31067/acusaglik.1816444.

EndNote

Erişken Y, Karaaslan F (February 1, 2026) Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi 17 January, February, March 2026

IEEE

[1]Y. Erişken and F. Karaaslan, “Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination”, Acibadem Univ Saglik Bilim Derg, vol. 17, no. January, February, March 2026, Feb. 2026, doi: 10.31067/acusaglik.1816444.

ISNAD

Erişken, Yaren - Karaaslan, Fatih. “Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination”. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi 17/January, February, March 2026 (February 1, 2026). https://doi.org/10.31067/acusaglik.1816444.

JAMA

1.Erişken Y, Karaaslan F. Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acibadem Univ Saglik Bilim Derg. 2026;17. doi:10.31067/acusaglik.1816444.

MLA

Erişken, Yaren, and Fatih Karaaslan. “Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination”. Acıbadem Üniversitesi Sağlık Bilimleri Dergisi, vol. 17, no. January, February, March 2026, Feb. 2026, doi:10.31067/acusaglik.1816444.

Vancouver

1.Yaren Erişken, Fatih Karaaslan. Performance of Large Language Models on Official Periodontology Questions: A 13-Year Analysis of the Turkish Dental Specialization Examination. Acibadem Univ Saglik Bilim Derg. 2026 Feb. 1;17(January, February, March 2026). doi:10.31067/acusaglik.1816444