Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions

Home Page
About
Submit A Journal
Submit A Conference
Submit Paper/Book
- Submit a Preprint
- Submit a Book
Contact

Journal of Health Sciences and Medicine
Cilt: 8 Sayı: 3
Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usabili...

Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions

Authors : İsmail Dal

Pages : 524-528

Doi:10.32322/jhsm.1660603

View : 61 | Download : 69

Publication Date : 2025-05-30

Article Type : Research Paper

Abstract :Aims: This study aims to evaluate the usefulness and reliability of artificial intelligence (AI) applications in thoracic surgery internship education and exam preparation. Methods: Claude Sonnet 3.7 AI was provided with core topics covered in the 5th-year thoracic surgery internship and was instructed to generate a 20-question multiple-choice exam, including an answer key. Four thoracic surgery specialists assessed the AI-generated questions using the Delphi panel method, classifying them as correct, minor error, or major error. Major errors included the absence of the correct answer among choices, incorrect AI-marked answers, or contradictions with established medical knowledge. A second exam was manually created by a thoracic surgery specialist and evaluated using the same methodology. Seven volunteer 5th-year medical students completed both exams, and the correlation between their scores was statistically analyzed. Results: Among AI-generated questions, 8 (40%) contained major errors, while 1 (5%) had a minor error. The expert-generated exam had a perfect accuracy rate, whereas the AI-generated exam had significantly lower accuracy (p=0.001). Median scores were 75 (67-100) for the AI exam and 85 (70-95) for the expert exam. No significant correlation was found between students’ scores (r=0.042, p=0.929). Conclusion: AI-generated questions had a high error rate (40% major, 5% minor), making them unreliable for unsupervised use in medical education. While AI may provide partial benefits under expert supervision, it currently lacks the accuracy required for independent implementation in thoracic surgery education.
Keywords : Yapay Zekâ, Göğüs Cerrahisi Eğitimi, Çoktan Seçmeli Testler, Delphi Tekniği.

ORIGINAL ARTICLE URL

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.

Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2026