IAD Index of Academic Documents
  • Home Page
  • About
    • About Izmir Academy Association
    • About IAD Index
    • IAD Team
    • IAD Logos and Links
    • Policies
    • Contact
  • Submit A Journal
  • Submit A Conference
  • Submit Paper/Book
    • Submit a Preprint
    • Submit a Book
  • Contact
  • Uludağ Üniversitesi Tıp Fakültesi Dergisi
  • Cilt: 51 Sayı: 2
  • Evaluating the Performance of Large Language Models in Generating Impressions for Radiology Reports

Evaluating the Performance of Large Language Models in Generating Impressions for Radiology Reports

Authors : Hasan Emin Kaya, Dilek Sağlam, Zeynep Yazıcı, Gökhan Gökalp
Pages : 305-309
Doi:10.32708/uutfd.1653680
View : 68 | Download : 53
Publication Date : 2025-08-28
Article Type : Research Paper
Abstract :The aim of the study was to evaluate and compare the performance of three popular large language models (LLMs) in generating impressions for radiology reports in Turkish. ChatGPT, Gemini, and Copilot were used to generate impressions for 50 anonymized radiology reports using a “few-shot” prompt. The impressions were scored by three radiologists using a Likert scale, based on whether they included all relevant information from the report, provided an appropriate summary of the report, contained no misleading information, and could be added to the report without modification. Friedman\\\'s test was used to evaluate whether there was a difference between the scores of the LLMs. The 50 reports included 32 magnetic resonance examinations, 11 computed tomography examinations, 5 ultrasound examinations, and 2 fluoroscopy examinations. Of these, 15 were neuroradiology studies, 14 were musculoskeletal studies, 13 were abdominal studies, and 8 were thoracic radiology studies. The median scores for the models’ outputs were 4 and 5. This finding indicates that the radiologists generally found the models successful in generating impressions. Furthermore, no statistically significant difference was found among the models in terms of their performance in containing all information, providing an appropriate summary, avoiding misleading information, and being suitable for inclusion in the report without modification (p = 0.607, 0.327, 0.629, 0.089, respectively). In conclusion, ChatGPT, Gemini, and Copilot were found to be successful in generating impressions for radiology reports in Turkish, and no significant difference in performance was detected among the models.
Keywords : radyoloji, yapay zeka, büyük dil modelleri

ORIGINAL ARTICLE URL

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.


Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2026