Comparative Performance Evaluation of Multimodal Large Language Models, Radiologist, and Anatomist in Visual Neuroanatomy Questions

Home Page
About
Submit A Journal
Submit A Conference
Submit Paper/Book
- Submit a Preprint
- Submit a Book
Contact

Uludağ Üniversitesi Tıp Fakültesi Dergisi
Volume:50 Issue:3
Comparative Performance Evaluation of Multimodal Large Language Models, Radiologist, and Anatomist i...

Comparative Performance Evaluation of Multimodal Large Language Models, Radiologist, and Anatomist in Visual Neuroanatomy Questions

Authors : Yasin Celal Güneş, Mehmet Ülkir

Pages : 551-556

Doi:10.32708/uutfd.1568479

View : 61 | Download : 66

Publication Date : 2025-01-12

Article Type : Research Paper

Abstract :This study examined the performance of four different multimodal Large Language Models (LLMs)—GPT4-V, GPT-4o, LLaVA, and Gemini 1.5 Flash—on multiple-choice visual neuroanatomy questions, comparing them to a radiologist and an anatomist. The study employed a cross-sectional design and evaluated responses to 100 visual questions sourced from the Radiopaedia website. The accuracy of the responses was analyzed using the McNemar test. According to the results, the radiologist demonstrated the highest performance with an accuracy rate of 90%, while the anatomist achieved an accuracy rate of 67%. Among the multimodal LLMs, GPT-4o performed the best, with an accuracy rate of 45%, followed by Gemini 1.5 Flash at 35%, ChatGPT4-V at 22%, and LLaVA at 15%. The radiologist significantly outperformed both the anatomist and all multimodal LLMs (p<0.001). GPT-4o significantly outperformed GPT4-V and LLaVA (p<0.001), but no significant difference was found between GPT-4o and Gemini 1.5 Flash (p=0.123). However, Gemini 1.5 Flash showed significant superiority over LLaVA (p<0.001) and also demonstrated a statistically significant difference compared to GPT4-V (p=0.004). This study highlights the significant performance gap between multimodal LLMs and medical professionals. While multimodal LLMs hold great potential in the medical field, they have not yet reached the level of accuracy of medical experts in correctly identifying neuroanatomical regions.
Keywords : nöroanatomi, büyük dil modelleri, GPT-4o, Gemini 1.5 Flash

ORIGINAL ARTICLE URL

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.

Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2026