A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Home Page
About
Submit A Journal
Submit A Conference
Submit Paper/Book
- Submit a Preprint
- Submit a Book
Contact

Medical Records
Volume:7 Issue:1
A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladd...

A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Authors : Mustafa Azizoğlu, Sergey Klyuev

Pages : 201-205

Doi:10.37990/medr.1601528

View : 31 | Download : 71

Publication Date : 2025-01-15

Article Type : Research Paper

Abstract :Aim: The rapid evolution of artificial intelligence (AI) has revolutionized medicine, with tools like ChatGPT and Google Gemini enhancing clinical decision-making. ChatGPT\\\'s advancements, particularly with GPT-4, show promise in diagnostics and education. However, variability in accuracy and limitations in complex scenarios emphasize the need for further evaluation of these models in medical applications. This study aimed to assess the accuracy and agreement between ChatGPT 4.o and Gemini AI in identifying bladder-related conditions, including neurogenic bladder, vesicoureteral reflux (VUR), and posterior urethral valve (PUV). Material and Method: This study, conducted in October 2024, compared ChatGPT 4.o and Gemini AI\\\'s accuracy on 51 questions about neurogenic bladder, VUR, and PUV. Questions, randomly selected from pediatric surgery and urology materials, were evaluated using accuracy metrics and statistical analysis, highlighting AI models\\\' performance and agreement. Results: ChatGPT 4.o and Gemini AI demonstrated similar accuracy across neurogenic bladder, VUR, and PUV questions, with true response rates of 66.7% and 68.6%, respectively, and no statistically significant differences (p>0.05). Combined accuracy across all topics was 67.6%. Strong inter-rater reliability (κ=0.87) highlights their agreement. Conclusion: This study highlights the comparable accuracy of ChatGPT-4.o and Gemini AI across key bladder-related conditions, with no significant differences in performance.
Keywords : ChatGPT, Gemini, articifial intelligence, bladder

ORIGINAL ARTICLE URL

VIEW PAPER (PDF)

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.

Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2025