IAD Index of Academic Documents
  • Home Page
  • About
    • About Izmir Academy Association
    • About IAD Index
    • IAD Team
    • IAD Logos and Links
    • Policies
    • Contact
  • Submit A Journal
  • Submit A Conference
  • Submit Paper/Book
    • Submit a Preprint
    • Submit a Book
  • Contact
  • Journal of Health Sciences and Medicine
  • Cilt: 8 Sayı: 5
  • Evaluation of the performance of current Artificial Intelligence Chatbots regarding patient informat...

Evaluation of the performance of current Artificial Intelligence Chatbots regarding patient information after coronary artery bypass surgery

Authors : Gökhan Yüksel, Selami Gürkan
Pages : 879-883
Doi:10.32322/jhsm.1752483
View : 50 | Download : 117
Publication Date : 2025-09-16
Article Type : Research Paper
Abstract :Aims: This study aims to evaluate the performance of existing Artificial Intelligence (AI) driven Chatbots regarding patient information after coronary artery bypass (CABG) surgery. Methods: On July 1, 2025, a standardized medical prompt concerning the recovery process after CABG was submitted to ten prominent AI Chatbots: GPT-4o, GPT 4.1, Grok-4, Claude Opus-4, DeepSeek R1, Gemini Pro, Microsoft Copilot, Llama 4, Mistral Large 2, and Perplexity Sonar. Each response was assessed using two validated scoring systems; the modified Ensuring Quality Information for Patients (mEQIP) and the newly developed Quality Analysis of Medical Artificial Intelligence (QAMAI). Readability was evaluated using the average reading level consensus (ARLC) calculator, which aggregates eight standard readability formulas. Results: Among the tested Chatbots, Perplexity Sonar achieved the highest mEQIP score (91.7%) and the highest QAMAI score (29/30), while Gemini Pro received the lowest scores in both evaluations (72.2% mEQIP, 25/30 QAMAI). The average mEQIP score across all platforms was 80.43%, and the mean QAMAI score was 27/30, indicating generally high-quality responses. Readability assessment revealed that DeepSeek R1 provided the most comprehensible content (ARLC: 9.92, equivalent to a reading age of 15-16 years), while Llama 4 produced the most complex output (ARLC: 14.69, age 23+). The average ARLC across all Chatbots was 11.9, which corresponds to a college-level reading difficulty and exceeds the recommended sixth to eighth-grade readability level for patient education materials. Conclusion: AI Chatbots show promising capabilities in delivering post-CABG patient information, often achieving high scores in quality assessments. However, inconsistencies remain in readability, completeness, and source transparency. Despite the increasing sophistication of AI-generated health information, the elevated reading levels and inconsistent citation practices may hinder accessibility for general patient populations. To enhance their role in patient education, future Chatbot iterations should prioritize user-centered design, medical guideline compliance, and content simplification.
Keywords : Koroner Arter Bypass Greftleme, Yapay Zekâ, Sohbet Robotları

ORIGINAL ARTICLE URL

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.


Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2026