Evaluation of the performance of current Artificial Intelligence Chatbots regarding patient information after coronary artery bypass surgery

Gökhan Yüksel; Selami Gürkan

doi:10.32322/jhsm.1752483

Evaluation of the performance of current Artificial Intelligence Chatbots regarding patient information after coronary artery bypass surgery

Authors : Gökhan Yüksel, Selami Gürkan

Pages : 879-883

Doi:10.32322/jhsm.1752483

View : 53 | Download : 117

Publication Date : 2025-09-16

Article Type : Research Paper

Abstract :Aims: This study aims to evaluate the performance of existing Artificial Intelligence (AI) driven Chatbots regarding patient information after coronary artery bypass (CABG) surgery. Methods: On July 1, 2025, a standardized medical prompt concerning the recovery process after CABG was submitted to ten prominent AI Chatbots: GPT-4o, GPT 4.1, Grok-4, Claude Opus-4, DeepSeek R1, Gemini Pro, Microsoft Copilot, Llama 4, Mistral Large 2, and Perplexity Sonar. Each response was assessed using two validated scoring systems; the modified Ensuring Quality Information for Patients (mEQIP) and the newly developed Quality Analysis of Medical Artificial Intelligence (QAMAI). Readability was evaluated using the average reading level consensus (ARLC) calculator, which aggregates eight standard readability formulas. Results: Among the tested Chatbots, Perplexity Sonar achieved the highest mEQIP score (91.7%) and the highest QAMAI score (29/30), while Gemini Pro received the lowest scores in both evaluations (72.2% mEQIP, 25/30 QAMAI). The average mEQIP score across all platforms was 80.43%, and the mean QAMAI score was 27/30, indicating generally high-quality responses. Readability assessment revealed that DeepSeek R1 provided the most comprehensible content (ARLC: 9.92, equivalent to a reading age of 15-16 years), while Llama 4 produced the most complex output (ARLC: 14.69, age 23+). The average ARLC across all Chatbots was 11.9, which corresponds to a college-level reading difficulty and exceeds the recommended sixth to eighth-grade readability level for patient education materials. Conclusion: AI Chatbots show promising capabilities in delivering post-CABG patient information, often achieving high scores in quality assessments. However, inconsistencies remain in readability, completeness, and source transparency. Despite the increasing sophistication of AI-generated health information, the elevated reading levels and inconsistent citation practices may hinder accessibility for general patient populations. To enhance their role in patient education, future Chatbot iterations should prioritize user-centered design, medical guideline compliance, and content simplification.
Keywords : Koroner Arter Bypass Greftleme, Yapay Zekâ, Sohbet Robotları

ORIGINAL ARTICLE URL