IAD Index of Academic Documents
  • Home Page
  • About
    • About Izmir Academy Association
    • About IAD Index
    • IAD Team
    • IAD Logos and Links
    • Policies
    • Contact
  • Submit A Journal
  • Submit A Conference
  • Submit Paper/Book
    • Submit a Preprint
    • Submit a Book
  • Contact
  • Anatolian Current Medical Journal
  • Cilt: 7 Sayı: 5
  • Fabricated or accurate? Ethical concerns and citation hallucination in aI-generated scientific writi...

Fabricated or accurate? Ethical concerns and citation hallucination in aI-generated scientific writing on musculoskeletal topics

Authors : Ertuğrul Safran, Adem Çalı
Pages : 695-702
Doi:10.38053/acmj.1746227
View : 98 | Download : 114
Publication Date : 2025-09-15
Article Type : Research Paper
Abstract :Aims: Large language models (LLMs) such as ChatGPT are increasingly used in academic and clinical writing. While these tools can generate coherent and domain-specific text, concerns persist regarding the accuracy of their automatically generated references. In musculoskeletal rehabilitation—a field heavily reliant on current evidence—the reliability of citations is especially critical. Yet, systematic evaluations of citation accuracy in AI-generated scientific content are lacking. To evaluate the reference accuracy of scientific texts generated by ChatGPT (GPT-4) in response to musculoskeletal rehabilitation prompts, and to determine whether reference accuracy improves following structured post-generation verification. Methods: ChatGPT was prompted to generate four scientific paragraphs on musculoskeletal rehabilitation topics (manual therapy, ACL reconstruction, low back pain, and rotator cuff repair), each including 10 references with DOIs. A total of 40 references were analyzed using a 3-point scoring system (0=fabricated, 1=partially correct, 2=fully accurate), which was used to assess citation quality. After initial evaluation, ChatGPT was asked to verify and revise its references. Scores before and after this step were compared descriptively and with Wilcoxon signed-rank tests to assess statistical significance, and effect sizes (r) were calculated to estimate the magnitude of improvement. Results: Only 7.5% of references were fully accurate in the initial generation, while 42.5% were completely fabricated. The remaining 50% were partially correct. After verification, the proportion of fully accurate references rose to 77.5%. Wilcoxon signed-rank testing confirmed a statistically significant improvement in accuracy across all prompts (W=561.0, p<0.001, r=0.60). The most common errors included invalid DOIs, fabricated article titles, and mismatched metadata. Conclusion: ChatGPT can generate coherent scientific content, but its initial references are frequently inaccurate or fabricated. Structured post-generation verification significantly improves reference accuracy, as confirmed by statistical testing. These findings suggest that LLMs may be integrated as drafting tools in academic and clinical musculoskeletal contexts, but only when accompanied by strict human-led verification of citations.
Keywords : ChatGPT, Yapay zeka, Muskuloskeletal rehabilitasyon, bilimsel yazı, referans doğruluğu, atıf halüsinasyonu

ORIGINAL ARTICLE URL

* There may have been changes in the journal, article,conference, book, preprint etc. informations. Therefore, it would be appropriate to follow the information on the official page of the source. The information here is shared for informational purposes. IAD is not responsible for incorrect or missing information.


Index of Academic Documents
İzmir Academy Association
CopyRight © 2023-2026