- Anadolu Bil Meslek Yüksekokulu Dergisi
- Cilt: 20 Sayı: 72
- Assessing language model performance in biomedical question-answering: A case study using the langch...
Assessing language model performance in biomedical question-answering: A case study using the langchain framework on the CliCR Dataset
Authors : Feras Almannaa, Ferdi Sönmez
Pages : 167-188
View : 30 | Download : 104
Publication Date : 2025-12-12
Article Type : Research Paper
Abstract :This paper focuses on developing and implementing a biomedical question-answering (BQA) system using large language models (LLMs) and the CliCR dataset, in combination with the LangChain framework. The study evaluates several models, including GPT-3.5, GPT-4, LLAMA3, and Mistral, in handling clinical questions. Key methodologies include data preparation, prompt engineering, and model adaptation. The evaluation employs metrics such as precision, recall, F1-score, BLEU scores, and embedding-based metrics. Results show that using the entire case context significantly outperforms chunking and vector store indexing methods. Notably, GPT-4 achieved an exact match score of 44.7%, surpassing human experts. Although fine-tuning improves domain-specific performance, there\\\'s a risk of overfitting. This research adds to the progress in BQA systems with possible benefits for clinical decision-making and medical education.Keywords : biyomedikal soru-cevap, CliCR, Değerlendirme, LLM, LangChain, GPT, Mistral, LLAMA, Cohere, RAG, İstem mühendisliği
ORIGINAL ARTICLE URL
