- Researcher
- Cilt: 05 Sayı: 01
- Bridging the Language Gap in RAG: A Case Study on Turkish Retrieval and Generation
Bridging the Language Gap in RAG: A Case Study on Turkish Retrieval and Generation
Authors : Erdoğan Bikmaz, Mohammed Briman, Serdar Arslan
Pages : 38-49
View : 30 | Download : 93
Publication Date : 2025-07-31
Article Type : Research Paper
Abstract :With the rise of Large Language Models (LLMs) and LLM-based RAG systems, there is a high demand for developing RAG applications that utilize LLM reasoning capabilities for handling intensive text systems in multilingual settings. However, RAG components are primarily developed for the English language, which hinders their ability to retrieve and construct precise multilingual information for LLMs to answer, especially for the Turkish language. In this work, we aim to explore the effects of developing comprehensive RAG systems that handle Turkish question-answer retrieval and generation tasks. We experiment with fine-tuning two major components on Turkish data: the embedding model used for data ingestion and retrieval, and a reranker model that ranks the retrieved documents based on their relevance to a query. We evaluate four RAG systems using six evaluation metrics. Experimental results show that fine-tuning retrieval components on Turkish data improves the accuracy of LLM responses and leads to improved context construction.Keywords : bilgi erişimi destekli üretim, büyük dil modelleri, gömütme, bilgi erişimi
ORIGINAL ARTICLE URL
