From Raw Data to Intelligent Questions: LLM-Powered Generation with Human Validation

Authors :Abdülkadir AKYILDIZ, Süleyman EKEN
Pages :90-142
Abstract :Human-in-the-loop data analytics, a long-standing research interest, focuses on evaluating, understanding, and formally reasoning about human participation in data management to develop optimized systems and techniques that treat humans as first-class citizens alongside data. This study aims to develop a human-in-the-loop and large language models-supported question generation system from heterogeneous sources (text, audio, video, and other documents). The research problem focuses on the need to gather and process a wide variety of data sources and to generate meaningful, leveled, and diverse types of questions from these sources. The proposed system first extracts text from the contents using faster-whisper, PyMuPDF, and other Python-based libraries. Then it combines extracted texts and organizes them with summarization techniques such as Luhn and TextRank in case the merged text exceeds the token limit of large language models. Six different types of questions (multiple choice, true/false, fill-in-the-blank, open-ended, matching, and short answer) are generated automatically using multiple APIbased and locally deployed large language models. Also, feedback from educators and domain experts is integrated into the revision phase of the system to enhance the linguistic and content quality of the generated questions. Evaluation of the models is conducted on the created questions for different grade levels (5th, 6th, 7th, and 8th) consisting of 60 topics spanning three subjects (Turkish, social studies, and science). Evaluation metrics include relevance, question type, difficulty level, correctness, clarity, and overall quality. "aya-expanse" has the best evaluation scores among other large language models. This study makes an important contribution to the automated and efficient production of educational content.
Keywords :Large language models, Human-in-the-loop, Generative ai, Question generation, Ai in education
Doi:10.5281/zenodo.16009204
Pdf URL :https://www.izmirakademi.org/books/The_Age_of_Generative_Artificial_Intelligence/cp5/pdf/cp5.pdf