- Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Dergisi
- Cilt: 27 Sayı: 80
- Discovering Latent Themes in Heart Disease Article Abstracts: A Topic Modeling Approach
Discovering Latent Themes in Heart Disease Article Abstracts: A Topic Modeling Approach
Authors : Burcu Baştürk, Aytuğ Onan
Pages : 216-223
Doi:10.21205/deufmd.2025278007
View : 124 | Download : 97
Publication Date : 2025-05-23
Article Type : Research Paper
Abstract :Heart disease is a global public health problem that requires in-depth analysis of extensive literature to uncover specific themes and relationships. This study aimed to identify latent themes and calculate consistencies in 5,000 heart disease-related abstracts retrieved from PubMed using topic modeling techniques. The original abstracts were paraphrased using ChatGPT and NLTK(Natural Language Toolkit), followed by extensive preprocessing, including tokenization, removal of stopped words, stemming, and lemmatization. For effective feature extraction, text data was vectorized using TF-IDF (term frequency-inverse document frequency). Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), and Non-Negative Matrix Factorization (NMF) were applied to reveal key thematic structures. Coherence scores were calculated and compared across different numbers of subjects (5 to 50) for each model and annotation method. This approach provides a valuable methodology for summarizing large amounts of information, allowing researchers to efficiently navigate the complex landscape of heart disease literature and identify critical areas of focus. The findings aim to improve understanding of heart disease and support future research in this vital area.Keywords : Kalp Hastalığı, Konu Modelleme, Gizli Dirichlet Tahsisi (LDA), Gizli Semantik Analiz (LSA), Negatif Olmayan Matris Faktorizasyonu (NMF), Tutarlılık Puanları, Doğal Dil İşleme
ORIGINAL ARTICLE URL
