- Mühendislik Bilimleri ve Araştırmaları Dergisi
- Cilt: 7 Sayı: 1
- Medical Text Classification Using Semisupervised Learning and Bert-Based Models
Medical Text Classification Using Semisupervised Learning and Bert-Based Models
Authors : Fatih Soygazi, Damla Oğuz
Pages : 60-69
Doi:10.46387/bjesr.1597329
View : 43 | Download : 142
Publication Date : 2025-04-30
Article Type : Research Paper
Abstract :Medical text classification organizes complex medical texts, facing challenges like insufficient training data. This paper proposes a novel method for categorizing medical texts based on a dataset of health problem abstracts and their labels. We applied data representation techniques to our labeled dataset and employed various machine learning algorithms for text classification. Initial results were unsatisfactory due to limited labeled data. To enhance this, we applied data augmentation techniques using an unlabeled dataset, utilizing BERT-based models (BioBERT, ClinicalBERT) to enrich the labeled data. Different voting mechanisms, namely hard voting and soft voting were employed to validate and add new labeled records to the dataset. After augmenting the labeled data, machine learning algorithms were re-applied. The results demonstrated that our approach significantly improves the performance of medical text classification, effectively addressing the challenges posed by limited labeled data and enhancing overall accuracy.Keywords : BioBERT, ClinicalBERT, Klinik Metin Sınıflandırması, Veri Artırma, Oylama Mekanizmaları
ORIGINAL ARTICLE URL
