- Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi
- Cilt: 14 Sayı: 4
- The effect of text representation and model selection on classification performance: A comprehensive...
The effect of text representation and model selection on classification performance: A comprehensive comparison of TF-IDF, Bow and Transformer-based methods on the Covid19-FNIR dataset
Authors : Muhammet Sinan Başarslan, Fatih Bal
Pages : 1447-1461
Doi:10.28948/ngumuh.1694988
View : 64 | Download : 106
Publication Date : 2025-10-15
Article Type : Research Paper
Abstract :This study evaluates the performance of various machine learning (ML) models on a dataset split into 80% training and 20% testing using Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW) text vectorization. Transformer-based models like DistilBERT, RoBERTa, and alBERT were integrated with classical ML algorithms and ensemble methods such as Stacking, Hard Voting, and Soft Voting. Stacking achieved the highest performance with both methods—92.62% Accuracy (Acc) and 92.51% F1-score (F1) with TF-IDF, and 92.29% Acc and 92.41% F1 with BoW. Hard Voting with BoW yielded the highest Recall (95.23%). Classical models like Logistic Regression (LR) and Support Vector Machine (SVM) performed better with BoW, reaching 90.98% and 90.51% Acc, respectively. Overall, TF-IDF produced balanced outcomes, while BoW offered higher Recall and Precision in specific cases. These results highlight the significance of both model and text representation choices in achieving optimal classification performance.Keywords : Sahte haber, ML, Metin Gösterimi, Önceden eğitilmiş
ORIGINAL ARTICLE URL
