- Fırat Üniversitesi Mühendislik Bilimleri Dergisi
- Cilt: 37 Sayı: 1
- Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distort...
Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP
Authors : Hakkı Halil Babacan, Ramazan Oğuz, Yahya Kemal Beyitoğlu
Pages : 83-92
Doi:10.35234/fumbd.1469178
View : 79 | Download : 41
Publication Date : 2025-03-27
Article Type : Research Paper
Abstract :Cognitive distortions are thought errors that lead individuals to perceive reality in a misleading way and are strongly associated with psychopathologies. Therefore, accurately identifying and classifying distortions can enhance the effectiveness of cognitive-behavioral therapy (CBT). This study investigates the effectiveness of deep learning and NLP techniques for the automatic detection of cognitive distortions. The RoBERTa model was trained using English synthetic data generated by GPT-4 (2000 examples) and the dataset from Shreevastava and Foltz (1590 cognitive distortion examples, 933 non-distortion examples). Three scenarios were tested: the original dataset, the synthetic dataset, and their combination. The results showed that synthetic data is a strong resource. Accuracy rates were 60.67% (original), 94.51% (synthetic), and 77.18% (combined). The GPT-4-based dataset provided almost perfect F1 scores, particularly in some categories. ROC curve analyses showed that the GPT-4 dataset had the highest AUC value (0.80). The study revealed that using synthetic data expands the potential of AI applications in clinical psychology and offers a way to develop effective models while preserving patient privacy. Future research should test synthetic data with different models and compare it with real clinical data.Keywords : Bilişsel çarpıtma, makine öğrenimi, doğal dil işleme, GPT-4, depresyon
ORIGINAL ARTICLE URL
