A Turkish Word Frequency Tool: LexiTR Frequency
Authors : Taner Sezer, Özay Karadağ
Pages : 266-276
Doi:10.16916/aded.1636416
View : 262 | Download : 273
Publication Date : 2025-04-30
Article Type : Research Paper
Abstract :Word frequency is a fundamental concept in linguistics, computational linguistics, natural language processing (NLP) and language education. Word frequency plays a critical role in understanding the characteristics and usage patterns of a word. This study introduces the \\\"Turkish Word Frequency Tool\\\" (TWFT), developed as part of the LexiTR Project, along with its features. TWFT is based on a balanced corpus consisting of over 193 million words from four distinct text types: academic, social media, fictional, and informative texts. TWFT serves a scalable online platform that provides researchers with the ability to examine word usage trends across different text types. It enables comprehensive analyses through real-time querying, graphical data representation, and both raw and normalized frequency values. Additionally, it provides API support, presenting word frequency information in a structured format. By filling a significant gap in the existing literature, TWFT aims to establish a consistent, transparent, and comprehensive foundation for linguistic research and natural language processing applications.Keywords : Sıklık, sözcük listesi, birimlendirme, TS Tokenizer, LexiTR