- Turkish Journal of Electrical Engineering and Computer Science
- Volume:25 Issue:1
- An online approach for feature selection for classification in big data
An online approach for feature selection for classification in big data
Authors : NASRIN BANU NAZAR, RADHA SENTHILKUMAR
Pages : 163-171
View : 19 | Download : 7
Publication Date : 0000-00-00
Article Type : Research Paper
Abstract :Feature selection insert ignore into journalissuearticles values(FS);, also known as attribute selection, is a process of selection of a subset of relevant features used in model construction. This process or method improves the classification accuracy by removing irrelevant and noisy features. FS is implemented using either batch learning or online learning. Currently, the FS methods are executed in batch learning. Nevertheless, these techniques take longer execution time and require larger storage space to process the entire dataset. Due to the lack of scalability, the batch learning process cannot be used for large data. In the present study, a scalable efficient Online Feature Selection insert ignore into journalissuearticles values(OFS); approach using the Sparse Gradient insert ignore into journalissuearticles values(SGr); technique was proposed to select the features from the dataset online. In this approach, the feature weights are proportionally decremented based on the threshold value, which results in attaining zeros for the insignificant features` weights. In order to demonstrate the efficiency of this approach, an extensive set of experiments was conducted using 13 real-world datasets that range from small to large size. The results of the experiments showed an improved classification accuracy of 15%, which is considered to be significant when compared with the existing methods.Keywords : Data analysis, data preprocessing, big data analytics, feature selection, online learning