PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION

Mohammed Muntaz OSMAN; Osman BÜYÜK

PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION

Authors : Mohammed Muntaz OSMAN, Osman BÜYÜK

Pages : 2177-2191

View : 54 | Download : 11

Publication Date : 2021-10-05

Article Type : Research Paper

Abstract :Speech is an acoustic signal initiated at the inner end of the human vocal tract and radiated as an audio wave at the tip of the outer end. The structure and length of the vocal tract makes distinctions on features taken from speeches similar in content, but uttered by different speakers. As a person grows his/her vocal tract changes in length which in turn modifies speech characteristics gradually. The mel frequency cepstral coefficient insert ignore into journalissuearticles values(MFCC); which uses triangular band pass filter banks has been widely regarded as the most popular feature used in most speech processing applications. To improve the accuracy of speaker age classification a new spectral based feature set named as parabolic filter mel frequency cepstral coefficient insert ignore into journalissuearticles values(PFMFCC); is proposed in this study. PFMFCC uses parabolic band pass filter banks instead of the triangular ones. This feature extraction technique uses 30 parabolic band pass filter banks to extract 42 features from each speech frame of length 20 ms. These features are applied to three classical classifiers, namely the Gaussian mixture model insert ignore into journalissuearticles values(GMM);, cosine score, and probabilistic linear discriminant analysis insert ignore into journalissuearticles values(PLDA);. The aGender database consisting of 47 hours of German speech uttered by a total of 852 speakers is used in this study. The new PFMFCC feature achieved 51.01%, 56.01% and 58.14% accuracies with cosine score, GMM and PLDA classifiers respectively on the female dataset. Similarly it achieved 50.44%, 52.74% and 57.23% accuracies with cosine score, GMM and PLDA classifiers respectively on the male dataset. Using feature fusion of seven feature sets overall accuracies of 60.18%, 52.17% and 56.35% are obtained on cosine score, GMM and PLDA classifiers respectively for all the seven speaker age classes. The feature fusion has improved the overall accuracy by 2.55% using cosine score compared to a related speaker age classification study carried out on the same database previously
Keywords : Parabolic filter, feature fusion, speaker age, classification, accuracy

ORIGINAL ARTICLE URL