Speech Emotion Feature Extraction and Classification
Main Article Content
Abstract
Emotion speech recognition has become a prominent research area in recent years due to its extensive applications across various industries. A key challenge in this field is extracting emotional states from the audio signal waveforms. Emotions serve as a way for individuals to communicate their moods or mental states to others. People experience a range of emotions, including sadness, joy, neutrality, disgust, anger, surprise, fear, and calmness. This paper focuses on extracting emotional features from audio waveforms. We have restructured an existing state-of-the-art multilayer perceptron (MLP) design to achieve this objective. The design features a hidden layer with 300 units, utilizes a batch size of 256, and restricts training to 500 epochs. The accuracy of the MLP classifier for speech emotion recognition reaches 95.65% when tested with a limited dataset composed of RAVDESS songs, which is significantly higher than the results reported in other studies in this field.