Engineering World
E-ISSN: 2692-5079 An Open Access, Peer Reviewed Journal of Selected Publications in Engineering and Applied Sciences
Volume 6, 2024
Hubert-LSTM: A Hybrid Model for Artificial Intelligence and Human Speech
Author:
Abstract: Speech emotion recognition (SER) is a critical component of human-computer interaction, facilitating seamless communication between individuals and machines. In this paper, we propose a hybrid model, integrating Hubert, a cutting-edge speech recognition model, with LSTM (Long Short-Term Memory), known for its effectiveness in sequence modeling tasks, to enhance emotion recognition accuracy in speech audio files. We explore the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for our investigation, drawn by its complexity and open accessibility. Our hybrid model combines the semantic features extracted by Hubert with LSTM’s ability to capture temporal relationships in audio sequences, thereby improving emotion recognition performance. Through rigorous experimentation and evaluation on a subset of actors from the RAVDESS dataset, our model achieved promising results, outperforming existing approaches, with a maximum accuracy of 89.1 %.
Search Articles
Pages: 159-169
DOI: 10.37394/232025.2024.6.17