Multimedia ResearchISSN:2582-547X

Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm

Volume 2 |
Issue 4 |
October 2019

Abstract

In recent days, from the speech signal the recognition of emotion is considered as an extensive advanced investigation subject because the speech signal is considered as the rapid and natural method to communicate with humans. Numerous examinations have been progressed related to this topic. This paper develops the emotions recognition from the speech signal in an accurate way, with the knowledge of numerous examined models. Therefore, to study the multimodal fusion of speech features, a Deep Convolutional Neural Network model is proposed. Moreover, the hybrid Genetic Algorithm (GA)-Grey Wolf Optimization (GWO) algorithm is presented that is the combination of both the GA and GWO technique features towards training the network. Finally, the developed recognition model is verified and compared with the existing techniques in correlation with diverse performance measures such as Accuracy, Sensitivity, Precision, Specificity, False Positive Rate (FPR), False Discovery Rate (FDR), False Negative Rate (FNR), F1Score, Negative Predictive Value (NPV), and Matthews correlation coefficient (MCC).

References

Ki-Seung Lee,"Speech enhancement using ultrasonic doppler sonar", Speech Communication, Volume 110, July 2019, Pages 21-32.
Johannes Stahl, Pejman Mowlaee,"Exploiting temporal correlation in pitch-adaptive speech enhancement",Speech Communication, Volume 111, August 2019, Pages 1-13.

Aaron Nicolson, Kuldip K. Paliwal,"Deep learning for minimum mean-square error approaches to speech enhancement", Speech Communication, Volume 111, August 2019, Pages 44-55

Q. He, F. Bao and C. Bao, "Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 457- 468, March 2017.

J. Ming and D. Crookes, "Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 531-543, March 2017.

R. Rehr and T. Gerkmann, "On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 357-366, Feb. 2018.

J. Stahl and P. Mowlaee, "A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 436- 450, Feb. 2018.

Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A.E.-D., Jin, W., Schuller, B., 2018. Deep learning for environmentally robust speech recognition: an overview of recent devel- opments. ACM Trans. Intell. Syst. Technol. 9 (5), 1–28.

Lu, X. , Tsao, Y. , Matsuda, S. , Hori, C. , 2013. Speech enhancement based on deep denoising autoencoder. In: Proceedings Interspeech 2013, pp. 436–440 .

Xia, Y., Stern, R., 2018. A priori SNR estimation based on a recurrent neural net- work for robust speech enhancement. In: Proc. Interspeech 2018, pp. 3274–3278. doi: 10.21437/Interspeech.2018-2423 .

Chen, J., Wang, D., 2017. Long short-term memory for speaker generalization in supervised speech separation. J. Acoust. Soc. Am. 141 (6), 4705–4714.

Xu, Y. , et al. , 2015. A regression approach to speech enhancement based on deep neu- ral networks. IEEE/ACM Trans. Audio Speech Lang. Process (TASLP) 23 (1), 7–19 .

Li, J.-j. , et al. , 2014. Whisper-to-speech conversion using restricted boltzmann machine arrays. Electron Lett. 50 (24), 1781–1782.

J. Deng, X. Xu, Z. Zhang, S. Frühholz and B. Schuller, "Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition,"IEEE Access, vol. 4, no. , pp. 4299-4309, 2016.

M. El Ayadi, M.S. Kamel and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, vol.44, pp.572-587, 2011.

D. D. Pukale, S. G. Bhirud and V. D. Katkar, "Content-based Image Retrieval using Deep Convolution Neural Network," 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, 2017, pp. 1-5.

E. Daniel, "Optimum Wavelet-Based Homomorphic Medical Image Fusion Using Hybrid Genetic–Grey Wolf Optimization Algorithm," in IEEE Sensors Journal, vol. 18, no. 16, pp. 6804-6811, 15 Aug.15, 2018.

Renjith Thomas and MJS. Rangachar,"Hybrid Optimization based DBN for Face Recognition using LowResolution Images",Multimedia Research, Volume 1, Issue 1, October 2018.

J.S. Anita and J.S. Abinaya,"Impact of Supervised Classifier on Speech Emotion Recognition",Multimedia Research, Volume 2, Issue 2, January 2019.

Multimedia ResearchISSN:2582-547X

Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm

Abstract

References

Access options

DOI : https://doi.org/10.46253/j.mr.v2i4.a2

Author information

Affiliations

Publisher Information

Speech; Recognition Model; Emotion; Neural Network; Optimization

Publisher : Resbee Info Technologies Pvt Ltd