Multimedia ResearchISSN:2582-547X

An Approach for Speech Enhancement Using Deep Convolutional Neural Network

Volume 2 |
Issue 1 |
January 2019

Abstract

Speech is a primary and universal medium to communicate with each other. The additive or background noise present in the channel humiliates the signal quality. In order to minimize undesirable background noises, speech enhancement techniques have been introduced. Accordingly, this paper proposes a speech enhancement approach using Deep Convolutional Neural Network (DCNN). At first, the noise signal is appended with the hygienic speech signal and the noisy speech signal is generated. Then, the next step is the framing, in which the Fractional Delta-Amplitude Modulation Spectrogram (FD-AMS) features are extracted from the frames. Finally, the extracted features are provided as the input to the DCNN, which generates the optimized estimation of the speech signal. The proposed method is analyzed using NOIZEUS database based on the metrics, Perceptual Evaluation of Speech Quality (PESQ) and Root Mean Square Error (RMSE). Also, the comparative analysis is performed with the existing speech enhancement techniques. From the results, it is shown that the proposed method obtains maximum PESQ and minimum RMSE than the existing techniques, which shows the superiority of the proposed speech enhancement.

References

Arul Valiyavalappil Haridas, Ramalatha Marimuthu, and Basabi Chakraborty, "A Novel Approach to Improve the Speech Intelligibility Using Fractional Delta-amplitude Modulation Spectrogram," Cybernetics and Systems, vol. 49, no. 7, pp. 1-31, 2018.
Qi He, Feng Bao, and Changchun Bao, “Multiplicative update of auto-regressive gains for codebook- based speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 457–68. 2017.

Johannes Stahl and Pejman Mowlaee, "A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 26 , Issue: 2 , Feb. 2018 ) Page(s): 436 - 450

Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, and Tatsuya Kawahara, "Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 5, pp. 960 – 971, May 2019.

Amol Chaudhari and S. B. Dhonde, "A review on speech enhancement techniques," In Proceedings of the International Conference on Pervasive Computing (ICPC), Pune, India, 2015.

Neha Chadha, R.C. Gangwar, and Rajeev Bedi, "Current Challenges and Application of Speech Recognition Process using Natural Language Processing: A Survey," International Journal of Computer Applications, vol.131, no.11, pp. 28-31, December2015.

Navneet Upadhyay, Abhijit Karmakar, “An Improved Multi-Band Spectral Subtraction Algorithm for Enhancing Speech in Various Noise Environments”, In Proceedings of the International Conference on Design and Manufacturing, IConDM 2013, Elsevier, vol. 64, pp. 312-321, 13 November 2013.

Yi Zhang, Yunxin Zhao, “Real and imaginary modulation spectral subtraction for speech enhancement”, Journal on Speech Communication, Elsevier, vol. 55, pp. 509-522, 6 November 2012.

Dr. Shaila D. Apte, “Speech Processing Applications”, Speech and Audio Processing, Wiley India Edition.

Sonia Sunny, David Peter S, K Poulose Jacob, “A New Algorithm for Adaptive Smoothing of Signals in Speech Enhancement”, In Proceedings of the International Conference on Electronic Engineering and Computer Science, IERI Procedia, Elsevier, vol. 4, pp. 337–343, 12 December 2013. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.

Deng, F., C. C. Bao, and F. Bao., “A speech enhancement method by coupling speech detection and spectral amplitude estimation,” INTERSPEECH, Lyon, France, August 25–29, 3234–3238, 2013.

Lev-Ari, H., and Y. Ephraim, “Extension of the signal subspace speech enhancement approach to colored noise,” IEEE Signal Processing Letters, vol. 10, no. 4, pp. 104–106, 2003.

Li, C., and W.-J. Liu., “A novel multi-band spectral subtraction method based on phase modification and magnitude compensation,” In Proceedings of the IEEE International Conference on Acoustic, Speech, Signal Processing (ICASSP), Prague Congress Center, Prague, Czech Republic, May 22–27, 4760–4763, 2011.

Loizou, P. C., “Speech enhancement: Theory and practice,” Boca Raton, FL, USA: CRC Press, 2007.

Cohen, I., “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 5, pp. 466–475, 2003.

Martin, R., “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504–512, 2001.

Tobias, G., Y. Xin, J. M. M. Jessica, and B. Stefan, “Speech enhancement for hearingimpaired listeners using deep neural networks with auditory-model based features,” In Proceedings of the 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, August 16, 2300–2304, 2016.

Wood, S. U. N., J. Rouat, S. Dupont, and G. Pironkov, “Blind speech separation and enhancement with GCCNMF,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 4, pp. 745–55, 2017.

Healy, E. W., S. E. Yoho, Y. Wang, and D. Wang, “An algorithm to improve speech recognition in noise for hearing-impaired listeners,” Journal of Acoustical Society of America, vol. 134, no. 4, pp. 3029–3038, 2013.

Bhaladhare, P. R., and D. C. Jinwala, “A clustering approach for the l-diversity model in privacy preserving data mining using fractional calculus-bacterial foraging optimization algorithm,” Advances in Computer Engineering, 2014.

G. Sateesh Babu, P. Zhao, and X.-L. Li, “Deep Convolutional Neural Network Based Regression Approach for Estimation of Remaining Useful Life,” Lecture Notes Comp. Scie. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9642, pp. 214–228, 2016.

Loizou, P, “NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms,” http://ecs.utdallas.edu/loizou/speech/noizeus/ .

Plapous, C., C. Marro, and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2098–108. 2006.

Boll, S, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, 1979.

Nikolay, L., and K. Mikhail, “Non-negative matrix factorization with linear constraints for single-channel speech enhancement,” Journal of Computer Science, pp. 446–450, 2013.

S. R. Chintakindi, O. V. S. R. Varaprasad and D. V. S. S. S. Sarma, "Improved Hanning window based interpolated FFT for power harmonic analysis," TENCON 2015 - 2015 IEEE Region 10 Conference, Macao, 2015, pp. 1-5.

Q. Huang, C. Bao, X. Wang and Y. Xiang, "DNN-Based Speech Enhancement Using MBE Model," 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, 2018, pp. 196-200.

P. Hannon, M. Krini and I. Schalk-Schupp, "Advanced speech enhancement with partial speech reconstruction," 21st European Signal Processing Conference (EUSIPCO 2013), Marrakech, 2013, pp. 1-5.

B. Sridhar and M. Z. Ali Khan, "RMSE comparison of path loss models for UHF/VHF bands in India," 2014 IEEE Region Symposium, Kuala Lumpur, 2014, pp. 330-335.

D. Sharma, L. Meredith, J. Lainez, D. Barreda and P. A. Naylor, "A non-intrusive PESQ measure," 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, GA, 2014, pp. 975-978.

E. Jung, P. Chikontwe, X. Zong, W. Lin, D. Shen and S. H. Park, "Enhancement of Perivascular Spaces Using Densely Connected Deep Convolutional Neural Network," IEEE Access, vol. 7, pp. 18382-18391, 2019.

Multimedia ResearchISSN:2582-547X

An Approach for Speech Enhancement Using Deep Convolutional Neural Network

Abstract

References

Access options

DOI : https://doi.org/10.46253/j.mr.v2i1.a5

Author information

Affiliations

Publisher Information

Speech enhancement, framing, feature extraction, Fractional Delta-Amplitude Modulation Spectrogram, Deep Convolutional Neural Network.

Publisher : Resbee Info Technologies Pvt Ltd