Archives of Acoustics, 44, 3, pp. 429–437, 2019
10.24425/aoa.2019.129259

Speech Enhancement Using Sliding Window Empirical Mode Decomposition and Hurst-based Technique

Selvaraj POOVARASAN
Bharathiar University
India

Eswaran CHANDRA
Bharathiar University
India

The most challenging in speech enhancement technique is tracking non-stationary noises for long speech segments and low Signal-to-Noise Ratio (SNR). Different speech enhancement techniques have been proposed but, those techniques were inaccurate in tracking highly non-stationary noises. As a result, Empirical Mode Decomposition and Hurst-based (EMDH) approach is proposed to enhance the signals corrupted by non-stationary acoustic noises. Hurst exponent statistics was adopted for identifying and selecting the set of Intrinsic Mode Functions (IMF) that are most affected by the noise components. Moreover, the speech signal was reconstructed by considering the least corrupted IMF. Though it increases SNR, the time and resource consumption were high. Also, it requires a significant improvement under nonstationary noise scenario. Hence, in this article, EMDH approach is enhanced by using Sliding Window (SW) technique. In this SWEMDH approach, the computation of EMD is performed based on the small and sliding window along with the time axis. The sliding window depends on the signal frequency band. The possible discontinuities in IMF between windows are prevented by the total number of modes and the number of sifting iterations that should be set a priori. For each module, the number of sifting
iterations is determined by decomposition of many signal windows by standard algorithm and calculating the average number of sifting steps for each module. Based on this approach, the time complexity is reduced significantly with suitable quality of decomposition. Finally, the experimental results show the considerable improvements in speech enhancement under non-stationary noise environments.
Keywords: Speech Enhancement; Empirical Mode Decomposition; Intrinsic Mode Functions; Hurst exponent; Sliding Window EMD
Full Text: PDF

References

Chatlani N., Soraghan J.J. (2012), EMD-based filtering (EMDF) of low-frequency noise for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1158–1166.

Dwijayanti S., Yamamori K., Miyoshi M. (2018), Enhancement of speech dynamics for voice activity detection using DNN, EURASIP Journal on Audio, Speech, and Music Processing, 2018, 10, 15 pages.

Gerkmann T., Hendriks R.C. (2012), Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1383–1393.

Ghahabi O., Zhou W., Fischer V. (2018), A robust voice activity detection for real-time automatic speech recognition, [in:] Proceedings of ESSV 2018, Ulm, Germany.

Hamid M.E., Das S., Hirose K., Molla M.K.I. (2012), Speech enhancement using EMD Based Adaptive Soft-Thresholding (EMD-ADT), International Journal of Signal Processing, Image Processing and Pattern Recognition, 5, 2, 1–16.

Hawaldar S., Dixit M. (2011), Speech enhancement for non-stationary noise environments, Signal Image Processing, 2, 4, 129–136.

Ji Y., Baek Y., Park Y.C. (2017), Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 1, 25.

Jin Y.G., Shin J.W., Kim N.S. (2017), Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement, The Journal of the Acoustical Society of America, 141, 3, EL228–EL233.

Kasap C., Arslan M.L. (2013), A unified approach to speech enhancement and voice activity detection, Turkish Journal of Electrical Engineering Computer Sciences, 21, 2, 527–547.

Khaldi K., Boudraa A.O., Komaty A. (2014), Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator, The Journal of the Acoustical Society of America, 135, 1, 451–459.

Kulkarni D.S., Deshmukh R.R., Shrishrimal P.P. (2016), A review of speech signal enhancement techniques, International Journal of Computer Applications, 139, 14, 23–26.

Mai V.K., Pastor D., Aïssa-El-Bey A., Le-Bidan R. (2015), Robust estimation of on-stationary noise power spectrum for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 23, 4, 670–682.

Mandic D.P., Rehman N.U., Wu Z., Huang N.E. (2013), Empirical mode decomposition-based time-frequency analysis of multivariate signals: the power of adaptive data analysis, IEEE Signal Processing Magazine, 30, 6, 74–86.

Mert A., Akan A. (2014), Detrended fluctuation thresholding for empirical mode decomposition based denoising, Digital Signal Processing, 32, 48–56.

Pasad A., Sabu K., Rao P. (2017), Voice activity detection for children's read speech recognition in noisy conditions, [in:] 2017 IEEE Twenty-third National Conference on Communications (NCC), pp. 1–6, March 2–4 , Chennai, India.

Shen L., Yin Q., Zhang Q., Lu M., Liu Z., Zhen H. (2012), Speech enhancement using EMD in low SNR environment. In IEEE Proceedings of the 2012 Second International Conference on Electric Technology and Civil Engineering, pp. 2588–2592, May 18–20.

Soni M.H., Shah N., Patil H.A. (2018), Time-frequency masking-based speech enhancement using generative adversarial network, [in:] 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5039–5043.

Taal C.H., Hendriks R.C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 2125–2136.

Vihari S., Murthy A.S., Soni P., Naik D.C. (2016), Comparison of speech enhancement algorithms, Procedia Computer Science, 89, 666–676.

wa Maina C., MacLaren Walsh J. (2011), Joint speech enhancement and speaker identification using approximate Bayesian inference, IEEE Transactions on Audio, Speech, and Language Processing, 19, 6, 1517–1529.

Zao L., Coelho R. (2011), Colored noise based multicondition training technique for robust speaker identification, IEEE Signal Processing Letters, 18, 11, 675–678.

Zao L., Coelho R., Flandrin P. (2014), Speech enhancement with EMD and hurst-based mode selection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 5, 899–911.

Zeiler A., Faltermeier R., Keck I.R., Tomé A.M., Puntonet C.G., Lang E.W. (2010), Empirical mode decomposition – an introduction, [in:] 2010 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 18–23 July, Barcelona, Spain.

Zhang Y., Tang Z.M., Li Y.P., Luo Y. (2014), A hierarchical framework approach for voice activity detection and speech enhancement, The Scientific World Journal, 2014, Article ID 723643, 8 pages.

Zhao Y., Zhao X., Wang B. (2014), A speech enhancement method based on sparse reconstruction of power spectral density, Computers Electrical Engineering, 40, 4, 1080–1089.




DOI: 10.24425/aoa.2019.129259

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)