Eliminating the negative effect of highly non-stationary environmental noise is a long-standing research topic for speech recognition but remains an important challenge nowadays. To address this issue, traditional unsupervised signal processing methods seem to have touched the ceiling. However, data-driven based supervised approaches, particularly the ones designed with deep learning, have recently emerged as potential alternatives. In this light, we are going to comprehensively summarise the recently developed and most representative deep learning approaches to deal with the raised problem in this article, with the aim of providing guidelines for those who are going deeply into the field of environmentally robust speech recognition. To better introduce these approaches, we categorise them into single- and multi-channel techniques, each of which is specifically described at the front-end, the back-end, and the joint framework of speech recognition systems. In the meanwhile, we describe the pros and cons of these approaches as well as the relationships among them, which can probably benefit future research. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Advertisements