Methods for Capturing Spectro-Temporal Modulations in Automatic Speech Recognition

Download Article:

download Methods for Capturing Spectro-Temporal Modulations in Automatic Speech Recognition Download
(PDF 1,151.6 kb)

Source: Acta Acustica united with Acustica, Volume 88, Number 3, May/June 2002, pp. 416-422(7)

Publisher: European Acoustics Association

Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are presented, all of which target at capturing spectro-temporal modulations. By deriving secondary features from the output of a perception model the tuning of neurons towards different envelope fluctuations is modeled. The following types of secondary features are introduced: product of two or more windows (sigma-pi cells) of variable size in the spectro-temporal representation, fuzzy-logical combination of windows and a Gabor function to model the shape of receptive fields of cortical neurons. The different approaches are tested on a simple isolated word recognition task and compared to a standard Hidden Markov Model recognition system. The results show that all types of secondary features are suitable for ASR. Gabor secondary features, in particular, yield a robust performance in additive noise, which is comparable and in some conditions superior to the Aurora 2 reference system.

Document Type: Research Article

Publication date: 01 May 2002

Access Key
Free content
Partial Free content
New content
Open access content
Partial Open access content
Subscribed content
Partial Subscribed content
Free trial content

Methods for Capturing Spectro-Temporal Modulations in Automatic Speech Recognition

Download Article:

Sign-in

Tools

Share Content