Publication Details
Temporal processing for feature extraction in speech recognition, shortened version of habilitation thesis
automatic speech processing, speech recognition, features for speech recognition, temporal filtering, neural networks, data-driven techniques
Temporal processing for feature extraction in speech recognition
Speech recognition is a booming research field, having large number of applications in telecommunications (especially mobile), automobile industry, consumer electronics, military and security, etc. Speech recognition systems are classically built from three basic blocks: feature extraction, acoustic matching and language modeling. While the last two are trained on data (annotated databases for acoustics and large speech corpora for the LM), feature extraction block is often neglected and most often, mel-frequency cepstral coefficients (MFCC) are used. This work concentrates on two techniques that should improve the feature extraction. The first one is temporal filtering of feature trajectories using filters designed on data using Linear Discriminant Analysis (LDA). This technique is shown to improve the recognition accuracy of isolated Czech words, confirming previous results on US-English obtained by our colleagues from OGI Portland. The second part of the work concentrates on more revolutionary approach of feature extraction using TRAPs (temporal patterns) whose fundamentals were also laid at OGI. Several experiments were conducted on three databases during author's stay at OGI. Although we have shown that TRAPs are comparable to MFCC's only on a small vocabulary recognition task, we believe that combination of frequency-band processing and neural nets will become very important in the next decade, and that they will become standard blocks of feature extraction.
@INBOOK{FITPUB7240, author = "Jan \v{C}ernock\'{y}", title = "Temporal processing for feature extraction in speech recognition, shortened version of habilitation thesis", pages = "1--30", booktitle = "V\v{e}deck\'{e} spisy VUT", series = "Edice Habilita\v{c}n\'{i} a inaugura\v{c}n\'{i} spisy, sv. 112", year = 2003, location = "Brno, CZ", publisher = "Publishing house of Brno University of Technology VUTIUM", ISBN = "80-214-2395-1", language = "english", url = "https://www.fit.vut.cz/research/publication/7240" }