Publication Details
Combination of MFCC and TRAP features for LVCSR of meeting data
Grézl František, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
speech recognition, TRAP, feature extraction, feature combination, hlda
he aim of this work is to examine TempoRAl Patterns (TRAPs) based feature extraction for the task of large vocabulary continuous speech recognition (LVCSR). Previously, TRAPs based features were mainly used in conjunction with hybrid NN-HMM recognition system (the conectionist approach). In this work, we use Tandem-TRAPS system to generate speech features, which are then used as an input for a standard GMM-HMM system. This approach allows for more precise modeling of phonetic context (context dependent models), which is important for LVCSR. Experiments are carried out on ICSI meetings database. For TRAPS processing, it is shown that use of frequency differentiation and local operators can significantly improve recognition performance. Performances obtained with TRAPs based features and convetional MFCC features are compared. Although stand-alone TRAPs based features never outperform MFCC in our experiments, we have reported an improvement over MFCC when TRAPs based features and MFCC features are combined together. The combined features are created by concatenation of the original feature streams followed by Heteroscedastic Linear Discriminant Analysis to perform decorelation and dimensionality reduction. Compared to previous works, the big advantage is brought by HLDA which combines the two feature streams optimally without strong assumptions imposed on data by previously used transforms (as PCA and LDA)