Training kaldi models with custom features
Kaldi Speech Recognition Toolkit is a freely available toolkit that offers several tools for conducting research on automatic speech recognition (ASR). It lets us train an ASR system from scratch all the way from the feature extraction (MFCC,FBANK, ivector, FMLLR,…), GMM and DNN acoustic model training, to the decoding using advanced language models, and produce state-of-the-art results.
While kaldi offers so much flexibilty at every stage, sometimes we also need to play with features that are not offered by the kaldi repository. Kaldi makes use of ark format to store the features. If we want to perform experiments with customized features, they must be converted to the ark format first. The goal of this post is to explain how we can extract and store the custom features in the ark format using matlab and python.