Building a better Jarvis

Open source speech recognition toolkit Kaldi now offers TensorFlow integration

Jane Elizabeth
© Shutterstock / studiostoks

The future is looking better and better for robot butlers and virtual personal assistants. Automatic speech recognition just got a little better as the popular open source speech recognition toolkit Kaldi now offers integration with TensorFlow.

Robot butlers and virtual personal assistants are a mainstay of science fiction. Who among us hasn’t wanted a Jarvis or a Rosie to help make their lives easier? Our fascination with this has helped fuel the boom in the current crop of virtual assistants, from Alexa to Siri and Cortana.

But, as anyone who has struggled to get their meaning across, the technology is far from perfect. In particular, voice recognition software has difficulties with people with accents or who do not speak English as a first language. That’s not even going into the difficulty of distinguishing purposeful language from a busy background or the ever elusive conversational syntax and diction issue. (They’re not super great with sarcasm, either.)

SEE MORE: TensorFlow 1.0 brings experimental APIs for Java and Go, XLA and more

Many speech recognition teams rely on Kaldi, the open source speech recognition toolkit. Now, it offers TensorFlow integration to help researchers and developers explore and deploy deep learning models in their Kaldi speech recognition pipelines. Thanks to this collaboration, the Kaldi community will be able to build better and more powerful voice recognition systems. It will also allow TensorFlow users a way to explore voice recognition with the help of the large Kaldi community. Win/win.

The trials and tribulations of automatic speech recognition (ASR)

Voice recognition isn’t easy. The traditional view of an ASR system is of a processing pipeline, where a series of modules operate on output from previous ones. Raw audio data enters at one end and a transcription of recognized speech comes out from the end of the pipeline. For Kaldi, these transcriptions are post-processed to support end-user applications.


Advances in machine learning and neural networks have revolutionized the field of ASR. Deep neural networks have replaced many of the existing ASR modules, leading to improved word recognition accuracy. Of course, these models require vast amounts of data in order to improve, which is simplified by TensorFlow.

However, there are still several challenges to developing ASR systems:

  • Algorithms – Deep learning algorithms need to tailored to the specific task at hand for the best results. However, it is difficult to change or adapt these algorithms once deployed.
  • Data – ASR systems need data like a fish needs water. Unfortunately, sometimes the appropriate data is insufficient or just not available.
  • Scale – Lots of data needs lots of power.

Integrating TensorFlow with Kaldi helps solve some of these problems. Deploying TensorFlow models into the Kaldi production modules is straightforward, making it easier for anyone working with Kaldi. Development time in Kaldi is reduced as well, thanks to TensorFlow’s tools and models.

SEE MORE: Yahoo open-sources TensorFlowOnSpark, brings deep learning to the masses

Additionally, this benefits TensorFlow developers, who not have an easy access to an ASR platform with a robust community. This integration allows TensorFlow users to directly incorporate existing speech processing pipelines from Kaldi cleanly into their ML applications.

The goal of this collaboration between Kaldi and TensorFlow is to bring two open source communities closer together and support future development and discoveries in the field of ASR. If you’re interested in using Kaldi with TensorFlow, check out the repo here! If you have a Kaldi setup and want to run TensorFlow, head here!

Jane Elizabeth
Jane Elizabeth is an assistant editor for

Inline Feedbacks
View all comments