Synchronizing text and audio. Is there a NLP/speech-to-text library to do this?

I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language.

Desired, but not required:

  • Open Source
  • Compatible with American English out-of-the-box
  • Cross-platform
  • Thoroughly documented

Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance.

What I've found so far:

Answers


Forced Alignment

It sounds like you want to do forced alignment between your audio and the known text.

Pretty much all research/industry grade speech recognition systems will be able to do this, since forced alignment is an important part of training a recognition system on data that doesn't have phone level alignments between the audio and the transcript.

Alignment CMUSphinx

The Sphinx4-1.0 beta 5 release of CMU's open source speech recognition system now includes a demo on how to do alignment between a transcript and long speech recordings.


Need Your Help

How to Update ONLY the Gridview in C#

c# asp.net gridview

I have gridview in my C# ASP.net application.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.