Microsoft releases Cortana-like speech-to-text technology to select developers

Microsoft today announced a new private preview of the Custom Recognition Intelligence Service (CRIS), a highly customizable tool that can give applications Siri-like speech-to-text functionality. Also today, Microsoft is opening up public previews for two sets of application programming interfaces (APIs) that offer developers technology that can understand who’s talking in audio recordings and what shows up in videos.

All of this technology falls under Project Oxford, an initiative to give third-party developers access to the artificial intelligence that Microsoft has built up through the years. Google is also moving down this path, for instance with the release of the Cloud Vision API.

Microsoft announced an emotion detection tool in Project Oxford last month and also announced that the public beta for speaker recognition would be available by the end of the year. Now that’s available, according to a blog post today from Microsoft technology and research senior program manager Ryan Galgon. The speech APIs can both verify and identify speakers, while the video APIs can track faces, detect motion for stationary backgrounds, and stabilize video content.

But the more interesting tool here is CRIS. Here’s the high-level description Microsoft provided last month:

This tool … makes it easier for people to customize speech recognition for challenging environments, such as a noisy public space. For example, a company could use it to help a team better use speech recognition tools while working on a loud shop floor or busy shopping center. It also could be used to help an app better understand people who have traditionally had trouble with voice recognition, such as non-native speakers or those with disabilities.

When developers sign up to use the service, Microsoft asks if they’re familiar with speech to text technologies like HTK, Kaldi, and SRILM, or are merely users of personal digital assistant technologies from Google, Apple, or, of course, Microsoft itself.

Indeed, as Galgon mentioned, “The past few years witnessed tremendous improvement in the performance of speaker recognition systems.” Now developers will be able to take advantage of the technology in this area that Microsoft has put together.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

The insights you need without the noise