How Nuance tapped the cloud to bring voice recognition to mobile … and beyond

Devindra Hardawar September 8, 2013 8:00 AM

This guy really loves Nuance's voice tech.

Image Credit: Nuance

For Nuance, the cloud was its ticket for bringing its voice recognition technology into smartphones and tablets.

It used to be that you had buy the company’s popular Dragon Naturally Speaking software, install it (and pray your computer was fast enough to keep up), and spend even more time training it, before you could have anything close to decent voice dictation. That process wouldn’t fly with mobile devices, where we’re used to jumping into new apps in seconds. The obvious solution? Move the processing work to the cloud.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":806756,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"cloud,mobile,","session":"B"}']

Nuance has actually been offering cloud-powered interactive voice recognition (IVR) services (which enables things like voice activated customer help phone lines) for almost a decade, long before we started using the “cloud” term for practically everything powered by servers. A significant portion of the company’s healthcare transcription service is also powered by the cloud.

Editor’s note: Our upcoming CloudBeat conference, Sept. 9-Sept. 10 in San Francisco, will be tackling revolutionary cases of enterprise cloud usage. Register today!

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

So when the iPhone launched in 2007 and sparked the new smartphone craze, Nuance knew exactly what it had to do. The company formed some early deals to bring its voice recognition tech embedded into devices (powering things like voice dialing), but it also began replicate its experience with cloud-based IVR to bring large-scale voice recognition across mobile devices.

“As smartphones began to come out and you had the whole app metaphor launch across these devices, it became apparent very quickly that there’s a huge opportunity there,” Richard Mack, the company’s vice president of corporate communications, told me in a recent chat.

Nuance ended up powering the voice tech behind Apple’s Siri and Android solutions like Samsung’s S Voice (Nuance also contributed some open source code to the Android project early on, but products like Google Now are powered by Google’s own voice tech). Nuance went on to release its own mobile apps like Dragon Dictation, and it also launched Nina, a platform that lets any company implement Siri-like functionality directly within their apps.

According to Mack, Nuance also noticed that voice processing speed, accuracy, and overall performance rates were higher on its cloud-based platform, compared to the desktop. Mostly, that’s because Nuance can offer a consistent experience from its own servers — when the voice processing is happening on your own computer, a wealth of things could slow it down. Some of Nuance’s partners (like Apple) handle voice processing on their own servers, while others use a combination of their servers together with Nuance’s, Mack tells me.

As someone who used to spend hours installing Dragon, training users how to use it, and helping those users to manage their dictation profiles (which were incredibly valuable since they contained all of the user’s voice training), it’s astonishing just how simple things are in the voice arena today. I’ve dictated large portions of some of my reviews simply by reclining on the couch and speaking into Nuance’s dictation apps. I don’t have to worry about managing my voice profile manually — it’s all stored on Nuance’s servers — and training was minimal.

Nuance is also gearing up to go further into the cloud. The company is working on a new project, dubbed Wintermute, which will serve as a cloud-based personal assistant, and potentially even unify your voice profile across multiple devices. Eventually, Dragon Naturally Speaking on the desktop will also incorporate more cloud functionality, Mack tells me.

[aditude-amp id="medium1" targeting='{"env":"staging","page_type":"article","post_id":806756,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"cloud,mobile,","session":"B"}']

http://www.youtube.com/watch?v=A5pzzJaWfbo

Even though Nuance has been in the voice recognition business for some time now, Google is quickly ramping up its own efforts. Voice commands sit at the heart of Google Now, Google Glass, and the Moto X, and it has also hired renowned futurist Ray Kurzweil to work on language processing and artificial intelligence. (Kurzweil, it’s worth noting, founded the digital imaging company that would become ScanSoft, which ended up merging with Nuance in 2005.)

Mack doesn’t see Google as much of a threat, however.

“You need to look across the vertical markets, Google is by far the biggest competition we have in the mobile space, but I don’t see them being significant,” he said. “I don’t see them being interested in health care, or open-ended dictation on tablets and desktops … At the end of the day Google is still largely a search company, and they’re looking to build technology to power their search.”

[aditude-amp id="medium2" targeting='{"env":"staging","page_type":"article","post_id":806756,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"cloud,mobile,","session":"B"}']

He added, “Where I think we differ, is that people like Samsung, Apple, and others view us as someone with different motives and a different endgame, trying to build the best speech recognition.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

None Cloud Mobile