Samsung and IBM show how Watson has improved conversational speech recognition

Dean Takahashi @deantak April 28, 2016 10:00 AM

Watson

IBM has made a big leap forward in the ability of its Watson artificial intelligence computer to recognize conversational speech. Last year, IBM was able to hold conversations in which the AI recognized English conversational speech with an 8 percent word error rate. Now IBM’s Watson team has been able to knock the word error rate down to 6.9 percent.

Above: IBM Watson logo

Image Credit: IBM

The achievement shows that AI is getting smarter and smarter — and that we’re all going to be replaced by robots some day. IBM Watson general manager David Kenny announced the breakthrough in Watson’s conversational capabilities for developers at the Samsung Developer Conference in San Francisco today.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1936885,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,","session":"A"}']

The Watson team included Kenny, Tom Sercu, Steven Rennie, and Jeff Kuo. Watson had its finest moment in 2011 when it beat the reigning human champion on the “Jeopardy” television quiz show.

To put this result in perspective, back in 1995, a “high-performance” IBM recognizer achieved a 43 percent error rate. Spurred by a series of Defense Advanced Research Projects Agency evaluations in the past couple of decades, IBM’s system improved steadily. Most recently, the advent of deep neural networks was critical in helping achieve the 8 percent and 6.9 percent results, said George Saon, lead scientist in the IBM Watson Group, in a post. The ultimate goal is to reach or exceed human accuracy, which is estimated to be around 4 percent on this task, dubbed the Switchboard.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

IBM said it made improvements in both acoustic and language modeling.

“On the acoustic side, we use a fusion of two powerful, deep neural networks that predict context-dependent phones from the input audio,” Saon said. “The models were trained on 2000 hours of publicly available transcribed audio from the Switchboard, Fisher, and CallHome corpora.”

Saon added, “We are currently working on integrating these technologies into IBM Watson’s state-of-the-art speech-to-text service. By exposing our acoustic and language models to increasing amounts of real-world data, we expect to bridge the gap in performance between the ‘lab setting’ and the deployed service.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

None Business