Proving the case both for HTML5 development and Chrome 11’s nifty new speech input functionality, developer Robert Oschler has managed to put together a demo of voice-driven YouTube controls in just three days.

While far from a final release, the demo shows just how easy it is to do something as complicated as voice commands on a web page with seemingly little effort. It’s also reminiscent of the YouTube Instant demo created by Stanford student Feross Aboukhadijeh, which ended up landing him a job offer from Google (though admittedly, YouTube Instant required much more hacking).

Oschler wrote to VentureBeat in an email:

I was able to knock together a demo in just 3 days that provided a simple yet full featured YouTube video search demo driven by speech input, using HTML5 and Chrome 11’s (beta) HTML5 speech input integration via Google’s speech servers.  The actual speech integration was achieved with a single html INPUT element that carried the webkit tag for speech input. The rest of the time was spent learning the YouTube REST API and adding some Javascript to facilitate the demo.

The voice commands allow you to search, play and pause videos.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Specifically, Oschler tells me that he relied heavily on the YouTube Data API, which he had never worked with before. The actual speech input feature in Chrome required only one line of HTML code to implement: <input name="speechInput1" id="speechInput1" size=64 type="text" x-webkit-speech />

He went on to describe what this one line of HTML accomplishes:

The “x-webkit-speech” attribute tells Chrome that the INPUT box is to be managed as a speech input box. With that one line of HTML, with no effort required on your part, Chrome does all this for you :

– Paints a microphone icon next to the INPUT box for the user to press to activate speech recognition
– Puts up a series of small floating information bubbles that: tell the user when to start talking and shows the volume of the speech as the user talks, puts up a status box as the recorded audio input is being sent to Google’s speech server’s, and if an error occurs offers the user the option to try again or cancel the speech input.
– Does all the work in: recording the audio from the user’s microphone, contacting Google’s speech recognition servers and managing the sending of the recorded audio, and waiting for the results from the speech servers, which are displayed in the INPUT element once it is received from Google’s servers.

Web developers already exploring the new voice input feature are likely familiar with its capabilities — still, it’s another thing to see an independent developer actually building something with the code.

Photo via thms.nl on Flickr

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More