Speech recognition is one of those technologies that took a long time to get off the ground, but in today’s world of smartphones, wearables, and smart home devices, it’s suddenly everywhere.
From Siri and Google Now on smartphones to Android Wear devices like Samsung’s Gear Live to the latest efforts from the crowdfunding scene, such as the Vocca, we are surrounded by gadgets capable of understanding the spoken word like never before.
But before all of these gadgets entered our lives, there was Dragon Naturally Speaking software. Now in its 13th version for Windows and 4th generation for the Mac (under the label Dragon Dictate), the software is developed by Nuance, a company whose voice recognition technology also powers Siri and plenty of other services.
Despite the proliferation of speech recognition software for smartphones and tablets, there are still plenty of reasons to consider a full-blown version of this software for your computer. Let’s take a look at why.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
The Good: Accuracy. I mean like 99% accuracy.
If you’re like me, the most important feature of a speech recognition package is its ability to, well, recognize speech. Dragon Dictate 4 excels at this. I haven’t run previous versions of Dragon Dictate for Mac, but Nuance claims that version 4 is “up to 99% accurate” which they say is an improvement (though they don’t indicate how much of an improvement) over version 3 of the software.
Once you install the software, it invites you to set up a profile. It wants you to create a profile for each person who will be speaking, and ideally, whether they are actively dictating or being transcribed (more on that part later). After setting up your profile, you begin a 5-10 minute training exercise in which you familiarize Dragon Dictate with your voice by reading a set of pre-prepared passages. When that’s done, you’re good to go.
For dictation, you can choose from various input options, including your computer’s built-in mic, or the recommended (and included) Nuance-approved noise-canceling USB headset microphone. Or, if you really want to walk and talk, you can download the free Dragon Dictate Remote app for iOS or Android and babble away to your heart’s content anywhere your Wi-Fi will take you. Amazingly, after calibrating each of these options, I found them all to be highly accurate in a quiet environment, with the provided USB headset offering the best results overall.
How good is it? I dictated this entire review using Dragon Dictate and only had to make one or two minor corrections per paragraph in terms of misunderstood words. It was usually on words like “a” which the software kept hearing as “and.” I did do a fair amount of editing for things like capitalization and modifying symbols such as “%,” but I suspect that had more to do with my lack of familiarity with the proper ways to insert these than any deficiency in the program. If you want to see an amusing video of Dragon Dictate in action, check out David Pogue’s review.
This most recent version of Dragon is very impressive. But frankly, if the only reason you’re looking at speech recognition software is the ability to do some basic dictation, you’re probably better off with your system’s built-in software. Mac OS X Mountain Lion and Mavericks, for example, pack some very capable speech recognition — especially if you use the “Enhanced Dictation” option, which gives you the ability to do offline and continuous dictation. There are, however, three very strong reasons to look at spending the $199 for Dragon Dictate.
1. It gives you something known as Full Text Control. Dictating text is all very well and good, but if you’re serious about going hands-free, you need a way to adjust text that you’ve just dictated. See that word “adjust” in the previous sentence? I was able to go back and italicize it after I finished dictating the sentence by simply saying “italicize adjust.” While not available in all apps (the latest version of Pages being a notable exception), Full Text Control gives you an almost magical ability to change the appearance of any text you’ve dictated.
2. It goes beyond dictation. If you’re physically unable to do much in the way of typing or mousing, Dragon Dictate gives you access to a wide variety of system-level commands. Opening applications, saving files from within those applications, navigating system dialog windows — this can all be done through the mastery of Dragon Dictate’s voice commands. Better still, you aren’t limited to a specified set of commands as is the case with OS X’s voice recognition.
For instance, let’s say you wanted to be able to open Gmail. Normally, this would involve opening Safari (or another browser) and then navigating to Gmail.com. But by creating a custom command, you can immediately open this URL simply by saying “Open Gmail.”
Speaking of Gmail, if this is your email service of choice, you will love that Nuance has created a free plugin for Safari and Firefox that enables Full Text Control in these browsers. This means you can navigate Gmail entirely with your voice using commands such as “Click Compose,” “Next Field,” and “Click Send.”
3. True transcription ability. This feature is the main reason I was interested in trying out Dragon Dictate, and it’s one of the only areas where the Mac version of the software exceeds the PC version. And though it was a feature of version 3, Nuance has expanded its feature set as well as file compatibility. As a journalist, I spend a lot of time interviewing sources for stories both on the phone and in-person. Typically, I record the interview using my iPhone’s built-in Voice Memos app and then begin the painstaking task of working through the 30 minutes of audio to find the best quotes. It’s incredibly time consuming. So I was eager to try Dragon Dictate’s transcription service.
You start by either creating a new Profile (if the person you’re transcribing is not you) or by adding a transcription source to your existing profile. You then need to provide a sample audio file (supported formats are: mp3, aif, aiff, wav, mp4, m4a and m4v) of at least 90 seconds in duration.
Dragon Dictate will show you its best guess at what the speaker has said during the sample period. You then need to correct any mistakes it has made along the way so that it can “learn” what those problem words are. Once done, Dragon Dictate can then whisk through an entire audio file, converting speech to text with reasonable accuracy.
There is however, one big drawback: Unlike normal dictation, where the speaker utters punctuation words such as “comma” or “period,” there are no such indicators in a recorded interview. As you might expect, this means that your finished transcription ends up being one very long run-on sentence. Pages and pages of it. While this was good enough for my purposes (I just need to quickly identify interesting sound bites), it might seem frustratingly basic for someone who wants the equivalent of dictated text. (See Competition, below for a full example.)
The Bad: Price (and transcription if pushed to its limits)
Nuance has priced Dragon Dictate v4 for the Mac at the starting point of $199. You can buy other versions for $299 each, but these simply offer upgraded hardware (like a Bluetooth headset for wireless dictation) not better or different software features.
As I pointed out earlier, you can achieve decent — if not perfect — basic dictation with your existing OS, (if you’re running Mavericks), so $199 is buying you the other three capabilities: System control, Full Text Control, and audio file transcription. Which means that you’ll only get maximum value out of your investment if your intent is to go virtually hands-free on your Mac.
For me, the price would have been easily justified for the transcription portion alone had it not been for one glitch: You need to have really clean and clear audio for the transcription service to work even moderately well. My first attempt to use it was with an interview I had recorded by setting my iPhone next to my landline speaker phone. Unfortunately, the person I was interviewing was on his cellphone, which dropped the call quality substantially — but since this is the new normal for me when doing interviews, I had hoped that Dragon Dictate would be up to the task.
Dragon Dictate couldn’t understand it at all. I had to correct every single word in the sample audio portion and even then, it still couldn’t make out what my subject was saying. Nuance suggests that the transcription function is ideal for capturing lectures if you’re a student, or speeches. That might well be the case, but in order for Dragon Dictate to deliver on that promise, you’ll need to set up your recording device as close to your source as possible.
In fairness, Nuance does say that the transcription feature is only meant for single-speaker recordings, so using it as an interview transcription tool might not be a fair test of its abilities. As you’ll see below, the transcription function can be very accurate when used as intended.
The Competition
Nuance is so dominant in the speech recognition space that its technology is baked into Apple’s Siri and similar products from other companies. There have even been rumors that Apple might buy Nuance to prevent that technology from falling into rival Samsung’s hands.
The company is dominant to the point that, as I mentioned above, there isn’t a competitive product that does everything Dragon Dictate can do. In fact, as far as the Mac goes, there aren’t any true speech recognition competitors at all (it seems Nuance has a habit of buying them and then closing and/or integrating them).
But there are a few inexpensive or free products that will give you dictation capabilities. This can also be used as a real-time transcription service from a recorded audio file if you place the playback device near your computer’s mic.
Google “transcription software for Mac” and “free transcription software” and “Mac voice recognition software,” and you’ll come up these options:
Wreally Transcribe: a plug-in for Chrome that is free to try for seven days but afterwards costs $20 if you want the full version. The interface is very simple — deceivingly so. It looks like someone mashed together an audio player with a text editor beneath it. I almost missed the button that lets you use it as a speech-recognition transcriber, which leverages the speech-recognition engine built into Chrome (note: it must have an Internet connection).
Dictation.io Chrome App: a bare bones but free speech-recognition app that also uses the speech-recognition engine built into Chrome and also needs an Internet connection.
Of course you don’t really need to look for additional software. Mac OS X Mavericks comes with “Enhanced Dictation”: it’s free, obviously, and doesn’t need an Internet connection. (Dictation in OS X Mountain Lion does require an Internet connection, and it isn’t as robust as Mavericks’ version.)
To test all three of these against Dragon Dictate, I recorded nine minutes of sample audio from none other than Lieutenant Hikaru Sulu — otherwise know as George Takei — using the built-in Voice Memos app on my iPhone. You can listen to what Takei actually said here. Using the first 90 seconds as a test case, here’s what each option produced:
Wreally Transcribe Chrome App:
isaura to the galaxy driving used starship with a qu made of people from all over the world many different races different cultures many different heritages ambition was the Explorer Street 200 feet. You like to do bows legal window well in the grandson of immigrants from Japan America boldly going to be new opportunities my mother was born in Sacramento California my father was the San Franciscan and married in Los Angeles and I was born i4 years old when Pearl Harbor was bombed on December 7 1941 by Japan overnight the world was plugged into a world war America Sudley Japanese Americans American citizens of Japanese ancestry dawn with suspicion and fear and I was like a tree
Dictation.io Chrome App
do so cm driving used starship with a crew made up of people from all over the world many different races make different cultures many different heritages all working together ambition what’s the Explorer Street Zeros new life new civilizations to boldly go before im the grandson of immigrants from Japan lands of America bo Diddley going to the world you are my mother was born in Sacramento California San Francisco married in Los Angeles and I was born I was 4 years old when Pearl Harbor was bombed on December 7 1941 by Japan overnight the world into a world war Americas outlet Japanese Americans American citizens of Japanese ancestry dawn with suspicion and fear and I was like a tree
Mac OS X Mavericks Enhanced Dictation
Of the Starship Enterprise I sort through the galaxy driving a huge starship with a group made up of people from all over this world many different races many different cultures many different heritages all working together at automation was to explore strange new worlds to seek out
The new civilizations to boldly go where no one has gone before well I am the grandson of immigrants from Javad gate length of America boldly going to a strange New World seeking new opportunities my mother was born in Sacramento California I father was a San Francisco they met and married in Los Angeles and I was born there I was four years old when Pearl Harbor was bombed on December 7, 1941 by Japan and overnight the world was plugged into world war America suddenly was swept up by Mr. Japanese-Americans American citizens of Japanese ancestry or looked on with suspicion and see what outright hatred
Dragon Dictate for Mac v4
It’s fairly clear from this comparison that Dragon Dictate has the edge, even before any training has been done to help it better understand Takei’s speech (though really, if you can’t understand that man’s perfect diction and dulcet tones, there’s something wrong.)
Training Dragon Dictate to work with the full audio file involves correcting mistakes in the 60 seconds of sample audio, which is easy thanks to the pop-up windows that let you listen to discrete chunks of audio to see if it matches the transcribed text.
Once I corrected the minor errors in the sample passage, Dragon Dictate proceeded to transcribe the full nine minutes of audio. Here’s how it performed:
Of the Starship Enterprise I soared through the galaxy driving a huge Starship with a crew made up of people from all over this world many different races many different cultures many different heritages all working together our mission was to explore strange new worlds to seek out new life and new civilization boldly go where no one has gone before well I am the grandson of immigrants from Japan who came to America boldly going to a strange new world seeking new opportunities my mother was born in Sacramento California my father was a San Franciscan they met and married in Los Angeles and I was born there I was four years old when Pearl Harbor was bombed on December 7 1941 by Japan an overnight the world was plunged into world America suddenly was swept up by mysterious Japanese-Americans American citizens of Japanese ancestry what looked on with suspicion and fear and with outright hatred simply because we happened to look like the people that bombed Pearl Harbor and the hysteria grew and grew until on February 1942 the President of the United States Franklin Delano Roosevelt ordered all Japanese-Americans on the West Coast of America to be summarily rounded up with no charges with no trial with no due process due process is the core pillar of our justice system that all disappeared we were to be rounded up and imprisoned in 10 Barb wire prison camps in some of the most desolate places in America the blistering hot desert of Arizona this sultry swamps of Arkansas the wastelands of Wyoming Idaho Utah Colorado and to have the most desolate places in California on April 20 I celebrated my fifth birthday and just a few weeks after my birthday my parents got my younger brother my baby sister me up very early one morning and they dressed us have my brother and I were in the living room looking out the front window and resolve to soldiers marching up hard drive they carried bayonets on their right they stopped at the front porch and then on the door my father answered it and the soldiers order this out of our home I father gave my brother needs small luggage is Terry and we walked out and stood on the driveway waiting for our mother to come out and when my mother finally came out she had our baby sister and one arm a huge duffel bag and the other and tears were streaming down both her cheeks I will never be able to forget that’s the it is burned into my member we were taken from our home and loaded onto train cars with other Japanese-American families there are guards stationed at both ends of each car as if we were criminals we were taken two thirds of the way across the country rocking on that train for four days and three nights to the swamps of Arkansas I still remember the barb wire fence confined I remember the tall century tower with the machine guns pointed at us I remember the Searchlight it followed me when I made the night runs for my barracks to the latrine but the five-year-old me I thought it was kind of nice that they let the weight community I was a child too young to understand the circumstances of my being there children are amazingly adaptable what would be grotesquely abnormal became my normality in prisoner of war camps it became routine for me to live that three times a day three lousy food in the noisy meso it became normal for me to go with my father to bathe in the mass shop being in a person a barb wire. Became my normality when the war ended we’ve ever since and given a one-way ticket to anywhere in the United States my parents decided to go back home to Los Angeles but Los Angeles was not a welcoming place we were pennies everything had been taken from us and the hostility was intense our first home was on skid row in the lowest part of our city living with derelicts drunkards and crazy people the stench of urine all of on the street in the alley in the hall it was a horrible experience and for us kids it was terrorized I remember once a drunkard came staggering down fell down right in front of us and through my baby sister said Mama let’s go back home because behind barbed wires was for us hope my parents worked hard to get back on their feet we’d lost everything they
Obviously, it’s not perfect. There are mistakes in terms of both accuracy and omission of words. But given that the text above was generated in just a few moments, with less than 5 minutes’ worth of training from me, the results are pretty amazing.
My only criticism is that once Dragon Dictate exits training mode and moves on to the full transcription of the audio file, there is no way to select specific chunks of transcribed text to hear the match audio portion (which is how the training section works). Given that Nuance is no doubt aware that transcriptions won’t be perfect, it would sure be nice to have a quick way to check the original recording.
Conclusion
Nuance Dragon Dictate 4 for Mac is very powerful speech recognition software that not only provides greater accuracy in converting speech to text than competitive products, it also offers a suite of voice-driven tools that enable a nearly hands-free computing experience on the Mac.
Still some features can be a bit buggy and others, such as audio file transcription, may not fulfill a need perfectly.
The $199 price point may scare some away, and if you’re looking for straight dictation, it’s too expensive. But if you’re shopping for a full suite of speech recognitions tools that includes the ability to transcribe audio files in faster-than-real-time speed, you won’t find a better product at any price.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More