A General Survey of

SPEECH RECOGNITION PROGRAMS

(Updated September 5, 2016)

 

Itamar Even-Zohar

For a comparison of the SR application see Table #1.

For a list of available languages see Table #2.

For other tables see Comparative Tables.

 

Table of Contents

Languages served by Speech Recognition

Manufacturers of Speech Recognition

A Historical Note

Microsoft Speech Recognition: Office 2003, Vista and Windows 7

Discussion Forums / Support Groups for MS Speech

A Comparative Evaluation

Factors involved with SR Functionality

Concluding Remarks

Information

 

Languages served by Speech Recognition

Speech Recognition (“SR”) is available for a limited number of languages. There are competing programs for the following:

For Windows

English (US, UK, Australian, South East Asian, Indian), Spanish, French, German, Dutch, Italian, Mandarin Chinese (Simplified Chinese and Traditional Chinese), Japanese, and Swedish.

 

SR applications for Brazilian Portuguese, Catalan, Austrian German, and Arabic have been discontinued.

 

(For a complete list of, and details about the available languages for Windows see Table #2.)

For Macintosh

English (US and UK), German, Italian, and Spanish.

 

For Android and iPhone

Nuance provides full speech recognition for both Android and iPhone with its various Dragon speech versions. For Android, look for “Swipe & Dragon”. It provides SR in many more languages than those available for either Windows or Mac computers (for example, Arabic, Hebrew, Scandinavian languages, and more).

Manufacturers of Speech Recognition

For Windows, Android, and iPhone

Only two companies now provide major applications for speech recognition for Windows: Nuance and Microsoft. Nuance provides Dragon NaturallySpeaking (DNS for short; latest Premium version: 13 [for English, French, German, Spanish, Italian, Dutch, Chinese and Japanese]), which works under all versions of Windows, including Windows 10. It also provides SR for Android and iPhone under “Dragon”, and “Swipe & Dragon.” The latest Professional Individual version is 15. It is available (September 2016) for English and German, while French and Spanish are served by “Professional Individual” versions that are probably compatible with DNS Pro 14.

Microsoft provides Office SR in Office 2003 under Windows XP, and “Windows Speech Recognition” (WSR) as a built-in feature in Windows Vista and later versions.

Various companies offer secondary SR applications, i.e. ones that are built on an engine developed by a major company, adding various features to it. Among those, Linguatec seems to be the most professional and promising. Its Voice Pro, erstwhile based on IBM’s ViaVoice engine, is now based, since version 12, on Windows Speech Recognition engine. It currently offers an SR application only for German.

 

For Macintosh

Mac is served since 2010 by Nuance. For more information, check Nuance page for Mac.

 

For Android and iPhone

Nuance provides full speech recognition for both Android and iPhone with its various Dragon speech versions. For Android, look for “Swipe & Dragon”. It provides SR in many more languages than those available for either Windows or Mac computers (for example, Arabic, Hebrew, Scandinavian languages, and more).

 

A Historical Note

Four major companies have produced speech recognition programs since the early 1990’s: IBM, Dragon, L&H, and Philips. Dragon’s product, “Dragon NaturallySpeaking”, is the only product still developed and marketed, currently by Nuance, which has taken over ScanSoft, which had replaced the former Lernout & Hauspie, who had in their turn purchased both Dragon and Kurzweil. Nuance has also taken the support for IBM’s ViaVoice, but no upgrade is offered for new Windows versions, and the company recommends DNS instead.

When ScanSoft bought L&H, they decided to make Dragon NaturallySpeaking (DNS for short) their main SR product, transferring features from L&H’s VoiceXpress into it (since Version 6). With the discontinuation of Philips FreeSpeech 2000, several out of its 14 supported languages have no SR support any longer, mainly Austrian German and Catalan. Swedish, however, is now served by a Swedish company, Voxit, which maintains and sells VoiceXpress (I am grateful to Oscar Abrahamsson for this information).

 

Microsoft Speech Recognition: Office 2003 under Windows XP

Microsoft Speech entered the SR arena when it provided speech recognition for Office 2002, and then a much improved version in Office 2003 (October 2003), both under Windows XP. If you install Office 2003 under later versions of Windows, its built-in speech recognition won’t work.

 

Microsoft Speech Recognition: Windows Vista, 7, 8 & 10

Windows Speech Recognition (known as WSR; released January 2007) made a bolder step and made Speech Recognition a built-in feature of the Operating System, first in Windows Vista, and later in Windows 7, 8, and 10. This is an unacknowledged revolution in the sense that SR became available on an unprecedented scale to any Windows user whose language is supported (see below). The User Interface, repertoire of commands and the command-and-control (controlling the entire computer by voice) features have all been much improved. A Macro Feature was beta released in spring 2008, and then had its version 1 release in January 2009. This makes it possible to create text macros, command substitutions, and sophisticated adds-on, thus enhancing the capabilities of WSR towards high flexibility and personal customizability. A rich library of macros is freely available from the Microsoft Speech Yahoo Forum.

 

WSR supports various languages, all available through Language Packs downloadable directly from Microsoft’s server. The following languages are currently supported: English US, English UK, French, Spanish, German, Chinese (Simplified and Classical), and Japanese.

 

Windows 7, 8, 8.1, and 10 offer no new UI features for WSR. It works the same way as under Vista, including the Macro Feature add-on, but various basic recognition functions have been greatly improved under Windows 7 according to Eric Brown of the Microsoft SR Team. See here for details. I have no knowledge of any such improvements under Windows 8, 8.1, or 10.

 

A toolkit sold by Martin Markoe, written by Brad Trott, offers basic transcription and basic macroing for WSR. It can be purchased from here. Brad Trott has also written a Guidebook for Macro Writing, available from the same site. A Macro Library for English, Spanish, and French has been created on the Yahoo MS-Speech Group.

 

For a list of desiderata for WSR see here.

 

Discussion Forums / Support Groups for MS Speech

In April 2003 I set up a Discussion Forum for Microsoft SR on Yahoo (http://groups.yahoo.com/groups/ms-speech), and a Speech Website with various materials (http://speech.even-zohar.com).

 

SR applications based on WSR

Dr. Reinhard Busch, of the Linguatec Sprachtechnologien, has provided me with a brief description of Voice Pro 12. Voice Pro 12 is an SR application based on the WSR engine but with extended acoustic models, new language models for various domains (general, medical, legal) and extended functionality, e.g. it allows full transcription of WAV and MP3 files, similar to DNS 10. The transcribed text is put out together with the original dictated audio (text-aligned). So one can make corrections based on the original recording and add unknown words to the speaker profile. In addition it includes the Macro Tool and app. 1000 built-in macro commands to allow for more intuitive interaction (called “SmartCommands”). As VP 12 relies on the WSR engine for German, the user needs to switch to a full German Windows interface.

For more information see: http://www.linguatec.de/products/stt/voice_pro

 

A Comparative Evaluation

There are no accepted criteria for an “objective” evaluation of SR, mostly because people have different experiences and appreciate different features. A combination of high accuracy with a reasonable User Interface is appreciated by most users, though naturally people differ in their evaluation of these features or what “high accuracy” really means. While anything less than 98% accuracy is considered by most users to not make SR worthwhile, there is no agreement about how to calculate the figure.

 

A prospective or a new user should take into account that speech recognition, like speech itself, is a very personal matter. A program that works successfully for one person may not work as successfully for another.

 

We constantly hear contradicting testimonies on the various SR discussion forums about failure and success. Some people are sworn supporters of one application, bitterly complaining about some other. This is expected in view of the enormous complexity of speech, the large range of voices and language varieties.  In short, as is the case with selecting any other product, much depends on personal preferences as regards the various features provided by the various programs.

 

The following is therefore based on my personal experience and some currently accepted evaluations among veteran users.

 

  1. DNS seems to still maintain its reputation as the leading program in terms of accuracy, User Interface and various important features, as well as its advantage of being functional under different Windows versions. However, with Windows Speech Recognition constantly improving, DNS must provide more advanced capabilities to justify its leading position. The latest versions, 13 and 15, offer splendid improvements, especially with accuracy and speed. Unfortunately, Nuance has changed very little in design, architecture, multilingual use, and the repertoire of commands. Macroing is still available under the Professional versions only.

 

  1. Microsoft Speech 2003 is very accurate, but it suffers from User Interface deficiencies and lacks basic features. It offers SR for only US English, Chinese and Japanese. Moreover, Microsoft has stopped developing it as part of its overall policy to stop selling and providing support for windows XP. It is no longer an option for current users.

 

  1. Windows Speech Recognition (WSR) has an improved UI, a superb set of commands and extended command-and-control, a powerful macro tool, as well as seven supported languages. Its greatest advantage is of course the fact that it is a built-in feature of the OS. Its disadvantage is a relatively rudimentary UI and the lack of basic features, like transcription and the preservation of the speech data (the audio element of SR). The latter, by the way, was offered by Microsoft Office 2003 speech recognition.

 

For a summary comparison of the various features supported by the SR application see Table #1.

 

Factors involved with SR Functionality

The functionality of a Speech Recognition application depends on various factors, such as the quality of the computer, the quality of the microphone, and the quality of one’s voice and speech.

 

Computer

The best application would not perform very well on a weak machine, in spite of the so-called “minimal requirements” advertised by the manufacturer. A powerful machine, on the other hand, can improve speech recognition to an unimaginable level. A powerful machine is a combination of at least the following parameters: a powerful CPU, a large amount of random access memory (RAM), a good sound card, and a large disk. For XP, a minimum of 1GB RAM would be necessary; for the later versions of Windows – a minimum of 4GB.

 

Microphone

A good microphone can make a lot of difference. There is a fair variety of brands, and a good advice from experienced people can make your SR a success story. For information, consultation and products I recommend Speech Recognition Solutions, originally founded by Martin Markoe, one of the greatest expert on SR.

 

Voice and manner of dictation

As Martin Markoe puts it, “the other factor besides a microphone is actually the quality of a person's voice and if they're actually going to use a dictation style of talking as opposed to a conversational style.”

 

Concluding Remarks

On the whole, as Chuck Runquist once wrote, “The whole issue of accuracy is a combination of hardware, software, and user skill.” At its current state, no one should expect SR to render excellent results without some learning and adjustment. Although the quality of SR has splendidly improved, one still must be determined to succeed. In addition to the need of the program to study the way people use a language, the speakers themselves must also learn something about themselves and the way they speak. Sometimes, because of the peculiar phonetic clashes characteristic of a given language, one must learn to change one’s speech habits. For example, short pauses must be introduced between certain words. Some words must be pronounced a bit differently than in everyday ordinary speech. In short, a novice must pay attention to the instances where they go wrong, and try and see how these can be handled by certain strategies, similarly to people who have to speak in public and make themselves heard and understood. People who want to succeed with speech recognition are strongly advised to not only consult the manuals for their applications, but also the various discussion forums, which also provide further documentation. (See links on my speech website: http://speech.even-zohar.com).

 

Information

Information about the various programs can be found on the Websites of their respective companies:

 

Nuance (DNS)

http://www.nuance.com/naturallyspeaking

Microsoft

http://www.microsoft.com/speech

Linguatec

http:/www.linguatec.de/products/stt/voice_pro

Voxit

http://shop.voxit.se/taligenknning/vx-pro-53-usb-headset (Swedish)

 

Comparative Tables

 

Description: Description: free tracking [by StatCounter]