A General
Survey of
|
SPEECH
RECOGNITION PROGRAMS
|
|
|
|
For
a list of available languages see Table #2. |
|
For
other tables see Comparative Tables. |
|
|
|
Languages served by Speech Recognition Manufacturers of Speech Recognition Microsoft Speech Recognition: Office
2003, Vista and Windows 7 Discussion
Forums / Support Groups for MS Speech Factors involved with SR Functionality |
Languages served by Speech Recognition
|
|
Speech Recognition (“SR”) is available for a limited
number of languages. There are competing programs for the following: |
For Windows |
|
English
(US, |
|
|
|
SR
applications for Brazilian Portuguese, Catalan, Austrian German, and Arabic
are no longer available. |
|
|
|
(For
a complete list of, and details about the available languages for Windows see
Table #2.) |
For Macintosh |
|
English
(US and UK), German, Italian, and Spanish. |
|
|
Manufacturers of Speech Recognition
|
|
For
Windows |
|
Only two companies now provide major
applications for speech recognition for Windows: Nuance and Microsoft. Nuance provides Dragon NaturallySpeaking
(DNS for short; latest version: 10.1; version 11 was announced in August,
2010), which works under Windows 98, 2000, XP, Vista, and Windows 7. Microsoft
provides Office SR in Office 2003 under Windows XP, and “Windows
Speech Recognition” (WSR) as a built-in feature of Windows Vista and Windows 7 (all versions). |
|
Various companies offer secondary SR
applications, i.e. ones that are built on an engine developed by a major
company, adding various features to it. Among those, Linguatec
seems to be the most professional and promising. Its Voice Pro, erstwhile
based on IBM’s ViaVoice engine, is now based, since
version 12, on Windows Speech Recognition engine. It currently offers an SR
application only for German, but the company is also considering other languages. |
|
|
|
For
Macintosh |
Mac
is currently served by MacSpeech, which was
acquired on February 16, 2010 by Nuance.
For more information check http://www.macspeech.com.
|
|
|
A Historical Note
|
|
Four
major companies have produced speech recognition programs since the early
1990’s: IBM, Dragon, L&H, and Philips. Dragon’s product, “Dragon
NaturallySpeaking”, is the only product still developed and marketed,
currently by Nuance, which has taken over ScanSoft,
which had replaced the former Lernout & Hauspie, who had in their turn purchased
both Dragon and Kurzweil. Nuance has also taken the support for IBM’s ViaVoice, but no upgrade is offered for new Windows
versions (Vista/Windows 7), and the company recommends DNS instead. When ScanSoft bought L&H, they decided to make Dragon
NaturallySpeaking (DNS for short) their main SR product, transferring
features from L&H’s VoiceXpress into it (since
Version 6). With the discontinuation of Philips FreeSpeech
2000, several out of its 14 supported languages have no SR support any
longer, mainly Austrian German and Catalan. Swedish, however, is now served
by a Swedish company, Voxit (http://www.voxit.se),
which maintains and sells VoiceXpress (I am
grateful to Oscar Abrahamsson for this
information). The latest version, 5.3, was issued December 2, 2005, but the company
claims it can function under Vista. |
|
|
Microsoft Speech Recognition: Office 2003, Vista and
Windows 7
|
|
Microsoft Speech entered the SR arena when it
provided speech recognition for Office 2002 under Windows XP, and then much
improved it in Office 2003 (October 2003). |
|
|
|
Windows
Speech Recognition (known as WSR; released January 2007) made a bolder step
and made Speech Recognition a built-in feature of the Operating System, first
in Windows Vista, and later in Windows 7. This is an unacknowledged
revolution in the sense that SR is now available on an unprecedented scale to
any Windows user whose language is supported (see below). The User Interface,
repertoire of commands and the command-and-control (controlling the entire
computer by voice) features have all been much improved. A Macro Feature was
beta released in spring 2008, and then had its version 1 release
in January 2009. This makes it possible to create text macros, command
substitutions, and sophisticated adds-on, thus enhancing the capabilities of
WSR towards high flexibility and personal customizability. |
|
|
|
WSR
supports various languages, all available to users of Windows Enterprise and
Windows Ultimate through Language Packs downloadable directly from
Microsoft’s server via the Update Windows feature. The following languages
are currently supported: English US, English UK, French, Spanish, German, Chinese (Simplified and Classical), and Japanese. |
|
|
|
Windows
7 offers no new UI features for WSR. It works the same way as under Vista,
including the Macro Feature add-on, but various basic recognition functions
have been greatly improved according to Eric Brown of the Microsoft SR Team.
See here for details. |
|
|
|
A
toolkit sold by Martin Markoe, written by Brad Trott, now offers basic transcription and basic macroing for WSR. It can be purchased from here.
Brad Trott has also written a Guidebook for Macro
Writing, available from the same
site. A Macro Library for English, Spanish, and French has been created
on the Yahoo
MS-Speech Group. |
|
|
|
For
a list of desiderata for WSR see here. |
|
|
Discussion Forums / Support Groups for
MS Speech
|
|
In
April 2003 I set up a Discussion Forum for Microsoft SR on Yahoo (http://groups.yahoo.com/groups/ms-speech),
and a Speech Website with various materials (http://speech.even-zohar.com). There
is a lot of valuable information on the Speech Computing forum (http://www.speechcomputing.com/forum)
dedicated to all applications. |
|
|
|
SR applications based on WSR |
|
Dr. Reinhard
Busch, of the Linguatec
Sprachtechnologien, has provided me with a
brief description of Voice Pro 12. Voice Pro 12 is an SR application based on
the WSR engine but with extended acoustic models, new language models for
various domains (general, medical, legal) and
extended functionality, e.g. it allows full transcription of WAV and MP3
files, similar to DNS 10. The transcribed text is put out together with the
original dictated audio (text-aligned). So one can make corrections based on
the original recording and add unknown words to the speaker profile. In
addition it includes the Macro Tool and app. 1000 built-in macro commands to
allow for more intuitive interaction (called “SmartCommands”).
As VP 12 relies on the WSR engine for German, the user needs to switch to a
full German Windows interface. |
|
For more information see: http://www.linguatec.de/products/stt/voice_pro |
|
|
A Comparative Evaluation
|
|
There are no accepted criteria for an “objective”
evaluation of SR, mostly because people have different experiences and
appreciate different features. A combination of high accuracy with a
reasonable User Interface is appreciated by most users, though naturally
people differ in their evaluation of these features or what “high accuracy”
really means. While anything less than 98% accuracy is considered by most
users to not make SR worthwhile, there is no agreement about how to calculate
the figure. |
|
|
|
A
prospective or a new user should take into account that speech recognition,
like speech itself, is a very personal matter. A program that works successfully
for one person may not work as successfully for another. |
|
|
|
We
constantly hear contradicting testimonies on the various SR discussion forums
about failure and success. Some people are sworn supporters of one application,
bitterly complaining about some other. This is expected in view of the
enormous complexity of speech, the large range of voices and language varieties.
In
short, as is the case with selecting any other product, much depends on personal
preferences as regards the various features provided by the various programs. |
|
|
|
The
following is therefore based on my personal experience and some currently
accepted evaluations among veteran users. |
|
|
|
|
|
|
|
|
|
|
|
|
For
a summary comparison of the various features supported by the SR application
see Table #1. |
|
|
Factors involved with SR Functionality
|
|
|
|
The functionality of a Speech Recognition application
depends on various factors, such as the quality of the computer, the quality
of the microphone, and the quality of one’s voice and speech. |
|
|
Computer
|
|
The
best application would not perform very well on a weak machine, in spite of
the so-called “minimal requirements” advertised by the manufacturer. A powerful
machine, on the other hand, can improve speech recognition to an unimaginable
level. A powerful machine is a combination of at least the following parameters:
a powerful CPU, a large amount of random access memory (RAM), a good sound
card, and a large disk. For XP, a minimum of 1GB RAM would be necessary; for |
|
|
Microphone
|
|
A
good microphone can make a lot of difference. There is a fair variety of
brands, and a good advice from experienced people can make your SR a success
story. For information, consultation and products I recommend Martin Markoe’s
Website: |
|
|
Voice and
manner of dictation
|
|
As
Martin Markoe puts it, “the other factor besides a
microphone is actually the quality of a person's voice and if they're
actually going to use a dictation style of talking as opposed to a conversational
style.” |
|
|
Concluding Remarks
|
|
|
|
On the whole, as Chuck Runquist
once wrote, “The whole issue of accuracy is a combination of hardware,
software, and user skill.” At its current state, no one should expect SR to
render excellent results without some learning and adjustment. Although the quality
of SR has splendidly improved, one still must be determined to succeed. In
addition to the need of the program to study the way people use a language,
the speakers themselves must also learn something about themselves and the
way they speak. Sometimes, because of the peculiar phonetic clashes characteristic
of a given language, one must learn to change one’s speech habits. For
example, short pauses must be introduced between certain words. Some words
must be pronounced a bit differently than in everyday ordinary speech. In
short, a novice must pay attention to the instances where they go wrong, and
try and see how these can be handled by certain strategies, similarly to
people who have to speak in public and make themselves
heard and understood. People who want to succeed with speech recognition are
strongly advised to not only consult the manuals for their applications, but
also the various discussion forums, which also provide further documentation.
(See links on my speech website: http://speech.even-zohar.com). |
|
|
Information
|
|
Information about the
various programs can be found on the Websites of their respective companies: |
|
|
|
Nuance
(DNS and ViaVoice) |
|
Microsoft |
|
|
|
Linguatec |
|
Voxit |
|
http://www.voxit.se (Swedish) |
|
|
|
|
|
|