A General
Survey of
|
|
SPEECH
RECOGNITION PROGRAMS
|
|
|
|
|
|
For
a list of available languages see Table #2. |
|
|
For
other tables see Comparative Tables. |
|
|
|
|
|
Languages served by Speech Recognition Manufacturers of Speech Recognition Microsoft Speech Recognition: Office
2003, Vista and Windows 7 Discussion
Forums / Support Groups for MS Speech Factors involved with SR Functionality |
|
Languages
served by Speech Recognition
|
|
|
Speech Recognition (“SR”) is available for a
limited number of languages. There are competing programs for the following: For
Windows English
(US, SR
applications for Brazilian Portuguese, Catalan, Austrian German, and Arabic
are no longer available. (For
a complete list of, and details about, the available languages for Windows
see Table #2.) For
Macintosh English
(US and UK), German, Italian, and Spanish. |
|
|
|
|
Manufacturers
of Speech Recognition
|
|
|
For
Windows Only two companies now provide major
applications for speech recognition for Windows: Nuance and Microsoft. Nuance provides Dragon NaturallySpeaking
(DNS for short; latest version: 10), which works under Windows 98, 2000, XP,
Vista, and Windows 7. Microsoft provides Office SR in Office 2003
under Windows XP, and “Windows Speech Recognition” (WSR) as a built-in
feature of Windows Vista and Windows 7 (all versions). Various companies offer secondary SR
applications, i.e. ones that are built on an engine developed by a major company,
adding various features to it. Among those, Linguatec seems to
be the most professional and promising. Its Voice Pro, erstwhile based on IBM’s
ViaVoice engine, is now based, since version 12, on Windows Speech
Recognition engine. It currently offers an SR application only for German,
but the company is considering other languages, too. For
Macintosh Mac
is served by ViaVoice, which is sold by Nuance.
and iListen. iListen offers downloadable language packs for US and UK
English, German, Italian, and Spanish. About iListen go to http://www.macspeech.com (I am
grateful to Jeff Neal for this information). |
|
A Historical Note
Four
major companies produced speech recognition programs since the early 1990’s:
IBM, Dragon, L&H, and Philips. Dragon’s product, “Dragon
NaturallySpeaking”, is the only product still developed and marketed,
currently by Nuance, which has taken over ScanSoft, which had replaced the
former Lernout & Hauspie, who had in their turn purchased both Dragon and
Kurzweil before. Nuance has also taken the support for IBM’s ViaVoice, but no
upgrade is offered for Windows Vista, and the company recommends DNS instead.
When ScanSoft bought L&H, they decided to make Dragon NaturallySpeaking
(DNS for short) their main SR product, transferring features from L&H’s
VoiceXpress into it (since Version 6). With the discontinuation of Philips
FreeSpeech 2000, several out of its 14 supported languages have no SR support
any longer, mainly Austrian German and Catalan. Swedish, however, is now
served by a Swedish company, Voxit (http://www.voxit.se),
which maintains and sells VoiceXpress (I am grateful to Oscar Abrahamsson for
this information). The latest version, 5.3, was issued December 2, 2005, but
the company claims it functions under |
|
|
|
|
Microsoft
Speech Recognition: Office 2003, Vista and Windows 7
|
|
|
Microsoft Speech entered the SR arena when it
provided speech recognition for Office 2002 under Windows XP, and then much
improved it in Office 2003 (October 2003). Windows
Speech Recognition (known as WSR; released January 2007) made a bolder step
and made Speech Recognition a built-in feature of the Operating System. This
is an unacknowledged revolution in the sense that SR is now available on an
unprecedented scale to any Windows user whose language is supported (see
below). The User Interface, repertoire of commands and the
command-and-control (controlling the entire computer by voice) features have
all been much improved. A new beta Macro Feature was released in spring 2008,
then first release
in January 2009, making it possible to create text macros, command substitutions,
and sophisticated adds-on. This enhances the capabilities of WSR towards high
flexibility and personal customizability. WSR
supports various languages, all available to users of Windows Vista Ultimate,
through Language Packs downloadable directly from Microsoft’s server via the
Update Windows feature. The following languages are currently supported:
English US, English UK, French, Spanish, German, Chinese (Simplified and Classical),
and Japanese. WSR
is also a built-in feature of Windows 7. In its current state (January-July
2009), there are no new UI features added. It works the same way as under
Vista, including the Macro Feature add-on, but various basic recognition
functions have been greatly improved according to Eric Brown of the Microsoft
SR Team. See here for details. For
a list of desiderata for WSR see here. Discussion Forums / Support Groups for MS Speech
In
April 2003 I set up a Discussion Forum for Microsoft SR on Yahoo (http://groups.yahoo.com/groups/ms-speech),
and a Speech Website with various materials (http://speech.even-zohar.com). There
is a lot of valuable information on the Speech Computing forum (http://www.speechcomputing.com/forum)
dedicated to all applications. |
|
|
|
|
|
SR applications based on WSR |
|
|
Dr. Reinhard Busch, of the Linguatec
Sprachtechnologien, has provided me with a brief description of Voice Pro 12.
Voice Pro 12 is an SR application based on the WSR engine but with extended acoustic
models, new language models for various domains (general, medical, legal) and
extended functionality, e.g. it allows full transcription of WAV and MP3 files,
similar to DNS 10. The transcribed text is put out together with the original
dictated audio (text-aligned). So one can make corrections based on the original
recording plus add unknown words to the speaker profile. In addition it includes
the Macro Tool and app. 1000 built-in macro commands to allow for more intuitive
interaction (called “SmartCommands”). As VP 12 relies on the WSR engine for German,
the user needs to switch to a full German Windows. For more information see: http://www.linguatec.de/products/stt/voice_pro |
|
A
Comparative Evaluation
|
|
|
There are no accepted criteria for an
“objective” evaluation of SR, mostly because people have different
experiences and appreciate different features. A combination of high accuracy
with a reasonable User Interface is appreciated by most users, though
naturally people differ in their evaluation of these features or what “high
accuracy” really means. While anything less than 98% accuracy is considered
by most users to not make SR worthwhile, there is no agreement about how to
calculate the figure. A
prospective or a new user should take into account that speech recognition,
like speech itself, is a very personal matter. A program that works
successfully for one person may not work as successfully for another. We
constantly hear contradicting testimonies on the various SR discussion forums
about failure and success. Some people are sworn supporters of one application,
bitterly complaining about some other. This is expected in view of the
enormous complexity of speech, the large range of voices and language
varieties. In short, as is the case with selecting any
other product, much depends on personal preferences as regards the various
features provided by the various programs. The
following is therefore based on my personal experience and some currently
accepted evaluations among veteran users.
A toolkit sold by Martin Markoe, written by
Brad Trott, now offers basic transcription and basic macroing for WSR. It can
be purchased from here.
Brad Trott has also written a Guidebook for Macro Writing, available from the
same
site. |
|
|
For
a summary comparison of the various features supported by the SR application
see Table #1. |
|
|
|
|
Factors
involved with SR Functionality
|
|
|
The functionality of a Speech Recognition
application depends on various factors, such as the quality of the computer,
the quality of the microphone, and the quality of one’s voice and speech. Computer
The
best application would not perform very well on a weak machine, in spite of
the so-called “minimal requirements” advertised by the manufacturer. A
powerful machine, on the other hand, can improve speech recognition to an
unimaginable level. A powerful machine is a combination of at least the
following parameters: a powerful CPU, a large amount of random access memory
(RAM), a good sound card, and a large disk. For XP, a minimum of 1GB RAM
would be necessary; for Microphone
A
good microphone can make a lot of difference. There is a fair variety of
brands, and a good advice from experienced people can make your SR a success
story. For information, consultation and products I recommend Martin Markoe’s
Website: Voice and manner of dictation
As
Martin Markoe puts it, “the other factor besides a microphone is actually the
quality of a person's voice and if they're actually going to use a dictation
style of talking as opposed to a conversational style.” |
|
|
|
|
Concluding Remarks
|
|
|
On the whole, as Chuck Runquist once wrote,
“The whole issue of accuracy is a combination of hardware, software, and user
skill.” At its current state, no one should expect SR to render excellent
results without some learning and adjustment. Although the quality of SR has
splendidly improved, one still must be determined to succeed. In addition to
the need of the program to study the way people use a language, the speakers
themselves must also learn something about themselves and the way they speak.
Sometimes, because of the peculiar phonetic clashes characteristic of a given
language, one must learn to change one’s speech habits. For example, short
pauses must be introduced between certain words. Some words must be
pronounced a bit differently than in everyday ordinary speech. In short, a
novice must pay attention to the instances where they go wrong, and try and
see how these can be handled by certain strategies, similarly to people who
have to speak in public and make themselves heard and understood. People who
want to succeed with speech recognition are strongly advised to not only
consult the manuals for their applications, but also the various discussion
forums, which also provide further documentation. (See links on my speech website:
http://speech.even-zohar.com). |
|
|
|
|
Information
|
|
|
Information about the various
programs can be found on the Websites of their respective companies: |
|
|
|
|
|
Nuance (DNS and ViaVoice) |
|
|
Microsoft |
http://www.microsoft.com/speech |
|
Linguatec |
|
|
Voxit |
http://www.voxit.se (Swedish) |
|
|
|
|
|
|