DISCUSSING THE QUALITY OF VOICE RECOGNITION PROGRAMS IN COMPARATIVE TERMS |
Itamar Even-Zohar |
(itamarez@post.tau.ac.il) |
July 2000 |
Working
with various applications Auxiliary
features during dictation Macros
and vocabulary extension Language
varieties -- Pronunciations and accents |
Comment, July 2009: Most of the specific information in this document is obsolete.
IBM no longer produces ViaVoice and its excellent Millenium version does not
work under the new versions of Windows. DNS has actually merged with
VoiceXpress and added many valuable features. In the meantime, Microsoft
offered in 2003 its first advanced speech recognition program under Office
2003, and in 2006 – Windows Speech Recognition (WSR) under Windows Vista and
most recently under Windows 7. This has changed the SR scene completely,
since out of many manufacturers, only two major ones are still in the race –
Nuance with Dragon NaturallySpeaking (DNS) and Microsoft with its WSR. In spite of the irrelevance of most of the specific information,
I thought this document should stay on my Speech Page because it is an
attempt at a systematic comparison between applications on the basis of
relevant criteria. I also believe it might be of some inetrest to people who
wish to learn something about the history of SR. So much has been taking
place under such a short span of time.
|
Having tested all of the four major voice recognition programs, I thought it might be a good idea to attempt a comparative evaluation based on a set of criteria rather than on a full description of each of them. I believe that some of the criteria suggested in the following have not been given any or sufficient attention in the various professional reviews |
|
The programs I've tested all programs running under Windows: |
· Dragon's NaturallySpeaking "Preferred" version -- U.S. and UK English, Spanish |
· L&H VoiceXpress Professional version 5 -- U.S. English |
· Philips' Free Speech 2000 -- U.S. and UK English, Spanish of Northern Latin America, Italian, French, and German. |
· ViaVoice Millennium PRO (release 7) -- U.S. English. |
|
All of these programs have been basically tested with their built-in editors (FreeSpeech 2000, however, considers WordPad as its domestic editor). I have also checked -- very superficially -- how some of them function in Microsoft Word 2000. If you've got a national version of Microsoft Word before version 2000, it is not likely to work. |
|
I would like to emphasize that I am not a native speaker of any of the languages I've tested. However, at least English, French and Spanish have pretensions to be international languages, so one would expect a well-designed dictation program to be able to accommodate various hybrid pronunciations. At any rate, I believe that my evaluation as a non-native speaker can be a good indicator for the quality of the program. If I managed to achieve a high degree of accuracy with one or another program, I assume that a native speaker can reach the same level with even less efforts. |
|
The programs will be referred to as follows: |
|
DNS -- Dragon's NaturallySpeaking "Preferred", version 4 |
FS -- Philips' Free Speech 2000 |
VVMP -- ViaVoice Millennium PRO, release 7 |
VX -- L&H VoiceXpress Professional version of 5 |
|
Hardware
connectivity
|
|
1. Program supports a USB microphone. |
|
DNS -- supports many USB microphones; offers its own USB microphone as part of one of its packages. |
FS -- did not support a USB microphone until February 2000, when it made Service Pack 2 available on its Website |
VVMP -- does not support a USB microphone. This is, however, contested by some experts and users: Martin Markoe claims that at least a number of computers can take a USB microphone with VVMP (and that the IBM people are wrong about this). Other users have confirmed that. My own conclusion is based on testing two different Telex USB microphones -- a headset and the M-560 desktop microphone. Neither worked on any of my machines, and I have received a written confirmation from IBM that they do not support a USB microphone with their Windows versions. They do, however, support such a device on their versions for the Mac. |
VX -- supports many USB microphones. |
|
Flexible installation
|
|
1. Program allows adding/removing of features any time after installation has been completed. |
|
DNS -- theoretically allows adding features any time after installation. In reality, if you select a feature to be installed, the program executes full installation. As for removal, I've not been able to arrive at any conclusions. |
FS -- (?) |
VVMP -- allows both adding and removal. |
VX -- (?) |
|
Training
|
|
1. Program allows -- after the obligatory first reading -- training further texts in portions (i.e., the user is allowed to stop, enabling the program to elaborate/learn/analyze the trained materials). |
|
DNS -- allows further reading any time, including the same texts (i.e., texts you once read the not disappear from the list), but does not allow reading of portions of texts -- you must go through a text in full. |
FS -- allows for the reading, stopping at any time and telling the program to process the results. |
VVMP -- allows further reading any time, eliminating from the list text already trained. Does not allow portions. |
VX -- (?) |
|
|
2. During
training -- including the obligatory first reading -- Program allows skipping
words /going back to re-read words (for example, if the program does not
understand the user). |
|
DNS -- fully compatible with the above description. |
FS -- fully compatible with the above description. |
VVMP -- only partly compatible with the above description; you cannot skip a word, but you can go back. However, when you go back and wish to return, you are not brought back to the same spot, but to the next segment. |
VX -- fully incompatible with the above description: if the program does not recognize a word, it gets stuck. Your only option is to terminate and start again from the beginning, helping this time the program will allow you to continue. |
|
3. Program
allows continuous reading, even though mistakes are
made (it leaves it to the user to decide if they want to go back and re-read
the misrecognized words). |
|
DNS -- fully compatible with the above description. |
FS -- fully compatible with above description. You are told explicitly that you are not obliged to say everything correctly. |
VVMP -- I'm not sure; I think that basically it is compatible with the above description. |
VX -- fully incompatible with the above description. |
|
4. Program
does not require extensive training after the first obligatory reading (taking
about fifteen minutes). (In
contradistinction to requiring repeat the training, even more than six times,
in order to achieve acceptable results.) |
italicized that |
DNS -- requires repeated training, preferably every month or so. |
FS -- I have not arrived at any conclusions. |
VVMP -- actually requires only very short basic training. |
VX -- requires a repeat the training. |
|
Learning
|
|
1. Program accumulates
data from materials dictated by the user for improving voice recognition
without forcing the user to do repetitive trainings. |
|
DNS -- makes changes to the user's Speech Files, if the user accepts the program's suggestion to save them up to reach session. It is not clear how often it is advisable to do so. The changes, however, seem to take into account user's corrections only, not accumulated or updated information about the general pronunciation of the user. |
FS -- seems to update itself in the background as far as explicit corrections are concerned. Same remark as for DNS. |
VVMP -- tells the user from time to time that sufficient materials have been accumulated to improve recognition. It seems that this does not apply to corrections (in contradistinction to DNS), which seem to be adopted by the program on a continual basis. If my analysis is correct, then VVMP is unique among all voice recognition programs in this respect. |
VX -- seems to update itself in the background as far as explicit corrections are concerned. Same remark as for DNS. |
|
2. Program improves voice recognition when user repeats the misrecognized item until recognized (in contradistinction to explicit correction by means of the correction window/box). |
|
I believe only FS complies with this. |
|
3. Program
improves voice recognition when user types the misrecognized item (in contradistinction
to explicit correction by means of the correction window/box). |
|
I believe that only VVMP complies with this. |
|
Management
of memory
|
|
1. Program
allows the running of several applications, without causing the system to
freeze or collapse. |
|
I have checked all programs with 2 Pentium II computers, a desktop and a notebook, both with 128 MB RAM, CPUs: 450 and 360 MHz respectively. |
|
DNS -- does not exhaust the computer's resources: it allows working with 2-3 more applications, including Microsoft Word. It seldom causes the system to freeze or collapse. |
FS -- exhausts almost completely the resources, thus allowing only WordPad. It often causes the system to freeze or collapse. |
VVMP -- does not exhaust the computer's resources, allowing several applications to work on top of itself, including Microsoft Word. However, the computer's memory is almost completely exhausted. |
VX -- I have not been able to arrive but any conclusive conclusions. |
|
2. Program can handle advanced CPUs. |
|
Steven J. Kleinman reports: "VVMP seems to work well (for me, at least) on a Pentium II 400, Pentium |
III 500, and AMD Athlon 700". |
|
Updating
and backups
|
|
1. Program allows easy transfer of Speech Files. |
|
DNS -- has no built-in procedure for transferring Speech Files. You must do that manually and hope it will work. Not really worthwhile doing that on a regular basis, so you cannot keep two machines equal to each other. |
FS -- theoretically has a built-in procedure, but in my experience, I could not transfer the same profile. I could only transfer a profile from one computer to the other and have it renamed. |
VVMP -- has a built-in procedure, but would not allow replacing a profile used for initial training. In such cases, you must import the profile, delete the original one, and rename the imported one. The procedure, however, is very simple and straightforward. |
VX -- (?) |
|
2. Program
saves changed Speech Files in the background (in contradistinction to saving
them each time program is exited). |
|
It seems that only DNS saves Speech Files in the
foreground. I cannot make up my mind
whether this is annoying or helpful.
There are sessions when you would not wish to save your corrections
and alterations; with DNS, it can be sure that no traces will be kept of that
session. This may not be the case with
all of the other programs. |
|
Working
with various applications
|
|
1. Program allows dictation and navigation in various applications. |
|
DNS -- allows dictation into most programs, but a full range of commands is available in a restricted range: Microsoft Word, Corel WordPerfect and several others. |
FS -- allows dictation to Microsoft WordPad only; does not offer an editor of its own. |
VVMP
-- like DNS. |
VX -- like DNS . |
|
Transcription
|
|
1. Program
allows transcription of recorded files from any source (i.e., not necessarily
from a dedicated recorder). |
|
DNS -- fully complies with this. |
FS -- fully complies with this. |
VVMP -- A mystery: something is mentioned in the Help File about installing a transcription program, but I have not been able to find out whether such a program can be purchased. |
|
Dictation
|
|
1. Program allows different modes for resolving problems. For example, Dictation Only Mode vs. Commands Only Mode, or Spelling Mode, Numbers Mode, etc. |
|
DNS -- has no explicit Dictation Only or Commands Only modes (but has designated keys to force the one or the other). |
FS -- has Dictation Only Mode and Commands Only Mode. The trouble is, you must turn off dictation Mode if you wish to execute commands. |
VVMP -- does not have Dictation Only/Commands Only modes. Offers only Spelling Mode and Numbers Mode. |
VX -- has Dictation Only Mode, Commands Only Mode, Spell Only Mode, in addition to "Normal Mode" (combining Dictation with Commands). |
|
2. Program has a key to force recognition of items as either dictation or commands. |
|
I believe that only DNS has this feature (absolutely indispensable and sorely missed in the other programs). |
|
Auxiliary
features during dictation
|
|
1. Program shows processing of material during dictation through a result box (in contradistinction to keeping the user waiting for the text to appear [on weaker computers than Pentium III]). |
|
DNS and FS have a result box; VVMP and VX do not. I believe previous versions of ViaVoice offered this feature. Perhaps the speed with which VVMP processes the user's speech makes this feature unnecessary. |
|
Correction
|
|
1. Program
allows easy correction during dictation (in contradistinction to allowing
correction after dictation). |
|
DNS seems to be the only program to allow easy and straightforward correction. VVMP and VX offer somewhat similar possibilities, but obviously inferior to DNS. VVMP does not allow selecting an item from the list of alternatives for editing, and often leaves you stuck in the correction window. VX has even fewer alternatives and often behaves erratically, so before we know it you get a mistaken item. |
|
Steven J. Kleinman comments: "I believe VVMP meets all the criteria except the ability to edit selected alternatives in the box. Realistically, many people find the VVMP correction mechanism inferior. However, it is (in my opinion) quick and efficient if you always maintain the open correction window off to the side. Also, you should edit the setting to permit more than the normal default number of correction alternatives. If you do this, you usually have the proper |
correction available and it is very rapid". |
|
2. Program
allows flexible corrections (such as quickly locating the items to be corrected,
efficient and fast procedures in the correction box, useful alternatives in
the correction box, ability to edit selected alternatives in the box, etc.). |
|
I believe that only DNS fully answers to this
description. |
|
3. Program allows saving Speech Files for delayed correction. |
|
FS, VVMP, VX -- comply with this. |
DNS -- does not comply with this. |
|
4. Program
produces a guessed string when it does not recognize the dictated item. This allows either immediate correction or
continuation of dictation. (In
contradistinction to no response accompanied by the message
"unrecognized...” which forces the user to get
immediately involved in direct customization.) |
|
DNS, FS, VVMP -- comply with this. |
VX
-- does not comply with this (generating no response
accompanied by the message "unrecognized...” which forces the user to
get immediately involved in direct customization. |
|
Navigation
|
|
1. Program offers
useful commands for extensive navigation (such as being able to
select/correct/delete items backwards and forwards by specifying location of
words/sentences/paragraphs). |
|
DNS -- fully complies with this. |
FS -- does not offer extensive navigation. |
VVMP -- offers restricted navigation in its own editor; offers relatively extensive possibilities in Microsoft Word. |
VX
-- offers possibilities somewhere between DNS and FS. |
|
Editing
|
|
1. Program allows
easy and quick textual editing (such as: inserting new items into the written
texts, deleting/replacing/changing items).
This includes smooth cooperation between voice, mouse, and keyboard;
for example, if you click with the mouse on a certain spot, you expect the
program to insert the text about to be dictated into that spot. |
|
Only DNS seems to comply with this in full. VVMP creates many difficulties in
editing. For example, if you put the
cursor at a certain spot in an already dictated text, you often get the item
you wanted to be inserted at that spot instead of the immediately preceding
item. As a consequence, you must take
extra precautions, including clicking the mouse
twice to make sure the program knows where you are. While DNS offers useful commands such as
"insert before/after", all other programs force you to either give
a series of commands or simply move around with your mouse (which also brings
you into troubles with VVMP). |
|
|
Alternative
commands
|
|
1. Program offers
built-in alternative commands (allowing the user to select those commands
which best work for them. This
involves two different aspects: [a] there are commands which are repeatedly
misrecognized, while other are performed unproblematically; [b] there are
commands that users may find easier to pronounce, or more comfortable with.) |
|
FS, DNS, and VX seem to offer reasonably various
alternative commands. This does not
mean that there are no problems as a result of non-extant synonyms. For example, various commands in DNS
Spanish do not work with my operating system ("Hebrew enabled Windows
98"); had it been possible to create alternatives, it could have solved
such problems. As for FS, however, the
general range of commands is very limited.
VVMP hardly offers any alternatives.
Sometimes, however, you may hit such alternatives by chance. For example, since I learned VVMP after a
long use of DNS, I automatically issued the command "open-paren"
for "open-parenthesis", and it worked, though undocumented. (I later created a macro
"left-paren".) |
|
2. Program
makes it possible to create (customize) alternative commands to the built-in
ones. (This may be necessary if the
built-in stock does not offer convenient solutions.) |
|
Only FS offers a built-in procedure for creating
alternative commands. VVMP (and I
believe VX, too) offer application macros, which sometimes can help replacing
various commands. DNS, on the other
hand, offers only dictation macros with which you can replace various
punctuation commands that you cannot train successfully. |
|
3. Program
makes it possible to create new commands for functions that are not covered
by any built-in command (this would be roughly equivalent to "navigation
macros"). |
|
FS, VVMP, and VX at least partially offer this
possibility. DNS offers none. |
|
Macros
and vocabulary extension
|
|
1. Program
allows extensive dictation macros (in contradistinction to allowing macros
that are no longer than a line). |
|
DNS -- only plain text, no longer than roughly a line is available. |
FS -- extensive macros are available, both text and commands. |
VVMP -- extensive macros are available, both text and commands. However, the so-called application macros cannot use a script language, nor make use of built-in VVMP commands or macros. |
VX
-- somewhere between FS and VVMP. |
|
2. Program
allows navigation macros (in contradistinction to dictation macros). |
|
DNS -- no navigation macros available. |
FS -- navigation macros available. |
VVMP -- navigation macros available, but only recordings of keystrokes; no script language and no integration of built-in commands. |
VX -- navigation macros available, but I can't specify with what limitations. |
|
3. Program
allows controlling the behavior of dictation macros (such as determining
spaces and capitalization before and after these macros). |
|
DNS -- no built-in procedure for controlling the behavior of dictation macros. However, knowledgeable users can download a little program created by Joel xxx, allowing such control. |
FS -- (?) |
VVMP -- allows controlling the behavior of dictation macros as a built-in procedure. |
VX
-- (?) |
|
4. Program has
extensive and efficient procedures for augmenting the vocabulary. (This includes easy use on the one hand,
and adding adequate information on the other.) |
|
All programs have this feature. I'm not sure any longer about the ease of use in each of them. |
|
Languages
|
|
1. Program can handle more than one language in one and the same document. |
|
FS is the only program that offers multilingual work (up to six languages in one single package). |
|
Flexible
spelling (English)
|
|
1. Program for
dictating in English allows choosing American or British spelling
irrespectively of the user's pronunciation (for example, a Canadian whose
pronunciation is closer to the U.S. variety would then be able to get a
document written in British spelling, according to what is required in
Canada). |
|
None of the four voice-recognition programs offers this
possibility. If you prefer U.S.
spelling, you must use a U.S. program, even if your pronunciation is more
British-like. |
|
Language
varieties -- Pronunciations and accents
|
|
1. Program is
flexible enough to accommodate a large variety of accents (in
contradistinction to being designed for a heavily marked specific
accent). This applies mainly to the
English language, but also to some extent to French, Spanish, and German. |
|
DNS -- both U.S. and UK English could accommodate my hybrid pronunciation of English (which nevertheless is more British-like). I had to struggle a little bit more with the U.S. version (since I pronounce "better" rather than "bedder", etc.), and I have not succeeded in teaching the program the difference between "there" and "their", and similar words. |
FS -- I could not achieve any reasonable results with either U.K. or U.S. English. I've seen, that native speakers did not report otherwise. On the other hand, I have achieved quite satisfactory results with the Italian, Spanish, German and tolerable results (i.e., at any rate better than English) with French. |
VVMP -- I have achieved the best level of accuracy with this program after few hours of use. I had to teach it non-American pronunciations of highly frequent words, but after that it functioned astoundingly well. |
VX
-- although I started working with version 1.1 of this program,
as well as tested UK English for version 4, and U.S. English for versions 4
and 5, I have not been able to reach any satisfactory results. I suppose this program is tailored very
specifically for U.S./U.K. accent. |
|
2. Program can
accommodate, in addition to native accents, non-native ones. This is particularly relevant to the
English language, in view of its pretension to function as an international
language. |
|
This parameter is discussed in the section above. |
|
Text-to-speech
|
|
1. Program
reads text with a high quality, close to human voice (in contradistinction to
low quality machine voice). |
|
DNS -- basically offers the only one voice per language: Gordon for English, and Rafael for Spanish. The voices are not really agreeable, and cannot, in my opinion, be very helpful for checking the dictated texts. |
FS -- although FS offers six languages in its regular package, only the main language in the package is accompanied with at text-to-speech device. The English voice is very mediocre. |
VVMP -- more or less like DNS and FS. |
VX
-- has an almost natural voice ("Jennifer"), undoubtedly
superior to all of the other voices. |
|
2. Program
allows user to select a voice to their taste. |
|
L&H offer additional voices, which can be used by some of the other programs. However, VVMP does not allow that. |
|
3. Program
displays the text converted to speech during reading (thus making it easier
for the user to follow). |
|
Only VVMP offers this feature. |
|
Documentation
|
|
1. Program
should be accompanied by as extensive and clear documentation as possible. |
|
Only DNS offers an extensive and clear documentation. All of the other programs force you to scrutinize frenetically their help screens, trying to chase the commands you so badly need. I think a user need not insist on printed documentation (such as the excellent DNS manuals for version 4); a well-edited PDF format file would suffice. |
|
Conclusions
|
|
No doubt every user has different criteria for evaluating products. And since people talk in so many different ways, there is no guarantee that if the program worked well for one user it would work as well for another. However, there is accumulated evidence about the accuracy achieved by the various programs. The first two positions are awarded almost unanimously to VVMP and DNS. Since DNS has got so many superior features in matters of flexibility of commands, flexibility of corrections, and documentation, some reviewers have tended to prefer it to VVMP. From my own point of view, although I deplore very much the absence of indispensable commands in VVMP -- in comparison with DNS and VX -- what I believe should count most of all is the level of accuracy, and here my experience has been beyond any doubt or hesitation in favor of VVMP. |
|
This text was dictated with ViaVoice Millennium PRO. |
|