DISCUSSING THE QUALITY OF VOICE RECOGNITION PROGRAMS IN COMPARATIVE TERMS

Itamar Even-Zohar

(itamarez@post.tau.ac.il)

July 2000

Hardware connectivity. 2

Flexible installation. 2

Training. 3

Learning. 4

Management of memory. 4

Updating and backups. 5

Working with various applications. 5

Transcription. 6

Dictation. 6

Auxiliary features during dictation. 6

Correction. 7

Navigation. 7

Editing. 8

Alternative commands. 8

Macros and vocabulary extension. 9

Languages. 10

Flexible spelling (English) 10

Language varieties -- Pronunciations and accents. 10

Text-to-speech. 11

Documentation. 11

Conclusions. 12

 

Comment, July 2009:

 

Most of the specific information in this document is obsolete. IBM no longer produces ViaVoice and its excellent Millenium version does not work under the new versions of Windows. DNS has actually merged with VoiceXpress and added many valuable features. In the meantime, Microsoft offered in 2003 its first advanced speech recognition program under Office 2003, and in 2006 Windows Speech Recognition (WSR) under Windows Vista and most recently under Windows 7. This has changed the SR scene completely, since out of many manufacturers, only two major ones are still in the race Nuance with Dragon NaturallySpeaking (DNS) and Microsoft with its WSR.

 

In spite of the irrelevance of most of the specific information, I thought this document should stay on my Speech Page because it is an attempt at a systematic comparison between applications on the basis of relevant criteria. I also believe it might be of some inetrest to people who wish to learn something about the history of SR. So much has been taking place under such a short span of time.

Having tested all of the four major voice recognition programs, I thought it might be a good idea to attempt a comparative evaluation based on a set of criteria rather than on a full description of each of them. I believe that some of the criteria suggested in the following have not been given any or sufficient attention in the various professional reviews

 

The programs I've tested all programs running under Windows:

                 Dragon's NaturallySpeaking "Preferred" version -- U.S. and UK English, Spanish

                 L&H VoiceXpress Professional version 5 -- U.S. English

                 Philips' Free Speech 2000 -- U.S. and UK English, Spanish of Northern Latin America, Italian, French, and German.

                 ViaVoice Millennium PRO (release 7) -- U.S. English.

 

All of these programs have been basically tested with their built-in editors (FreeSpeech 2000, however, considers WordPad as its domestic editor). I have also checked -- very superficially -- how some of them function in Microsoft Word 2000. If you've got a national version of Microsoft Word before version 2000, it is not likely to work.

 

I would like to emphasize that I am not a native speaker of any of the languages I've tested. However, at least English, French and Spanish have pretensions to be international languages, so one would expect a well-designed dictation program to be able to accommodate various hybrid pronunciations. At any rate, I believe that my evaluation as a non-native speaker can be a good indicator for the quality of the program. If I managed to achieve a high degree of accuracy with one or another program, I assume that a native speaker can reach the same level with even less efforts.

 

The programs will be referred to as follows:

 

DNS -- Dragon's NaturallySpeaking "Preferred", version 4

FS -- Philips' Free Speech 2000

VVMP -- ViaVoice Millennium PRO, release 7

VX -- L&H VoiceXpress Professional version of 5

 

Hardware connectivity

 

1. Program supports a USB microphone.

 

DNS -- supports many USB microphones; offers its own USB microphone as part of one of its packages.

FS -- did not support a USB microphone until February 2000, when it made Service Pack 2 available on its Website

VVMP -- does not support a USB microphone. This is, however, contested by some experts and users: Martin Markoe claims that at least a number of computers can take a USB microphone with VVMP (and that the IBM people are wrong about this). Other users have confirmed that. My own conclusion is based on testing two different Telex USB microphones -- a headset and the M-560 desktop microphone. Neither worked on any of my machines, and I have received a written confirmation from IBM that they do not support a USB microphone with their Windows versions. They do, however, support such a device on their versions for the Mac.

VX -- supports many USB microphones.

 

Flexible installation

 

1. Program allows adding/removing of features any time after installation has been completed.

 

DNS -- theoretically allows adding features any time after installation. In reality, if you select a feature to be installed, the program executes full installation. As for removal, I've not been able to arrive at any conclusions.

FS -- (?)

VVMP -- allows both adding and removal.

VX -- (?)

 

Training

 

1. Program allows -- after the obligatory first reading -- training further texts in portions (i.e., the user is allowed to stop, enabling the program to elaborate/learn/analyze the trained materials).

 

DNS -- allows further reading any time, including the same texts (i.e., texts you once read the not disappear from the list), but does not allow reading of portions of texts -- you must go through a text in full.

FS -- allows for the reading, stopping at any time and telling the program to process the results.

VVMP -- allows further reading any time, eliminating from the list text already trained. Does not allow portions.

VX -- (?)

 

2. During training -- including the obligatory first reading -- Program allows skipping words /going back to re-read words (for example, if the program does not understand the user).

 

DNS -- fully compatible with the above description.

FS -- fully compatible with the above description.

VVMP -- only partly compatible with the above description; you cannot skip a word, but you can go back. However, when you go back and wish to return, you are not brought back to the same spot, but to the next segment.

VX -- fully incompatible with the above description: if the program does not recognize a word, it gets stuck. Your only option is to terminate and start again from the beginning, helping this time the program will allow you to continue.

 

3. Program allows continuous reading, even though mistakes are made (it leaves it to the user to decide if they want to go back and re-read the misrecognized words).

 

DNS -- fully compatible with the above description.

FS -- fully compatible with above description. You are told explicitly that you are not obliged to say everything correctly.

VVMP -- I'm not sure; I think that basically it is compatible with the above description.

VX -- fully incompatible with the above description.

 

4. Program does not require extensive training after the first obligatory reading (taking about fifteen minutes). (In contradistinction to requiring repeat the training, even more than six times, in order to achieve acceptable results.)

italicized that

DNS -- requires repeated training, preferably every month or so.

FS -- I have not arrived at any conclusions.

VVMP -- actually requires only very short basic training.

VX -- requires a repeat the training.

 

Learning

 

1. Program accumulates data from materials dictated by the user for improving voice recognition without forcing the user to do repetitive trainings.

 

DNS -- makes changes to the user's Speech Files, if the user accepts the program's suggestion to save them up to reach session. It is not clear how often it is advisable to do so. The changes, however, seem to take into account user's corrections only, not accumulated or updated information about the general pronunciation of the user.

FS -- seems to update itself in the background as far as explicit corrections are concerned. Same remark as for DNS.

VVMP -- tells the user from time to time that sufficient materials have been accumulated to improve recognition. It seems that this does not apply to corrections (in contradistinction to DNS), which seem to be adopted by the program on a continual basis. If my analysis is correct, then VVMP is unique among all voice recognition programs in this respect.

VX -- seems to update itself in the background as far as explicit corrections are concerned. Same remark as for DNS.

 

2. Program improves voice recognition when user repeats the misrecognized item until recognized (in contradistinction to explicit correction by means of the correction window/box).

 

I believe only FS complies with this.

 

3. Program improves voice recognition when user types the misrecognized item (in contradistinction to explicit correction by means of the correction window/box).

 

I believe that only VVMP complies with this.

 

Management of memory

 

1. Program allows the running of several applications, without causing the system to freeze or collapse.

 

I have checked all programs with 2 Pentium II computers, a desktop and a notebook, both with 128 MB RAM, CPUs: 450 and 360 MHz respectively.

 

DNS -- does not exhaust the computer's resources: it allows working with 2-3 more applications, including Microsoft Word. It seldom causes the system to freeze or collapse.

FS -- exhausts almost completely the resources, thus allowing only WordPad. It often causes the system to freeze or collapse.

VVMP -- does not exhaust the computer's resources, allowing several applications to work on top of itself, including Microsoft Word. However, the computer's memory is almost completely exhausted.

VX -- I have not been able to arrive but any conclusive conclusions.

 

2. Program can handle advanced CPUs.

 

Steven J. Kleinman reports: "VVMP seems to work well (for me, at least) on a Pentium II 400, Pentium

III 500, and AMD Athlon 700".

 

Updating and backups

 

1. Program allows easy transfer of Speech Files.

 

DNS -- has no built-in procedure for transferring Speech Files. You must do that manually and hope it will work. Not really worthwhile doing that on a regular basis, so you cannot keep two machines equal to each other.

FS -- theoretically has a built-in procedure, but in my experience, I could not transfer the same profile. I could only transfer a profile from one computer to the other and have it renamed.

VVMP -- has a built-in procedure, but would not allow replacing a profile used for initial training. In such cases, you must import the profile, delete the original one, and rename the imported one. The procedure, however, is very simple and straightforward.

VX -- (?)

 

2. Program saves changed Speech Files in the background (in contradistinction to saving them each time program is exited).

 

It seems that only DNS saves Speech Files in the foreground. I cannot make up my mind whether this is annoying or helpful. There are sessions when you would not wish to save your corrections and alterations; with DNS, it can be sure that no traces will be kept of that session. This may not be the case with all of the other programs.

 

Working with various applications

 

1. Program allows dictation and navigation in various applications.

 

DNS -- allows dictation into most programs, but a full range of commands is available in a restricted range: Microsoft Word, Corel WordPerfect and several others.

FS -- allows dictation to Microsoft WordPad only; does not offer an editor of its own.

VVMP -- like DNS.

VX -- like DNS .

 

Transcription

 

1. Program allows transcription of recorded files from any source (i.e., not necessarily from a dedicated recorder).

 

DNS -- fully complies with this.

FS -- fully complies with this.

VVMP -- A mystery: something is mentioned in the Help File about installing a transcription program, but I have not been able to find out whether such a program can be purchased.

 

Dictation

 

1. Program allows different modes for resolving problems. For example, Dictation Only Mode vs. Commands Only Mode, or Spelling Mode, Numbers Mode, etc.

 

DNS -- has no explicit Dictation Only or Commands Only modes (but has designated keys to force the one or the other).

FS -- has Dictation Only Mode and Commands Only Mode. The trouble is, you must turn off dictation Mode if you wish to execute commands.

VVMP -- does not have Dictation Only/Commands Only modes. Offers only Spelling Mode and Numbers Mode.

VX -- has Dictation Only Mode, Commands Only Mode, Spell Only Mode, in addition to "Normal Mode" (combining Dictation with Commands).

 

2. Program has a key to force recognition of items as either dictation or commands.

 

I believe that only DNS has this feature (absolutely indispensable and sorely missed in the other programs).

 

Auxiliary features during dictation

 

1. Program shows processing of material during dictation through a result box (in contradistinction to keeping the user waiting for the text to appear [on weaker computers than Pentium III]).

 

DNS and FS have a result box; VVMP and VX do not. I believe previous versions of ViaVoice offered this feature. Perhaps the speed with which VVMP processes the user's speech makes this feature unnecessary.

 

Correction

 

1. Program allows easy correction during dictation (in contradistinction to allowing correction after dictation).

 

DNS seems to be the only program to allow easy and straightforward correction. VVMP and VX offer somewhat similar possibilities, but obviously inferior to DNS. VVMP does not allow selecting an item from the list of alternatives for editing, and often leaves you stuck in the correction window. VX has even fewer alternatives and often behaves erratically, so before we know it you get a mistaken item.

 

Steven J. Kleinman comments: "I believe VVMP meets all the criteria except the ability to edit selected alternatives in the box. Realistically, many people find the VVMP correction mechanism inferior. However, it is (in my opinion) quick and efficient if you always maintain the open correction window off to the side. Also, you should edit the setting to permit more than the normal default number of correction alternatives. If you do this, you usually have the proper

correction available and it is very rapid".

 

2. Program allows flexible corrections (such as quickly locating the items to be corrected, efficient and fast procedures in the correction box, useful alternatives in the correction box, ability to edit selected alternatives in the box, etc.).

 

I believe that only DNS fully answers to this description.

 

3. Program allows saving Speech Files for delayed correction.

 

FS, VVMP, VX -- comply with this.

DNS -- does not comply with this.

 

4. Program produces a guessed string when it does not recognize the dictated item. This allows either immediate correction or continuation of dictation. (In contradistinction to no response accompanied by the message "unrecognized... which forces the user to get immediately involved in direct customization.)

 

DNS, FS, VVMP -- comply with this.

VX -- does not comply with this (generating no response accompanied by the message "unrecognized... which forces the user to get immediately involved in direct customization.

 

Navigation

 

1. Program offers useful commands for extensive navigation (such as being able to select/correct/delete items backwards and forwards by specifying location of words/sentences/paragraphs).

 

DNS -- fully complies with this.

FS -- does not offer extensive navigation.

VVMP -- offers restricted navigation in its own editor; offers relatively extensive possibilities in Microsoft Word.

VX -- offers possibilities somewhere between DNS and FS.

 

Editing

 

1. Program allows easy and quick textual editing (such as: inserting new items into the written texts, deleting/replacing/changing items). This includes smooth cooperation between voice, mouse, and keyboard; for example, if you click with the mouse on a certain spot, you expect the program to insert the text about to be dictated into that spot.

 

Only DNS seems to comply with this in full. VVMP creates many difficulties in editing. For example, if you put the cursor at a certain spot in an already dictated text, you often get the item you wanted to be inserted at that spot instead of the immediately preceding item. As a consequence, you must take extra precautions, including clicking the mouse twice to make sure the program knows where you are. While DNS offers useful commands such as "insert before/after", all other programs force you to either give a series of commands or simply move around with your mouse (which also brings you into troubles with VVMP).

 

 

Alternative commands

 

1. Program offers built-in alternative commands (allowing the user to select those commands which best work for them. This involves two different aspects: [a] there are commands which are repeatedly misrecognized, while other are performed unproblematically; [b] there are commands that users may find easier to pronounce, or more comfortable with.)

 

FS, DNS, and VX seem to offer reasonably various alternative commands. This does not mean that there are no problems as a result of non-extant synonyms. For example, various commands in DNS Spanish do not work with my operating system ("Hebrew enabled Windows 98"); had it been possible to create alternatives, it could have solved such problems. As for FS, however, the general range of commands is very limited. VVMP hardly offers any alternatives. Sometimes, however, you may hit such alternatives by chance. For example, since I learned VVMP after a long use of DNS, I automatically issued the command "open-paren" for "open-parenthesis", and it worked, though undocumented. (I later created a macro "left-paren".)

 

2. Program makes it possible to create (customize) alternative commands to the built-in ones. (This may be necessary if the built-in stock does not offer convenient solutions.)

 

Only FS offers a built-in procedure for creating alternative commands. VVMP (and I believe VX, too) offer application macros, which sometimes can help replacing various commands. DNS, on the other hand, offers only dictation macros with which you can replace various punctuation commands that you cannot train successfully.

 

3. Program makes it possible to create new commands for functions that are not covered by any built-in command (this would be roughly equivalent to "navigation macros").

 

FS, VVMP, and VX at least partially offer this possibility. DNS offers none.

 

Macros and vocabulary extension

 

1. Program allows extensive dictation macros (in contradistinction to allowing macros that are no longer than a line).

 

DNS -- only plain text, no longer than roughly a line is available.

FS -- extensive macros are available, both text and commands.

VVMP -- extensive macros are available, both text and commands. However, the so-called application macros cannot use a script language, nor make use of built-in VVMP commands or macros.

VX -- somewhere between FS and VVMP.

 

2. Program allows navigation macros (in contradistinction to dictation macros).

 

DNS -- no navigation macros available.

FS -- navigation macros available.

VVMP -- navigation macros available, but only recordings of keystrokes; no script language and no integration of built-in commands.

VX -- navigation macros available, but I can't specify with what limitations.

 

3. Program allows controlling the behavior of dictation macros (such as determining spaces and capitalization before and after these macros).

 

DNS -- no built-in procedure for controlling the behavior of dictation macros. However, knowledgeable users can download a little program created by Joel xxx, allowing such control.

FS -- (?)

VVMP -- allows controlling the behavior of dictation macros as a built-in procedure.

VX -- (?)

 

4. Program has extensive and efficient procedures for augmenting the vocabulary. (This includes easy use on the one hand, and adding adequate information on the other.)

 

All programs have this feature. I'm not sure any longer about the ease of use in each of them.

 

Languages

 

1. Program can handle more than one language in one and the same document.

 

FS is the only program that offers multilingual work (up to six languages in one single package).

 

Flexible spelling (English)

 

1. Program for dictating in English allows choosing American or British spelling irrespectively of the user's pronunciation (for example, a Canadian whose pronunciation is closer to the U.S. variety would then be able to get a document written in British spelling, according to what is required in Canada).

 

None of the four voice-recognition programs offers this possibility. If you prefer U.S. spelling, you must use a U.S. program, even if your pronunciation is more British-like.

 

Language varieties -- Pronunciations and accents

 

1. Program is flexible enough to accommodate a large variety of accents (in contradistinction to being designed for a heavily marked specific accent). This applies mainly to the English language, but also to some extent to French, Spanish, and German.

 

DNS -- both U.S. and UK English could accommodate my hybrid pronunciation of English (which nevertheless is more British-like). I had to struggle a little bit more with the U.S. version (since I pronounce "better" rather than "bedder", etc.), and I have not succeeded in teaching the program the difference between "there" and "their", and similar words.

FS -- I could not achieve any reasonable results with either U.K. or U.S. English. I've seen, that native speakers did not report otherwise. On the other hand, I have achieved quite satisfactory results with the Italian, Spanish, German and tolerable results (i.e., at any rate better than English) with French.

VVMP -- I have achieved the best level of accuracy with this program after few hours of use. I had to teach it non-American pronunciations of highly frequent words, but after that it functioned astoundingly well.

VX -- although I started working with version 1.1 of this program, as well as tested UK English for version 4, and U.S. English for versions 4 and 5, I have not been able to reach any satisfactory results. I suppose this program is tailored very specifically for U.S./U.K. accent.

 

2. Program can accommodate, in addition to native accents, non-native ones. This is particularly relevant to the English language, in view of its pretension to function as an international language.

 

This parameter is discussed in the section above.

 

Text-to-speech

 

1. Program reads text with a high quality, close to human voice (in contradistinction to low quality machine voice).

 

DNS -- basically offers the only one voice per language: Gordon for English, and Rafael for Spanish. The voices are not really agreeable, and cannot, in my opinion, be very helpful for checking the dictated texts.

FS -- although FS offers six languages in its regular package, only the main language in the package is accompanied with at text-to-speech device. The English voice is very mediocre.

VVMP -- more or less like DNS and FS.

VX -- has an almost natural voice ("Jennifer"), undoubtedly superior to all of the other voices.

 

2. Program allows user to select a voice to their taste.

 

L&H offer additional voices, which can be used by some of the other programs. However, VVMP does not allow that.

 

3. Program displays the text converted to speech during reading (thus making it easier for the user to follow).

 

Only VVMP offers this feature.

 

Documentation

 

1. Program should be accompanied by as extensive and clear documentation as possible.

 

Only DNS offers an extensive and clear documentation. All of the other programs force you to scrutinize frenetically their help screens, trying to chase the commands you so badly need. I think a user need not insist on printed documentation (such as the excellent DNS manuals for version 4); a well-edited PDF format file would suffice.

 

Conclusions

 

No doubt every user has different criteria for evaluating products. And since people talk in so many different ways, there is no guarantee that if the program worked well for one user it would work as well for another. However, there is accumulated evidence about the accuracy achieved by the various programs. The first two positions are awarded almost unanimously to VVMP and DNS. Since DNS has got so many superior features in matters of flexibility of commands, flexibility of corrections, and documentation, some reviewers have tended to prefer it to VVMP. From my own point of view, although I deplore very much the absence of indispensable commands in VVMP -- in comparison with DNS and VX -- what I believe should count most of all is the level of accuracy, and here my experience has been beyond any doubt or hesitation in favor of VVMP.

 

This text was dictated with ViaVoice Millennium PRO.