A Re-Capitulation of MS-SR Problems and a Wish List

(April-May 2003, updated July 2004)

Itamar Even-Zohar

 

I was asked by David Mowatt of The Speech User Experience Group within Microsoft to answer a few questions about Microsoft speech recognition. Here are few of my replies (slightly edited for the sake of clarity, and augmented with new issues).

 

[WHAT ARE THE MAIN PROBLEMS WITH MICROSOFT'S SPEECH RECOGNITION?]

 

MS SR still has annoying bugs.  Some of these have been fixed in Office 2003, and the overall performance of the application has been significantly improved. Here are a few examples of the bugs:

 

 [1] An extra space after left parenthesis and after a left quote, if you pause after dictating the names of these signs (you must say "left paren-xxxx" without pausing to prevent that, and that's not the way people dictate!) (not fixed in Office 2003);

 

[2] An annoying lack of indispensable commands, such as "delete previous word"--"delete previous n words", "select previous n words", as well as "caps on" and "caps off" for beginning and ending capitalization (CAPS ON in MS SR activates the CAPS ON key), and more;

 

[3] An awkward switching between dictation and voice commands, utterly unacceptable under the current stage of speech recognition applications (certainly a relic from archaic stages). For example, it is unthinkable that any user should wish to switch to using a voice command in the middle of dictation just in order to go backwards or forwards one character or one word;

 

[4] No explicit and direct facility to create dictation macros (dictation macros), and no ability to create alternative commands/strings of commands (command macros);

 

[5] No ability to save voice/speech files for compatibility with other computers, or for restoration in case of deterioration in the functionality of the program, or for cases of reinstallation on a new computer. Likewise, no ability to export a list of additional words (import is feasible from a file on screen);

This has been partly fixed by a little application provided by David Mowatt (link availabel in the Yahoo Microsoft Group LINKS section)

 

[6] Very scarce facilities for editing and post-editing dictated text, either during or after dictation;

 

[7] Moving in and out of dictating numbers is problematic. If you wish to dictate, for example, just one digit number, you must go through the unstable command “forcenum”. If you dictate a word and wish to add a number to it (for example, "DNS version 7"), you must specifically say "space bar" before you say "force num [+ the desired number]", because otherwise the number will be appended to the word. Another deficiency is the absence of such a command as DNS’ "format that number"-"format that spelled out" in case you change your mind. AND – last but not least -- it is really annoying to get such numbers as 2,003 when you mean 2003 (date) (This last problem was fixed in Office 2003);

 

[8] If you switch to playing back your text without turning the microphone off before clicking on the Speak button, the text simply collapses and disappears, if not immediately, then after a few seconds, and irretrievably (this problem seems to have been cured in Office 2003). The same happens when you check "playback original audio when correcting" in Configuration (haven’t checked that yet in the new beta version);

 

 [9] There is no possibility to extract the sound component from the text and export it to be kept as a separate sound file;

 

[10] Personally, I also have troubles with the imposition of a monolithic system of punctuation. For example, it is very difficult to circumvent the American double dash. I believe that even DNS or ViaVoice U.S. English do not impose this standard. (For my American friends who might be puzzled by this remark, I should note that the British long dash is not attached backwards and forwards, but has got one space before and one space after, like most European languages.)

 

 

[WHAT SHOULD BE ADDED OR IMPROVED?]

 

[1] First and foremost, I would like the main bugs to be fixed;

 

[2] I would like to have a full and seamless use of this application in other applications (such as Eudora). For the moment, it seems that MS-SR is fully functional only in Microsoft Word and in Microsoft WordPad;

 

[3] I would like to have the facility of TRANSCRIPTION of voice dictated files (DNS is the winner so far in that area);

 

[4] More information should be given to the users. There is a need for a fuller manual, and a fully reliable list of available commands (there are currently discrepancies within the lists built in the application and those published on Microsoft's Web Site). In addition, more general information is highly desirable, such as some description of the nature of the engine, the size of the vocabulary, the options of augmenting the vocabulary, etc.

 

In addition, I would like to have the following features (on various and different levels):

 

[1] The ability to create Dictation and Navigation macros;

 

[2] The ability to save voice files and transfer them to other computers, or use them for restoration;

 

[3] The ability to export custom words, customized commands, etc.;

 

[4] One single program for all languages, with the ability to install any of the designed languages without having to purchase those languages in various countries all around the globe. Personally, I would like to have UK English as soon as possible (not for British spelling, but for better recognition level in my case), and subsequently several major European languages, as well as Swedish and Catalan;

 

[5] Moreover, I would like to have the ability to dictate in several languages in the same document. Philips FreeSpeech 2000 created such an option (to dictate in 14 languages into the same document), but the application was crude and has been discontinued. The questions of multilingualism was discussed between David Mowatt and myself  in a later exchange.