A response to Dvorak’s Opinion that “SR has come to its end” (January 2002)

(Published some time on PC Mag Online Discussion Forum, January 2002, but can’t be traced)


Itamar Even-Zohar


[I thought it might be a good idea to reproduce here my response to Dvorak’s statement that SR had reached “its end”. The situation has not changed since these lines have been written.]


Dvorak believes that speech recognition has reached its dead-end.  I believe that Dvorak could not be more wrong.  He is spreading disbelief and pessimism where the development of speech recognition has been more than positive.  Anyone who remembers speech recognition from the mid-90s must have realized what a long way speech recognition has gone since.  In the mid-90s, more precisely until 1997, speech recognition was indeed a matter for aficionados, or more precisely: for people who saw this was a promising device, in spite of the fact that it wasn't yet a working tool.  With almost all of the voice recognition programs after 1997, almost anyone could do real work.  The trouble is, these programs could not meet the needs or nature of all users.  Some programs reacted better than others to some users.  Only if you have experimented with all of them, could you more or less find out which one suited you best.  Such a procedure is of course expensive, though not terribly so, and most people would abhor investing so many hours of work just in order to find out which tool suited them best.  But those who do lots of writing would have found such an investigation worthwhile, because once the proper program was found, all those hours invested in finding it paid very well in terms of working time and efficiency.


Unfortunately, or perhaps fortunately for most people, the alternatives have now shrunk to only two, and perhaps three in the long run: IBM's ViaVoice, the combined DNS and VoiceXpress by ScanSoft, and Microsoft SR. 


Another aspect that has been completely ignored by Dvorak is the question of languages.  Speech recognition works beautifully, and almost impeccably, for such languages as Spanish and Italian.  It also functions very well, and definitely better than for English, for German.  English is the most complicated language to have a good working speech recognition tool for.  Undoubtedly, the sound pattern of English is by far more impenetrable than that of the other languages mentioned above.  As a result, from the point of view of the user, greater energy must be invested in training a speech recognition program for English than is required for the other languages.  In spite of that, one can reach a level of accuracy of about 98% for English.  Such results are much better than the results of typing, because most people make more mistakes by typing, not to mention all the other disadvantages of keyboard usage. 


Perhaps one of the difficulties Dvorak and other native speakers of American English are having is probably their unease with articulating their native language.  Although there is no need to articulate artificially, a very heavy native "natural" pronunciation may indeed be an obstacle to good speech recognition.  I am beginning to suspect that, being a non-native speaker of English, I am not in any disadvantageous position vis-ŕ-vis native speakers.  Perhaps, being a non-native speaker, I take care to articulate better than they do.  This is an aspect that should be studied seriously by the speech recognition community. 


As regular users of speech recognition, it lies in our interest that the speech recognition companies -- the few that have remained -- should not despair, but go on developing this useful tool.  This does not mean they should not be criticized or requested to improve their products, most particularly -- listen to what their users have to say.  But dismissing the whole endeavor is really unhelpful. 


Itamar Even-Zohar