A response to Dvorak’s Opinion that “SR has come to
its end” (January 2002)
(Published some time on PC Mag Online Discussion Forum, January 2002,
but can’t be traced)
[I thought it might be a good idea to reproduce here
my response to Dvorak’s statement that SR had reached “its end”. The situation
has not changed since these lines have been written.]
Dvorak believes that speech recognition has reached
its dead-end. I believe that Dvorak
could not be more wrong. He is spreading
disbelief and pessimism where the development of speech recognition has been
more than positive. Anyone who remembers
speech recognition from the mid-90s must have realized what a long way speech
recognition has gone since. In the
mid-90s, more precisely until 1997, speech recognition was indeed a matter for
aficionados, or more precisely: for people who saw this was a promising device,
in spite of the fact that it wasn't yet a working tool. With almost all of the voice recognition
programs after 1997, almost anyone could do real work. The trouble is, these programs could not meet
the needs or nature of all users. Some
programs reacted better than others to some users. Only if you have experimented with all of
them, could you more or less find out which one suited you best. Such a procedure is of course expensive,
though not terribly so, and most people would abhor investing so many hours of
work just in order to find out which tool suited them best. But those who do lots of writing would have
found such an investigation worthwhile, because once the proper program was
found, all those hours invested in finding it paid very well in terms of
working time and efficiency.
Unfortunately, or perhaps fortunately for most
people, the alternatives have now shrunk to only two, and perhaps three in the
long run: IBM's ViaVoice, the combined DNS and VoiceXpress by ScanSoft, and
Microsoft SR.
Another aspect that has been completely ignored by
Dvorak is the question of languages.
Speech recognition works beautifully, and almost impeccably, for such
languages as Spanish and Italian. It
also functions very well, and definitely better than for English, for
German. English is the most complicated
language to have a good working speech recognition tool for. Undoubtedly, the sound pattern of English is
by far more impenetrable than that of the other languages mentioned above. As a result, from the point of view of the
user, greater energy must be invested in training a speech recognition program
for English than is required for the other languages. In spite of that, one can reach a level of
accuracy of about 98% for English. Such
results are much better than the results of typing, because most people make more
mistakes by typing, not to mention all the other disadvantages of keyboard
usage.
Perhaps one of the difficulties Dvorak and other
native speakers of American English are having is probably their unease with
articulating their native language.
Although there is no need to articulate artificially, a very heavy
native "natural" pronunciation may indeed be an obstacle to good
speech recognition. I am beginning to
suspect that, being a non-native speaker of English, I am not in any
disadvantageous position vis-à-vis native speakers. Perhaps, being a non-native speaker, I take
care to articulate better than they do.
This is an aspect that should be studied seriously by the speech
recognition community.
As regular users of speech recognition, it lies in
our interest that the speech recognition companies -- the few that have
remained -- should not despair, but go on developing this useful tool. This does not mean they should not be
criticized or requested to improve their products, most particularly -- listen
to what their users have to say. But
dismissing the whole endeavor is really unhelpful.
Itamar Even-Zohar