(April
2003, October 2004, April 2007) |
|
Multilingualism
in speech recognition can mean different things: |
|
1.
Producing
speech recognition applications for several languages. |
2.
Bundling together speech recognition applications for
several languages. |
|
Let me now
briefly discuss these topics. |
|
1. Producing
speech recognition applications for several languages. |
Between
1990 and 2007, all four companies engaged in producing
SR applications offered Speech recognition for several languages. However, two out of the four have
discontinued their SR applications (VX and Philips), and IBM no longer
develops its ViaVoice, which is currently sold by Nuance, the company that
has taken over DNS. The main languages have been: U.S. and UK English, Chinese,
French, Spanish, German, Italian, Japanese, and Dutch. DNS (the latest:
Version 9) offers in addition Australian, Indian, and Southeast Asian English.
ViaVoice (the latest: Version 10.5) no longer offers Brazilian Portuguese and
Arabic. Swedish is offered by a Swedish company which continues VoiceXpress. Philips, which offered fourteen languages (among
which Austrian German, Catalan, and Swedish), has discontinued its
application. |
|
As for the
new contender, Microsoft speech recognition, it has fully integrated speech
recognition in its new OS – Windows Vista. Currently, the following languages
have full SR in Vista: English U.S. and UK, Chinese, French, German,
Japanese, and Spanish. As for speech recognition for Office 2003 under
Windows XP, only U.S. English, Japanese and simplified Chinese are available. |
|
For more
details see Languages Supported. |
|
2. Bundling
together speech recognition applications for several languages. |
With the
exception of Philips (which offered all of its 14 languages in one single
bundle, or made it possible to get additional languages when requested), all
other speech recognition companies offered their various languages only in
those countries where the relevant languages are (believed to be) used. In this supposedly globalized age,
such international companies as Microsoft, Nuance and IBM still think in
terms of language provinces cut off from each other and serving only local communities. It seems to be inconceivable for the
designers of these applications that people may be in need for more than one single
language. This has not disturbed
IBM, by the way, to create such a unique architecture for ViaVoice that
actually does not allow using all of their applications on the same computer
and under the same operating system if one of the languages is installed in a
different version. Similarly, DNS since version 7 has introduced a similar
architecture that makes it impossible to keep different versions for
different languages and run them alternately under the same operating system. |
|
Although
DNS has never offered any multilingual bundle before version 5, it seems that
in view of the growing international need for English, all DNS Speech
recognition languages are now bundled at least with English. On the other hand,
this information is not always made explicit on the various Internet pages of
their respective electronic stores.
For version 9, Spanish, German and Italian are each bundled with all
the various English variants, while the Dutch package is bundled with Dutch,
German, English, and French. |
|
Purchasing
the various language applications of both DNS and ViaVoice is not a simple
operation. As there is no
international store for these products, one must locate in each of the
relevant countries some Internet store that would be willing not only to sell
it, but also to ship it abroad.
For example, Amazon UK is not prepared to ship an upgrade for ViaVoice
outside of the European Union. Of course, for each language package purchased
from a different store in a different country you get a new headset
microphone, and you pay separately for shipping, handling, customs and
brokerage. It is a very time and energy consuming operation. |
|
As for
Windows Vista SR, the additional languages can be downloaded from Microsoft
Website as “Language Packages” if you have purchased Windows Vista Ultimate
or Enterprise. |
|
3. Allowing the use of several speech recognition
applications under one operating system without conflicts. |
This issue, as much as it might be taken for granted, is
however not implemented by all manufacturers of SR, as described above about
IBM and DNS. From the point of view of programming aesthetics, I truly agree
that the architecture of ViaVoice (and currently DNS) is beautiful. One
single engine is used for all installed languages, thus creating both
aesthetic neatness and economical functionality. However, commercial factors
have greatly contributed to destroying the advantage of this unique
architecture. Due to the costs involved with manufacturing upgrades, and due
to the fact that most languages other than English produce high quality
results (and therefore do not badly require far reaching modifications as
English does), IBM has not found it necessary to offer upgrades for most of
these languages. As a result,
those who were in need of a better English version, and who immediately
upgraded from version 7, to versions 8, 9, and 10, simply weren't able to go
on using the other languages they had purchased for quite a significant sum
of money. |
|
By the way,
the beautiful architecture of ViaVoice (and DNS) has not been accompanied by
any explicit information. In
none of the various language manuals for these programs has there ever been
any mentioning of multilingualism, or an explanation of that particular
architecture and what it entailed.
A bug that created language confusion in the HELP file of ViaVoice has
never been fixed, nor explained and for a long time not even acknowledged. |
|
This
architecture makes it possible to install more than one language, and then
switch between the various languages more quickly than by unloading and
loading a standalone application. This is simply carried out by switching
between users. |
|
4. Producing
a speech recognition application that would allow switching between several
languages in one and the same document. |
Having a
speech recognition application that would allow switching between languages
even in the same document without the lengthy procedures of unloading and
loading different modules, or users, or even different standalone
applications, is not utopia. As a matter of fact, Philips offered this
possibility in the now discontinued speech recognition application FreeSpeech 2000. As described above, FreeSpeech
2000 was shipped with one main language and an additional set of thirteen
languages. If you did not get all of these at once, as far as I recall, you
could request them from Philips headquarters. As far as I remember, this applied to
English or French, or perhaps German and Spanish, as main languages. It became more complicated when I was
looking in vain for the combination of those fourteen languages with Swedish
as main language, for a friend of mine, a journalist working for the Swedish
press. |
|
FreeSpeech 2000 made
it possible to work with as many of the fourteen languages as you liked. For many international offices, I
believe such a bundle must have been most efficient. Even if they did not need to create
multilingual documents, they could still switch quickly and easily between
languages in order to create documents in various languages. This arrangement beautifully liberated
any prospective user from the need to wander between countries and department
stores all over the world desperately looking for unheard of products. I
still believe it is also the best solution from the marketing point of view. |
|
However, FreeSpeech 2000 offered more than just a bundle of
languages. It offered the ability
to switch between languages even in the middle of a sentence. The trouble was that the
implementation of this procedure was far from unproblematic. It often
collapsed under windows 98 and caused all sorts of damage to the operating
system (which was anyway quite shaky). On the whole, it was rather crude,
requiring, for instance, to switch all the time between dictation and voice
commands. The levels of accuracy gained were unstable; personally, I have
gained the worst levels with English, and the best
with Italian, Spanish, Swedish, and even Catalan. Nevertheless, none of these languages
reached any results comparable to any of the competitors', except for the
languages where there were no competitors (such as Swedish and Catalan, or
Austrian German). On the other hand, in many other respects it was more
advanced than the current available applications. For example, it offered very extensive
and versatile tools for creating alternative commands. The program was designed to work under
Windows 95, 98 or NT, and had to be abandoned if one switched over to Windows
2000 and later XP. Philips decided to discontinue it rather than upgrade it
to work with these new versions of the Windows operating system. |
|
In spite of
all of its deficiencies, however, FreeSpeech 2000
at least was moving in the right direction, as well as has offered an array
of solutions for various problems connected with multilingualism. When I was
recently asked by David Mowatt of the Microsoft
Corporation about how I viewed multilingual usage, I responded that I
basically believe FreeSpeech 2000 could still serve
as a model, naturally to be improved and brought forwards to the level of the
current state of the art. |
|
Basically,
the design could be as follows: |
Of course
there is one more possibility, which I have extensively used in VV for a lot
of commands that didn't quite work. For example, instead of "open
quote" and "close quote" I had BQ (pronounced BEE-KEW) and SQ;
instead of "PERIOD--NEW PARAGRAPH" I had PNP; I had GTB for
"Go to bottom", etc. Such commands are not language-dependent but
they are perhaps less easy to memorize. Their advantage, however, is that
they could stay in any language (though the letter names are pronounced
differently). |
David Mowatt's Comment: "That’s
an interesting idea, although it reduces discoverability. Customisation
is something that more advanced users might be able to master with sufficient
help files, but not something that would work for all users. It is simply a
really hard problem to solve!" |
|
2. Language
Switching |
David Mowatt's Comment: "Yes.
That was the conclusion that I was being drawn towards". |
This will
also allow us dictate, say, in English, then say "switch over to
French", and immediately dictate in French. But perhaps this particular
detail is feasible even now. |
Actually,
no easy Switching Mechanism is eventually provided for Windows Speech Recognition
under Vista. If you wish to dictate in a different language, you must change the
computer’s main language (“Display Language”). You then get the entire UI
switched over to that language. This does not look as a particularly
attractive solution if you wish to go back and forth between dictations in
various languages. |
|
|
|
|
|