On Language Tags and Font Selection in KOffice

Sivan Toledo

This document proposes the addition of language tags to text in Koffice applications and explains how the language information should be used by Koffice.

 

Many documents contain text in more than one language. This screenshot shows part of an article in an Israeli daily newspaper. As you can see, English names (in this case, names of computer games) are printed in Latin letters, not translated or transliterated into Hebrew. This situation is quite common.

 

In documents that contain text in two or more languages, it is helpful to tag text with language information, so applications can determine the language for each piece of the text. This is similar to the way application tag text with style (e.g., normal or heading, etc.).

 

Documents and document templates can have default languages associated with particular styles, so most of the time is it not necessary to set the language manually. For example, I should be able to create a template in which all the styles default to Hebrew. When I use this template to create a document, all the text is implicitly tagged as Hebrew, except for parts that I explicitly mark as other languge (say by selecting, right-clicking, and selecting a language from a little menu). The default language association should mean that most of the time users do not need to bother with setting the language.

 

Language tagging is useful for several purposes:

  • Font selection. Many non-latin fonts support only one script (such as Hebrew or Arabic). When such fonts are used in multilingual documents, the same font cannot render all the text. Therefore, a style (e.g. normal paragraph text or title) should be associated with more than one font, such as “Hadassah for Hebrew and Bookman for everything else”. This issue is really critical for producing good–quality bilinqual documents, and it is addressed properly in MS Word and StarOffice; Koffice needs to address it as well. More on this later.
  • Glyph selection. OpenType fonts support language-based glyph selection. For example, a font may include two glyphs for the same accented unicode letter, one for Romanian and another for Turkish. Without language tagging, the application cannot select the correct glyph (except on a whole-document basis, using the locale or document-language).
  • For automatic text insertion. For example, I instert dates into letters, but I write letters in both Hebrew and English. When I write an English letter, I want the date to show “May 5, 2002”, but in a Hebrew letter I want “5 áîàé 2002”. I should be able to accomplish this by tagging the date field as either Hebrew or English; the automatic text menu should give me options accordingly (now it uses the KDE locale, but this does not allow me to insert Hebrew dated into some documents and English dates into others).
  • Spell-Checking. This one is obvious, the spell checker should use dictionary for whatever language a word or sentence is in.
  • Hyphenation. Different languages have different hyphenation rules, and applications should hyphenate words according to the language of each word. (I also think that one of the hyphenation options should be “don’t hyphenate words that are not in the dominant language of the document” since this can be confusing. For example, in predominantly Hebrew documents it is best to hyphenate only the Hebrew and not the occasional English word.)

 

Therefore, I think that Koffice 1.2 should add  language tags to text (in the DTD’s) and a mechanism to set the language of text. I also think that the “view” menu should have an option that marks text according to language using, say, the background color, in order to assist in language tagging and to allow fixing tagging errors. Although some of the features that language tagging enables may be quite far off for Koffice (e.g., glyph selection using OpenType features), I think that it is important to build the tagging mechanism into Koffice as soon as possible.

 

I would also like to note that GNOME/GTK 2.0 applications now use pango to render text, and pango uses language or script specific rendering engines, so these applications probably already tag text with language.

 

Finally, I want to show how one selects fonts in MS Word, to convince readers that language tagging is really necessary:

As you can see, for Latin text I chose Georgia, but since Georgia does not support Hebrew, I get to choose another font for non-latin scripts. StarOffice has a similar font selection dialog for the non-latin versions.

 

One may think that the solution is to use fonts with sufficient Unicode coverage for the document, but this is not really a desirable solution, since it restricts font selection too much. I often use Hadassah for Hebrew and Bookman or Raleigh for English, since these fonts look good and look good together, and I would not be able to do so if I had to choose one font.

 

I propose an interface that is both simpler and more general than the MS Word interface. I propose to leave the font selection menu pretty much as is, but to add a button “add font for other languages”. Only if I click on it, I get another tab in the dialog. The new tab is another font selection menu, but with a list where I can mark languages. The first font is used by default, but for text in languages that I marked in the second font, the application uses the second font. This mechanism is better since most users in Latin countries will never need to click this button and to deal with multiple languages for a style, and because it allows users to define more than two fonts (e.g., for Eglish/Hebrew/Arabic documents, which are commonly used in Israel in packaging and government forms); the MS Word solution does not permit this.

 

Another note on language tagging in MS Office documents: it is not explicit, and I am not sure exactly how they do it. It is clearly related to the keyboard setting. I think that this would be difficult to do using XKB (which is what people in Israel use for Hebrew/English entry), but perhaps would be possible using KDE’s international keyboard utility. But I think that Koffice should also enable explicit tagging, not only keyboard-related implicit tagging.

Last updated on Sunday, May 05, 2002