Last Comment Bug 24139 - implement OpenType "locale" feature
: implement OpenType "locale" feature
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: P3 normal with 3 votes (vote)
: Future
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 458972 449292 524107
Blocks:
  Show dependency treegraph
 
Reported: 2000-01-16 19:27 PST by Erik van der Poel
Modified: 2011-07-02 13:39 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Erik van der Poel 2000-01-16 19:27:53 PST
Subject: RE: Unicode Cyrillic GHE DE PE TE in Serbian
Date: Sun, 16 Jan 2000 17:00:25 -0800 (PST)
From: John Hudson <tiro@tiro.com>
To: "Unicode List" <unicode@unicode.org>

At 04:11 PM 16-01-00 -0800, Janko Stamenovic wrote:

>I made the web page which contains some conclusions from the disussion form
>here about the subject (Serbian Cyrillic GHE DE PE TE):

>http://195.228.137.80/SerbianCyr.htm

Janko, thank you for putting this page together, and I hope it focusses
attention on the requirements of Serbian typography. You repeatedly state
on the page,

        Up to now there is no known software which will render the same
        Unicode characters differently if they are tagged as text in Serbian!

To this I would only add that the encoding and font solutions _are_ in
place, and that what is lacking is application support.

The good news is that, because the OT 'locale' layout feature is not
limited in its application to a single language, it should be an attractive
solution for applications to enable national and linguistic typographic
variants. This means that support for Serbian typography does not rely
solely on application developers knowing about or caring about Serbian.
Font developers will need to deal directly with Serbian in GSUB table
entries, so they _will_ have to know and care; I hope your webpage can find
a permanent home to help educate them.

Also good news, is the fact that the Adobe Cyrillic font standard already
contains the Serbian forms in unencoded positions, so there are already a
sizeable number of fonts containing these glyphs waiting to be converted to
the OpenType format.

When I have a chance, I will send you a set of OT lookups to handle these
substitutions. These will be importable into Microsoft's Visual OpenType
Layout Tool (currently in beta testing), so font developers can use them as
sample code or apply them directly to fonts containing the appropriate
variant glyph forms.

John Hudson

Tiro Typeworks
Vancouver, BC
www.tiro.com
tiro@tiro.com
Comment 1 ekrock's old account (dead) 2000-06-06 17:02:13 PDT
Marking FUTURE and helpwanted. Afraid this won't make FCS.
Comment 2 Frank Tang 2001-05-07 23:42:15 PDT
erik resign. reassign all his bug to ftang for now.
Comment 3 Frank Tang 2001-05-08 00:18:43 PDT
mark all future new as assigned after move from erik to ftang
Comment 4 Frank Tang 2002-07-16 00:08:45 PDT
reassign to shanjian future. I don't think this is an important issue for now.
Comment 5 Shanjian Li 2002-09-04 16:24:48 PDT
Is there any OT font available to test if we implement this feature?
Comment 6 Frank Tang 2005-03-01 23:58:20 PST
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first. 
Comment 7 Travis Chase 2005-03-02 03:51:28 PST
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Comment 8 Travis Chase 2005-03-02 04:09:59 PST
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Comment 9 Erik van der Poel 2005-05-13 14:20:03 PDT
See also:

http://www.unicode.org/mail-arch/unicode-ml/y2005-m05/0131.html

For user/password, see:

http://www.unicode.org/mail-arch/
Comment 10 Damjan Georgievski 2008-04-23 06:40:35 PDT
The DejaVu font supports SRB and MKD language specific glyphs. This can be seen for ex. with the http://fontmatrix.net tool.

Also, Pango since 1.17 implemented this feature and can be seen for example with:
pango-view --font="DejaVu Serif Italic 24px" --text="б г д п т" --language=mk
notice the difference with:
pango-view --font="DejaVu Serif Italic 24px" --text="б г д п т" --language=ru

Pango will recognize from the system locale what OpenType language to use by default. Firefox doesn't. In this case it shows the russian cyrillic glyphs.
Comment 11 Damjan Georgievski 2008-07-31 16:24:05 PDT
Here's a nice, I think visual enough, explanation of the feature we need - as implemented in Pango.
http://library.gnome.org/misc/release-notes/2.20/index.html.en#rndevelopers-pango
(look at figure 24)

I propose stealing the Pango code :)
Comment 12 Damjan Georgievski 2008-07-31 16:56:22 PDT
I've talked to gandalf in Whistler he pointed that smontagu is the right person to address this issues... adding him to CC
Comment 13 Damjan Georgievski 2008-10-07 11:56:10 PDT
ping
Comment 14 Karl Tomlinson (ni?:karlt) 2008-10-07 15:47:26 PDT
This kind of works on Linux, if intl/locale/src/langGroups.properties (installed as res/langGroups.properties) is edited to treat mk uniquely:

  -mk=x-cyrillic
  +mk=mk

but there is a quirk with caching (bug 458972).

A more general solution than making each special language unique (and modifying every other list of langGroups in Mozilla) would be to pass the language information to gfx. i.e. not the langGroup unless we don't know the language because the langGroup is guessed from the encoding.
Comment 15 Damjan Georgievski 2008-10-17 04:27:59 PDT
So karlt, are you going to commit that change to res/langGroups.properties?
It seems correct to me that Gecko should prefer the mk font script for the mk language.

Are there any shortcomings to that change? (I did not notice any, localy here)
Comment 16 Karl Tomlinson (ni?:karlt) 2008-10-19 15:28:38 PDT
(In reply to comment #15)
> So karlt, are you going to commit that change to res/langGroups.properties?

No.

> Are there any shortcomings to that change? (I did not notice any, localy here)

I expect that there would be no user interface for selecting preferred fonts for the mk langGroup.

That would also only be a partial solution.  It seems from my understanding that the OpenType feature is general to any language and so Mozilla should always provide the language (if known), without ever morphing it into a "langGroup".  Let me know if there is good data that there is only a small set of "special" languages that may be important.

I don't really understand the purpose of langGroups.  langGroups such as x-unicode and x-user-def suggest that langGroups are intended to represent encodings rather than languages.

One purpose that langGroups do seem to help with is in choosing default fonts.
A user probably doesn't want to choose fonts individually for en_US, en_GB, fr_FR, fr_CA, de, it, etc.  Would users wish to have an interface to choose different preferred fonts for mk, sr, ru, etc?  Or would they be surprised that, having set a general cyrillic font, sr and mk still render using a different font?

BTW, what does the x-central-euro langGroup represent?  Languages that are usually written in Latin script, but may be written in Cyrillic?  Or is the distinction from x-western or x-cyrillic because of a different encoding?
Comment 17 Damjan Georgievski 2008-10-20 08:25:08 PDT
I have no idea about langGroups either.. (I wonder how you even found that hack for mk) ... 

The idea in Mozilla, that the user sets separate fonts for the different alphabets {Edit->Preferences->Content->Advanced (for fonts)}, is really strange and confusing.

A single set of options for a sans, serif and mono font would be enough. 

Then, Mozilla should try use the OT locl feature according to the lang= attribute in the HTML page.
Comment 18 Damjan Georgievski 2008-10-20 08:27:18 PDT
Now I wonder... clearly Firefox (on Linux) has a mechanism to do the correct thing vis a vi the macedonian alphabet and the serif italic letters.

Why is that not working by default?
Comment 19 Erik van der Poel 2008-10-20 08:31:38 PDT
Language groups were originally intended to bridge the gap between the old
encoding-based font selection in Netscape 1.1 and onwards and the new encoding-
and language-based font selection in Gecko. When there is no language info in
the HTTP header and HTML tags, Gecko selects a font based on the detected
encoding (which is mapped to a language group). Hence we have rather strange
language groups called x-unicode and x-user-defined, which aren't really groups
of languages.

The x-central-euro group was originally intended to be mapped from Central
European encodings like ISO-8859-2 and from Central European languages. It was
not intended to deal with languages that can be written in either Latin or
Cyrillic.

One way to deal with mk is to give it its own language group, as you have done
in your experiments. This is a bit of a hack since it doesn't really deal with
scripts.

A cleaner solution might be to select fonts based not only on document encoding
and DOM node language, but also on substring script. I have no idea whether
this would be a big change. It might be.
Comment 20 Karl Tomlinson (ni?:karlt) 2008-10-20 13:13:38 PDT
(In reply to comment #18)
> Why is that not working by default?

It is not working because the language from the DOM node is not passed to the font handling code.  Instead only a less-specific langGroup (x-cyrillic) is provided, and so the font handling code needs to guess the intended language.

(In reply to comment #19)

Thanks for the history, Erik.

> One way to deal with mk is to give it its own language group, as you have done
> in your experiments. This is a bit of a hack since it doesn't really deal with
> scripts.

Yes, this doesn't feel quite right to me, but mainly because AIUI there are many similarities between mk, sr and other languages using cyrillic script.
If we really wanted to take this path, we could consider scripts subcodes in deciding to which langGroup the full language code maps, and maybe that is a reasonable solution to bug 192636.

> A cleaner solution might be to select fonts based not only on document
> encoding and DOM node language, but also on substring script.

It's not so much the font selection that is the issue in this particular bug but providing the language to the font code for glyph selection.  However, I do see that font-selection is important when dealing with different script codes.

Providing the DOM language code would be sufficient for the cases here AIUI.
I'm not clear on what you have in mind with "substring script".
The subcode component of HTML4 language codes is intended as a country code.

http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.1

However, extension to also interpret ISO 15924 script subcodes (bug 192636) seems sensible.  Even then, passing the full DOM language code to font handling should be sufficient to let font handling interpret and do the right thing AIUI.

> I have no idea whether this would be a big change. It might be.

Passing the full DOM language code to gfx (or a langGroup when only an encoding is known) would mean changing quite a few APIs and related code, but I don't imagine it would be a complicated change.  Performance may be an issue though with more strings stored and fewer (word or font) cache hits due to increasing distinction between languages.
Comment 21 Erik van der Poel 2008-10-20 17:46:58 PDT
When I said "substring script", I was referring to the script of each character or string of contiguous characters. For example, the script of "manga" is Latn. It would be redundant to say <span lang="ja-Latn">manga</span>, since we can tell from the string itself (manga) that the script is Latn. I.e. the "-Latn" is redundant in this case.

I think we need to distinguish font selection from text measurement/drawing (as you say). There may well be performance ramifications if we need to look at the script of each character.

However, once we have selected a font, we still need to pass the language down to the text measurement/drawing routines, so that they can do their thing with OpenType language features.
Comment 22 Damjan Georgievski 2009-04-10 14:57:21 PDT
Interesting, my test case works fine on Firefox 3.5b4-pre-mk on Linux.

Is this to be expected? for all locales or not? for all platforms or ?
Comment 23 Karl Tomlinson (ni?:karlt) 2009-04-13 16:51:24 PDT
(In reply to comment #22)
> Interesting, my test case works fine on Firefox 3.5b4-pre-mk on Linux.

What testcase is that?

> Is this to be expected? for all locales or not? for all platforms or ?

It probably is only working for your locale or languages in the LANGUAGE environment variable.
Comment 24 Damjan Georgievski 2009-04-13 20:06:43 PDT
> > Interesting, my test case works fine on Firefox 3.5b4-pre-mk on Linux.
> 
> What testcase is that?

http://damjan.softver.org.mk/italic/index.en.html

> > Is this to be expected? for all locales or not? for all platforms or ?
> 
> It probably is only working for your locale or languages in the LANGUAGE
> environment variable.

True, unseting LANG (was LANG=mk_MK.UTF-8) and then starting firefox, reverts to the russian version of the glyphs in my test case.

the lang attribute in the <body> tag doesn't change anything.

Which in whole is a acceptable scenario, people using a mk locale would see macedonian italic letters.

What about the other platforms?
Comment 25 Damjan Georgievski 2009-11-16 08:26:24 PST
Does this issue now depend on https://bugzilla.mozilla.org/show_bug.cgi?id=449292 ?
Comment 26 Jonathan Kew (:jfkthame) 2009-11-16 08:40:05 PST
Yes, bug 449292 will enable us to fix this. (It's currently working in not-yet-reviewed, work-in-progress patches.)
Comment 27 Jonathan Kew (:jfkthame) 2010-07-20 07:34:08 PDT
Just an update: with the landing of bug 449292 and bug 511339, this is now fixed on Mac OS X. It's also fixed on Windows if you set the gfx.font_rendering.harfbuzz.level preference to 1. Not yet implemented on Linux.
Comment 28 Jonathan Kew (:jfkthame) 2011-07-02 13:39:35 PDT
Now that we're using the harfbuzz font backend by default (for "simple" scripts) on all platforms, this should be working. The actual behavior is still dependent on appropriate language/locale settings, and on fonts having the appropriate OpenType tables.

Resolving as FIXED, as a result of bug 449292 and its linked bugs.

Note You need to log in before you can comment on or make changes to this bug.