get rid of nsUnicodeRange

NEW
Unassigned

Status

()

Core
Layout: Text
6 years ago
6 years ago

People

(Reporter: jfkthame, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
We use nsUnicodeRange to map Unicode characters to "ranges" for the purpose of font selection. This looks like a legacy of the old (pre-Unicode) world of multiple codepages and charset-specific fonts.

It would be better to base font preferences/selection on the Unicode script property of the text, which is more accurate and useful than block-based "ranges".

Moreover, the data in nsUnicodeRange is badly out of date, so that even within the limitations of its model, it is incorrect for any recent version of the Unicode standard. As a result, font-selection behavior will be inconsistent (e.g. between the "original" Arabic block and the Arabic Supplement or Arabic Extended blocks, to take one example).

Comment 1

6 years ago
> Moreover, the data in nsUnicodeRange is badly out of date, so that
> even within the limitations of its model, it is incorrect for any
> recent version of the Unicode standard. As a result, font-selection
> behavior will be inconsistent (e.g. between the "original" Arabic
> block and the Arabic Supplement or Arabic Extended blocks, to take one
> example).

This is just an indication that it's out of date, it's not really a
solid reason for switching to the script code for example.

The nsUnicodeRange code is used to tie unicode ranges to lang groups and pref font selections.  I don't see how we change one without changing the others.  I think we could definitely make font selection much simpler but there are several components to this, not just nsUnicodeRange.  What's the pref UI for font selection?  How are locale-specific font settings defined?

I should point out there are lots of subtleties that matter to users in specific locales, particularly users in Japan for example.  Basing font prefs on script instead has lots of problems because it doesn't easily allow distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you would deal with codepoints classified as common since those include codepoints that are in fact specific to Japanese.

So I think this bug needs to be more than "get rid of nsUnicodeRange" and more about how locale-specific settings determine font selection for different scripts.
(Reporter)

Comment 2

6 years ago
(In reply to John Daggett (:jtd) from comment #1)
> I should point out there are lots of subtleties that matter to users in
> specific locales, particularly users in Japan for example.  Basing font
> prefs on script instead has lots of problems because it doesn't easily allow
> distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you
> would deal with codepoints classified as common since those include
> codepoints that are in fact specific to Japanese.

But nsUnicodeRange doesn't really contribute there, either - it just returns kRangeSetCJK for large blocks, which is no more helpful than SCRIPT_HAN. In either case, we need to look at page language or system locale or something in order to prioritise among the Japanese/TradChinese/SimpChinese font preferences. That'll still be the case - replacing "unicode range" with "script" as a primary basis doesn't alter the need to consider language in certain situations.
(In reply to John Daggett (:jtd) from comment #1)
> The nsUnicodeRange code is used to tie unicode ranges to lang groups and
> pref font selections.  I don't see how we change one without changing the
> others.  I think we could definitely make font selection much simpler but
> there are several components to this, not just nsUnicodeRange.  What's the
> pref UI for font selection?  How are locale-specific font settings defined?

I bet users won't notice the difference between script codes and unicode ranges as far as UI goes.

Comment 4

6 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3)
> I bet users won't notice the difference between script codes and unicode
> ranges as far as UI goes.

Obviously not but they will if characters that are script code 'common' are mapped differently from script code 'Han, Hiragana, Katakana'.  Another example of potential problems here:

FF21..FF3A    ; Latin # L&  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A    ; Latin # L&  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z

Those are commonly used in Japanese and only supplied in Japanese fonts.  Mapping script codes to fonts is *not* a simple exercise, you would probably need to create a "special ranges" mapping that dealt with ranges like the one above.  In which case, you're back to doing what nsUnicodeRange does already...

Comment 5

6 years ago
Created attachment 593581 [details]
Unicode 6.0 script ranges

Comment 6

6 years ago
I should add that dealing with the "common" range is where I think a script code ==> font mapping has the most problems.  Script-specific punctuation and symobls are often classified as "common" even though the only font supporting those characters are fonts for that script.
I'd like to note here that I plan to use [a possibly improved and updated version of] nsUnicodeRange for bug 722299
(Reporter)

Comment 8

6 years ago
(In reply to Simon Montagu from comment #7)
> I'd like to note here that I plan to use [a possibly improved and updated
> version of] nsUnicodeRange for bug 722299

The IDN display algorithm depends on the Unicode script property of the characters, which we have available in a much more up-to-date and accurate form via gfxUnicodeProperties::GetScriptCode. Doesn't that provide a replacement for what nsUnicodeRange could offer?
You need to log in before you can comment on or make changes to this bug.