Closed Bug 723045 Opened 12 years ago Closed 5 years ago

get rid of nsUnicodeRange

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

Details

Attachments

(2 files, 1 obsolete file)

We use nsUnicodeRange to map Unicode characters to "ranges" for the purpose of font selection. This looks like a legacy of the old (pre-Unicode) world of multiple codepages and charset-specific fonts.

It would be better to base font preferences/selection on the Unicode script property of the text, which is more accurate and useful than block-based "ranges".

Moreover, the data in nsUnicodeRange is badly out of date, so that even within the limitations of its model, it is incorrect for any recent version of the Unicode standard. As a result, font-selection behavior will be inconsistent (e.g. between the "original" Arabic block and the Arabic Supplement or Arabic Extended blocks, to take one example).
> Moreover, the data in nsUnicodeRange is badly out of date, so that
> even within the limitations of its model, it is incorrect for any
> recent version of the Unicode standard. As a result, font-selection
> behavior will be inconsistent (e.g. between the "original" Arabic
> block and the Arabic Supplement or Arabic Extended blocks, to take one
> example).

This is just an indication that it's out of date, it's not really a
solid reason for switching to the script code for example.

The nsUnicodeRange code is used to tie unicode ranges to lang groups and pref font selections.  I don't see how we change one without changing the others.  I think we could definitely make font selection much simpler but there are several components to this, not just nsUnicodeRange.  What's the pref UI for font selection?  How are locale-specific font settings defined?

I should point out there are lots of subtleties that matter to users in specific locales, particularly users in Japan for example.  Basing font prefs on script instead has lots of problems because it doesn't easily allow distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you would deal with codepoints classified as common since those include codepoints that are in fact specific to Japanese.

So I think this bug needs to be more than "get rid of nsUnicodeRange" and more about how locale-specific settings determine font selection for different scripts.
(In reply to John Daggett (:jtd) from comment #1)
> I should point out there are lots of subtleties that matter to users in
> specific locales, particularly users in Japan for example.  Basing font
> prefs on script instead has lots of problems because it doesn't easily allow
> distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you
> would deal with codepoints classified as common since those include
> codepoints that are in fact specific to Japanese.

But nsUnicodeRange doesn't really contribute there, either - it just returns kRangeSetCJK for large blocks, which is no more helpful than SCRIPT_HAN. In either case, we need to look at page language or system locale or something in order to prioritise among the Japanese/TradChinese/SimpChinese font preferences. That'll still be the case - replacing "unicode range" with "script" as a primary basis doesn't alter the need to consider language in certain situations.
(In reply to John Daggett (:jtd) from comment #1)
> The nsUnicodeRange code is used to tie unicode ranges to lang groups and
> pref font selections.  I don't see how we change one without changing the
> others.  I think we could definitely make font selection much simpler but
> there are several components to this, not just nsUnicodeRange.  What's the
> pref UI for font selection?  How are locale-specific font settings defined?

I bet users won't notice the difference between script codes and unicode ranges as far as UI goes.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3)
> I bet users won't notice the difference between script codes and unicode
> ranges as far as UI goes.

Obviously not but they will if characters that are script code 'common' are mapped differently from script code 'Han, Hiragana, Katakana'.  Another example of potential problems here:

FF21..FF3A    ; Latin # L&  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A    ; Latin # L&  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z

Those are commonly used in Japanese and only supplied in Japanese fonts.  Mapping script codes to fonts is *not* a simple exercise, you would probably need to create a "special ranges" mapping that dealt with ranges like the one above.  In which case, you're back to doing what nsUnicodeRange does already...
I should add that dealing with the "common" range is where I think a script code ==> font mapping has the most problems.  Script-specific punctuation and symobls are often classified as "common" even though the only font supporting those characters are fonts for that script.
I'd like to note here that I plan to use [a possibly improved and updated version of] nsUnicodeRange for bug 722299
(In reply to Simon Montagu from comment #7)
> I'd like to note here that I plan to use [a possibly improved and updated
> version of] nsUnicodeRange for bug 722299

The IDN display algorithm depends on the Unicode script property of the characters, which we have available in a much more up-to-date and accurate form via gfxUnicodeProperties::GetScriptCode. Doesn't that provide a replacement for what nsUnicodeRange could offer?
As noted above, nsUnicodeRange is badly out of date, not having been touched in years. The only user of these mappings is WhichPrefFontSupportsChar. While I'd really like to rework the whole language-font-prefs code more extensively, that's a project for another day (or year), involving re-thinking the UI as well as the internal mechanism. For now, though, we can at least get rid of nsUnicodeRange by using Unicode block data from ICU instead, which means WhichPrefFontSupportsChar will be based on the latest Unicode repertoire, and we can drop the (obsolete) tables here.
Attachment #9042463 - Flags: review?(jwatt)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Attachment #9042463 - Flags: review?(jwatt) → review+

jwatt: the phab revision is just the same patch, rebased to current m-c; could you carry over your r+ so it can land? Thanks!

Flags: needinfo?(jwatt)
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10af6c739d9c
Remove nsUnicodeRange and instead use ICU to look up Unicode blocks. r=jwatt
Flags: needinfo?(jwatt)
Attachment #9042463 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

RIP

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: