Closed Bug 723045 Opened 13 years ago Closed 6 years ago

get rid of nsUnicodeRange

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla68

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

Details

Attachments

(2 files, 1 obsolete file)

Unicode 6.0 script ranges 13 years ago John Daggett (:jtd) 123.50 KB, text/plain		Details
Remove nsUnicodeRange and instead use ICU to look up Unicode blocks 6 years ago Jonathan Kew [:jfkthame] 28.72 KB, patch	jwatt : review+	Details \| Diff \| Splinter Review
Bug 723045 - Remove nsUnicodeRange and instead use ICU to look up Unicode blocks. r?jwatt 6 years ago Jonathan Kew [:jfkthame] 47 bytes, text/x-phabricator-request		Details \| Review

Jonathan Kew [:jfkthame]

Assignee

Description

•

13 years ago

We use nsUnicodeRange to map Unicode characters to "ranges" for the purpose of font selection. This looks like a legacy of the old (pre-Unicode) world of multiple codepages and charset-specific fonts. It would be better to base font preferences/selection on the Unicode script property of the text, which is more accurate and useful than block-based "ranges". Moreover, the data in nsUnicodeRange is badly out of date, so that even within the limitations of its model, it is incorrect for any recent version of the Unicode standard. As a result, font-selection behavior will be inconsistent (e.g. between the "original" Arabic block and the Arabic Supplement or Arabic Extended blocks, to take one example).

John Daggett (:jtd)

Comment 1

•

13 years ago

> Moreover, the data in nsUnicodeRange is badly out of date, so that > even within the limitations of its model, it is incorrect for any > recent version of the Unicode standard. As a result, font-selection > behavior will be inconsistent (e.g. between the "original" Arabic > block and the Arabic Supplement or Arabic Extended blocks, to take one > example). This is just an indication that it's out of date, it's not really a solid reason for switching to the script code for example. The nsUnicodeRange code is used to tie unicode ranges to lang groups and pref font selections. I don't see how we change one without changing the others. I think we could definitely make font selection much simpler but there are several components to this, not just nsUnicodeRange. What's the pref UI for font selection? How are locale-specific font settings defined? I should point out there are lots of subtleties that matter to users in specific locales, particularly users in Japan for example. Basing font prefs on script instead has lots of problems because it doesn't easily allow distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you would deal with codepoints classified as common since those include codepoints that are in fact specific to Japanese. So I think this bug needs to be more than "get rid of nsUnicodeRange" and more about how locale-specific settings determine font selection for different scripts.

Jonathan Kew [:jfkthame]

Assignee

Comment 2

•

13 years ago

(In reply to John Daggett (:jtd) from comment #1) > I should point out there are lots of subtleties that matter to users in > specific locales, particularly users in Japan for example. Basing font > prefs on script instead has lots of problems because it doesn't easily allow > distinguishing Japanese/TradChinese/SimpChinese and I'm not sure how you > would deal with codepoints classified as common since those include > codepoints that are in fact specific to Japanese. But nsUnicodeRange doesn't really contribute there, either - it just returns kRangeSetCJK for large blocks, which is no more helpful than SCRIPT_HAN. In either case, we need to look at page language or system locale or something in order to prioritise among the Japanese/TradChinese/SimpChinese font preferences. That'll still be the case - replacing "unicode range" with "script" as a primary basis doesn't alter the need to consider language in certain situations.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

13 years ago

(In reply to John Daggett (:jtd) from comment #1) > The nsUnicodeRange code is used to tie unicode ranges to lang groups and > pref font selections. I don't see how we change one without changing the > others. I think we could definitely make font selection much simpler but > there are several components to this, not just nsUnicodeRange. What's the > pref UI for font selection? How are locale-specific font settings defined? I bet users won't notice the difference between script codes and unicode ranges as far as UI goes.

John Daggett (:jtd)

Comment 4

•

13 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3) > I bet users won't notice the difference between script codes and unicode > ranges as far as UI goes. Obviously not but they will if characters that are script code 'common' are mapped differently from script code 'Han, Hiragana, Katakana'. Another example of potential problems here: FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z Those are commonly used in Japanese and only supplied in Japanese fonts. Mapping script codes to fonts is *not* a simple exercise, you would probably need to create a "special ranges" mapping that dealt with ranges like the one above. In which case, you're back to doing what nsUnicodeRange does already...

John Daggett (:jtd)

Comment 5

•

13 years ago

Attached file Unicode 6.0 script ranges — Details

John Daggett (:jtd)

Comment 6

•

13 years ago

I should add that dealing with the "common" range is where I think a script code ==> font mapping has the most problems. Script-specific punctuation and symobls are often classified as "common" even though the only font supporting those characters are fonts for that script.

Simon Montagu :smontagu

Comment 7

•

13 years ago

I'd like to note here that I plan to use [a possibly improved and updated version of] nsUnicodeRange for bug 722299

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

13 years ago

(In reply to Simon Montagu from comment #7) > I'd like to note here that I plan to use [a possibly improved and updated > version of] nsUnicodeRange for bug 722299 The IDN display algorithm depends on the Unicode script property of the characters, which we have available in a much more up-to-date and accurate form via gfxUnicodeProperties::GetScriptCode. Doesn't that provide a replacement for what nsUnicodeRange could offer?

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

6 years ago

Attached patch Remove nsUnicodeRange and instead use ICU to look up Unicode blocks (obsolete) — Details — Splinter Review

As noted above, nsUnicodeRange is badly out of date, not having been touched in years. The only user of these mappings is WhichPrefFontSupportsChar. While I'd really like to rework the whole language-font-prefs code more extensively, that's a project for another day (or year), involving re-thinking the UI as well as the internal mechanism. For now, though, we can at least get rid of nsUnicodeRange by using Unicode block data from ICU instead, which means WhichPrefFontSupportsChar will be based on the latest Unicode repertoire, and we can drop the (obsolete) tables here.

Attachment #9042463 - Flags: review?(jwatt)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

6 years ago

Assignee: nobody → jfkthame

Status: NEW → ASSIGNED

Jonathan Watt [:jwatt]

Updated

•

6 years ago

Attachment #9042463 - Flags: review?(jwatt) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 10

•

6 years ago

Attached file Bug 723045 - Remove nsUnicodeRange and instead use ICU to look up Unicode blocks. r?jwatt — Details

Jonathan Kew [:jfkthame]

Assignee

Comment 11

•

6 years ago

jwatt: the phab revision is just the same patch, rebased to current m-c; could you carry over your r+ so it can land? Thanks!

Jonathan Kew [:jfkthame]

Assignee

Updated

•

6 years ago

Flags: needinfo?(jwatt)

Pulsebot

Comment 12

•

6 years ago

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10af6c739d9c Remove nsUnicodeRange and instead use ICU to look up Unicode blocks. r=jwatt

Jonathan Kew [:jfkthame]

Assignee

Updated

•

6 years ago

Flags: needinfo?(jwatt)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

6 years ago

Attachment #9042463 - Attachment is obsolete: true

Andrei Ciure[:aciure]

Comment 13

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/10af6c739d9c

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Stuart Parmenter

Comment 14

•

6 years ago

RIP

You need to log in before you can comment on or make changes to this bug.