Closed Bug 1418724 Opened 6 years ago Closed 6 years ago

Wrong font for Chinese text on Mac

Categories

(Core :: Layout: Text and Fonts, defect)

57 Branch
Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: wyuenho, Assigned: jfkthame)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: intl)

Attachments

(7 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36

Steps to reproduce:

1. Go to Facebook
2. Type this into the status box

"The quick brown fox jumps over the lazy dog.

天地玄黃,宇宙洪荒。
喺咁
Summary: Wrong font selection for Chinese on Mac → Wrong font for Chinese text on Mac
Component: Untriaged → Layout: Text
Keywords: intl
OS: Unspecified → Mac OS X
Product: Firefox → Core
FYI, this was tested on macOS 10.13.1, on a completely clean profile and preferences.

The way Firefox resolves fonts on the Mac for Chinese has gone spectacularly wrong here. Let me see if I can find out which version started.
Also, I remember I've added what happened and expect results, any idea why they aren't showing up?
Facebook is using the "system-ui" font, which the OS should resolve to different fonts according to the language order set in System Preferences > Languages. FWIW Chrome does indeed resolve to different fonts, which IMHO gives a more pleasant UX.

A minimal test case for the above text would be:

<p style="font-family: SF Optimized, system-ui, sans-serif; font-size: 72px; font-weight: 300">
天地玄黃,宇宙洪荒。<br>
喺咁㗎喇
</p>

In Chrome, the font used to render the above changes according to which CJK language comes first in the System Preference setting:

1. Simplified Chinese: .PingFang SC
2. Traditional Chinese (Hong Kong) or Cantonese (Traditional): .PingFang HK
3. Traditional Chinese (Taiwan): .PingFang TC
4. Japanese: .Hiragino Kaku Gothic Interface, .PingFang SC
5. Korean: .Apple SD Gothic NeoI, .PingFang SC

In Firefox 58.0b4, the above (as shown in Inspector > Fonts) is always rendered using a mix of Hiragino Kaku Gothic ProN W3 and PingFang SC Thin, irrespective of the System Preferences setting.

As a user in Hong Kong, Chrome's behaviour is more preferable.
Anyway...

Actual Results:

1. The Latin portion seems to be SFProDisplay-Regular
2. The Tradition Chinese portion is a mixed of 2 fonts
3. Everything besides 喺咁 are in STHeiti
4. 喺咁 are in PingFangTC-Light

Expected Results:

The font-family declaration on Facebook is "SF Optimized", system-ui, -apple-system, system-ui, ".SFNSText-Regular", sans-serif.

The expected result should be SFProDisplay-Regular for Latin, PingFangTC-Regular for the rest on macOS >= 10.11. The fact that Firefox resolved to 2 different fonts to display Traditional Han is very mystifying to me.
I think this is basically the issue that bug 1212731 was intended to address; a patch set was landed there that would (IIUC) have resulted in better behavior here, but it was then backed out in bug 1244017.
Depends on: 1212731
BTW, the results would be better if the content were tagged as lang="zh-hk", because then the CSS generic sans-serif would be resolved using Firefox's Chinese (HK) font prefs. But in the absence of a lang tag, we don't know which of the CJK font preferences to favor. (I'm a bit surprised we don't pick up the System Prefs language ordering here, actually; that would help, I think, even in the absence of a full implementation of bug 1212731.)

Hmm... checking the code, we're supposed to pay attention to system locale, but only as one of several factors. If you go to Preferences and make sure Chinese is included in the list of "preferred languages for displaying pages", and that Chinese (HK) is listed ahead of Japanese (if it's also in the list), does that help? (Restarting Firefox may be needed for changes there to be reflected in font selection.)
Flags: needinfo?(wyuenho)
According to my testing, you should get the "right" result if Chinese is listed (ahead of Japanese, if present) in the "preferred languages" of Firefox's prefs; *or* if no CJK languages are listed there, you should also get the right result if Chinese is listed *first* in the System Prefs languages list. But if (for example) English is first and Chinese second, the bad fallback (choosing a Japanese font for most of the characters) will occur.
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> Hmm... checking the code, we're supposed to pay attention to system locale,
> but only as one of several factors. If you go to Preferences and make sure
> Chinese is included in the list of "preferred languages for displaying
> pages", and that Chinese (HK) is listed ahead of Japanese (if it's also in
> the list), does that help? (Restarting Firefox may be needed for changes
> there to be reflected in font selection.)

For me, it only helps if Chinese (HK) is the very first item in the System Preferences list.

In the intl.accept_languages list, indeed it works if zh-hk is above other CJK languages, as described in bug 677919.

Firefox does need to be restarted in both cases.

(BTW the correct default in font.name-list.sans-serif.zh_HK should be PingFang HK, instead of PingFang TC.)
Jonathan what macOS and Firefox version are you running, and have you rm ~/Library/Preferences/org.mozilla.firefox.plist ?

I'm on macOS 10.13.1, starting Firefox with /Applications/Firefox.app/Contents/MacOS/firefox --safe-mode --migration.

I've picked Safe mode and Don't import anything.

Now this will give us a completely pristine environment.

As to changing System Preferences Preferred languages ordering, I don't see any difference whatsoever with any ordering, so I don't think Firefox is picking it up at all.

Let's isolate this issue with https://codepen.io/wyuenho/pen/JOMmpy?editors=1000, there seems to be a lot of things wrong here.

The whole problem seems to be related to how Firefox deals with font-weight.

font-weight: 300 - for some strange reason Firefox decided to pick just PingFangSC-Thin for just "喺咁㗎".

1) font-weight should apply to the entire text node.
2) The font should be the "Light" variant, because "Light" corresponds to font-weight: 300, Thin should be 100.
3) The font's script is wrong. According to System Preferences, Firefox should pick zh_Hant_HK first, but for whatever reason it decided to jump straight to zh and resolves to PingFangSC. Smells like Firefox has messed up resolving Unicode's CLDR data http://www.unicode.org/reports/tr35/#Locale_Inheritance .
Flags: needinfo?(wyuenho)
Upon further investigation, Chrome is also wrong on the Mac. Based on the Codepen above, Chrome seems to be rendering "PingFangHK-Thin", but for some reason it's using PingFangHK-Regular on Facebook. I think Safari is the only one that is correct on the Mac here. "PingFangHK-Light" both on Facebook and the Codepen.

To summarize:

1. The same text node resolves to 2 different fonts for at least Traditional Chinese when `font-family` is at initial or returns an invalid font, disregarding both System Preferences' language ordering and Firefox's own preferred language settings.
2. The fallback font for Traditional Chinese first resolves to "Hiragino Kaku Gothic ProN W3", which actually doesn't even exist, only the W4 variant exists on the Mac. And then when that font isn't found, Firefox picked Hiragino Sans W4 for some reason.
3. For code points that don't have a glyph in the currently selected font, a different font is picked, in this case it is using the OS's default font.
4. When font-weight is set, and when using more than 1 fallback font, and if the requested weight only exists in one font, the portion of the text rendered with the font that has the requested font weight is set to the desired weight.
5. Firefox font-weight got 300 wrong. According to [MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/font-weight), the Light weight variant of a font should be used, not the Thin variant.
6. Fallback should either use the OS's language ordering > Firefox's language ordering, then only use the font resulted from the aforementioned, and no more. (Use 1 font and 1 font only)
7. If a glyph is not found in a fallback font, Firefox should render a placeholder.

Firefox tries too hard to always render some text, but the way achieves this goal is all wrong.
(In reply to Yuen Ho Wong from comment #10)
> Jonathan what macOS and Firefox version are you running, and have you rm
> ~/Library/Preferences/org.mozilla.firefox.plist ?
> 
> I'm on macOS 10.13.1, starting Firefox with
> /Applications/Firefox.app/Contents/MacOS/firefox --safe-mode --migration.

I'm running 10.12, so may not see exactly the same results as you, but the general problem is clearly present here as well.

There are actually a couple of parts to this, which both need to be fixed for us to get a better result. I have a set of patches that should improve things substantially; I just need to clean things up a little before posting them for review.

First, the code in gfxPlatformFontList::AppendCJKPrefLangs is supposed to add the appropriate fonts from Firefox's prefs to the end of the font list, so that they'll be used if the specific families named in CSS font-family don't support the CJK characters that are present. This code does look at the intl.accept-languages setting, which corresponds to "preferred languages" in the Firefox preferences, and checks for any CJK language codes found there. This is why "in the intl.accept_languages list, indeed it works if zh-hk is above other CJK languages" (comment 9). But many users probably don't ever customize that list.

AppendCJKPrefLangs also tries to respect the system locale setting, but there's a problem here: it calls OSPreferences::GetSystemLocale, which returns just a single locale code. On macOS, this will correspond to the first-preference language in System Preferences. But it doesn't deal with a list of language or locale preferences; and so if the System Preferences list is something like English / Chinese(HK) / Chinese(CN), the Chinese preference will be lost (and English doesn't help resolve the CJK font prefs at all).

Finally, it uses a hard-coded sequence that happens to put Japanese ahead of Chinese. And this is why we get the bad result when neither intl.accept_languages nor the (single) system locale tells us to prefer a Chinese font.

So to fix this, AppendCJKPrefLangs should be using OSPreferences::GetSystemLocales (plural), not GetSystemLocale (singular), so as to request a sorted list of the system's preferred locales. That will allow us to respect the user's preference for a specific Chinese locale ahead of Japanese, for example, even if it's not the first thing in the system list.

The second problem is that currently, the macOS implementation of OSPreferences::GetSystemLocales doesn't actually return the full list; it only returns the first entry. So fixing gfxPlatformFontList::AppendCJKPrefLangs won't actually help until we also fix the OSPreferences service implementation. I think the way to do this is to use CFLocaleCopyPreferredLanguages instead of the current code based on CFLocaleCopyCurrent.

Finally, it would be nice to make these settings "live", so that changes will immediately be reflected in the browser without needing to restart.
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Thanks! Sounds like we are on the right direction here. Will your patch fix the problem with font-weight picking the wrong font variant? If not, I'll look around and see if it's been reported, otherwise I'll file a bug.
Also, I can confirm that changing Firefox's language preferences does have an effect for the test cases in the Codepen, just not Facebook, that's probably a separate issue.

There's one last thing I'd like to add is, even thought moving zh_HK to the top in Firefox's language preferences will resolve A fallback font for a language, the fallback should not be PingFang for Chinese. PingFang is a sans-serif font, fallback fonts should all be in serif, so the font should be SongTi TC for Traditional Chinese and Song Ti SC for Simplified Chinese.
(In reply to Jonathan Kew (:jfkthame) from comment #12)
> First, the code in gfxPlatformFontList::AppendCJKPrefLangs is supposed to
> add the appropriate fonts from Firefox's prefs to the end of the font list,
> so that they'll be used if the specific families named in CSS font-family
> don't support the CJK characters that are present.

Thanks for the pointer so we know where to look.

From a first glance, however, am I correct in thinking that the fallback mechanism **doesn't** actually take into account whether the current CSS font-family is sans/serif/monospace, and always take whatever is defined in the font.default.<lang> pref? (As wyuenho in comment 14 notes, currently the defaults for CJK languages are sans-serif, which might have been a good choice a decade ago but questionable now.)
(In reply to Roger So from comment #15)
> From a first glance, however, am I correct in thinking that the fallback
> mechanism **doesn't** actually take into account whether the current CSS
> font-family is sans/serif/monospace, and always take whatever is defined in
> the font.default.<lang> pref? 

Yes, once we get to the fallback stage I think that's right. This sounds like something worth a separate bug; if the CSS includes a generic, it would be good to take that into account here.

> (As wyuenho in comment 14 notes, currently the
> defaults for CJK languages are sans-serif, which might have been a good
> choice a decade ago but questionable now.)

Not being a CJK user myself, I don't really have an opinion on which is a better default. Users can configure this in Preferences, but of course most people probably never touch those settings. If you think serif would be a better choice than sans-serif for the CJK languages, that also sounds like a topic for a separate bug (where we should get additional input from more of the Firefox CJK developers and community).
Comment on attachment 8930202 [details] [diff] [review]
part 3 - Make the OSPreferences service return the user's list of preferred languages from macOS, not just a single locale code

:gandalf, also tagging you for feedback here as OSPreferences is primarily your baby. :) This seems to me like the right thing to do, but wanted to check if you have any concerns with it.
Attachment #8930202 - Flags: feedback?(gandalf)
Using the set of patches above, I see substantially improved behavior with the testcase from comment 4, for example. Adding or reordering CJK languages in the intl.accept_languages pref has the expected result, as does adding/reordering CJK languages in System Preferences (if they haven't been already given a specific order by intl.accept_languages). And both kinds of change are "live", with changes to the preferred fonts appearing immediately on the open page.
Comment on attachment 8930202 [details] [diff] [review]
part 3 - Make the OSPreferences service return the user's list of preferred languages from macOS, not just a single locale code

hah! Seems like you're fixing bug 1337065 :)

Glad to see that!
Attachment #8930202 - Flags: feedback?(gandalf) → feedback+
Attachment #8930204 - Flags: review?(gandalf) → review+
WRT comment 4, although both Chrome and Safari fallback to PingFang when system-ui is set, when no font-family is set, both fallback to the default serif font set for Chinese, which is SongTi. I've been trying to build Firefox all day and still haven't gotten there, what font does your patch set give on the first case of my Codepen in comment 10?
With no CJK languages in either intl.accept_languages or the System Prefs language list, I get mainly Hiragino Sans, and PingFang SC for the characters Hiragino doesn't support. But if I add Chinese (HK) to either of those lists, I get PingFang TC; and if I add Chinese (CN) I get PingFang SC.

If you want to test a build with these patches on your system, you can find one at https://queue.taskcluster.net/v1/task/Fo814g75R5Cb7hcTz0Spag/runs/0/artifacts/public/build/target.dmg (from the tryserver run https://treeherder.mozilla.org/#/jobs?repo=try&revision=68fff702a89a0d6cd4287f026875031dc77b9974).
Thanks for the DMG! I can confirm your test result, with additional information.

Using the same code in an actual HTML5 file (it's got a doctype, html head title and body tag, Codepen code inside body), without setting <meta charset="UTF-8"> , when no CJK set, all the CJK code points are rendered in binary junk. I can only get your test result when <meta charset="UTF-8"> is set. This is probably due to Firefox's defaulting to Western text encoding as opposed to Unicode when document charset isn't set. Chrome and Safari seems to be able to render the expected result without a charset. I think this day and age we can probably default to UTF-8 not when charset is not specified. Anyway, separate issue...

When charset="UTF-8", but still no CJK in intl.accept_languages or the System Prefs, this patch just get back to where we were, which is the hardcoded list kicking in and giving Hiragino Sans and PingFang SC. When zh-hk is in intl.accept_languages, we are in comment 15. I'll file a bug for this. 

To finish this issue off, can we just remove that hardcoded ja > zh-CN > zh-TW > zh-HK > ko chain? There's got to be a better mechanism than this. How about sending all the code points to all the default fonts in all of the CJK languages and see which one comes back with the highest number of glyphs found? When there's a tie, if all the code points are HanS, pick zh-CN, if HanT and has no HKSCS, pick zh-TW, else zh-hk. If ko, pick ko.
(In reply to Yuen Ho Wong from comment #27)
> There's got to be a better mechanism than this. How about sending all the
> code points to all the default fonts in all of the CJK languages and see
> which one comes back with the highest number of glyphs found?

I believe this is tracked by bug 543200.
Attachment #8930200 - Flags: review?(m_kato) → review+
Attachment #8930201 - Flags: review?(m_kato) → review+
Attachment #8930202 - Flags: review?(m_kato) → review+
Attachment #8930203 - Flags: review?(m_kato) → review+
(In reply to Yuen Ho Wong from comment #27)
> Thanks for the DMG! I can confirm your test result, with additional
> information.
> 
> Using the same code in an actual HTML5 file (it's got a doctype, html head
> title and body tag, Codepen code inside body), without setting <meta
> charset="UTF-8"> , when no CJK set, all the CJK code points are rendered in
> binary junk. I can only get your test result when <meta charset="UTF-8"> is
> set. This is probably due to Firefox's defaulting to Western text encoding
> as opposed to Unicode when document charset isn't set. Chrome and Safari
> seems to be able to render the expected result without a charset. I think
> this day and age we can probably default to UTF-8 not when charset is not
> specified. Anyway, separate issue...

The default used when no charset is provided can be set in Preferences (see the "Fallback Text Encoding" setting at the bottom of the "Advanced" font settings dialog). I believe the initial setting is supposed to be based on the system locale -- though it seems wrong, in that case, that on macOS where I have LANG=en_GB.UTF-8 (from checking "env" in Terminal), Firefox still seems to default to Western. There may be a bug worth investigating there...

> When charset="UTF-8", but still no CJK in intl.accept_languages or the
> System Prefs, this patch just get back to where we were, which is the
> hardcoded list kicking in and giving Hiragino Sans and PingFang SC. When
> zh-hk is in intl.accept_languages, we are in comment 15. I'll file a bug for
> this.

Thanks. Please include a mention of the discussion here in the new bug, to help provide background.

> To finish this issue off, can we just remove that hardcoded ja > zh-CN >
> zh-TW > zh-HK > ko chain? There's got to be a better mechanism than this.
> How about sending all the code points to all the default fonts in all of the
> CJK languages and see which one comes back with the highest number of glyphs
> found? When there's a tie, if all the code points are HanS, pick zh-CN, if
> HanT and has no HKSCS, pick zh-TW, else zh-hk. If ko, pick ko.

I'm not sure how feasible this is.... if there's just a small amount of text, it might be OK, but testing "all the code points" vs "all the default fonts in all the CJK languages" could easily become unacceptably expensive. If the user opens a large plain-text CJK document with no language or charset metadata, we're not going to be able to do this level of analysis before we can start any layout or rendering.

I agree the current behavior isn't great, but it's not clear to me how best to deal with this in the case where no suitable font has been specified, and no "hints" in the form of language or charset are available from the document, the user's Firefox prefs, or the system environment.

Maybe we could "sniff" a limited amount of content in such a document, apply language-recognition heuristics to guess which CJK language it most likely is, and then use that as a last-resort "hint" for the font fallback process in the rest of the document. If you'd like to file a followup bug about this, too, I think it's worth considering if we can somehow improve things.
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/01682a146116
part 1 - Improve handling of default font prefs for CJK languages, to recognize a potential ordered list of preferred locales from the OS, not just a single selected locale. r=m_kato
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c69d9176968
part 2 - Force reflow of all content when locale prefs change, so that we can pick up modifications to the preferred ordering of CJK fonts without requiring restart or page reload. r=m_kato
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa0a9f25efcd
part 3 - Make the OSPreferences service return the user's list of preferred languages from macOS, not just a single locale code. r=m_kato
https://hg.mozilla.org/integration/mozilla-inbound/rev/593ae48605f5
part 4 - Register for notifications from macOS when the user's locale preferences are updated. r=m_kato
https://hg.mozilla.org/integration/mozilla-inbound/rev/9308d8831364
part 5 - Update a couple of tests to be more tolerant of variations in the exact form of the OS regional prefs. r=gandalf
Decoding documents with UTF-8 by default across the board is tracked in bug 1071816. I'll continue the discussion there, I don't agree with the decision to default to UTF-8 for file: URL and stay Western for docs loaded from the network.

font-weight: 300 isn't resolving to the Light variant font is tracked by bug 1122693.

Text clustering is tracked by bug 543200, Jonathan you've files it 8 years ago, has there been any progress since last year?

Once again thanks for fixing this bug!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: