Closed Bug 1756400 Opened 2 years ago Closed 2 years ago

font detect wrong in Linux with zh-TW/HK lang page when I use zh-CN

Categories

(Core :: Layout: Text and Fonts, defect)

defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox99 --- fixed

People

(Reporter: coelacanthushex, Assigned: jfkthame)

References

Details

Attachments

(6 files)

Attached image Pic 1.png

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

First, I test it on Firefox 97.0.1 and 99.0a1 on 2022-02-21

There is two test page.

  1. https://pgp.coelacanthus.moe/pgp-policy.zh-TW.html
  2. https://asaba.sakuragawa.moe/2021/07/%E4%BF%AE%E5%BE%A9-fedora-gnu-linux-%E7%B3%BB%E7%B5%B1%E4%B8%8B%E7%9A%84%E9%8D%B5%E7%9B%A4%E5%8A%9F%E8%83%BD%E5%8D%80%EF%BC%88f-%E5%8D%80%EF%BC%89%E6%8C%89%E9%8D%B5/

Step:

  1. Set your locate to zh-CN (or just using LANG=zh_CN.UTF-8 environment variable to launch firefox), Install Noto Serif CJK and one of Menlo,Monaco,Consolas,"Courier New"
  2. Open pages above, check all Chinese char of 1 and monospace font of 1 and 2

Actual results:

You can see Noto Serif CJK (Menlo,Monaco,Consolas,"Courier New") don't be detected although these have been installed, It fallback to serif and monospace. But this is still not the end, it will not use the fonts I set in fontconfig properly, for example, monospace will use non-fixed-width fonts and serif will use sans serif fonts (but this fallback font and serif font settings are working properly in other languages pages)
Looks like Pic 1,2

And If you change locate to en_GB (or launch firefox using this locate), this problem disappears.

Expected results:

It can use installed font in fallback sequence properly, and it can use proper serif and monospace font.
As in en_GB locate
Looks like Pic 3,4

Attached image Pic 2.png
Attached image Pic 3.png
Attached image Pic 4.png

The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core

Addition Info:
I use Arch Linux

I'm hoping jfkthame might have an idea about what's going on or suggestions on how to further-troubleshoot..

Severity: -- → S3
Flags: needinfo?(jfkthame)

I can't seem to reproduce behavior like this on my (ubuntu) Linux machine; launching Firefox with LANG=zh_CN.UTF-8 doesn't affect the fonts that I see on the example pages (Noto Serif CJK TC etc).

Celeste, could you please provide the output of running

LANG=zh_CN.UTF-8 fc-pattern -c :family="-moz-sentinel"

and

LANG=zh_CN.UTF-8 fc-pattern -c :family="Noto Serif CJK TC,-moz-sentinel"

on your system? And for comparison, the same with LANG=en_GB.UTF-8. I'm curious what the fontconfig configuration may be doing with this.

Flags: needinfo?(jfkthame) → needinfo?(coelacanthus)

Here are the outputs. And I determined what customized fontconfig will lead this problem, these looks like this:

<match>
<test name="lang" compare="contains">
<string>zh_CN</string>
</test>
<edit mode="prepend" name="family">
<string>Roboto</string>
<string>Noto Sans CJK SC</string>
</edit>
</match>

So there is a communication problem, Firefox passes LANG=zh_CN to fontconfig on a zh-TW page. This configuration should not cause problems with zh-TW, because it theoretically does not affect the font selection of zh-TW, but it does, because Firefox passes a wrong lang. (Of course, I also realize that this configuration itself is also a bit problematic, it should be set separately for sans, serif, and monospace).

LANG=zh_CN.UTF-8 fc-pattern -c :family="-moz-sentinel"

Pattern has 5 elts (size 16)
family: "Roboto"(w) "Noto Sans CJK SC"(w) "-moz-sentinel"(s) "Roboto"(w) "Noto Sans CJK SC"(w) "Noto Sans CJK TC"(w) "Noto Sans CJK JP"(w) "Noto Sans CJK KR"(w) "HanaMinA"(w) "HanaMinB"(w) "Twemoji"(w) "Twemoji"(w) "Blobmoji"(w) "Twemoji"(w) "Twemoji"(w) "Twemoji"(w) "Noto Sans"(w) "DejaVu Sans"(w) "Verdana"(w) "Arial"(w) "Albany AMT"(w) "Luxi Sans"(w) "Nimbus Sans L"(w) "Nimbus Sans"(w) "Nimbus Sans"(w) "Helvetica"(w) "Nimbus Sans"(w) "Nimbus Sans L"(w) "Lucida Sans Unicode"(w) "BPG Glaho International"(w) "Tahoma"(w) "Nachlieli"(w) "Lucida Sans Unicode"(w) "Yudit Unicode"(w) "Kerkis"(w) "ArmNet Helvetica"(w) "Artsounk"(w) "BPG UTF8 M"(w) "Waree"(w) "Loma"(w) "Garuda"(w) "Umpush"(w) "Saysettha Unicode"(w) "JG Lao Old Arial"(w) "GF Zemen Unicode"(w) "Pigiarniq"(w) "B Davat"(w) "B Compset"(w) "Kacst-Qr"(w) "Urdu Nastaliq Unicode"(w) "Raghindi"(w) "Mukti Narrow"(w) "malayalam"(w) "Sampige"(w) "padmaa"(w) "Hapax Berbère"(w) "MS Gothic"(w) "UmePlus P Gothic"(w) "Microsoft YaHei"(w) "Microsoft JhengHei"(w) "WenQuanYi Zen Hei"(w) "WenQuanYi Bitmap Song"(w) "AR PL ShanHeiSun Uni"(w) "AR PL New Sung"(w) "MgOpen Modata"(w) "VL Gothic"(w) "IPAMonaGothic"(w) "IPAGothic"(w) "Sazanami Gothic"(w) "Kochi Gothic"(w) "AR PL KaitiM GB"(w) "AR PL KaitiM Big5"(w) "AR PL ShanHeiSun Uni"(w) "AR PL SungtiL GB"(w) "AR PL Mingti2L Big5"(w) "MS ゴシック"(w) "ZYSong18030"(w) "TSCu_Paranar"(w) "NanumGothic"(w) "UnDotum"(w) "Baekmuk Dotum"(w) "Baekmuk Gulim"(w) "KacstQura"(w) "Lohit Bengali"(w) "Lohit Gujarati"(w) "Lohit Hindi"(w) "Lohit Marathi"(w) "Lohit Maithili"(w) "Lohit Kashmiri"(w) "Lohit Konkani"(w) "Lohit Nepali"(w) "Lohit Sindhi"(w) "Lohit Punjabi"(w) "Lohit Tamil"(w) "Meera"(w) "Lohit Malayalam"(w) "Lohit Kannada"(w) "Lohit Telugu"(w) "Lohit Oriya"(w) "LKLUG"(w) "Noto Sans"(w) "FreeSans"(w) "Arial Unicode MS"(w) "Arial Unicode"(w) "Code2000"(w) "Code2001"(w) "URW Gothic"(w) "Nimbus Sans"(w) "Nimbus Sans Narrow"(w) "Noto Sans CJK SC"(s) "sans-serif"(w) "Twemoji"(w) "Roya"(w) "Koodak"(w) "Terafik"(w) "Helvetica"(w) "sans-serif"(w) "ITC Avant Garde Gothic"(w) "URW Gothic"(w) "sans-serif"(w) "sans-serif"(w) "Helvetica"(w) "Helvetica Narrow"(w) "Nimbus Sans Narrow"(w)
hintstyle: 1(i)(w)
lang: "zh-CN"(w)
lcdfilter: 1(i)(w)
prgname: "fc-pattern"(s)

LANG=zh_CN.UTF-8 fc-pattern -c :family="Noto Serif CJK TC,-moz-sentinel"

Pattern has 5 elts (size 16)
family: "Roboto"(w) "Noto Sans CJK SC"(w) "Noto Serif CJK TC"(s) "-moz-sentinel"(s) "Roboto"(w) "Noto Sans CJK SC"(w) "Noto Sans CJK TC"(w) "Noto Sans CJK JP"(w) "Noto Sans CJK KR"(w) "HanaMinA"(w) "HanaMinB"(w) "Twemoji"(w) "Twemoji"(w) "Blobmoji"(w) "Twemoji"(w) "Twemoji"(w) "Twemoji"(w) "Noto Sans"(w) "DejaVu Sans"(w) "Verdana"(w) "Arial"(w) "Albany AMT"(w) "Luxi Sans"(w) "Nimbus Sans L"(w) "Nimbus Sans"(w) "Nimbus Sans"(w) "Helvetica"(w) "Nimbus Sans"(w) "Nimbus Sans L"(w) "Lucida Sans Unicode"(w) "BPG Glaho International"(w) "Tahoma"(w) "Nachlieli"(w) "Lucida Sans Unicode"(w) "Yudit Unicode"(w) "Kerkis"(w) "ArmNet Helvetica"(w) "Artsounk"(w) "BPG UTF8 M"(w) "Waree"(w) "Loma"(w) "Garuda"(w) "Umpush"(w) "Saysettha Unicode"(w) "JG Lao Old Arial"(w) "GF Zemen Unicode"(w) "Pigiarniq"(w) "B Davat"(w) "B Compset"(w) "Kacst-Qr"(w) "Urdu Nastaliq Unicode"(w) "Raghindi"(w) "Mukti Narrow"(w) "malayalam"(w) "Sampige"(w) "padmaa"(w) "Hapax Berbère"(w) "MS Gothic"(w) "UmePlus P Gothic"(w) "Microsoft YaHei"(w) "Microsoft JhengHei"(w) "WenQuanYi Zen Hei"(w) "WenQuanYi Bitmap Song"(w) "AR PL ShanHeiSun Uni"(w) "AR PL New Sung"(w) "MgOpen Modata"(w) "VL Gothic"(w) "IPAMonaGothic"(w) "IPAGothic"(w) "Sazanami Gothic"(w) "Kochi Gothic"(w) "AR PL KaitiM GB"(w) "AR PL KaitiM Big5"(w) "AR PL ShanHeiSun Uni"(w) "AR PL SungtiL GB"(w) "AR PL Mingti2L Big5"(w) "MS ゴシック"(w) "ZYSong18030"(w) "TSCu_Paranar"(w) "NanumGothic"(w) "UnDotum"(w) "Baekmuk Dotum"(w) "Baekmuk Gulim"(w) "KacstQura"(w) "Lohit Bengali"(w) "Lohit Gujarati"(w) "Lohit Hindi"(w) "Lohit Marathi"(w) "Lohit Maithili"(w) "Lohit Kashmiri"(w) "Lohit Konkani"(w) "Lohit Nepali"(w) "Lohit Sindhi"(w) "Lohit Punjabi"(w) "Lohit Tamil"(w) "Meera"(w) "Lohit Malayalam"(w) "Lohit Kannada"(w) "Lohit Telugu"(w) "Lohit Oriya"(w) "LKLUG"(w) "Noto Sans"(w) "FreeSans"(w) "Arial Unicode MS"(w) "Arial Unicode"(w) "Code2000"(w) "Code2001"(w) "URW Gothic"(w) "Nimbus Sans"(w) "Nimbus Sans Narrow"(w) "Noto Sans CJK SC"(s) "sans-serif"(w) "Twemoji"(w) "Roya"(w) "Koodak"(w) "Terafik"(w) "Helvetica"(w) "sans-serif"(w) "ITC Avant Garde Gothic"(w) "URW Gothic"(w) "sans-serif"(w) "sans-serif"(w) "Helvetica"(w) "Helvetica Narrow"(w) "Nimbus Sans Narrow"(w)
hintstyle: 1(i)(w)
lang: "zh-CN"(w)
lcdfilter: 1(i)(w)
prgname: "fc-pattern"(s)

LANG=en_GB.UTF-8 fc-pattern -c :family="-moz-sentinel"

Pattern has 5 elts (size 16)
family: "-moz-sentinel"(s) "Roboto"(w) "Noto Sans CJK SC"(w) "Noto Sans CJK TC"(w) "Noto Sans CJK JP"(w) "Noto Sans CJK KR"(w) "HanaMinA"(w) "HanaMinB"(w) "Twemoji"(w) "Twemoji"(w) "Blobmoji"(w) "Twemoji"(w) "Twemoji"(w) "Twemoji"(w) "Noto Sans"(w) "DejaVu Sans"(w) "Verdana"(w) "Arial"(w) "Albany AMT"(w) "Luxi Sans"(w) "Nimbus Sans L"(w) "Nimbus Sans"(w) "Nimbus Sans"(w) "Helvetica"(w) "Nimbus Sans"(w) "Nimbus Sans L"(w) "Lucida Sans Unicode"(w) "BPG Glaho International"(w) "Tahoma"(w) "Nachlieli"(w) "Lucida Sans Unicode"(w) "Yudit Unicode"(w) "Kerkis"(w) "ArmNet Helvetica"(w) "Artsounk"(w) "BPG UTF8 M"(w) "Waree"(w) "Loma"(w) "Garuda"(w) "Umpush"(w) "Saysettha Unicode"(w) "JG Lao Old Arial"(w) "GF Zemen Unicode"(w) "Pigiarniq"(w) "B Davat"(w) "B Compset"(w) "Kacst-Qr"(w) "Urdu Nastaliq Unicode"(w) "Raghindi"(w) "Mukti Narrow"(w) "malayalam"(w) "Sampige"(w) "padmaa"(w) "Hapax Berbère"(w) "MS Gothic"(w) "UmePlus P Gothic"(w) "Microsoft YaHei"(w) "Microsoft JhengHei"(w) "WenQuanYi Zen Hei"(w) "WenQuanYi Bitmap Song"(w) "AR PL ShanHeiSun Uni"(w) "AR PL New Sung"(w) "MgOpen Modata"(w) "VL Gothic"(w) "IPAMonaGothic"(w) "IPAGothic"(w) "Sazanami Gothic"(w) "Kochi Gothic"(w) "AR PL KaitiM GB"(w) "AR PL KaitiM Big5"(w) "AR PL ShanHeiSun Uni"(w) "AR PL SungtiL GB"(w) "AR PL Mingti2L Big5"(w) "MS ゴシック"(w) "ZYSong18030"(w) "TSCu_Paranar"(w) "NanumGothic"(w) "UnDotum"(w) "Baekmuk Dotum"(w) "Baekmuk Gulim"(w) "KacstQura"(w) "Lohit Bengali"(w) "Lohit Gujarati"(w) "Lohit Hindi"(w) "Lohit Marathi"(w) "Lohit Maithili"(w) "Lohit Kashmiri"(w) "Lohit Konkani"(w) "Lohit Nepali"(w) "Lohit Sindhi"(w) "Lohit Punjabi"(w) "Lohit Tamil"(w) "Meera"(w) "Lohit Malayalam"(w) "Lohit Kannada"(w) "Lohit Telugu"(w) "Lohit Oriya"(w) "LKLUG"(w) "Noto Sans"(w) "FreeSans"(w) "Arial Unicode MS"(w) "Arial Unicode"(w) "Code2000"(w) "Code2001"(w) "URW Gothic"(w) "Nimbus Sans"(w) "Nimbus Sans Narrow"(w) "sans-serif"(w) "Twemoji"(w) "Roya"(w) "Koodak"(w) "Terafik"(w) "Helvetica"(w) "sans-serif"(w) "ITC Avant Garde Gothic"(w) "URW Gothic"(w) "sans-serif"(w) "sans-serif"(w) "Helvetica"(w) "Helvetica Narrow"(w) "Nimbus Sans Narrow"(w)
hintstyle: 1(i)(w)
lang: "en"(w)
lcdfilter: 1(i)(w)
prgname: "fc-pattern"(s)

LANG=zh_CN.UTF-8 fc-pattern -c :family="Noto Serif CJK TC,-moz-sentinel"

Pattern has 5 elts (size 16)
family: "Noto Serif CJK TC"(s) "-moz-sentinel"(s) "Roboto"(w) "Noto Sans CJK SC"(w) "Noto Sans CJK TC"(w) "Noto Sans CJK JP"(w) "Noto Sans CJK KR"(w) "HanaMinA"(w) "HanaMinB"(w) "Twemoji"(w) "Twemoji"(w) "Blobmoji"(w) "Twemoji"(w) "Twemoji"(w) "Twemoji"(w) "Noto Sans"(w) "DejaVu Sans"(w) "Verdana"(w) "Arial"(w) "Albany AMT"(w) "Luxi Sans"(w) "Nimbus Sans L"(w) "Nimbus Sans"(w) "Nimbus Sans"(w) "Helvetica"(w) "Nimbus Sans"(w) "Nimbus Sans L"(w) "Lucida Sans Unicode"(w) "BPG Glaho International"(w) "Tahoma"(w) "Nachlieli"(w) "Lucida Sans Unicode"(w) "Yudit Unicode"(w) "Kerkis"(w) "ArmNet Helvetica"(w) "Artsounk"(w) "BPG UTF8 M"(w) "Waree"(w) "Loma"(w) "Garuda"(w) "Umpush"(w) "Saysettha Unicode"(w) "JG Lao Old Arial"(w) "GF Zemen Unicode"(w) "Pigiarniq"(w) "B Davat"(w) "B Compset"(w) "Kacst-Qr"(w) "Urdu Nastaliq Unicode"(w) "Raghindi"(w) "Mukti Narrow"(w) "malayalam"(w) "Sampige"(w) "padmaa"(w) "Hapax Berbère"(w) "MS Gothic"(w) "UmePlus P Gothic"(w) "Microsoft YaHei"(w) "Microsoft JhengHei"(w) "WenQuanYi Zen Hei"(w) "WenQuanYi Bitmap Song"(w) "AR PL ShanHeiSun Uni"(w) "AR PL New Sung"(w) "MgOpen Modata"(w) "VL Gothic"(w) "IPAMonaGothic"(w) "IPAGothic"(w) "Sazanami Gothic"(w) "Kochi Gothic"(w) "AR PL KaitiM GB"(w) "AR PL KaitiM Big5"(w) "AR PL ShanHeiSun Uni"(w) "AR PL SungtiL GB"(w) "AR PL Mingti2L Big5"(w) "MS ゴシック"(w) "ZYSong18030"(w) "TSCu_Paranar"(w) "NanumGothic"(w) "UnDotum"(w) "Baekmuk Dotum"(w) "Baekmuk Gulim"(w) "KacstQura"(w) "Lohit Bengali"(w) "Lohit Gujarati"(w) "Lohit Hindi"(w) "Lohit Marathi"(w) "Lohit Maithili"(w) "Lohit Kashmiri"(w) "Lohit Konkani"(w) "Lohit Nepali"(w) "Lohit Sindhi"(w) "Lohit Punjabi"(w) "Lohit Tamil"(w) "Meera"(w) "Lohit Malayalam"(w) "Lohit Kannada"(w) "Lohit Telugu"(w) "Lohit Oriya"(w) "LKLUG"(w) "Noto Sans"(w) "FreeSans"(w) "Arial Unicode MS"(w) "Arial Unicode"(w) "Code2000"(w) "Code2001"(w) "URW Gothic"(w) "Nimbus Sans"(w) "Nimbus Sans Narrow"(w) "sans-serif"(w) "Twemoji"(w) "Roya"(w) "Koodak"(w) "Terafik"(w) "Helvetica"(w) "sans-serif"(w) "ITC Avant Garde Gothic"(w) "URW Gothic"(w) "sans-serif"(w) "sans-serif"(w) "Helvetica"(w) "Helvetica Narrow"(w) "Nimbus Sans Narrow"(w)
hintstyle: 1(i)(w)
lang: "en"(w)
lcdfilter: 1(i)(w)
prgname: "fc-pattern"(s)

Flags: needinfo?(coelacanthus)

Thanks! I wondered if there might be something like that going on.

So there are a couple of problems here that we need to figure out how to avoid. First, when we call FcConfigSubstitute to ask fontconfig to apply its configuration, we're not passing it the language of the page, and so it defaults to the system locale. I think we can fairly easily fix this (though untested as yet), which should give more consistent behavior.

The second problem is that with this zh_CN configuration, the FcConfigSubstitute call ends up prepending a couple of fixed family names ahead of the family name that we pass. This means those families will always take precedence over whatever the content requests. I'm not sure what, if anything, we should try to do about that.... I think the resulting behavior is quite surprising for users, because it means that font-family ends up being (mostly) ignored, and the fontconfig-prepended names get used instead. But arguably that's what the user asked for by setting such a config, so that's what they get. (This is why you're seeing all the Latin characters in Roboto when the zh_CN configuration is in effect, instead of the expected monospace fonts.)

I've pushed a try job at https://treeherder.mozilla.org/jobs?repo=try&revision=2f138b375e964e92ce7706c1568168298bbf75b8 with a patch that I think should help here. Once the build is available (or if you care to build locally with the patch), if you can test it with your configuration and confirm whether it fixes the issue, that would be great - thanks!

Flags: needinfo?(coelacanthus)
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Not fix, even worse, it leads monospace font in en locale page is not fixed-width now.
But I noticed that there were duplicates in my fontconfig, and when I removed the duplicates, the problem miraculously disappeared.
Duplicates are located here.
https://github.com/CoelacanthusHex/dotfiles/blob/master/fontconfig/.config/fontconfig/conf.d/50-generic.conf#L16-L33
So it may not be the bug of Firefox, sorry to bother you.
But I don't understand the reason for this problem, fontconfig will deduplicate the font list, so if I do the same prepend twice, the result should be the same, could you explain the reason for this problem for me?

Flags: needinfo?(coelacanthus)

first line description is before I remove duplicates

Odd... I don't know why the duplicate fontconfig entries would have caused this.

In any case, I think looking into this did expose a real issue with how Firefox interacts with fontconfig, so the patches here should improve the handling of cases where the document language is different from the system language, and there are language-specific fontconfig rules in place (as well as some emoji-font configurations I've seen in the past where an emoji font is prepended to all font lists).

I'm glad to help Firefox improve.
Ok, I test patch after removing duplicates, it works well.
And I tried building different page languages, css and fontconfig to test it, also all working fine.
So I think it can be considered that the test passed.

Regarding the strange behavior of fontconfig when repeating prepend, I think it is worth investigating, but the priority does not need to be high, because this should not happen in normal configuration.

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f8f857f310d
Pass the content language (if any) to fontconfig when calling FcConfigSubstitute, to get appropriate substitutions for the page rather than system-locale defaults. r=lsalzman
https://hg.mozilla.org/integration/autoland/rev/3f5225bc9ec4
Improve font-selection behavior with fontconfig setups that prepend family names to all patterns. r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Depends on: 1758286
Regressions: 1758286
Regressions: 1763175
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: