nightly don't use default font when html lang="ja" and charset is UTF-8

NEW
Unassigned

Status

()

Core
Internationalization
P3
normal
7 months ago
7 months ago

People

(Reporter: Takeshi Ichimaru(a.k.a. Ayakawa), Unassigned)

Tracking

55 Branch
Points:
---

Firefox Tracking Flags

(firefox55 affected)

Details

Attachments

(2 attachments)

(Reporter)

Description

7 months ago
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170318030202

Steps to reproduce:

1. fonts set "Japansese", Proportional=Sans serif, Serif=Yu Mincho, Sans-serif=Yu Gothic
   fonts set "latin", Proportional=Serif, Serif=Times New Roman, Sans-serif=Arial
2. access HTML that is written by UTF-8, has <html lang="ja"> and not assigned font, sample url : http://ayakawa.o.oo7.jp/test_utf8.html
  


Actual results:

draw text with Serif(Yu Mincho)


Expected results:

draw text with Sans-serif(Yu Gothic)
(Reporter)

Comment 1

7 months ago
if fonts set "latin" change proportional=Sans-serif, text is drawn with Sans-serif(Yu Gothic).

if HTML is written by shift_jis, text is drawn with Sans-serif(Yu Gothic). sample url : http://ayakawa.o.oo7.jp/test_sjis.html
(Reporter)

Comment 2

7 months ago
build 20170314 ok
build 20170315 fail

Comment 3

7 months ago
Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b1c8b28b9fa2a8424db940d4b657eb59b3f01ff3&tochange=b45d664c0a7b10d6a54ceae884f2f8956f10bbec

Regressed by: Bug 1346674
Blocks: 1346674
Status: UNCONFIRMED → NEW
Component: Untriaged → Untriaged
Ever confirmed: true
Keywords: regression
Product: Firefox → Core
Thank you for reporting and apologies for the regression.

I'm taking the bug and will investigate it.
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Reporter, can you please provide me:

 - what language is your operating system in?
 - what language is your Firefox User Interface in?

?
Flags: needinfo?(ayakawa.m)

Comment 6

7 months ago
Created attachment 8848781 [details]
screenshot

Screenshot on Windows10 Japanese edition and Nightly en-US build.
In that case, it seems that we just use the app locale when selecting font (and previously we used OS locale) instead of using the locale from "lang" attribute?
Flags: needinfo?(m_kato)
(Reporter)

Comment 8

7 months ago
Created attachment 8848816 [details]
screenshot with ja build

App locale means "general.useragent.locale" ?
I test with nightly ja build, this bug is alive.
(Reporter)

Comment 9

7 months ago
oop. sorry for missing environment.

I test Windows 10 japanese edition and nightly ja build(20170318).
Thank you! I'm building ja debug build on Windows 10 to investigate it.
Flags: needinfo?(ayakawa.m)
I'm confused by this.

I expect that this has something to do with this change: https://hg.mozilla.org/integration/autoland/diff/2fcd6a97a1b8/gfx/thebes/gfxPlatformFontList.cpp

but I don't know why.

This function starts by checking if lang is set and sets prefs based on it and I didn't touch that part. All my changes do is alter what we do later and it shouldn't affect preferences?

Especially if someone is on Firefox ja-JP and Windows ja-JP, my change should not affect them (before the patch, the middle block of code in the `AppendCJKPrefLangs` was taking `ja-JP` from OS, now it's taking it from the app, but it shouldn't matter for this scenario.

Comment 12

7 months ago
I can also reproduce on Nightly ja and Windows10.0 Japanese Edition.
(Reporter)

Comment 13

7 months ago
do these results suggest hint?

Only this bug appear that html is written by UTF-8.
sample : http://ayakawa.o.oo7.jp/test_utf8.html

If html is written by Shift JIS or EUC-JP or IOS-2022-JP, no bug.
sample : http://ayakawa.o.oo7.jp/test_sjis.html
         http://ayakawa.o.oo7.jp/test_eucjp.html
         http://ayakawa.o.oo7.jp/test_iso2022jp.html
status-firefox55: --- → affected
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #7)
> In that case, it seems that we just use the app locale when selecting font
> (and previously we used OS locale) instead of using the locale from "lang"
> attribute?

When content is UTF-8, lang is nsGkAtoms::unicode in nsPresContext::UpdateCharSet.  After your fix, mLangService->GetLocaleLanguage() returns UX locale instead of OS locale.  For compatibility, GetLocaleLanguage should return 1st OS locale atom (multiple support will be used be for Android N+ only, so it will be bug 1255242).  Or, nsLanguageAtomService should have a method to return OS locale, then use it in nsPresContext::UpdateCharSet.
Flags: needinfo?(m_kato)

Updated

7 months ago
Priority: -- → P1

Updated

7 months ago
Component: Untriaged → Internationalization
(Also, I think that there seems to be another bug that nsPresContext::UpdateCharSet doesn't reference content's lang well..)
Am I correct that we're talking about the code in gfxPlatformFontList::AppendCJKPrefLangs ?

> For compatibility, GetLocaleLanguage should return 1st OS locale atom

I don't understand why. Let's say I'm on "fr" OS and I installed "ja" Firefox. Why would you take locale from the OS and not from Firefox?

And why would you take locale from Firefox or OS and not from "lang" attribute on the HTML tag?

> Or, nsLanguageAtomService should have a method to return OS locale, then use it in nsPresContext::UpdateCharSet.

We do now have intl/locale/OSPreferences which has "GetSystemLocales", so I can switch it easily, I just would like to understand why would we even look into the OS or App in the scenario where lang="ja"?

And if one of them has to be consulted, then why OS and not the App locale?

> (Also, I think that there seems to be another bug that nsPresContext::UpdateCharSet doesn't reference content's lang well..)

Does it mean that we should take lang="ja" over OS or App locale?
Flags: needinfo?(m_kato)
Oh, you're saying that it's not gfxPlatformFontList::AppendCJKPrefLangs, it's nsPresContext::UpdateCharSet which uses mLangService (which is nsLanguageAtomService) which uses LocaleService::GetAppLocales now.

So, the solution here might be to just use OSPreferences in nsPresContext::UpdateCharSet instead of mLangService.

But I still don't get why would we look up OS preferences and not app preferences.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #17)
> Oh, you're saying that it's not gfxPlatformFontList::AppendCJKPrefLangs,
> it's nsPresContext::UpdateCharSet which uses mLangService (which is
> nsLanguageAtomService) which uses LocaleService::GetAppLocales now.
> 
> So, the solution here might be to just use OSPreferences in
> nsPresContext::UpdateCharSet instead of mLangService.
> 
> But I still don't get why would we look up OS preferences and not app
> preferences.

Please forget Firefox OS.  This is desktop browser application.  We (including reporter) talk about the behavior of normal desktop application.

We should reference both.

Example, my environment is Firefox English and OS locale setting is Japanese.  So when browsing Japanese content, I want that it should render by Japanese font even if it doesn't have lang="ja".

Also, If a Chinese user, OS is Chinese locale and Firefox (ex. Nightly) uses non-Chinese version such as English UI, he/she want that it should render by Chinese font even if content is doesn't have lang="cn".

Google's developer explains about this situation in bug 1255242 (this is a new feature of Android N).  Even if UI is english, we should select better font using OS settings too.
Flags: needinfo?(m_kato)
> Example, my environment is Firefox English and OS locale setting is Japanese.  So when browsing Japanese content, I want that it should render by Japanese font even if it doesn't have lang="ja".

Ok, I think I have two questions about this:


1) You say that "when browsing Japanese content, we should use Japanese font".

I agree with that. But I also believe that this should be the case irrelevant of the locale of my operating system or my browser.

Is that true?

2) You say that, an example is that your OS is "ja" and your browser is "en-US"

What if the values are reversed - your browser is "ja" and your OS is "en-US"?

To be honest, I feel like your OS should not affect your browser. If you installed "en-US" - your experience should be "en-US'. If you installed "ja" (we have nightly builds for ja), you should have the "ja" experience.

I struggle to see the value in skipping browser locale choice to follow OS locale choice.
Flags: needinfo?(m_kato)

Comment 20

7 months ago
(In reply to Makoto Kato [:m_kato] (PTO until 3/20) from comment #15)
> (Also, I think that there seems to be another bug that
> nsPresContext::UpdateCharSet doesn't reference content's lang well..)

AFAICT this bug is actually about this particular problem, not OS vs browser language.

We just made this problem noticable to people who had had this bug masked by their OS language.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #19)

> 1) You say that "when browsing Japanese content, we should use Japanese
> font".
> 
> I agree with that. But I also believe that this should be the case
> irrelevant of the locale of my operating system or my browser.
> 
> Is that true?

You know, even if same code point, Japanese font's glyph, Simplified Chinese font's glyph, Traditional Chinese font's glyph aren't same.

This is regression, and it is too sensitive issue to change font, so we should back to original behavior. (some bugs are already opened by your fix).

In this case, we don't discuss what language is default font yet.  Popular Japanese people sets that system locale is Japanese even if using any language OS.  And many Nightly users uses English UI since each localization isn't completely and we don't provide any language version of Nightly until recently. So many people still use English version's Firefox even if system locale isn't English.

So we should keep using system locale for default font on unicode/utf-8 content.  If we can reference HTML lang attribute for default font, it might have to use it instead of system locale.  But referencing lang attribute might cause preference issue. 

Also, as font enumeration, we should reference UI locale too.

Nakano-san, do you have additional suggestion?


> 2) You say that, an example is that your OS is "ja" and your browser is
> "en-US"
> 
> What if the values are reversed - your browser is "ja" and your OS is
> "en-US"?

Priority is

1. en-US "as default font"
2. Japanese.
 
> To be honest, I feel like your OS should not affect your browser. If you
> installed "en-US" - your experience should be "en-US'. If you installed "ja"
> (we have nightly builds for ja), you should have the "ja" experience.
> 
> I struggle to see the value in skipping browser locale choice to follow OS
> locale choice.

font selection is too difficult :-<.  If all platform uses matchOS pref, it is more simply.
Flags: needinfo?(m_kato)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #19)
> > Example, my environment is Firefox English and OS locale setting is Japanese.  So when browsing Japanese content, I want that it should render by Japanese font even if it doesn't have lang="ja".
> 
> Ok, I think I have two questions about this:
> 
> 
> 1) You say that "when browsing Japanese content, we should use Japanese
> font".
> 
> I agree with that. But I also believe that this should be the case
> irrelevant of the locale of my operating system or my browser.
> 
> Is that true?

First, it's difficult to decide which language is the primary language of each web page. For example, lang attribute may not be specified explicitly, wrong lang attribute value may be set, etc. When legacy encoding is used, the encoding was very good hit.  However, UTF-8 is now used in most web pages. So, we've lost a big hint to *guess* the language of the web contents.

Next, there is another big difference of preferred font in each language, that is, Serif vs. Sans-serif. If Gecko judges a web page should be rendered with fonts for en-US, but if actual content is written Japanese, Japanese characters are rendered with Japanese serif fonts.  However, Japanese characters (most of them are Chinese characters) have a lot of lines different from Western languages.  Therefore, glyph of Serif fonts are very difficult to read (and ugly) on screen due to lower resolution than printing. So, misjudging language may cause this problem.

Finally, user's primary language's default font must be easiest font to read *any* character because any users see such fonts in most time.  Therefore, it's the safest way to refer OS locale when Gecko is not sure the language of the content.

So, I think that it's not true.

> 2) You say that, an example is that your OS is "ja" and your browser is
> "en-US"
> 
> What if the values are reversed - your browser is "ja" and your OS is
> "en-US"?
> 
> To be honest, I feel like your OS should not affect your browser. If you
> installed "en-US" - your experience should be "en-US'. If you installed "ja"
> (we have nightly builds for ja), you should have the "ja" experience.
> 
> I struggle to see the value in skipping browser locale choice to follow OS
> locale choice.

I think that's wrong. Some users may use English locale even they stay their primary language's web sites in most time. Using English version of browser really helps to report bugs. Imagine, when you want to report a bug with an error message, you need to *find* the message from the resource because translating words to English won't restore original words in most cases.


Another possible hint is, accept-language setting because when user needs to use different UI language version, user needs to change this properly.  Actually, we are using this at deciding the preferred order of CJKT fonts.  However, I strongly agree with that such behavior change is too sensitive. We should change it only in Nightly for a couple of cycles to get feedback.
> First, it's difficult to decide which language is the primary language of each web page. For example, lang attribute may not be specified explicitly, wrong lang attribute value may be set, etc.

1) I don't understand why you both, :m_kato and :masayuki, keep bringing some examples ("what if my OS is ja and browser is en-US") while ignoring others ("what if my OS is en-US and browser is ja").
2) You also repetitively say that nothing can be trusted - browser locale may be non-JA because user wants to use en-US, "lang" tag can be "wrong", but somehow OS locale is the North Star that we should follow closely.

This bug's title is: "nightly don't use default font when html lang="ja" and charset is UTF-8"

I believe the fix for this bug is to respect the "lang" tag on HTML.

There are two more bugs: bug 1348259 and bug 1348299 which may be about scenario where we don't have lang="ja".

I'm going to follow :m_kato recommendation to switch to OSPreferences so that we don't change the behavior without discussion, but I believe that we should talk about it. I'll do this in bug 1348259.

This bug should imho stay open for the discussion on why we follow OS locale and not use lang="ja" when present.

My theory here is that there are multiple cohorts of users:
1) users who have en-US OS and ja browser
2) users who have ja OS and en-US browser
3) users who have ja OS and ja browser
4) users who have en-US OS and en-US browser

All four of them may visit "ja" website and should get "ja" fonts. 

The last group needs a hint from the website itself, either by some unicode chars or lang attribute.
The third group can take either OS or app locale as a hint.
The second group has been working before my patch and are vocal about the "regression" now.
The first group was *not* working for years, and the new behavior makes it work for them, so we won't get complaints from them.

My argument is that we should do much more to display japanese fonts properly based on the webpage, not UI locale of OS or browser.
If we fix the heuristics for (4) group, we won't need to rely on OS locale at all.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #23)
> > First, it's difficult to decide which language is the primary language of each web page. For example, lang attribute may not be specified explicitly, wrong lang attribute value may be set, etc.
> 
> 1) I don't understand why you both, :m_kato and :masayuki, keep bringing
> some examples ("what if my OS is ja and browser is en-US") while ignoring
> others ("what if my OS is en-US and browser is ja").

Commenting from my experience. And I don't think that there are a lot of users who are using localized app but its OS locale is en-US.

> 2) You also repetitively say that nothing can be trusted - browser locale
> may be non-JA because user wants to use en-US, "lang" tag can be "wrong",
> but somehow OS locale is the North Star that we should follow closely.

Yes. Basically, if the user cannot understand the language of OS, it's too difficult to use such environment. Actually, I have a lot of VM images which are Traditional Chinese, Simplified Chinese, Korean and Russian for testing specific issue on each localized Windows. However, I don't understand most UI of the OSes. When I need to configure the OS settings, I checked same OS's Japanese or English version. So, using OS locale is the safest hint for users. Actually, you got a lot of regression reports after changing to prefer application UI locale.

> This bug's title is: "nightly don't use default font when html lang="ja" and
> charset is UTF-8"
> 
> I believe the fix for this bug is to respect the "lang" tag on HTML.

No, this has "regression" keyword and actually you broke Nightly's rendering result.

We might improve font switching with lang attribute in the pages. But it's different issue. For now, we should restore the previous behavior. Current rendering result is too ugly for me due to serif fonts for Japanese characters.

> There are two more bugs: bug 1348259 and bug 1348299 which may be about
> scenario where we don't have lang="ja".

Only first font's metrics is used for computing line-height, underline position, etc because of performance reason and consistent result.

So, they are really caused by the change of default language at choosing fonts for web contents whose locale is unclear.

> My theory here is that there are multiple cohorts of users:
> 1) users who have en-US OS and ja browser
> 2) users who have ja OS and en-US browser
> 3) users who have ja OS and ja browser
> 4) users who have en-US OS and en-US browser
> 
> All four of them may visit "ja" website and should get "ja" fonts. 

If lang attribute is specified explicitly, ideally, yes. But some web contents may lie its language. For example, if web site is localized from original language, the template may have original language's lang attribute value and translator may not change the value.

Although, I'm not familiar with current actual web sites. So, this issue could become less than when I worked around here.
> Yes. Basically, if the user cannot understand the language of OS, it's too difficult to use such environment.

You're making the argument that the user will select OS in the locale they understand (say, "ja"), but then will use the browser in locale they presumably don't ("en-US"), and that's the intended behavior?

> Actually, I have a lot of VM images which are Traditional Chinese, Simplified Chinese, Korean and Russian for testing specific issue on each localized Windows. However, I don't understand most UI of the OSes. When I need to configure the OS settings, I checked same OS's Japanese or English version.

I'm sorry to point it out, but as you can see, you cannot claim to represent the users. Your way of using the software is very specific to your domain of expertise.
I do not believe that Firefox behavior should be optimized for your use case.

> So, using OS locale is the safest hint for users. Actually, you got a lot of regression reports after changing to prefer application UI locale.

That's biased.

For X years you serve a product that ignores the browser locale, and follows the OS locale. It's pretty obvious that whoever wanted to use "ja" Firefox on "en-US" Windows had to give up because it didn't work.
So you end up self-selecting a population of users who stick to "ja" Windows and then use any browser locale they want because it doesn't matter.

Obviously, if we change the behavior and start expecting users who want "ja" Firefox to use "ja" Firefox, instead of "en-US" Firefox with "ja" Windows, those people will be vocal, while the first group will not.

I do not believe that this constitutes a reliable data point that we should base our judgement on.

See https://en.wikipedia.org/wiki/Self-selection_bias for more details.

> No, this has "regression" keyword and actually you broke Nightly's rendering result.
> We might improve font switching with lang attribute in the pages. But it's different issue.

I do not believe that you are correct. This bug has been reported informing us that when HTML element has lang="ja",we ignore it.
We ignore it now, and we ignored it before my patch.

As :pike pointed out, the only difference is that *you* didn't see it because the bug was masked by the fact that you use Windows in "ja" and Firefox picked that.
Reverting the patch does not fix the bug described here.

> So, they are really caused by the change of default language at choosing fonts for web contents whose locale is unclear.

I provided a patch in bug 1348299 to use both, OS locale and browser locale.

> Although, I'm not familiar with current actual web sites. So, this issue could become less than when I worked around here.

I sure hope it does, because lying to the browser forces us to do dirty hacks that will never end up well for anyone :)

Anyway,  I provided patches for all behaviors except of the `toLocaleFormat`. I hope to get it working for you guys again and continue working on:

a) recognizing lang attribute
b) using app locale together with os locale to guess which of the CJK locales range did we identify in the website
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #25)
> > Yes. Basically, if the user cannot understand the language of OS, it's too difficult to use such environment.
> 
> You're making the argument that the user will select OS in the locale they
> understand (say, "ja"), but then will use the browser in locale they
> presumably don't ("en-US"), and that's the intended behavior?

I meant that in most cases, users are familiar with OS's language. So, I'd like to say that ignoring OS locale does not make sense.

> > Actually, I have a lot of VM images which are Traditional Chinese, Simplified Chinese, Korean and Russian for testing specific issue on each localized Windows. However, I don't understand most UI of the OSes. When I need to configure the OS settings, I checked same OS's Japanese or English version.
> 
> I'm sorry to point it out, but as you can see, you cannot claim to represent
> the users. Your way of using the software is very specific to your domain of
> expertise.
> I do not believe that Firefox behavior should be optimized for your use case.

I just tried to explain why I think that users must understand the OS's language. I don't explain any cases of our users.

> > So, using OS locale is the safest hint for users. Actually, you got a lot of regression reports after changing to prefer application UI locale.
> 
> That's biased.
> 
> For X years you serve a product that ignores the browser locale, and follows
> the OS locale. It's pretty obvious that whoever wanted to use "ja" Firefox
> on "en-US" Windows had to give up because it didn't work.

Not directly it's not related this bug, such users can specify other languages fonts to OS locale's fonts. Although, as you must think, it's not a good workaround.

> So you end up self-selecting a population of users who stick to "ja" Windows
> and then use any browser locale they want because it doesn't matter.
> 
> Obviously, if we change the behavior and start expecting users who want "ja"
> Firefox to use "ja" Firefox, instead of "en-US" Firefox with "ja" Windows,
> those people will be vocal, while the first group will not.
> 
> I do not believe that this constitutes a reliable data point that we should
> base our judgement on.

Note that I do NOT disagree with that app location is a good hint. I claim that ignoring OS locale does not make sense.

> > No, this has "regression" keyword and actually you broke Nightly's rendering result.
> > We might improve font switching with lang attribute in the pages. But it's different issue.
> 
> I do not believe that you are correct. This bug has been reported informing
> us that when HTML element has lang="ja",we ignore it.
> We ignore it now, and we ignored it before my patch.

Yes, in my understanding, the lang attribute issue isn't a recent regression. Your patch broke default font judgement of Nightly users who use en-US Nightly but using other locale's OS and that causes needing other hints like "lang" attribute.

I follow the reporter's twitter, he was investing the regression cause, and he reached the lang attribute issue, but I believe that he intended to point the recent regression. So, for preventing bug spams about lang attribute discussion, we should file another bug which does NOT treat the recent regression.

> As :pike pointed out, the only difference is that *you* didn't see it
> because the bug was masked by the fact that you use Windows in "ja" and
> Firefox picked that.
> Reverting the patch does not fix the bug described here.

Yes, but this was reported as a regression bug, not for long standing bug of lang attribute.

> > So, they are really caused by the change of default language at choosing fonts for web contents whose locale is unclear.
> 
> I provided a patch in bug 1348299 to use both, OS locale and browser locale.
> 
> > Although, I'm not familiar with current actual web sites. So, this issue could become less than when I worked around here.
> 
> I sure hope it does, because lying to the browser forces us to do dirty
> hacks that will never end up well for anyone :)
> 
> Anyway,  I provided patches for all behaviors except of the
> `toLocaleFormat`. I hope to get it working for you guys again and continue
> working on:
> 
> a) recognizing lang attribute
> b) using app locale together with os locale to guess which of the CJK
> locales range did we identify in the website

Let's see the result, first. But as I said above, we should file another bug for discussing the long standing bug. The reporter and some watchers of this bug may not want to read long discussion after the fix of bug 1348299 if it fixes the recent regression.
> Actual results:
> 
> draw text with Serif(Yu Mincho)
> 
> Expected results:
> 
> draw text with Sans-serif(Yu Gothic)

Ah, I remember that, I was thinking this issue. This is not related to locale issue. The lang attribute completely works fine because Japanese font is selected by the lang attribute.

The problem is, current Nightly build chooses en-US as the document's locale. Then, font family is chosen at en-US's default font setting.  Therefore, this becomes like this:

> <html lang="ja" style="font-family: serif;">
> ...
> </html>

because default generic font family for en-US is serif.  Therefore, serif font is also used for Japanese characters too. (Japanese preferred generic font family is sans-serif.)

So, there is no bug around lang attribute. If we *should* change generic font family with lang attribute, we need new pseudo generic font family like "moz-default" in CSS level. However, I'm not sure if switching generic font is a good rendering result because serif fonts and sans-serif fonts may be mixed up in a paragraph.
I think all regression introduced by me are fixed now.

This is the last bug that is open, and we have a choice of morphing this bug into what :masayuki described in comment 27 (font follow lang attribute), or closing it.

Preference?
change priority to 3 since regressions are fixed
Priority: P1 → P3
> I think all regression introduced by me are fixed now.

Should we remove the regression keyword then?
Flags: needinfo?(gandalf)
yep, and I'm de-assigning myself.
Assignee: gandalf → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(gandalf)
Keywords: regression
Thanks!
You need to log in before you can comment on or make changes to this bug.