Closed Bug 977540 Opened 8 years ago Closed 3 years ago

Don't apply detectors to foreign domains

Categories

(Core :: Internationalization, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Regressed 1 open bug)

Details

Attachments

(2 files)

Charset menu usage by locale for...
Firefox 25: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393
Firefox 26: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394

It's not worthwhile to pay attention to the gray rows. Those locales have so few reported sessions a even a couple of sessions with the menu used make the locales sort to the top by session percentage.

Looking at the black rows, the top ones are Traditional Chinese and Thai, which accidentally have wrong fallbacks, because the old way that localizations used to define the fallback made it easy to make the error of having the intended fallback in a generic file but having Windows, Mac and *nix-specific files always override the generic file with something unintentional. Firefox 28 fixes Traditional Chinese and Thai.

Other than that, the top locales in black are the ones that had detectors available in the menu (off by default for Simplified Chinese and Korean; on by default for Japanese, Russian and Ukrainian) in Firefox 25 and 26.

Since the Korean really has a single legacy encoding in practice, other things being equal, one should expect the situation to be similar for Korean, Hebrew and Greek. Yet, it's not. Once versions that no longer offer a Korean detector make it to the release channel, we'll learn if the difference between Korean on one hand and Hebrew and Greek on the other can be attributed to the detector.

Also note how the Bulgarian localization is doing much better than Russian or Ukrainian localizations. (The Belarusian one has too few sessions to draw conclusions.)

But in any case, now that we have per-TLD baseline guesses from bug 910211, it doesn't really make sense to let the Japanese detector run if the baseline guess isn't Shift_JIS and it doesn't make sense to let the Russian or Ukrainian detector run if the baseline guess isn't windows-1251. (Maybe this could even be stricter: Running the Japanese detector only on .jp, Russian detector only on .ru and Ukrainian detector only on .ua.)

We should probably make the detectors scoped that way. And if we do, the current UI no longer makes sense. At most a boolean pref for turning all three on or off would make sense.

Once the detectors are scoped not to run on foreign sites, we can assess if they do more harm than good even domestically.
How about "generic" TLDs (e.g. .com)?
(In reply to Masatoshi Kimura [:emk] from comment #1)
> How about "generic" TLDs (e.g. .com)?

Whether a localization-dependent detector runs on unlabeled .com/org/net/ needs to be decided. Hence, "*Maybe* this could be stricter" in comment 0.
Priority: -- → P4
(In reply to Masatoshi Kimura [:emk] from comment #1)
> How about "generic" TLDs (e.g. .com)?

My patch will let detection run on .com/net/org.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #9030558 - Flags: review?(VYV03354) → review+
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2df400cb88c
Avoid running Japanese, Russian and Ukrainian detectors on domains associated with different encoding legacies. r=emk.
https://hg.mozilla.org/mozilla-central/rev/a2df400cb88c
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Attached file enter.html
Regressions: 1543077
Regressions: 1585935
You need to log in before you can comment on or make changes to this bug.