Closed Bug 833152 Opened 7 years ago Closed 3 years ago

[Browser] Support specifying character encoding of the webpage in Firefox OS

Categories

(Firefox OS Graveyard :: General, enhancement)

All
Gonk (Firefox OS)
enhancement
Not set

Tracking

(b2g18-)

RESOLVED WONTFIX
Tracking Status
b2g18 - ---

People

(Reporter: gkw, Unassigned)

References

Details

(4 keywords, Whiteboard: [UCID:Browser11, FT:Browser, KOI:P2][Please see comment 26 when triaging])

Build Identifier: 20130116230203
Git commit info: 3c84fa2b74b3dbe591f0d4fbedbac9aac...

On Desktop and Firefox for Android, there is an option to specify character encoding of the webpage, so we should consider having that option on Firefox OS too.
Would like to have, but not on 1.x branch.
Do we have user feedback indicating that this is needed? In which locales?

Does IE on WP8 have this? What about Safari on iOS?

Chrome on Android does not appear to have this.
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> Do we have user feedback indicating that this is needed? In which locales?

Well, I figured that we have it at least on Firefox for Android, that it might be worthwhile considering for Firefox OS, and less for competitive reasons.
The problem is that this is a crappy feature to have. If we can get away with not having it, that would be much better. Do we have telemetry data suggesting it is used by Firefox for Android users?
We don't have telemetry for this on Android. For desktop, there's telemetry starting with Firefox 22.
Filed bug 847870 about getting telemetry on Android. I think we should not do anything here before having data from other platforms.
Depends on: 847870
Duplicate of this bug: 879234
Adding productwanted - we need a decision on whether we want to do this or not.
Keywords: productwanted
Summary: Support specifying character encoding of the webpage in Firefox OS → [Browser] Support specifying character encoding of the webpage in Firefox OS
That seems a bit premature. We should get the outcome of whether it's used on Android first.
I sometimes meet web pages which don't specify correct encoding and our auto detection fails to find "right" encoding. Such web pages are not modern design, but we need to support such legacy web pages.
(In reply to Anne (:annevk) from comment #9)
> That seems a bit premature. We should get the outcome of whether it's used
> on Android first.

The encoding override wasn't used at all in 99.99% of Firefox for Android 22 sessions on the release channel.

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) (offline: 8/13-8/18 JST) from comment #10)
> I sometimes meet web pages which don't specify correct encoding and our auto
> detection fails to find "right" encoding. Such web pages are not modern
> design, but we need to support such legacy web pages.

When users do use the override, the reasons on Android break down as follows:
Overriding a previous override (i.e. the user already made an override and is unhappy with the result): 30%
The page had an encoding label (presumably a bad label in the user's opinion): 54%
The page was unlabeled: 16%

The case you mention (page is unlabeled, is not a file URL and autodetection has done its thing but wrongly in the user's opinion) is very rare on Android: 1% of the times the user activates the override match that case.
(In reply to Henri Sivonen (:hsivonen) from comment #11)
> (In reply to Anne (:annevk) from comment #9)
> > That seems a bit premature. We should get the outcome of whether it's used
> > on Android first.
> 
> The encoding override wasn't used at all in 99.99% of Firefox for Android 22
> sessions on the release channel.

How about per UI Locale?  Since Firefox mobile is low ADI at East Asia, I think that this data will be correct as worldwide, not Japan.  Modern mobile site usually uses UTF-8, so no encoding menu is no problem for modern mobile site that uses jQuery mobile.  But i-mode site (old mobile site in Japan) uses Shift-JIS or other encoding, so this menu is sometimes needed.

But we should discuss this with out partners (China Unicom, Hutchion, KDDI and KT) of East Asia whether necessary or not.  If they says "need", we should add this.
If they say "need" without data I don't think we're any further in improving the status quo for our users.
(In reply to Makoto Kato (:m_kato) from comment #12)
> How about per UI Locale?

Even though locale data was added to the telemetry ping in bug 668842, there's no UI for querying the data by locale (bug 847919). I sent e-mail to the telemetry team to ask if they can query the data by locale.

(In reply to Makoto Kato (:m_kato) from comment #12)
> But we should discuss this with out partners (China Unicom, Hutchion, KDDI
> and KT) of East Asia whether necessary or not.  If they says "need", we
> should add this.

I think we should not ask. It's the sort of question that's likely to get a "yes, it's needed" answer if the person answering doesn't have data to be confident that it's not needed.

Also note that mainland China, Taiwan and Korea each have a single dominant legacy encoding unlike Japan, so generalizing from Japan to East Asia doesn't make sense without data.
Duplicate of this bug: 889605
> Also note that mainland China, Taiwan and Korea each have a single dominant
> legacy encoding unlike Japan, so generalizing from Japan to East Asia
> doesn't make sense without data.

Note that mainland China's legacy encoding is GB/GBK, while that of Taiwan/Hong Kong is Big5.

I'm not sure how many legacy sites using these encodings are still available, but they are usually historical and updated much less often.

Not having these options usually renders the content almost completely unreadable and will definitely cause a switch to another browser / viewing device should there be a need for the content to be accessed, so that's something to note.
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #16)
> Not having these options usually renders the content almost completely
> unreadable

And it is the difference from Western (usually only a few accent characters are garbled).
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #16)
> > Also note that mainland China, Taiwan and Korea each have a single dominant
> > legacy encoding unlike Japan, so generalizing from Japan to East Asia
> > doesn't make sense without data.
> 
> Note that mainland China's legacy encoding is GB/GBK, while that of
> Taiwan/Hong Kong is Big5.

Like I said, each one of mainland China, Taiwan and Korea has a single dominant legacy encoding.

> I'm not sure how many legacy sites using these encodings are still
> available, but they are usually historical and updated much less often.

How many sites use legacy encodings is not the right question. If a site uses a legacy encoding and declares it, it works across all Firefox localizations just fine.

If a site uses the dominant legacy encoding for its locale and does *not* declare it, it still works fine in Firefox builds for the same locale. If you use zh-CN Firefox to browse an unlabeled GBK site, you're fine. If you use zh-TW Firefox to browse an unlabeled Big5 site, you're fine. If you use ko Firefox to browse an unlabeled EUC-KR site, you're fine.

If this is not the case for B2G, that's a bug different from this one!

The problems arise with out-of-locale unlabeled content. If you use zh-TW Firefox to browse an unlabeled GBK site, you're in trouble.

> Not having these options usually renders the content almost completely
> unreadable

Sure. However, it's bad if sites feel they can kick this problem to the user instead of fixing their sites. Other authoring failures that render sites completely unreadable are treated as something we contact site authors about—not as something exposed in technical terms to end users so that they can take a guess.
(In reply to Makoto Kato (:m_kato) from comment #12)
> How about per UI Locale?

https://bug906032.bugzilla.mozilla.org/attachment.cgi?id=796536

The level of non-use with Fennec in CJK locales is > 99.9% but < 99.99% whereas for non-CJK locales the level of non-use is > 99.99%. (Note that there's a Fennec table in the document after the Firefox table.)
(In reply to Henri Sivonen (:hsivonen) from comment #18)
> The problems arise with out-of-locale unlabeled content. If you use zh-TW
> Firefox to browse an unlabeled GBK site, you're in trouble.

See bug 910211 for an idea for alleviating this problem.
Whiteboard: [UCID: Browser11, FT:Browser, KOI:P2]
Whiteboard: [UCID: Browser11, FT:Browser, KOI:P2] → [UCID:Browser11, FT:Browser, KOI:P2]
It would be totally premature to fix this before fixing bug 933785. If bug 933785 was fixed, bug 910192 would kick in and non-windows-1252 would get locale-appropriate fallback encodings without the user having to bother with a menu like this.
Depends on: 933785
Blocks: 950786
Keywords: feature
Why is this blocking bug 950786? There's nothing delightful about the character encoding menu. Things are sad if the user has to deal with author errors. Let's instead pursue bug 933785 and bug 910211 to make the encoding menu less needed.

Again, it would be *totally* premature to throw this problem to the user to deal with before bug 933785 is fixed.
(In reply to Henri Sivonen (:hsivonen) from comment #22)
> Why is this blocking bug 950786? There's nothing delightful about the
> character encoding menu. Things are sad if the user has to deal with author
> errors.

I'm not advocating for this to be fixed before bug 933785 and bug 910211 (feel free to help persuade fixing those first!), and I know that we are not responsible for author issues, especially wrt. legacy pages out there on the web. However, the user does not know whose issue it is, and can potentially put the blame on Firefox OS instead of the author.

It's the experience that may make having this delightful, although this can be a subjective feeling - that's why I placed the dependency. I'm only speaking from my CJK experience with not-that-many legacy pages that cannot be viewed without changing the encoding.

Please feel free to disagree and change as you see fit, especially if you feel (from your CJK experience, or statistical results showing low usage, or otherwise) that this is not important to us, independent of bug 933785 and bug 910211.
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #23)
> Please feel free to disagree and change as you see fit, especially if you
> feel (from your CJK experience, or statistical results showing low usage, or
> otherwise) that this is not important to us, independent of bug 933785 and
> bug 910211.

My point is that we should *expect* our CJK behavior to be bad (worse than desktop/Android) unless bug 933785 is fixed. After that, I expect bug 910211 would improve the CJK experience beyond our current desktop/Android situation.

Adding UI on CJK grounds without fixing bug 933785 is the completely wrong way to go about improving the CJK experience, because it's bad to ask the user to fix stuff when we could easily improve things by making stuff work without user interaction.
s/magic/papercut/

this definitely falls into the category of things that hurt daily use for some population of users, and not the "delight" category of OS-level features.
Blocks: fxos-papercuts
No longer blocks: 950786
(In reply to Dietrich Ayala (:dietrich) from comment #25)
> this definitely falls into the category of things that hurt daily use for
> some population of users

Comment 19 indicates that this is *extremely* far from a "daily use" issue according to Fennec and desktop telemetry (when the fallback encoding code that we have works as designed).

Our B2G problem (bug 933785) is that we always use a fallback encoding that's appropriate for English/Portuguese/Spanish etc. even if the user's locale is Polish/Serbian/CJK etc. It's totally backwards to address the problem by letting the user fix this on a page-by-page basis instead of us applying a locale-appropriate fallback in the first place.

Concretely, to make sure we apply the locale-appropriate fallback, we need to make sure that when https://mxr.mozilla.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#61 runs in the content process, the return value is the language code of the Gaia locale.
Whiteboard: [UCID:Browser11, FT:Browser, KOI:P2] → [UCID:Browser11, FT:Browser, KOI:P2][Please see comment 26 when triaging]
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.