Open Bug 1702914 Opened 4 years ago Updated 3 years ago

Proton need a menu of 'Character Encoding'

Categories

(Firefox :: Toolbars and Customization, enhancement)

Firefox 89
enhancement

Tracking

()

ASSIGNED

People

(Reporter: nolamiller1203+bugzilla, Assigned: hsivonen, NeedInfo)

References

Details

Attachments

(6 files)

Attached image 無題.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0

Steps to reproduce:

Set browser.proton.enabled = true.
Open the Hamburger menu right top corner of Firefox.

Please visit a below site contains UTF-8 Japanese character strings list.
If Firefox doesn't have a menu 'Character Encoding', We will see broken characters and we can't fix it on Proton. Please consider it.

https://charenc.herokuapp.com/a

Actual results:

No menu 'Character Encoding'

Expected results:

There is a menu 'Character Encoding'

The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Internationalization
Product: Firefox → Core

Core::Intl or Firefox?

Flags: needinfo?(hsivonen)

This is a Firefox decision.

The Text Encoding menu is available from the menu bar (less discoverable on Windows and Linux compared to Mac) and under toolbar customization.

rtestard, if bug 1687635 was fixed, what would be the chances of that single menu item being acceptable in the Proton hamburger menu?

Status: UNCONFIRMED → NEW
Component: Internationalization → Toolbars and Customization
Ever confirmed: true
Flags: needinfo?(hsivonen) → needinfo?(rtestard)
Product: Core → Firefox

Now that I think about it more, there's no reason why bug 1687635 needs to be fixed for the remaining instances of the Text Encoding menu before an item "Override Text Encoding" that performs the same action as View > Text Encoding > Automatic is added to the Proton hamburger menu.

rtestard, what are your thoughs on adding such a single item in the "More Tools" submenu of the Proton hamburger menu?

(Could alternatively be on the top level and not in the More Tools submenu.)

The most logical place would be an item below "Zoom".

The patch adds a menu item that does the right thing, but the patch doesn't yet disable the item when it should be disabled.

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

I put it under "Zoom", because that's also a menu item for the user to take counter measures against bad page authoring whereas the "More Tools" submenu has developer tools and this isn't one.

(In reply to Henri Sivonen (:hsivonen) from comment #3)

This is a Firefox decision.

The Text Encoding menu is available from the menu bar (less discoverable on Windows and Linux compared to Mac) and under toolbar customization.

rtestard, if bug 1687635 was fixed, what would be the chances of that single menu item being acceptable in the Proton hamburger menu?

The share of users suffering from this issue matters and reality is that we cannot add entries to menus for scenarios that are not relevant to most users since it will add noise to menus which makes it harder to find the selection you're after. Is there a way to have a breakdown of MAU using the character encoding feature currently per country?
I also feel like the current UX is suited to people who know about character encoding issues (fair to assume that this is a minority) and these people are likely the ones who would discover the feature under customize pane. If this issue significantly impacts users in specific countries then it seems like we need a UX that is more accessible or ideally no UX at all?

Flags: needinfo?(rtestard)

(In reply to Romain Testard [:RT] from comment #10)

The share of users suffering from this issue matters and reality is that we cannot add entries to menus for scenarios that are not relevant to most users since it will add noise to menus which makes it harder to find the selection you're after.

At least with this patch, the menu item knows to disable itself even more often than the Text Encoding menu in the menu bar is disabled. I do see how even a disabled item clutters up the menu, but at least there's no risk of users experimenting with it as an attractive nuisance when it's disabled.

Is there a way to have a breakdown of MAU using the character encoding feature currently per country?

To my knowledge, we don't have a way to estimate relative to MAU. I'll send you an estimate of how many users of the active daily users in Japan use the feature (from any of its entry points in Firefox 86) per day. (I haven't cleared that number for publication, so not putting it here.) The feature is used in Japan at the highest rate, but Taiwan and Hong Kong aren't far behind.

I also feel like the current UX is suited to people who know about character encoding issues (fair to assume that this is a minority) and these people are likely the ones who would discover the feature under customize pane.

It's not clear that people discover that part of the UI. As far as I'm aware, we don't have any data on the relative discoverability of the entry points. (But it's pretty obvious that the menu bar entry point is more discoverable on Mac than it is on Windows and Linux.)

Note that the patch makes the feature much more usable for non-expert users: The patch adds a single menu item instead of adding a submenu, so the user doesn't need to know how to choose from multiple items.

If this issue significantly impacts users in specific countries then it seems like we need a UX that is more accessible or ideally no UX at all?

I agree that no UX for this problem would be ideal, but we're not there. In particular, I don't know how to design a no-UX solution that wouldn't have an adverse feedback loop to how the Web authors behave making the Web Platform more brittle for everyone. Also, I don't want to make the feature so prominent that it would suggest to Web authors that it's OK for them to expect the user to use the feature. That is, I don't want the feature whose purpose is to remedy authoring errors to cause more authoring errors.

The impact is totally different for users of the Latin script vs. users of non-Latin scripts. For users of the Latin script, the failure mode is relatively benign. For users of non-Latin scripts, the failure mode is total unreadability of the text on the page. So the use case is rare, but in the non-Latin case, when the use case shows up, its severity level is "page totally unreadable". Of places where a non-Latin script is the primary script, places where the primary script has a Web-supported legacy encoding are a lot more often affected than places where the non-Latin script doesn't have a Web-supported legacy encoding.

That is to say, the feature is of interest primarily for Japanese, Chinese, Thai, Korean, Cyrillic, Greek, and Hebrew contexts, in that order (and somewhat less for Arabic). Still, people use the en-US locale all over the place and reading out-of-locale content is a supported activity, so I'd rather push the menu item into the More Tools submenu than make its presence depend on the localization.

I moved the menu item to the More Tools submenu. I put it above the dev tool items, because the code that fills in the dev tool items expects them to go last, and it doesn't make sense to polish this patch further at this point if Product rejects this. However, if Product OKs the menu item but only as the last item, I can then make the effort to make it go last.

The screenshot shows the menu item grayed out as it would appear on most Web pages (where Firefox already knows that overriding the encoding doesn't make sense).

This isn't WONTFIXed at this point but this isn't going into the first release of Photon, either.

Is there a way to have a breakdown of MAU using the character encoding feature currently per country?

This isn't exactly MAU, but:

I looked at major non-noisy regions that have the highest per-subsession Text Encoding menu usage by an over-month basis instead. That is, I looked at one per how many distinct telemetry submitters made a selection from the Text Encoding menu (via any of app menu, menu bar, or toolbar button; the telemetry does not distinguish) in Firefox 86 on the release channel over the period of four weeks that Firefox 86 was the latest release.

Japan: 1/622 distinct submitters made a selection from the Text Encoding menu (any entry point) in Firefox 86 on the release channel during the four weeks when 86 was the latest release, 28% of global
Traditional Chinese cohort (Taiwan + Hong Kong + Macao): 1/1126
Mainland China: 1/2278
South Korea: 1/2597
Thailand: 1/3175
Cyrillic cohort (Russia + Ukraine + Bulgaria + Belarus): 1/3236
Israel: 1/4792
Greece: 1/6400
Singapore: 1/8865
...
Germany: 1/27523

The share of global (the share of distinct submitters who used the menu at least once over the four weeks) isn't interesting apart from Japan standing out. Otherwise, it's more reflective of the total number of users than of the menu usage.

Bug reporting activity is that any time the feature becomes less available or works less well, there's a bug report connected to Japanese Web content rather quickly. There are also occasional complaints in this general space from the Cyrillic context. (As an exception, the issue about Fenix not having the menu is was originally reported about Vietnamese on a predominantly English page.) That is, apart from Japanese, the bug reporting rate doesn't map directly to the usage rate in the field.

I make these interpretations of this data:

  1. In addition to the severity of the failure mode, the usage rate in the Latin-script context (Germany here as a reference) is so different that Web browsing experience from that context gives no usage-experience-based intuition of judging the necessity of the feature in the Japanese context. (I only read Latin-script languages, so I lack this usage-experience-based intuition myself and, before seeing these numbers, have previously estimated form Japanese-related bug reports that this feature remains relevant enough to keep in the Japanese context.)
  2. While I don't have telemetry for other troubleshooting menu items, like Task Manager or Troubleshooting Mode…, to compare with, one in every 622 telemetry submitters in Japan over four weeks using this feature seems like a rather high user relevance (in Japan) for a problem recovery feature. Also, it seems significant that 28% of the distinct telemetry submitters who make a selection from the menu (any entry point) over four weeks were in Japan.
  3. If we were to contemplate locale-dependent visibility of the menu item for the Japanese locale, it would be weird to withhold it from other non-Latin locales.

Based on off-bug discussion, I think it's worthwhile to say what problems an encoding override solves, what other browsers do and if we have a single menu item that performs something automatic, why wouldn't we run that automation without the user having to trigger it.

For unlabeled legacy encodings, Firefox and Chrome both autodetect without user action. Safari guesses from the UI locale or, if the UI locale is Japanese, runs a Japanese-specific detector, AFAICT. Unlabeled legacy encodings is not the problem being solved by the Firefox menu anymore, even though it is in Safari.

For unlabeled UTF-8, Chrome, Fenix and mobile Safari provide no recourse; non-Latin pages are completely unreadable. (See a blog post for why this differs from the legacy encoding case.) Desktop Firefox and desktop Safari provide a menu to make the pages readable.

For incorrectly-labeled content, Chrome, Fenix and mobile Safari provide no recourse; non-Latin pages are completely unreadable. Desktop Firefox and desktop Safari provide a menu to make the pages readable. Especially with older content and especially in the case of universities, what the content is and how the server is configured can be disconnected from each other, but that old content may still be valuable to read on occasion. If we were to fix this case by running the detector without user action, we'd be ignoring an explicit declaration and replacing it with an occasionally incorrect guess. (These days, in terms of number of times Firefox users make a selection from the menu, the problem of incorrect labeling is larger than the problem of unlabeled UTF-8.)

Summary: Proton need a menu of 'Character Encoding' → Photon need a menu of 'Character Encoding'
Summary: Photon need a menu of 'Character Encoding' → Proton need a menu of 'Character Encoding'

If we were to contemplate locale-dependent visibility of the menu item for the Japanese locale, it would be weird to withhold it from other non-Latin locales.

It turns out we've done that kind of thing before in bug 596173. (CCing the reporter of that bug.)

The localizable pref that was introduced in that bug is still around but doesn't appear to have a practical effect. The code suggests that it affects the Text Encoding toolbar button somehow on profile migration. I tried creating a fresh profile with the Japanese localization of Firefox, and the toolbar button was hidden behind customization just like with U.S. English.

(The toolbar button was introduced in bug 865916.)

I'd be actively pushing against a solution that resembles the one used in bug 596173. If it's a setting, it should be somehow stored centrally in mozilla-central.

Localization repositories should not store prefs, especially not hacking the system and camouflaging that setting as a localizable string. This has lead to way too many errors and issues over time, and we're still paying the price for some of those choices.

:flod - would you be open to storing localization preferences in localization resource directory if it wasn't stored in form of localization files, but instead regular config/data file like TOML or JSON?

I have no preference here, just wondering if your position is more of a:

  • Don't store l10n metadata in l10n repository
  • Don't abuse l10n resource files for configuration data

or maybe both?

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #17)

I have no preference here, just wondering if your position is more of a:

  • Don't store l10n metadata in l10n repository
  • Don't abuse l10n resource files for configuration data

Both.

About l10n repositories: in the past, direct hg access was a common way to localize. That resulted in prefs getting set unnecessarily or incorrectly, no centralized way to check for issues, and a huge pain when it comes to removing them (searchplugins, default handlers, intl settings).

Nowadays, most communities don't have access to l10n repositories, which means I'd become the gateway for any change. That doesn't scale, and at that point it's better to have them stored centrally and in a single place, where we have full control and visibility over those changes.

l10n files: if settings are exposed in l10n files as "strings", they're also exposed in localization tools (Pontoon these days). And they get translated, often by localizers who don't fully understand what they're doing (that's the case for intl.properties, where typically boolean or empty values get random translations). To the point that I had to create tools to monitor those settings.

Having extra keys it's even worse, because it forces us to create one-off exceptions in our entire toolchain. That's the case for the Chinese home page, which has been removed at least 3 times during my tenure in the l10n-drivers team.

Gotcha. Thanks.

For the posterity - I agree that l10n file formats should not be used for config settings. I'm less certain about the location.

I see m-c configs as a great way to store sensitive data that we want to vet and maintain (say, default search engines per locale), but less for data that is heavily per-locale, but less business sensitive - for example "what locale should we fallback on when a string is missing in this locale".

In particular, within it are two options:

  • Store some config settings in l10n repo and bundle it into Firefox binary at release
  • Store some config settings in l10n repo and bundle it into langpacks

Both can potentially have a place and from the i18n architecture perspective I'd like to see our platform supporting all three models:

  • Data lives in m-c and is packaged into Firefox release
  • Data lives in l10n repos and is packaged into Firefox release
  • Data lives in l10n repos and is packaged into langpacks

Saying that, what we do right now (which is option 1) is sufficient and the benefit of introducing the latter two options is not really well understood yet so I'm not interested in driving any change. I just don't think it's as clear cut.

(In reply to Henri Sivonen (:hsivonen) from comment #15)

If we were to contemplate locale-dependent visibility of the menu item for the Japanese locale, it would be weird to withhold it from other non-Latin locales.

It turns out we've done that kind of thing before in bug 596173. (CCing the reporter of that bug.)

The localizable pref that was introduced in that bug is still around but doesn't appear to have a practical effect. The code suggests that it affects the Text Encoding toolbar button somehow on profile migration. I tried creating a fresh profile with the Japanese localization of Firefox, and the toolbar button was hidden behind customization just like with U.S. English.

(The toolbar button was introduced in bug 865916.)

Yeah, I implemented it. Although I'm not sure the approach is suitable under current design though. I guess that the function is still required for some locales even in these days. It's mentioned in comment 14.

I searched twitter briefly. Even in a couple of years, Japanese users still see some web pages which loaded with wrong decoder. And some people still use Shift_JIS or EUC-JP for writing their own pages... And also some of others use Firefox when they meet broken page due to the text encoding issue on Chrome. This can be a reason why Firefox users still use the UI a lot and report this quickly. Finally, some people say, wrong text encoding detection occurred only on Firefox. It seems that they don't know how to specify text encoding, e.g., <meta charset=> appears after <title> which has Japanese text.

And I googled, then I see 2 scenarios. One is that users see wrong detection on the Wayback Machine of Internet Archive. The other is, web server starts to send charset=utf-8 even though there are legacy encoding pages. (I'm not sure these cases are still true with the latest builds today.)

(In reply to Francesco Lodolo [:flod] from comment #16)

I'd be actively pushing against a solution that resembles the one used in bug 596173. If it's a setting, it should be somehow stored centrally in mozilla-central.

Localization repositories should not store prefs, especially not hacking the system and camouflaging that setting as a localizable string. This has lead to way too many errors and issues over time, and we're still paying the price for some of those choices.

To be clear, I wasn't advocating for doing it but noting what has happened in the past. (I think the Photon UI refresh in 57 is the only UI refresh cycle when we haven't had this discussion. It happened in the Firefox 4 cycle and in the Australis cycle.)

I see at least these reasons against a localizable pref:

  1. Last time we had it for this same thing, the next UI refresh moved back to having a single global design.
  2. Last time for this specific thing resulted in translations of the word "false" despite the localization notes explaining not to do that.
  3. Some localizations that are still among the highest in menu usage (assuming the obvious correspondence between localizations and regions; I looked at telemetry by region) failed to adjust the pref back then.
  4. Previously, when we've put configuration in this broader topic area in the localization files, it has resulted in configurations that we're too much work to adjust to cross-browser-consistent values one-by-one in the localizations so it was easier to program the logic in Gecko conditional on locale and keep that code under i18n peer review. (A boolean has lower risk, but still.)

(In reply to Masayuki Nakano from comment #20)

And also some of others use Firefox when they meet broken page due to the text encoding issue on Chrome.

I suspected things along these lines. Thanks.

Finally, some people say, wrong text encoding detection occurred only on Firefox.

I'd like to see examples of this so that I could fix these. The main known issue is bug 673087, which isn't on the release channel yet.

It seems that they don't know how to specify text encoding, e.g., <meta charset=> appears after <title> which has Japanese text.

That shouldn't result in unreadable text (at worst a reload) if the charset value is right (and there isn't a different one on the HTTP layer).

(In reply to Masayuki Nakano from comment #21)

The other is, web server starts to send charset=utf-8 even though there are legacy encoding pages. (I'm not sure these cases are still true with the latest builds today.)

Firefox never second-guesses an explicit server-declared encoding without user interaction via the menu, so this is very much a relevant use case. Based on telemetry, this looks like the most relevant scenario.

I tweeted somebody let me know what's the scenario when you need "Text Encoding" menu and/or let me know the URL which Gecko fails to detect the right encoding. And I got 3 replies for the latter. I filed them and set "See Also" to them.

I got an invalid case report.
http://www-imai.is.s.u-tokyo.ac.jp/~yato/

The HTTP request has charset=utf-8, but a legacy encoding (Shift_JIS) is used. This page is readable only with Firefox with specifying the text encoding to "Japanese".

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(Away: 4/29-5/5) from comment #24)

I got an invalid case report.
http://www-imai.is.s.u-tokyo.ac.jp/~yato/

This a great example of a use case: a university server setting charset=utf-8 server-wide without regard to what content already exists under the user directories.

The HTTP request has charset=utf-8, but a legacy encoding (Shift_JIS) is used. This page is readable only with Firefox with specifying the text encoding to "Japanese".

"Automatic" works as well.

An example of a page of broken characters.

URL: http://wikitools.web.fc2.com/

Attachment #9214135 - Attachment description: Bug 1702914 - Add an Override Text Encoding item to the Proton app menu. → WIP: Bug 1702914 - Make Repair Text Encoding available via the Proton app menu.

(In reply to Henri Sivonen (:hsivonen) from comment #13)

This isn't WONTFIXed at this point but this isn't going into the first release of Photon, either.

(Mixed up Proton and Photon there.)

I updated the patch now that bug 1687635 has landed in case there's bandwidth to consider this now that MR1 has shipped.

Telemetry showing distinct submitters using the feature at least once over four weeks from 70 to 88

I expect usage to fall in 89 for three reasons:

  1. The discoverability issue here.
  2. Bug 1702246 should reduce the need to use the menu in Traditional Chinese and perhaps Japanese contexts and maybe a tiny bit in Simplified Chinese and Korean contexts.
  3. Bug 673087 will reduce the need to use the menu globally.

As for how will we be able to tell what's attributable to the change in discoverability and what to the change in the need to use the feature, it should be possible to estimate the general shape of the change with bug 673087 excluded by querying the non-UTF-8 situations only, and it should be possible to estimate the general shape of the change with bug 1702246 excluded by querying non-CJK regions.

RT, now that MR1 has been shipped, what's the process of getting this reconsidered?

I've attached a patch that adds a menu item (not a submenu) to the bottom of the More Tools panel. See the screenshots in the two previous comments. I believe this is minimally obtrusive and minimally confusing (the item is disabled on modern sites) but allows users who still need the feature and who've we've taught for a decade to look for it in the app menu to discover it.

I suggest we take this patch.

Failing that, if the item is still considered too confusing to users who don't need it, as an alternative, I suggest taking the patch with the tweak of hiding the menu item (and the separator above it) if the Firefox UI is set to a Latin-script language unless an about:config pref is set.

Flags: needinfo?(rtestard)
Attachment #9214135 - Attachment description: WIP: Bug 1702914 - Make Repair Text Encoding available via the Proton app menu. → Bug 1702914 - Make Repair Text Encoding available via the Proton app menu.

I looked at one in how many distinct telemetry submitters, excluding Mac, in a given region made a selection from any entry point to the Text Encoding menu at least once over the last four weeks of a given release on the release channel. Both 88 and 89 were six weeks on the release channel, so this left two weeks of update ramp-up for both.

I excluded Mac, because it has the menubar always visible. I excluded Mac both in the total number of distinct submitters and in the menu usage. The other exclusions were in the menu usage only.

The columns are:

  • Geo
  • Change from 88 to 89, excluding Mac due to the always-visible menubar
  • Change from 88 to 89, excluding Mac, excluding overriding unlabeled UTF-8, and excluding the .jp/.in/.lk TLDs to try to rule out an effect from bug 673087.
  • Change from 88 to 89, excluding Mac, excluding overriding unlabeled UTF-8, excluding the .jp/.in/.lk TLDs, and excluding attempts to override guess made by chardetng to try to rule out effects both from bug 673087 and bug 1702246.
CN  -33%  -35%  -36%
GR  -23%  -19%  -18%
HK  -32%  -36%  -37%
IL  +10%   -5%  -11%
JP  -22%  -22%  -21%
KR  -34%  -23%  -25%
RU  -24%  -25%  -26%
TH  -32%  -34%  -33%
TW  -33%  -27%  -27%
UA  -14%  -15%  -13%

This compares changes in proportions computed independently for 88 and 89. That is, the "distict telemetry submitter" criterion is applied separately to 88 and 89.

Clearly, the usage rate is Israel is low enough that small changes cause big relative changes.

Otherwise, the most plausible explanation, especially for the regions with the most data points: JP, CN, and RU, is that the proportion of distinct telemetry submitters using the feature dropped due to the app menu entry point going away and some users not discovering the remaining entry points. The number in Japan for 88 excluding Mac is about the same as for 86 (including Mac) in comment 14, so the release-to-release result for Japan seems to have been otherwise stable of late.

It looks like one in five users in Japan who pre-Proton would have known how to deal with an encoding problem failed to figure out how to continue to do so in Proton despite a Japanese-language SUMO article on the topic being available.

Notably, the rest of the CJK cohort did even worse. (There are Simplified and Traditional Chinese translations of the SUMO article but no Korean translation.)

I think we should take the patch to put the Repair Text Encoding item in the More Tools submenu of the app menu.

And I should emphasize that since these are actual invocations of the menu items when enabled rather than just opening the submenu, these are unlikely to be random UI exploration, because on a typical site that menu items were disabled because the menu is inapplicable. For random UI exploration to explain the engagement with the app menu, it users would have had to happen to do the exploration on a site where the menu items were actually invocable, which is not the norm.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: