<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 1

•

4 years ago

A look at the relevant telemetry: https://hsivonen.fi/encoding-telemetry/

Assignee

Updated

•

4 years ago

Depends on: 1704056

Assignee

Updated

•

4 years ago

Blocks: 1701828

Assignee

Comment 2

•

4 years ago

Additional non-user-facing benefit:

The back end code for supporting the non-Automatic menu items got in the way of implementing bug 673087. Based on that experience, chances are that fixing bug 1701828 would be much easier if we first implemented the change proposed here.

Assignee

Comment 3

•

4 years ago

Attached file Bug 1687635 part 1 - Add new localizable strings for Override Text Encoding. (obsolete) — Details

Updated

•

4 years ago

Attachment #9217362 - Attachment description: Bug 1687635 part 1 - Add new localizable strings. → Bug 1687635 part 1 - Add new localizable strings for Override Text Encoding.

Assignee

Comment 4

•

4 years ago

Attached file Bug 1687635 part 1 - Replace Text Encoding submenu with Repair Text Encoding item. — Details

The changeset deliberately does not clean up the resulting dead code
to make reverting easier if needed.

Assignee

Comment 5

•

4 years ago

Attached file Bug 1687635 part 2 - Disable Repair Text Encoding when known not to have effect. — Details

https://treeherder.mozilla.org/#/jobs?repo=try&revision=97dfadd18550439cef83e0c7155a5a5fc6870b31

Assignee

Comment 6

•

4 years ago

Assignee

Updated

•

4 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e333f4b6299d4e97be4e924bd1099bd035e25cd

Assignee

Comment 7

•

4 years ago

Assignee

Updated

•

4 years ago

Attachment #9217363 - Flags: ui-review?(mwalkington)

Assignee

Comment 8

•

4 years ago

I intend to remove the resulting dead code as a follow-up.

Updated

•

4 years ago

Attachment #9217363 - Attachment description: Bug 1687635 part 2 - Change UI code for Override Text Encoding. → Bug 1687635 part 1 - Replace Text Encoding submenu with Override Text Encoding item.

Updated

•

4 years ago

Attachment #9217373 - Attachment description: Bug 1687635 part 3 - Disable Override Text Encoding when known not to have effect. → Bug 1687635 part 2 - Disable Override Text Encoding when known not to have effect.

Updated

•

4 years ago

Attachment #9217362 - Attachment is obsolete: true

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffb54adc93a676462bd4e618ee94a10ab9f7f6d4

Assignee

Comment 9

•

4 years ago

Assignee

Comment 10

•

4 years ago

It was pointed out to me that needinfo might be better than ui-review? these days.

Flags: needinfo?(mwalkington)

Assignee

Comment 11

•

4 years ago

The ui-review question being: Is the menu item label "Override Text Encoding" OK enough to land this?

Magnus Melin [:mkmelin]

Updated

•

4 years ago

Blocks: 1704749

Comment 12

•

4 years ago

Hi Henri, can you help me understand where this label appears (perhaps via screenshot and trigger instructions)?

Changes to menus added via the Customize pane were not in scope for MR1 and Flod advised not to make any changes to this menu.

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Assignee

Comment 13

•

4 years ago

Attached image Screenshots: Old on the left, new on the right — Details

Hi Henri, can you help me understand where this label appears (perhaps via screenshot and trigger instructions)?

The string "Override Text Encoding" appears in two places:

In the menubar View menu as a menu item in the place currently occupied by the Text Encoding submenu. (I.e. the submenu is replaced with a single item called Override Text Encoding whose function is the same as the current item Automatic in the current submenu.)
As the label of the toolbar button, which is currently labeled Text Encoding in toolbar customization or in the overflow menu. (Instead of the button opening a menu, with this patch clicking the button performs the operation that is currently performed by the item Automatic in the menu currently opened by the button.)

The string "Guess text encoding from page content" appears as the tooltip for the toolbar button.

The accelerator key, c, for the Override Text Encoding menu item in the View menu intentionally remains the same as the accelerator key for the current submenu.

I generated try builds for experimentation:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd2f329ffee9e3cf540c45bc0fa419e7125ef133

Note that the menu item and button are disabled on pages where they'd do nothing. Here's an example of a page where they are enabled:
https://hsivonen.com/test/charset/unlabeled-utf8/ja.htm

I'm attaching an image of screenshots with the current situation on the left and the situation with the patch on the right.

In the View menu, the item Automatic of the Text Encoding submenu becomes an item Override Text Encoding in the View menu directly and the submenu and the other submenu items go away.

The item Automatic in the menu opened by the toolbar button becomes the action of the toolbar button itself so that that the toolbar button no longer opens a menu. When the action is not available, the toolbar button itself is disabled instead of the current state where the toolbar button is always enabled but its menu items are disabled.

Flags: needinfo?(hsivonen) → needinfo?(mwalkington)

Assignee

Comment 14

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #13)

https://hsivonen.com/test/charset/unlabeled-utf8/ja.htm

While that page enabled the feature, these are better examples in terms of what scenarios they represent:
https://hsivonen.com/test/charset/mislabeled-as-utf8/ja-Shift_JIS.html
https://hsivonen.com/test/charset/mislabeled-as-legacy/ja-EUC-JP.htm

Comment 15

•

4 years ago

Thanks, Henri. I also spoke with Flod to get a better understanding of text encoding in general.

What is the feature overriding text encoding with? i.e., the page is not displaying correctly. I select "Override Text Encoding" to fix the issue, which overrides the current text (or characters?) with _____.

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Assignee

Comment 16

•

4 years ago

•

Edited

What the user wants to do is to override the text encoding that they visually believe to be wrong with the correct one. What the menu item does is it overrides the text encoding with a guess made from the page content.

Whether the verb should be what the user is trying to do (Correct, Fix) or less promising of success (Override, Guess) depends on whether we want to claim success for mechanism that is successful with a high probability (extremely high for the main uses cases) but that doesn't guarantee success.

I'm shy to use a verb that over-promises, because:

There are going to be failures but very rarely for the main use case of applying the feature to non-Latin-script content. (I still believe that this change will be a net usability improvement; users aren't really that good at choosing right on the first try from the long manual list.)
Overpromising invites the question "If you know how to correct it, why don't you just do it by default?" (But see previous point.)
Experience indicates some users who use this feature believe that they know encoding stuff better than Firefox does and are upset if Firefox takes away their options when Firefox knows that none of the options that were available in an older version would work. However, in this case (see first point), there are going to be (rare) cases where this patch takes away options and the user actually could know better than Firefox. I don't want to flame users in that case.

Flags: needinfo?(hsivonen) → needinfo?(mwalkington)

Comment 17

•

4 years ago

It sounds like "Override text encoding" is technically accurate and safe. It would nice if the label could communicate the benefit of the feature without over-promising. Did you consider something like: "Troubleshoot text encoding"?

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Assignee

Comment 18

•

4 years ago

(In reply to Meridel [:meridel] from comment #17)

It sounds like "Override text encoding" is technically accurate and safe. It would nice if the label could communicate the benefit of the feature without over-promising. Did you consider something like: "Troubleshoot text encoding"?

I hadn't considered "Troubleshoot". Now that I consider it, my first thought is that the UI items that I've seen before that have said "Troubleshoot" are more ceremonious than this one: They either launch a separate more or run something automated that takes for long enough to have a progress bar. In this case, the menu item / toolbar button just does its thing instantaneously without any further UI.

If "Troubleshoot" sets the expectation of some further UI appearing and then no further UI appears, might that be confusing?

Flags: needinfo?(hsivonen) → needinfo?(mwalkington)

Comment 19

•

4 years ago

I am less concerned about "Troubleshooting" setting a false expectations of a separate or elongated process, and more that users would expect something called 'troubleshoot' to diagnose a problem. Would it be accurate to say this feature diagnoses a problem? By correcting the text encoding do they then understand the source of the issue?

Documenting label options:

Override text encoding: Technically accurate but does not communicate the benefit of the feature
Troubleshoot text encoding: Gets at the benefit of the feature but could set expectations of diagnosis?
Fix text encoding: Communicates benefit but could over-promise. Our goal in product copy isn't to be 100% technically accurate, though, so this label may be the best option for most contexts.
Fix garbled text: Communicates benefit but could over-promise. Plainer language than "text encoding" but is it accurate enough?
Fix default text encoding: More specific than option 3 and 4, but longer

FYI, if a request like this is time-sensitive and I become a blocker for shipping, please don't hesitate to reach out to me on Slack (@Meridel). Betsy and I make up the content design team and are spread thin across projects so these kind of string requests can get lost in the daily shuffle!

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Assignee

Comment 20

•

4 years ago

(In reply to Meridel [:meridel] from comment #19)

I am less concerned about "Troubleshooting" setting a false expectations of a separate or elongated process, and more that users would expect something called 'troubleshoot' to diagnose a problem.

To get impressions from more people on various options, I asked about this in the DOM Core team meeting. The result was that "Troubleshoot" suggested a bigger thing than other options.

Would it be accurate to say this feature diagnoses a problem?

I guess it's accurate enough that the feature makes a partial diagnosis as part of its internal operation, but what's exposed to the user is the page became readable (likely) or the page remained unreadable (unlikely).

By correcting the text encoding do they then understand the source of the issue?

I expect users who use this feature to be aware of the conceptual source of the problem before invoking the UI even if they couldn't identify the exact pair of wrong and right encoding in a particular case.

Documenting label options:

Override text encoding: Technically accurate but does not communicate the benefit of the feature

Troubleshoot text encoding: Gets at the benefit of the feature but could set expectations of diagnosis?

"Troubleshoot" also could set expectations of a bigger operation than what's going to happen.

Fix text encoding: Communicates benefit but could over-promise. Our goal in product copy isn't to be 100% technically accurate, though, so this label may be the best option for most contexts.

Apart from potentially over-promising, this raises the question "If you know how to fix it, why didn't you just do it already?" (This concern was raised in the DOM Core meeting.)

Fix garbled text: Communicates benefit but could over-promise. Plainer language than "text encoding" but is it accurate enough?

I'd like to keep "Text Encoding" in the string in order to keep the term that people who've previously used the submenu have already seen and to enable the same accelerator key letter to be found in the string.

The technically accurate term would be "Character Encoding", and we used to use that. However, we previously changed it from "Character Encoding" to "Text Encoding" in order to have both the more colloquial "Text" and the technical "Encoding" there and to align with Apple's terminology. Safari's corresponding submenu is labeled "Text Encoding".

Fix default text encoding: More specific than option 3 and 4, but longer

I think "default" suggests technically a somewhat wrong thing considering how there default is understood to be a now-obsolete browser setting and not "whatever was initially determined for this page this time".

However, if we're OK with a longer string, others suggested "Force Automatic Text Encoding" and observed that text editors have a "Reopen with Encoding" feature. In the browser context and with the automation here, the closest to the latter would be "Reload with Automatic Text Encoding".

It's unclear if the nuance contemplated here reaches the users of this feature, since if we assume that users in the regions where the feature is used the most use localized versions of Firefox, users will see the Japanese, Traditional Chinese, or Simplified Chinese translation (and, to lesser extent, other non-Latin-script translations).

Flags: needinfo?(hsivonen) → needinfo?(mwalkington)

Assignee

Comment 21

•

4 years ago

I expect users who use this feature to be aware of the conceptual source of the problem before invoking the UI even if they couldn't identify the exact pair of wrong and right encoding in a particular case.

That is, for users of the feature, I expect the exact verb not to be the critical factor and I expect them to be looking for the "Text Encoding" part.

Comment 22

•

4 years ago

Thanks, Henri. I appreciate you asking the DOM team and sharing context. Using "automatic" is a bit confusing because I assume what the user sees before trying to fix the issue is 'automatically' there.

New and remaining options:

Force text encoding (leave it at this?)
Manual text encoding (implies that whatever showed up by default or automatically is not correct and a one-off, manual fix is needed)
Repair text encoding (this is pretty pedantic but "repair" might not promise as much as "fix" since "repair" is the act of repairing something — the outcome is not promised — while "fix" is the action itself)
Unscramble text encoding (in articles on this topic I am seeing people use the word 'unscramble' to describe fixing text encoding)
Override text encoding

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Assignee

Comment 23

•

4 years ago

•

Edited

(In reply to Meridel [:meridel] from comment #22)

Force text encoding (leave it at this?)

I think this is less informative than Override. This one also doesn't explain why and this seems less suggestive of changing it than Override.

Manual text encoding (implies that whatever showed up by default or automatically is not correct and a one-off, manual fix is needed)

I think this is somewhat weird in the sense that what this patch is doing is taking away the possibility of manually choosing which one and leaving only the option to trigger the automated guessing.

Repair text encoding (this is pretty pedantic but "repair" might not promise as much as "fix" since "repair" is the act of repairing something — the outcome is not promised — while "fix" is the action itself)

I agree that in English this seems better than Fix. The nuance between Repair and Fix might be lost in some translations.

Unscramble text encoding (in articles on this topic I am seeing people use the word 'unscramble' to describe fixing text encoding)

I think this could work, although what's being unscrambled is, pedantically, the text and not the text encoding.

Override text encoding

How about I change the menu item and the button label to "Repair Text Encoding", keep the button tooltip as "Guess text encoding from page content", and land?

Flags: needinfo?(hsivonen) → needinfo?(mwalkington)

Comment 24

•

4 years ago

Yes, I am happy with that solution, with a couple minor tweaks:

Label: Repair text encoding (can we make this sentence case? Or are you treating this as a proper feature name? For Proton/MR1, we are moving to sentence case in all core UI)

Tooltlip: Guess correct text encoding from page content (added "correct" to be more specific and based on your comment above that 'What the user wants to do is to override the text encoding that they visually believe to be wrong with the correct one. '

If you are good with these tweaks, we should then get Flod review.

Flags: needinfo?(mwalkington) → needinfo?(hsivonen)

Updated

•

4 years ago

Attachment #9217363 - Attachment description: Bug 1687635 part 1 - Replace Text Encoding submenu with Override Text Encoding item. → Bug 1687635 part 1 - Replace Text Encoding submenu with Repair Text Encoding item.

Updated

•

4 years ago

Attachment #9217373 - Attachment description: Bug 1687635 part 2 - Disable Override Text Encoding when known not to have effect. → Bug 1687635 part 2 - Disable Repair Text Encoding when known not to have effect.

Assignee

Comment 25

•

4 years ago

(In reply to Meridel [:meridel] from comment #24)

Yes, I am happy with that solution, with a couple minor tweaks:

Label: Repair text encoding (can we make this sentence case? Or are you treating this as a proper feature name? For Proton/MR1, we are moving to sentence case in all core UI)

Tooltlip: Guess correct text encoding from page content (added "correct" to be more specific and based on your comment above that 'What the user wants to do is to override the text encoding that they visually believe to be wrong with the correct one. '

Thanks!

If you are good with these tweaks, we should then get Flod review.

flod, I'm needinfoing you in case Phabricator doesn't surface the review request in this case. (Meridel clarified off-bug that the menu item should use the case "Repair Text Encoding" while the button label should use the case "Repair text encoding".)

Flags: needinfo?(hsivonen) → needinfo?(francesco.lodolo)

Assignee

Updated

•

4 years ago

Attachment #9217363 - Flags: ui-review?(mwalkington)

Francesco Lodolo [:flod]

Comment 26

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #25)

flod, I'm needinfoing you in case Phabricator doesn't surface the review request in this case. (Meridel clarified off-bug that the menu item should use the case "Repair Text Encoding" while the button label should use the case "Repair text encoding".)

Done. Note that blocking reviewer is a safer choice for a patch that has already been reviewed ;-)

Flags: needinfo?(francesco.lodolo)

Pulsebot

Comment 27

•

4 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9f57ec83cdc6 part 1 - Replace Text Encoding submenu with Repair Text Encoding item. r=Gijs,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/b0b7678a6781 part 2 - Disable Repair Text Encoding when known not to have effect. r=emk

Iulian Moraru

Comment 28

•

4 years ago

•

Edited

Backed out for causing bc failures on browser_967000_button_charEncoding.js.

Push with failures

Failure log

Backout link

This also failed on test verify: https://treeherder.mozilla.org/jobs?repo=autoland&revision=b0b7678a67810f52147536f5ca65070ec476b36d&searchStr=tv&selectedTaskRun=DFdR8g3GR96vvDjglEyu1Q.0

Flags: needinfo?(hsivonen)

Assignee

Comment 29

•

4 years ago

(In reply to Francesco Lodolo [:flod] from comment #26)

(In reply to Henri Sivonen (:hsivonen) from comment #25)

flod, I'm needinfoing you in case Phabricator doesn't surface the review request in this case. (Meridel clarified off-bug that the menu item should use the case "Repair Text Encoding" while the button label should use the case "Repair text encoding".)

Done. Note that blocking reviewer is a safer choice for a patch that has already been reviewed ;-)

Thanks.

(In reply to Iulian Moraru from comment #28)

Backed out for causing bc failures on browser_967000_button_charEncoding.js.

Push with failures

Failure log

Backout link

This also failed on test verify: https://treeherder.mozilla.org/jobs?repo=autoland&revision=b0b7678a67810f52147536f5ca65070ec476b36d&searchStr=tv&selectedTaskRun=DFdR8g3GR96vvDjglEyu1Q.0

Well, this is strange. The test passes locally individually and didn't show up on try:
https://treeherder.mozilla.org/jobs?repo=try&revision=5568003c9b92038e1f98f3bab81bf4d0383c341f&selectedTaskRun=BYNrt36STWajNuwQQ8SN3w.0

Flags: needinfo?(hsivonen)

Assignee

Comment 30

•

4 years ago

Ah, it's a TSAN failure. Probably the test needs to wait. It already has an earlier need to wait.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=633d34e0fc8d9200b6fbb7172edb51d9aa99eaab

Assignee

Comment 31

•

4 years ago

Pulsebot

Comment 32

•

4 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/701981112733 part 1 - Replace Text Encoding submenu with Repair Text Encoding item. r=Gijs,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/bd95df8be7ca part 2 - Disable Repair Text Encoding when known not to have effect. r=emk

Dorel Luca [:dluca]

Comment 33

•

4 years ago

Backed out 2 changesets (bug 1687635) for Browser-chrome failures in browser/components/customizableui/test/browser_967000_button_charEncoding.js. CLOSED TREE

Log:
https://treeherder.mozilla.org/logviewer?job_id=340926820&repo=autoland&lineNumber=2975

https://treeherder.mozilla.org/logviewer?job_id=340926842&repo=autoland&lineNumber=2642

Backout:
https://hg.mozilla.org/integration/autoland/rev/08357f132f6f8d08eb3e5601c82e2b4dae6ea8ec

Flags: needinfo?(hsivonen)

Assignee

Comment 34

•

4 years ago

The failures are in a part of the test that (before the attempt to work around this failure) were not changed by the patch on the test side and were at least not supposed to be changed by the patch on the browser UI side. Also, the failures are clearly intermittent, because they didn't happen on the landing itself but on a later run.

Gijs, do you have ideas of what goes wrong and what the action should be?

Flags: needinfo?(hsivonen) → needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 35

•

4 years ago

•

Edited

(In reply to Henri Sivonen (:hsivonen) from comment #34)

The failures are in a part of the test that (before the attempt to work around this failure) were not changed by the patch on the test side and were at least not supposed to be changed by the patch on the browser UI side. Also, the failures are clearly intermittent, because they didn't happen on the landing itself but on a later run.

Well, the TV jobs were orange on the initial push. :-)

Gijs, do you have ideas of what goes wrong and what the action should be?

The test indicates that the button should be disabled initially but isn't, so the question is... what should be disabling the button, and is that not happening, or is something else untowards happening that then re-enables it, before we check? Adding logging to your patch and pushing to try (if you can't reproduce locally with --verify) should help elucidate. I left some more notes on phabricator.

Flags: needinfo?(gijskruitbosch+bugs)

Assignee

Comment 36

•

4 years ago

Try run with logging:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bd6bf67de73c118932aa3f898d5d255d7f9f4f0b

Assignee

Comment 37

•

4 years ago

Without the waiting:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2355ab40b87be0f2e679a4d516652676e2124964

Assignee

Comment 38

•

4 years ago

I think the cause is this async section:
https://searchfox.org/mozilla-central/rev/2b372b94ce057097a6ef8eb725f209faa9d1dc4d/browser/components/customizableui/CustomizeMode.jsm#555

And I think the test has was already bogus on this point but this changed something about the timing in an unlucky way.

Assignee

Comment 39

•

4 years ago

Let's see if this fixes it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ca18418a8189bf955c4f7a57e6a203ee4091eb7

Assignee

Comment 40

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Let's see if this fixes it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ca18418a8189bf955c4f7a57e6a203ee4091eb7

It doesn't.

I'm very tempted to remove the assertion from the test, since I can see that things are OK in manual testing.

Assignee

Comment 41

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #40)

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Let's see if this fixes it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ca18418a8189bf955c4f7a57e6a203ee4091eb7

It doesn't.

One more logging round:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06461d4b66935179846c5e382982401411148c6d

I'm very tempted to remove the assertion from the test, since I can see that things are OK in manual testing.

It doesn't seem useful to put a lot of time into chasing this. If there's a real user-facing bug, it would only show up on a rare proportion of times when the user closes the customization view, which is a rare event itself.

Assignee

Comment 42

•

4 years ago

Trying with the assertion removed:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18702a42703984720d1d6c687e1e0358b1a8f7f3

One more:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a089f6be2f62ac06dd10521fd4050194ee93f95

Assignee

Comment 43

•

4 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c7c56847c48a4b180e9a738e2a041f88bea66da

Assignee

Comment 44

•

4 years ago

Assignee

Comment 45

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #44)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c7c56847c48a4b180e9a738e2a041f88bea66da

The run that passes looks suspicious on terms of expected test logging being missing.

Assignee

Comment 46

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #41)

It doesn't seem useful to put a lot of time into chasing this. If there's a real user-facing bug, it would only show up on a rare proportion of times when the user closes the customization view, which is a rare event itself.

I uploaded a version of the patch that removes the assertion to Phabricator.

Gijs, is it OK to land this without the problematic assertion (once the soft freeze is over)?

As noted above, chasing this particular assertion looks like a really bad use of developer time in terms of the possible user impact: Manual testing suggests this is not a problem. And even if it was, it would be pretty harmless in itself (button enabled when it shouldn't and not doing anything harmful if pressed), would take place only upon exiting customization (rare event), and would be resolved upon the next page load anyway.

Flags: needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 47

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #45)

(In reply to Henri Sivonen (:hsivonen) from comment #44)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c7c56847c48a4b180e9a738e2a041f88bea66da

The run that passes looks suspicious on terms of expected test logging being missing.

Logging from tests gets buffered on infra, and elided from the complete log output if the test passes, unless you call SimpleTest.requestCompleteLog. This is because otherwise log files for test runs would be even more huge than they are at the moment. This is also why the log output includes stuff like "Buffered messages logged at <time>", so that you can reconstruct the chronology wrt other output that is not buffered.

(In reply to Henri Sivonen (:hsivonen) from comment #46)

As noted above, chasing this particular assertion looks like a really bad use of developer time in terms of the possible user impact: Manual testing suggests this is not a problem. And even if it was, it would be pretty harmless in itself (button enabled when it shouldn't and not doing anything harmful if pressed), would take place only upon exiting customization (rare event),

I don't really follow - the test doesn't use customize mode. Is the effect from one of the preceding tests?

and would be resolved upon the next page load anyway.

I mean, I guess this is fine, though it's a bit disconcerting. The point of the assertion is to check the initial state of the button and that it gets updated. If we don't check the initial state of the button, the test would continue passing just the same if the button was always enabled, though I guess we have another test for that. Looking at that, though, it would appear that post-patch, browser_987640_charEncoding.js is a strict superset of what browser_967000_button_charEncoding.js tests, so perhaps we should just remove the latter - no point testing the same thing twice. Assuming I'm not missing something, r=me to land with the smaller test completely removed, and with the isTopLevel check added that I mentioned on phab, if my assumptions there are correct.

Flags: needinfo?(gijskruitbosch+bugs)

Try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d29ddc5f1a5ee495bb002ec2d90f70d5e8d39ba1

Assignee

Comment 48

•

4 years ago

(In reply to :Gijs (he/him) from comment #47)

(In reply to Henri Sivonen (:hsivonen) from comment #45)

(In reply to Henri Sivonen (:hsivonen) from comment #44)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c7c56847c48a4b180e9a738e2a041f88bea66da

The run that passes looks suspicious on terms of expected test logging being missing.

Logging from tests gets buffered on infra, and elided from the complete log output if the test passes, unless you call SimpleTest.requestCompleteLog.

While it's embarrassing that I managed to forget that, I guess it's in some ways good that I don't have this problem often enough to have a routine.

I don't really follow - the test doesn't use customize mode. Is the effect from one of the preceding tests?

IIRC, adding logging to the customization exit code path logged something when I ran the test individually. Anyway, in the interest of not burning time on that test, I didn't re-check this now.

Assuming I'm not missing something, r=me to land with the smaller test completely removed,

Removed. Thanks.

and with the isTopLevel check added that I mentioned on phab, if my assumptions there are correct.

I haven't changed anything on this point; see phab. Is this OK to land as-is (once the soft freeze ends)?

Flags: needinfo?(gijskruitbosch+bugs)

Narcis Beleuzu [:NarcisB]

Assignee

Updated

•

4 years ago

Blocks: 1713627

:Gijs (he/him)

Comment 49

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #48)

and with the isTopLevel check added that I mentioned on phab, if my assumptions there are correct.

I haven't changed anything on this point; see phab. Is this OK to land as-is (once the soft freeze ends)?

Sure. Thanks for clarifying.

Flags: needinfo?(gijskruitbosch+bugs)

Pulsebot

Comment 50

•

4 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bdf15c3d3b0a part 1 - Replace Text Encoding submenu with Repair Text Encoding item. r=Gijs,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/9992f4bb88c4 part 2 - Disable Repair Text Encoding when known not to have effect. r=emk

Comment 51

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bdf15c3d3b0a
https://hg.mozilla.org/mozilla-central/rev/9992f4bb88c4

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox91: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

Assignee

Comment 52

•

4 years ago

(In reply to :Gijs (he/him) from comment #49)

(In reply to Henri Sivonen (:hsivonen) from comment #48)

and with the isTopLevel check added that I mentioned on phab, if my assumptions there are correct.

I haven't changed anything on this point; see phab. Is this OK to land as-is (once the soft freeze ends)?

Sure. Thanks for clarifying.

Thanks.

Retitling the bug for better searchability.

Summary: Replace the Text Encoding menu with a single item Override Text Encoding → Replace the Text Encoding menu with a single override item Repair Text Encoding

Pascal Chevrel:pascalc

Comment 53

•

4 years ago

Henri, should that change be mentioned in our release notes?

Flags: needinfo?(hsivonen)

Assignee

Comment 54

•

4 years ago

(In reply to Pascal Chevrel:pascalc from comment #53)

Henri, should that change be mentioned in our release notes?

I think this change doesn't need to be relnoted, but, in retrospect, might have been good to relnote the removal of the app menu entry point that happened in 89: https://support.mozilla.org/en-US/kb/text-encoding-no-longer-available-firefox-menu

Flags: needinfo?(hsivonen)

heejoon.lee+github

Comment 55

•

4 years ago

Oh my god... auto fix is not working for me. please let me change encoding freely!

Flags: needinfo?(heejoon.lee+github)

Assignee

Comment 56

•

4 years ago

(In reply to heejoon.lee+github from comment #55)

Oh my god... auto fix is not working for me. please let me change encoding freely!

Please file a new bug and include the URL of a page that shows the problem.

Flags: needinfo?(heejoon.lee+github)

Alice Wyman

Comment 57

•

4 years ago

I tried using https://hsivonen.com/test/charset/unlabeled-legacy/ja-Shift_JIS.htm as a test page in Firefox 91 but Repair Text Encoding is greyed-out (a bug?). The same page shows a Text Encoding sub-menu of choices in Firefox 90.

Comment 58

•

4 years ago

(In reply to Alice Wyman from comment #57)

I tried using https://hsivonen.com/test/charset/unlabeled-legacy/ja-Shift_JIS.htm as a test page in Firefox 91 but Repair Text Encoding is greyed-out (a bug?).

This is not a bug. The detector already runs for unlabeled pages, so there's no point in running it again. There is no user-visible problem with that page thanks to the detector having run.

The same page shows a Text Encoding sub-menu of choices in Firefox 90.

That's because the there were menu items that did something (but wrong thing), so the items weren't disabled.

Comment 59

•

4 years ago

This should not have happened, and is just a mistake:

It prevents you from knowing which character set was used to display the message (although TBH this was already too vague with the existing UI)
It prevents you from setting a character set you believe you need (especially relevant for messages with content in multiple character sets, where there is no single valid setting).

About the benefits:

User doesn't need to know which item to choose.

That was true before. User can simply ignore this menu. And if one wants an emphasized "repair it for me", that can be achieved without removing anything.

Removal of one level of submenus, which might allow the new single item to remain in the hamburger menu.

You can put a single item on the hamburger menu and keep the existing proper menu.

Harder to get the user to self-XSS, because the attack needs to fool the detector instead of stating a particular encoding to the user.

I doubt XSS dependent on users setting their own message charsets is a real concern. Are there any numbers to back this up?

Allows for even more conditions where the item can be disabled (since there will be more cases where we know the item won't change anything), which saves users' time by not allowing the feature to be activated when useless.

This shows your implicit assumption that viewing and setting the character set is useless. automatic/repair may sometimes be useless, but the menu is never useless. Please revert this.

Assignee

Comment 60

•

4 years ago

(In reply to aliledudiable from comment #59)

This should not have happened, and is just a mistake:

Your comment on bug 1704749 suggests that your comment here is about the corresponding change in Thunderbird and not about this change in Firefox.

The limitation that this change would not address the issue of a single byte stream mixing multiple encodings was known before making this change, and I judged it to be enough of an edge case within an itself ever rarer problem space not to address in Firefox.

It prevents you from knowing which character set was used to display the message (although TBH this was already too vague with the existing UI)

Ctrl-i (cmd-i on Mac) shows the encoding (and without the vagueness of the old menu).

This shows your implicit assumption that viewing and setting the character set is useless. automatic/repair may sometimes be useless, but the menu is never useless. Please revert this.

Clearly, retaining the Repair Text Encoding item (previously called "Automatic") as opposed to removing the whole feature demonstrates an assumption of utility.

Having a whole menu around for a niche problem is the sort of thing that risks the entire feature (i.e. even the bit that now remains) getting removed, which would be worse than the current state in the (rare) cases where the is a need to repair the encoding. (Case in point: the removal from the hamburger menu.)

Comment 61

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #60)

(In reply to aliledudiable from comment #59)

This should not have happened, and is just a mistake:

Your comment on bug 1704749 suggests that your comment here is about the corresponding change in Thunderbird and not about this change in Firefox.

Yes, that is true. I'm sorry, I commented on the wrong bug, as I was directed here as the bug in which the submenu was removed from TB, and neglected to double-check the categorization. I was annoyed by the change and spoke too soon. I'll try to figure out which bug page I should actually comment on.

Having a whole menu around for a niche problem is the sort of thing that risks the entire feature (i.e. even the bit that now remains) getting removed, which would be worse than the current state in the (rare) cases where the is a need to repair the encoding. (Case in point: the removal from the hamburger menu.)

The term "repair" is really confusing, btw. It's not as though the web page , or the message, is actually altered, "repaired" in any way. I assume you mean "perform careful auto-detection"? Anyway, that feature is currently broken in TB 91 - the release version - and the menu is gone, to the users' chagrin. FYI.

Assignee

Comment 62

•

4 years ago

(In reply to aliledudiable from comment #61)

The term "repair" is really confusing, btw. It's not as though the web page , or the message, is actually altered, "repaired" in any way.

You can see how we arrived at "Repair" in the earlier comments of this bug.

I assume you mean "perform careful auto-detection"?

Yes.

Anyway, that feature is currently broken in TB 91 - the release version - and the menu is gone, to the users' chagrin. FYI.

That's bug 1713786.

Comment hidden (admin-reviewed)

Comment 64

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #62)
Thanks for the clarification. Still,

Having a whole menu around for a niche problem

Well,

It is not a whole menu; it is a menu item (which expands a submenu).
It is not a "niche problem". First, and most importantly, that is because regardless of frequency of occurrence, it is a fundamental problem: If you cannot change the character set you may simply not be able to read the message. It is not merely about ugly formatting. Second - it is not as rare as you think, for people who do not speak English or other basically-Latin-alphabet languages. Granted, its frequency is decreasing over time, but we are still far from its being negligibly small.
The charset override is a bit like, say, a spare tire for your car. We might call it a "niche" if we were only to consider the frequency of its use. But - it really needs to be in there.

is the sort of thing that risks the entire feature (i.e. even the bit that now remains) getting removed

Why does such a risk exist? Is there some impetus to remove critical features? Have we gotten any user complaints about this feature?

Comment 65

•

3 years ago

I am also quite upset with this change. It is not a niche problem when there two different ways that I have found myself running into it. To solve this issue, I have had to edit the HTML to specify the right encoding in order for Firefox to display the page properly, which is a terrible user experience compared to simply choosing the right one. Of course I could be a strange person and not representative, but I think these scenarios are worth considering. Everything I say here is based on my experience using Firefox and visiting lots of sites in a handful of languages.

One is that a fair number of websites don’t display properly without changing the encoding. I am of course not talking about big players like Google or Amazon. They would not make mistakes like this, but they are not the whole web, and visiting small websites by people who don’t quite know their way around web technologies is a big part of browsing the web, too. Just because something is not an issue for Google does not mean that it is not an issue for the website of some small Japanese software project. Perhaps even more daunting, though, is the various ISO/IEC 8859 encodings. Many of them are so incredibly similar in layout, differing only in a few characters, that it can be really hard to detect them accurately, and Firefox gets them wrong a decent amount in my own lived experience. I agree that it’s good for websites to use Unicode, but that’s not my decision as a user, and having the experience be miserable because someone else decided not to use Unicode is not an experience that I would like to have.

The argument I’ve seen put forward above is that it doesn’t let you choose an encoding which “wouldn’t work anyway,” so nothing is lost. But in practice, just because the byte stream is not well-formed as a specific encoding does not mean that that is not the correct encoding for the content. I run into this a decent amount. Sometimes textual content will have been uploaded to a website in a different encoding from what the website itself uses, and so the page will be in a mix of encodings. The user (me) would presumably want to choose the encoding used by the actual content, even if it means garbling some stuff at the top of the page. This is not a niche cache as you make it out to be.

Finally, I want to address the other way that I have run into this issue, which I agree is somewhat niche compared to what I mentioned above, but I still think is valuable to address. This is that when viewing content in the Wayback Machine, Firefox gets the encoding wrong for me about as often as it gets it right. Now, I understand that having websites from the 1990s and early 2000s work properly is less of an issue than the issues I raised above, but even so, I think an issue which makes the text of a page completely unreadable is a lot more pressing than an old page simply not being formatted properly.

Assignee

Comment 66

•

3 years ago

First, and most importantly, that is because regardless of frequency of occurrence, it is a fundamental problem: If you cannot change the character set you may simply not be able to read the message.

I understand and agree with this statement; it's a key reason why Firefox hasn't gone the route that Safari on iOS/iPad OS has taken. ("Message" suggests that you are writing about Thunderbird, though. This bug is about Firefox.)

Second - it is not as rare as you think

I don't know how rare it is for Thunderbird, but Thunderbird is off-topic for this bug. In the case of Firefox, I have no particular reason to suspect telemetry from back from when there was a submenu.

Perhaps even more daunting, though, is the various ISO/IEC 8859 encodings. Many of them are so incredibly similar in layout, differing only in a few characters, that it can be really hard to detect them accurately, and Firefox gets them wrong a decent amount in my own lived experience.

This is true, but the failure mode is significantly more benign than the failure mode with non-Latin encodings. Specifically, in the Latin case, you can still figure out what the text says. Telemetry suggested that readers of non-windows-1252 Latin text didn't bother using the menu that often.

The argument I’ve seen put forward above is that it doesn’t let you choose an encoding which “wouldn’t work anyway,” so nothing is lost.

The "wouldn't work anyway" point refers to cases like the server side interpreting UTF-8 bytes according to a single-byte encoding and then re-encoding that as UTF-8.

Now, I understand that having websites from the 1990s and early 2000s work properly is less of an issue than the issues I raised above, but even so, I think an issue which makes the text of a page completely unreadable is a lot more pressing than an old page simply not being formatted properly.

If you have cases where Firefox cannot detect a non-Latin-script encoding that was previously selectable from the submenu, I'm interested in treating concrete real-world cases as (possibly low-priority) bugs (distinct from this one).

Comment 67

•

3 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #66)

I don't know how rare it is for Thunderbird, but Thunderbird is off-topic for this bug.

I believe you're mistaken: this is a core bug, not a Firefox-specific bug. If you want to make Thunderbird off-topic, then please roll back this change in the core, and arrange it so that only Firefox will be affected.

In the case of Firefox, I have no particular reason to suspect telemetry from back from when there was a submenu.

Even thought browser use is not my main concern personally - can you elaborate regarding that telemetry? Specifically, are you sure you are not interpreting the telemetry in a way that favors the removal of the menu, while it may be also be interpreted differently? For example, suppose the Internet had 2 sites, a.com and b.org . Most people only visit a.com, where they don't need the menu; but on b.org you do need the menu; and some people, sometimes, visit b.org. Should the menu then be removed? You could argue "It's almost never used!" - but maybe b.org is an important site for the group of people who use it. And maybe that group of users are actually mostly of one ethno/lingual community, which are under-represented among Mozilla users. That may cast things in a different light.

This is true, but the failure mode is significantly more benign than the failure mode with non-Latin encodings. Specifically, in the Latin case, you can still figure out what the text says. Telemetry suggested that readers of non-windows-1252 Latin text didn't bother using the menu that often.

So, you're saying that readers of non-windows-1252 Latin text did use the menu occasionally.

If you have cases where Firefox cannot detect a non-Latin-script encoding that was previously selectable from the submenu, I'm interested in treating concrete real-world cases as (possibly low-priority) bugs (distinct from this one).

You are employing "dogmatic ignorance", and that is inappropriate. Let's turn this on its head: If you can show that there are no such issues, then removing the menu might be justifiable.

Assignee

Comment 68

•

3 years ago

(In reply to aliledudiable from comment #67)

(In reply to Henri Sivonen (:hsivonen) from comment #66)

I don't know how rare it is for Thunderbird, but Thunderbird is off-topic for this bug.

I believe you're mistaken: this is a core bug, not a Firefox-specific bug. If you want to make Thunderbird off-topic, then please roll back this change in the core, and arrange it so that only Firefox will be affected.

While this bug is filed in Core, this really is a Firefox change. Bug 1704749 is about Thunderbird, and the Core+Firefox change here didn't prevent Thunderbird from keeping the menu: The way the selection from the menu was used internally was quite different.

In the case of Firefox, I have no particular reason to suspect telemetry from back from when there was a submenu.

Even thought browser use is not my main concern personally - can you elaborate regarding that telemetry?

See bug 1702914 comment 14 and https://hsivonen.fi/encoding-telemetry/ .

This is true, but the failure mode is significantly more benign than the failure mode with non-Latin encodings. Specifically, in the Latin case, you can still figure out what the text says. Telemetry suggested that readers of non-windows-1252 Latin text didn't bother using the menu that often.

So, you're saying that readers of non-windows-1252 Latin text did use the menu occasionally.

Yes, but it was closer to windows-1252 than to non-Latin-script-using places. Unfortunately, bug 1702914 comment 14 doesn't include non-windows-1252 Latin-script places. They sorted between Singapore and Germany.

If you have cases where Firefox cannot detect a non-Latin-script encoding that was previously selectable from the submenu, I'm interested in treating concrete real-world cases as (possibly low-priority) bugs (distinct from this one).

You are employing "dogmatic ignorance", and that is inappropriate.

No, I'm saying that

I can't address remaining non-Latin problems without seeing concrete examples that need addressing,
given concrete examples, I might address (some) non-Latin cases,
I generally don't plan to address remaining Latin cases.

I already know that cases where one Latin encoding gets detected as another Latin encoding exist and fixing one example generally breaks another. As an exception, I have fixed a couple of Latin-related misdetections around non-letter symbols.

So:

Reverting the Firefox+Core change is extremely unlikely.
Thunderbird discussion doesn't belong in this bug.
Non-Latin legacy encoding misdetections are unactionable without concrete examples. I might fix issues (mixed-encoding documents excluded) with concrete examples (filed as distinct bugs).
Latin-to-Latin misdetections are unlikely to be fixed even if concrete examples were provided, because fixing one thing likely just breaks something else.

Comment 69

•

3 years ago

This is true, but the failure mode is significantly more benign than the failure mode with non-Latin encodings. Specifically, in the Latin case, you can still figure out what the text says. Telemetry suggested that readers of non-windows-1252 Latin text didn't bother using the menu that often.

This really does depend on the language and the proportion of the characters in the text which are in ASCII. French would probably be quite legible, whereas Vietnamese would not. But what bothers me about this position is that this is not considered big enough of a problem unless the text is completely unreasonable. This is perhaps justified if the page is only a few lines and the user will be out of there in no time, but ignores cases where there might actually be a need to read a page of decent length. And I am not a fan of the telemetry argument either. For the most part, I am not surprised because the intersection of users who read non-Unicode content in such languages and are also technologically knowledgeable about text encodings is surely only a certain subset of users. Of course not everyone knows what text encodings are. But the point here is that when a user does run into such problems, when the menu was there (especially in its earlier days before it started greying out all of possibilities that it thought were “impossible”), it was very easy for those users to fix a problem which is now much harder to fix. I know what I need to do to display a page, and now Firefox won’t let me do it, and the justification is “users rarely do it”. It’s incredible frustrating.

Latin-to-Latin misdetections are unlikely to be fixed even if concrete examples were provided, because fixing one thing likely just breaks something else.

So you admit that the detection does not work and is impossible to fix. This is the aspect of the position which you are taking which bothers me the most. You’re saying that if we run across a site which does not render properly, instead of just using the encoding menu which you have now removed, we would have to open a bug. Oh, but it’s a Latin encoding? Never mind, because you won’t fix it anyway! You are moving a responsibility which users used to be able to take into their own hands to Firefox itself, and then just deciding that it does not matter, because detecting encodings is too hard, and saying “You can still kind of sort of read this half-mojibake due to the common ASCII subset, anyway.”

Let me just leave off by saying that I have not seen a single user who has reacted positively to this change. Every Firefox user I have talked to, if they had noticed the change at all, has disliked this change and considered it akin to being handheld and belittled by a browser which thinks it knows better than you. Please don’t go down the route of “Unnecessary settings are bad, and things which are only occasionally necessary are unnecessary.”

Kevin Brosnan [Ex-Mozilla]

Comment 70

•

3 years ago

I should add that a Russian friend of mine has been getting around this issue by using an extension called FoxReplacer, and setting it to find-and-replace common mojibake sequences. So again, I want you to consider not just how often people use the menu, but what solving (more like duct-taping) the problem looks like when the menu is not available.

Updated

•

3 years ago

Restrict Comments: true

Kevin Brosnan [Ex-Mozilla]

Assignee

Comment 72

•

3 years ago

French would probably be quite legible, whereas Vietnamese would not.

For the exact same structural reason Vietnamese detection is remarkably accurate. (And UTF-8 was adopted for Vietnamese much earlier than for the Web in general.)

The Latin-to-Latin issue affects Hungarian, Lithuanian, Faroese, and Kurdish the most.

And I am not a fan of the telemetry argument either.

Keeping the old functionality around wouldn't have been free across a major change in this area (bug 1701828), so it's quite reasonable to look at usage levels.

by a browser which thinks it knows better than you

I understand that that appearance really annoys users. However, having investigated bug reports in this area, it is quite often the case the browser does know better: There are cases of server-side data corruption that are encoding problems but not the kind of encoding problems that the old menu would have addressed. (FoxReplacer as a remedy hints strongly at this kind of case.)

Updated

•

3 years ago

Restrict Comments: false

Comment 73

•

3 years ago

Now that comments are enabled again, I would like to express some more concerns.

I feel that telemetry is being given too much weight here. Yes, if you go by telemetry, you are probably unlikely to make a change which will be devastating for most of the userbase, but you can end up ignoring groups of users and their needs quite easily. I also find some of the interpretations of telemetry questionable. For example, since users often used the encoding menu more than once, that implies that their first guess was not correct. This has been given as justification for why “the browser knows better,” but this says nothing about whether or not they eventually found the correct encoding. And of course, with only the “repair” option available, there’s still no way to know. The word “often” has been thrown around a lot here, and it rubs me the wrong way. For example: “I understand that that appearance really annoys users. However, having investigated bug reports in this area, it is quite often the case the browser does know better.” Even if the browser knows better than the user more than half the time, that is not a good argument for removing this. The good argument would be “The user practically never knows better,” and I can speak from experience that that is not the case. In fact, I don’t think anyone here can claim that that is the case when certain Latin-script languages such as Lithuanian are hard for Chardet-NG to handle, and the accuracy is not that good. I understand that this accuracy is not likely to improve, which is exactly what makes this kind of thing important. Lithuania is unlikely to make a dent in telemetry, after all. And that dent will be even smaller when you consider users savvy enough to know to try to fix the encoding.

Reading the blog post about Chardet-NG, the motivation was stated repeatedly to have parity with Chromium. If a page works in Chromium but not in Firefox, that’s bad. While I understand that that is a useful goal for not driving users away to use the competition, it seems to me like that same thinking is being used in the other direction here: If a page is broken in Firefox, that is perfectly acceptable if it is broken in Chromium. Since Chromium removed the encoding menu ages ago, Firefox removing it won’t make Firefox any worse relative to Chromium. I think that this is a bad way to think about it.

LpSamuelm

Comment 74

•

3 years ago

Please do reconsider reverting this removal of user control. Henri Sivonen (:hsivonen) mentioned that he would like to see a real-world case; let me offer one! I do a lot of research on old archived Japanese websites – both as a hobby and for work (I work as a Japanese media translator) – and they're typically encoded in Shift-JIS, EUC-JP, or ISO-2022-JP without a meta tag. The automatic detection frequently fails to find the correct one (see, for example, this page), which means that there is a wealth of perfectly good HTTP(S)+HTML pages online that I simply can no longer access.

This change is an Anglocentric one that entirely breaks some pages in other languages (currently, Shift JIS and EUC-JP is used by ~7.2% of all .jp pages on the web). As :noname422 mentions, if you go by telemetry, only the majority (likely the USA and other Anglospheric countries) is considered. That, however, is no reason to disregard geographic, linguistic, and even ethnic minorities – which is what this change does. I truly wish that this behavior would be reverted, or at least that the old behavior would be offered alongside the new.

Assignee

Comment 75

•

3 years ago

•

Edited

(In reply to LpSamuelm from comment #74)

a real-world case; let me offer one!

Thank you.

I do a lot of research on old archived Japanese websites

I hope it's non-controversial to observe that we can't generalize broad user base needs from this.

The automatic detection frequently fails to find the correct one (see, for example, this page),

This is detected as Shift_JIS, because it mixes Shift_JIS and ISO-2022-JP in a single document. As noted previously, it is known that the present design doesn't address the issue of mixed-encoding documents.

What does "frequently" mean in your case? What kind of n in "every n weeks"?

This change is an Anglocentric one that entirely breaks some pages in other languages (currently, Shift JIS and EUC-JP is used by ~7.2% of all .jp pages on the web).

The usage in Japan was the main motivator for not removing a manual override altogether. It was also the main motivator for me to (so far unsuccessfully) advocate for making the Repair Text Encoding item available also in the app menu. Note how having more elaborate UI for encoding override resulted in the complete removal of the feature from the app menu. (See bug 1702914.)

The statistic you cite is not relevant for discovering how often users who read Japanese encounter a problem that the current design doesn't address.

When the encoding is correctly declared, there's no need to select anything from the menu. (As I understand the methodology of the site you cite, this is what it counts.)
When Shift_JIS, EUC-JP, or ISO-2022-JP is undeclared and the page is in one of those encodings, the page is readable and there's no need to select anything from the menu.
For undeclared UTF-8 or misdeclared UTF-8, Shift_JIS, or EUC-JP, the remaining menu item works.

Cases that don't work:

Content is actually ISO-2022-JP but misdeclared as UTF-8. (Was already unoverridable before the change here.)
More than one encoding in a single document. (Your example.)
Documents using the ISO-2022-JP-2 extensions to ISO-2022-JP. (Assumed very rare. Plausible with email archives containing emails sent by very old versions of Apple Mail.)
Documents using very obscure extensions of Shift_JIS and EUC-JP. (Somewhat obscure extensions are tolerated by the detector.)

As :noname422 mentions, if you go by telemetry, only the majority (likely the USA and other Anglospheric countries) is considered.

The links from comment 68 show analysis broken down by GeoIP categorization of telemetry submissions. This is not a case of only looking at telemetry on the global level and letting telemetry from "USA and other Anglospheric countries" hide the signal from elsewhere.

Comment 76

•

3 years ago

@HenriSivonen:

The usage in Japan was the main motivator for not removing a manual override altogether.

No, it wasn't; I believe you're rewriting history here. Users were not even asking for this to be removed; and if a request had been made, the motivators against it are:

This feature is useful to numerous groups of users in many cases, who otherwise have no recourse;
The removal doesn't achieve anything except for "one less menu item". ... and even for achieving that, you could make it optional.

@LpSamuelm : I am getting the sense that all arguments we could make will be brushed off. This removes the menu in Thunderbird? They can bring it back, so who cares. Messes up old Japanese texts? It's a niche case, so who cares. Anglo-centric change? No way, keeping it so far shows we're not anglo-centric. etc.

Comment 77

•

3 years ago

As noted previously, it is known that the present design doesn't address the issue of mixed-encoding documents.

Mixed encoding websites are fundamentally broken. On this we can agree. Where I think we would disagree is what to do about it. I would say that in such a scenario, the user needs to decide what text on the page they want to actually work. Chances are that the useful content is in one encoding, and the surrounding website blindly serving it is in another. While you cannot get entirely coherent text from any encoding selection in this scenario, you can get the actually useful content to work, which is what matters. Saying that the website should fix their stuff is not a good answer. If a scenario like this happens, it is nearly always because the page is so old and unmaintained that they have never noticed, or because the page is maintained, but the content being served is user-uploaded and so the website has nothing to do with it.

The links from comment 68 show analysis broken down by GeoIP categorization of telemetry submissions. This is not a case of only looking at telemetry on the global level and letting telemetry from "USA and other Anglospheric countries" hide the signal from elsewhere.

I really dislike this argument because to me it seems like a simplistic way to view the needs of users of various languages. If you just look at each country in the world, see that something works for most people in most countries, and each country has its own language, then you have all of the languages covered. That is just simply not how it works. Let me give you an example that is dear to me. I enjoy participating in the Esperanto community. Esperanto speakers are not especially geographically concentrated, and not enormous in terms of speaker count (I think most have moved on from that dream and care more about the sense of community at this point) but even by the more lowball estimates, the Esperanto-speaking community is similar in size to the Irish-speaking community. However, in telemetry based on looking at countries, we are nowhere near as visible as a concentrated community like Irish speakers. And this change has already had an impact. There is a lot of old, classic Esperanto content on the web that is in an obscure ISO 8859 variant for Maltese and Esperanto. This is just simply not going to detect that encoding. We are a community that in your telemetry simply does not matter. This whole thread has been you explaining how everyone’s needs are accounted for to people whose needs were trivially accounted for by the presence of a simple menu, but now have no recourse because that menu was considered unnecessary bloat despite its very simple nature. Woohoo, you saved one parameter to the encoding function. And you tell us that you understand that users hate the implication that the browser knows better, but that the browser really does know better. To that I ask, does the browser know better for Esperanto? Does it know better when user content does not match the containing website in encoding? What I think you are really putting forward is that the browser knows best... provided that there is no one else to blame for the problem (as is not the case with mixed encodings), or provided that the community in question is big and concentrated (as is not the case with Esperanto). If all that matters is getting that 99% coverage to sell copies of your software, then this approach is good enough, from a cynical perspective. But I believe that free and open source software has always had a spirit of being flexible in terms of letting users solve their own problems, and I believe that Firefox is losing that spirit.

Assignee

Comment 78

•

3 years ago

(In reply to noname422 from comment #77)

As noted previously, it is known that the present design doesn't address the issue of mixed-encoding documents.

Mixed encoding websites are fundamentally broken. On this we can agree. Where I think we would disagree is what to do about it. I would say that in such a scenario, the user needs to decide what text on the page they want to actually work. Chances are that the useful content is in one encoding, and the surrounding website blindly serving it is in another. While you cannot get entirely coherent text from any encoding selection in this scenario, you can get the actually useful content to work, which is what matters. Saying that the website should fix their stuff is not a good answer. If a scenario like this happens, it is nearly always because the page is so old and unmaintained that they have never noticed, or because the page is maintained, but the content being served is user-uploaded and so the website has nothing to do with it.

I agree. However, 1) optimizing for the mixed-encoding case risks losing recourse even for the undeclared single encoding case when UI that is disproportionately elaborate relative to how rare the use case is is evaluated for retention over a major UI refresh (see bug 1702914), and 2) as noted above, keeping the back end functionality around got in the way of parser changes on the topic of encoding declarations. (And, no, the time it took to develop the add-on doesn't prove otherwise; the add-on doesn't provide the same protections against XSS as the built-in functionality did.)

The links from comment 68 show analysis broken down by GeoIP categorization of telemetry submissions. This is not a case of only looking at telemetry on the global level and letting telemetry from "USA and other Anglospheric countries" hide the signal from elsewhere.

I really dislike this argument because to me it seems like a simplistic way to view the needs of users of various languages. If you just look at each country in the world, see that something works for most people in most countries, and each country has its own language, then you have all of the languages covered. That is just simply not how it works. Let me give you an example that is dear to me. I enjoy participating in the Esperanto community. Esperanto speakers are not especially geographically concentrated, and not enormous in terms of speaker count (I think most have moved on from that dream and care more about the sense of community at this point) but even by the more lowball estimates, the Esperanto-speaking community is similar in size to the Irish-speaking community. However, in telemetry based on looking at countries, we are nowhere near as visible as a concentrated community like Irish speakers. And this change has already had an impact. There is a lot of old, classic Esperanto content on the web that is in an obscure ISO 8859 variant for Maltese and Esperanto. This is just simply not going to detect that encoding. We are a community that in your telemetry simply does not matter. This whole thread has been you explaining how everyone’s needs are accounted for to people whose needs were trivially accounted for by the presence of a simple menu, but now have no recourse because that menu was considered unnecessary bloat despite its very simple nature. Woohoo, you saved one parameter to the encoding function. And you tell us that you understand that users hate the implication that the browser knows better, but that the browser really does know better. To that I ask, does the browser know better for Esperanto? Does it know better when user content does not match the containing website in encoding? What I think you are really putting forward is that the browser knows best... provided that there is no one else to blame for the problem (as is not the case with mixed encodings), or provided that the community in question is big and concentrated (as is not the case with Esperanto). If all that matters is getting that 99% coverage to sell copies of your software, then this approach is good enough, from a cynical perspective. But I believe that free and open source software has always had a spirit of being flexible in terms of letting users solve their own problems, and I believe that Firefox is losing that spirit.

It's really not productive that you still haven't stated a concrete case (i.e. real page with real URL made by someone else) where the change here has been a problem for you personally in practice relative to the state of things before this change. ISO-8859-3 for Esperanto cannot be it, because ISO-8859-3 wasn't in the menu prior to the change here.

Comment 79

•

3 years ago

@HenirSivonen:

However, 1) optimizing for the mixed-encoding case

Nobody suggested anything like "optimizing for the mixed-encoding case". The question is whether it is appropriate to prevent users from manually applying a character set when they need to do so.

Rachel Martin

Comment 80

•

3 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #68)

... the Core+Firefox change here didn't prevent Thunderbird from keeping the menu: The way the selection from the menu was used internally was quite different.

That is not accurate. Making nsIDocShell.charset read-only here put an end the charset menu other than for informational purposes. Or are we missing something?