Closed Bug 1438779 Opened 6 years ago Closed 5 years ago

Disable the Text Encoding toolbar button if the document is decoded as UTF-8 and has no UTF-8 errors

Categories

(Firefox :: Toolbars and Customization, enhancement, P5)

59 Branch
enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: alice0775, Unassigned)

References

Details

(Keywords: ux-consistency)

+++ This bug was initially created as a clone of Bug #980904 +++


I think that the Text Encoding toolbar button need to disable as same way.

Steps to reproduce:
1. Put the Text Encoding toolbar button on toolbar (not overflow menu area)
2. Open  UTF-8 encoded web page such as https://developer.mozilla.org/en-US/Add-ons/WebExtensions


Actual Results:
The button is enabled

Expected Results:
The button is disabled
Keywords: ux-consistency
All the contents of the panel/menu that opens from the button are disabled, which still accomplishes what bug 980904 intended. I would be surprised if many people put this button on the main toolbar, so I don't think fixing this is important. I'd take a patch as long as we didn't have to do a lot more work on every pageload / tab switch.
Priority: -- → P5
Depends on: 1474211
Please enable the View > Text Encoding menu in Firefox and add all charsets of interest, especially UTF-8, which is NOT 16-bit Unicode. Test case: page "http://webcache.googleusercontent.com/search?q=cache:n1ZQwgYsi9wJ:www.zehnet.de/2005/02/12/unicode-utf-8-tutorial/+&cd=14&hl=en&ct=clnk&gl=us&client=firefox-b-1-ab" shows UTF-8 examples that render incorrectly, but the Encoding menu is disabled, so the page is almost useless. Let users view a page in different encodings until such time as no other charset but UTF-8 is ever used!
(In reply to David Spector from comment #2)
> Test case: page
> "http://webcache.googleusercontent.com/search?q=cache:n1ZQwgYsi9wJ:www.
> zehnet.de/2005/02/12/unicode-utf-8-tutorial/
> +&cd=14&hl=en&ct=clnk&gl=us&client=firefox-b-1-ab" shows UTF-8 examples that
> render incorrectly

They render correctly for me on Nightly. Can you provide more context? What problem specifically do you see? Which Firefox version?
Flags: needinfo?(t20)
I looked at the page again and can find no examples that render incorrectly. I apologize for the invalid report. I'm using version 63.0.

I wonder if someone can explain the rationale behind disabling View > Text Encoding. Why and under what conditions is it disabled? As a developer, I would like to be able to see a page, especially on my development websites, in two or three different common encodings. Sometimes that could help me with debugging multibyte programming, which is not well supported in PHP. There is not even an about:config option to enable it!
Flags: needinfo?(t20)
(In reply to David Spector from comment #4)
> I wonder if someone can explain the rationale behind disabling View > Text
> Encoding. Why and under what conditions is it disabled?

The reasons to disable it are:
 1) Disable it in situation where enticing the user to use the menu could be part of an XSS attack.
 2) Avoid a situation where the user changes the browser state such that the browser submits data in the wrong encoding to the site such that the wrongly-encoded data ends up in a database or similar.
 3) Avoid offering the user the opportunity to waste time doing something that's not going to help them. (Before the current behavior of the menu, we gathered telemetry that indicated that people didn't have a good idea of when they should use the menu and what to choose from it. Specifically, telemetry showed that when people used the menu, very often they were using the menu a second time on the same page, i.e. the first attempt didn't help.)

The menu is disabled if any of the following is true:
 * The document isn't text/plain or text/html (it's instead some flavor of XML)
 * The document is in one of the UTF-16 encodings.
 * The document is in the replacement encoding (i.e. it was declared as one of the removed XSS-dangerous encodings, such as ISO-2022-KR).
 * The document was created via document.open()
 * The document is a Firefox built-in document.
 * The document had the UTF-8 byte order mark.
 * The document was declared as UTF-8 (by whatever means) and had no UTF-8 errors.

> As a developer, I
> would like to be able to see a page, especially on my development websites,
> in two or three different common encodings. Sometimes that could help me
> with debugging multibyte programming, which is not well supported in PHP.
> There is not even an about:config option to enable it!

Do you mean you'd like to see a page encoded in a multibyte encoding decoded according to a single-byte encoding so that you can see *some* character per byte? While I can see how that could help with PHP development with UTF-8, it's in conflict with achieving item #2 from the first list by the means of the last bullet point of the second list.
Thank you, Henri, for this interesting answer. This information deserves to be better known, so that the disabled menu item doesn't just look like a new bug in Firefox. At least remove the menu item entirely from the menu, and explain the behavior fully in the documentation. Casual Firefox users should not have to read bug reports or C code to understand Firefox (I am concerned with all users, whether casual or developers).

As to the specifics in your lists, I must admit that I didn't follow some of the reasons. In particular, point 2 would be better motivated and understood if it were accompanied by an example. I tried to imagine an example, but was unsuccessful. How would changing the encoding lead the user to being infected by a virus, or receiving a malicious message?

As a counter-example, I can imagine a situation where a Web programmer or author used a tool to post some non-UTF-8 text to a website. Because the website itself specified it as UTF-8, and because there were no UTF-8 coding errors, Firefox would disable the menu option. Yet the text was actually in some other encoding that could have been made visible as free of "garbage" characters to the user if the menu option had been available.

Also, I saw no convincing reason in your reply to prevent an about:config or "developer's toolbox" option to enable the menu. If there is no reason not to include, I ask that it be included, so at least a developer or other expert can quickly view in another encoding. I would not ask for this if UTF-8 were standard all over the world, but as you know, it is not yet standard, either de jure or de facto. For several more years, we can expect users to be manipulating code in many encodings, so we should make absolutely sure that they can read text in their language without interference from Firefox, Thunderbird, or other tools.
(In reply to David Spector from comment #6)
> Thank you, Henri, for this interesting answer. This information deserves to
> be better known, so that the disabled menu item doesn't just look like a new
> bug in Firefox. At least remove the menu item entirely from the menu,

The ideal state is to get rid of the menu completely as Chrome and Edge have already done. However, while the menu is available sometimes, the correct way to present it at other times is to disable it instead of hiding it completely. This is in line with GUI conventions all the way back to Macintosh in the 1980s.

> As to the specifics in your lists, I must admit that I didn't follow some of
> the reasons. In particular, point 2 would be better motivated and understood
> if it were accompanied by an example. I tried to imagine an example, but was
> unsuccessful.

The page is in encoding A and has a form. The server-side form submission handler expects to receive data in encoding A. The user overrides the encoding to encoding B and submits the form. The server interprets data in encoding B as though it was in encoding A and persists garbage in a back end system.

> How would changing the encoding lead the user to being
> infected by a virus, or receiving a malicious message?

Try the menu on https://hsivonen.com/test/moz/never-show-user-supplied-content-as-utf-16.htm in Firefox 20 or earlier.

> As a counter-example, I can imagine a situation where a Web programmer or
> author used a tool to post some non-UTF-8 text to a website. Because the
> website itself specified it as UTF-8, and because there were no UTF-8 coding
> errors, Firefox would disable the menu option. Yet the text was actually in
> some other encoding that could have been made visible as free of "garbage"
> characters to the user if the menu option had been available.

Concern #2 in my previous comment is the flip side of this. The menu protects against the user introducing bad data instead (if the page appears modern in the sense of being declared as UTF-8 and not having UTF-8 errors).

But more importantly, apart from ISO-2022-JP, which is mainly relevant to email rather than the Web, and ASCII-only (in which case the encoding used doesn't matter as long as it's ASCII compatible) it is very improbable for content in a non-UTF-8 encoding not to have any errors when decoded as UTF-8.

> Also, I saw no convincing reason in your reply to prevent an about:config or
> "developer's toolbox" option to enable the menu. If there is no reason not
> to include, I ask that it be included, so at least a developer or other
> expert can quickly view in another encoding.

All about:config options have an implementation cost, testing cost and the opportunity for users to shoot themselves in the foot. You are the first person to ask for this option for developer-motivated reasons, so the need for such an option might not be as broad as it might seem from your perspective.

> I would not ask for this if
> UTF-8 were standard all over the world, but as you know, it is not yet
> standard, either de jure or de facto.

The HTML spec requires authors to use UTF-8: https://html.spec.whatwg.org/#charset
As for de facto, W3Techs (not a great source but the only one available on this topic) puts UTF-8 at over 90%: https://w3techs.com/technologies/history_overview/character_encoding/ms/y
Some of your points have knowledge behind them that I do not have, so I will bow to your superior knowledge.

However, from the end user's perspective, viewing garbage characters in their browser with no menu item to fix the problem is unacceptable. It is the sort of thing I would expect from Mi***, not from Mozilla.

In your comment 5, point 2, you claimed that enabling the Text Encoding menu item would somehow corrupt databases. Even with your example in comment 7, I just don't buy this. If there is a 1% chance that the characters being rendered belong to a non-UTF-8 character set, there should be a way for the end user to correct the problem quickly, using the same browser that is doing the rendering.

And as for protecting the user from themselves, I see no reason why the user should go to the bother of choosing an incorrect encoding to view their web pages, since the result would be (with 90% probability according to you) seeing garbage on the screen.

Leave the assumptions about users and the protection of users to marketing-driven companies like Mi***. Even if Firefox is the only browser to allow the user to fix rendering problems resulting from encoding mistakes (no matter who is to blame for those mistakes), it should.

But more importantly, apart from ISO-2022-JP, which is mainly relevant to email rather than the Web, and ASCII-only (in which
case the encoding used doesn't matter as long as it's ASCII compatible) it is very improbable for content in a non-UTF-8
encoding not to have any errors when decoded as UTF-8.

Not true. Many pages display incorrectly when UTF-8 shows no decode errors. The problem is when a web server says all
pages are UTF-8, individual web pages cannot override that. So curly quotes, bullets, and other symbols do not appear
as intended. Even though there is no decode error! I rely on that menu to display pages correctly.

It is not possible for a web browser to decide if there is an incorrect encoding, as long as the byte stream does not
contain invalid UTF-8 sequences. So the menu MUST be present. It's why it was present in the first place.

Please reconsider the decision to disenfranchise (the opposite of empower) users.

(In reply to Wayne Pollock from comment #9)

Many pages display incorrectly when UTF-8 shows no decode errors.

Please provide an example.

Flags: needinfo?(pollock)

I heartily second Wayne's comment! A Text Encoding menu doesn't have to be visible to end users, but at least it should be available in the Web Developer Tools. It isn't needed often, but when it is needed, it really is needed, as I and others have tried to explain. If you haven't run into the need for such a menu, you are lucky indeed. However, there will always be specific language and website situations (illegal though they may be) that will call for it, to make content readable.

(In reply to :Gijs (he/him) from comment #10)

(In reply to Wayne Pollock from comment #9)

Many pages display incorrectly when UTF-8 shows no decode errors.

Please provide an example.

On this page, this forum poster has posted some text that requires Shift-JIS encoding to be readable. Instead, mojibake is currently displayed with UTF-8 encoding.

https://www.woodus.com/forums/topic/30472-dqx-useful-tool-app/?do=findComment&comment=399776

The resulting characters will be Japanese characters. However, when trying to switch to Shift-JIS, I found that Firefox's "View"->"Text Encoding" menu had been disabled. Even though this behavior was intentionally implemented by somebody, it is a bug in my opinion. I am running Firefox Developer Edition, also. I would expect the developer edition to empower users to take control of the display of web pages. I remember using the "Text Encoding" menu to positive effect in the past, and I see no good reason I shouldn't be allowed to do so now.

Instead of using Firefox as the excellent tool it used to be, I now have to copy the text to an external program and change the character encoding there.

As Wayne Pollock so-excellently wrote, please reconsider the decision to disenfranchise (the opposite of empower) users.

Passing ni for a response to comment #12.

Flags: needinfo?(pollock) → needinfo?(hsivonen)

Are you saying about, for example, this paragraph?

Once you choose your character, there are two tabs in the app: The first is called メニュー and should be selected by default.

The second tab is called プレミアムコンシェルジュ and this one has all the cool stuff (IMO anyway, you can do stuff like check the bazaar or sell things).

If you are using Firefox 58 and switch the encoding to Shift_JIS, the paragraph will be changed to:

Once you choose your character, there are two tabs in the app: The first is called テ」ニ陳。テ」ニ停ケテ」ニ陳・テ」ニ陳シ and should be selected by default.

The second tab is called テ」ニ停氾」ニ陳ャテ」ニ椎クテ」窶堋「テ」ニ陳テ」窶堋ウテ」ニ陳ウテ」窶堋キテ」窶堋ァテ」ニ陳ォテ」窶堋クテ」ニ陳・ and this one has all the cool stuff (IMO anyway, you can do stuff like check the bazaar or sell things).

It is just changing one mojibake to another mojibake. The correct text is not shown. This example demonstrates that the Text Encoding menu is useless.

Flags: needinfo?(hsivonen)

BTW run this script in the Web Console to decode the garbled text:
(function enumTextNode(node,callback){if(!node.childNodes)return;[].forEach.call(node.childNodes,node=>{if(node.nodeType==3)callback(node);else(enumTextNode(node,callback));});})(document,node=>node.nodeValue=node.nodeValue.replace(/[^\0-\x7f]/g,c=>{let i="€\x81‚ƒ„…†‡ˆ‰Š‹Œ\x8dŽ\x8f\x90‘’“”•–—˜™š›œ\x9džŸ".indexOf(c);return i<0?c:String.fromCharCode(i+128)}).replace(/(\xe3)([\x80-\xbf])(?![\x80-\xbf])/g,(...m)=>m[1]+"\x81"+m[2]).replace(/[^\0-\x7f]+/g,s=>new TextDecoder().decode(new Uint8Array([].map.call(s,c=>c.charCodeAt(0))))));

I wasn't able to check whether changing the encoding worked. I shouldn't have to use a very old version (Firefox 58) to enable a feature that already existed in the project in the past. Instead, I'd have preferred to be able to check in Firefox 70 whether viewing the page in Shift-JIS worked for that mojibake. To me, this reinforces the idea that the text encoding menu should not be disabled.

Aside, thanks for the useful script! That's awesome of you.

I was almost confident that changing Text Encoding would not work. I tested it just in case. If the Text Encoding menu is disabled, you can safely assume that it is a waste of time to fiddle with Text Encoding. Unfortunately, your anxiety would not weigh over the security (changing Text Encoding may cause something bad).

It seems to me that, in general (not just for this mojibake case), even if we could prove that disabling the char encoding functionality could NEVER prevent the user from resolving a mojibake, we STILL want the functionality enabled. Sometimes the user has to be free to try out remedies that absolutely cannot work to avoid a state of great frustration. Why force them to blame Firefox for disallowing their experimentation?

Also in general, I'm opposed to treating users like children and trying to make the interface as simple as possible, unless you make the simplicity an option. Most users are intelligent enough to search for ways to experiment when they encounter a problem.

The truth is that char encoding has always been a big problem in software engineering, with English-speaking developers giving themselves a big advantage of a 7-bit character set (some early computers like the LINC used six bits and Baudot used five bits) without even thinking about the misuse of privilege that was involved in supporting only their own alphabet. While I am thankful that utf-8, a single solution, has gained widespread acceptance, it is still the case that legacy and local custom in various countries will continue to require fiddling around by the user to get text to look right, well into the future. We must always support fiddling, while not confusing those who don't need it. Disabling a feature is unfriendly at best and is sometimes unpleasantly surprising.

Sometimes the user has to be free to try out remedies that absolutely cannot work to avoid a state of great frustration.

I understand that there is a set of users who believe that they know better than browser developers, need to perform a failed experiment to be convinced otherwise, and are unhappy if they are prevented from performing the experiment. Still, I'm marking this WONTFIX.

Most users are intelligent enough to search for ways to experiment when they encounter a problem.

I'm not going post sample exploits beyond the one I already referred to in comment 7: https://hsivonen.com/test/moz/never-show-user-supplied-content-as-utf-16.htm An attack would, to a user, look similar to the scenario in comment 12. As demonstrated by https://hsivonen.com/test/moz/never-show-user-supplied-content-as-utf-16.htm (you need to use Firefox 20 or earlier to be able to override the encoding), letting the user experiment isn't always harmless.

(Debating how well Firefox's current menu disabling behavior matches actually dangerous scenarios with the intent of disabling it less often, so that people could conduct failed experiments to convince themselves that the menu doesn't help them, isn't really productive. If you find cases that argue for disabling it more often, please file those as security bugs.)

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX

I will make a different point, since I suppose there has been no user request for choosing a different text encoding when the current encoding has been determined to be UTF-8.

I would like to see a wide choice of text encodings added to some appropriate section of the Developer Tools. This could help developers quickly solve legacy issues in the text encoding of their own web pages, such as debugging situations in which different encodings are accidentally used on the same page.

I am certainly willing to drop this matter entirely if it can be demonstrated that other text encodings are no longer used for new pages anywhere on the web.

(In reply to David Spector from comment #20)

I would like to see a wide choice of text encodings added to some appropriate section of the Developer Tools.

In the most recent Nightly, there is no longer a wide manual choice but just one item. Apart from the usability improvement, it will enable a simplification of the HTML parser internals. Having a wide manual choice (and still maintaining the same security properties as before) would prevent the simplification of the internals.

The complication of the parser internals already got in the way of a recent change (bug 673087) and, unless cleaned up as planned, would likely get in the way of another planned change (bug 1701828).

This could help developers quickly solve legacy issues in the text encoding of their own web pages, such as debugging situations in which different encodings are accidentally used on the same page.

Usually this scenario of mixed encodings on one page is pretty apparent. While I can see how the old menu could have helped developers who don't immediately recognize what the exact pair of encodings is by just looking at the symptoms, the complexity of (securely) supporting this use case is too high considering the benefit of the niche use case.

I suggest piping the problematic page from curl to https://crates.io/crates/recode_rs and trying different input encodings (the -f command line flag; the encodings are listed in the spec).

I am persuaded by these points. However, the proposed solution is not well-known, meaning that it doesn't show up on the first page of a Google search "online" "tool" "character" "encodings", probably since Rust itself is not well-known. Other tools do show up, though, so the functionality I was expecting at least in Developer Tools is available online. I withdraw my objections.

You need to log in before you can comment on or make changes to this bug.