Open Bug 1438779 Opened 2 years ago Updated 6 months ago

Disable the Text Encoding toolbar button if the document is decoded as UTF-8 and has no UTF-8 errors

Categories

(Firefox :: Toolbars and Customization, enhancement, P5)

59 Branch
enhancement

Tracking

()

People

(Reporter: alice0775, Unassigned, NeedInfo)

References

(Depends on 1 open bug)

Details

(Keywords: ux-consistency)

+++ This bug was initially created as a clone of Bug #980904 +++


I think that the Text Encoding toolbar button need to disable as same way.

Steps to reproduce:
1. Put the Text Encoding toolbar button on toolbar (not overflow menu area)
2. Open  UTF-8 encoded web page such as https://developer.mozilla.org/en-US/Add-ons/WebExtensions


Actual Results:
The button is enabled

Expected Results:
The button is disabled
Keywords: ux-consistency
All the contents of the panel/menu that opens from the button are disabled, which still accomplishes what bug 980904 intended. I would be surprised if many people put this button on the main toolbar, so I don't think fixing this is important. I'd take a patch as long as we didn't have to do a lot more work on every pageload / tab switch.
Priority: -- → P5
Depends on: 1474211
Please enable the View > Text Encoding menu in Firefox and add all charsets of interest, especially UTF-8, which is NOT 16-bit Unicode. Test case: page "http://webcache.googleusercontent.com/search?q=cache:n1ZQwgYsi9wJ:www.zehnet.de/2005/02/12/unicode-utf-8-tutorial/+&cd=14&hl=en&ct=clnk&gl=us&client=firefox-b-1-ab" shows UTF-8 examples that render incorrectly, but the Encoding menu is disabled, so the page is almost useless. Let users view a page in different encodings until such time as no other charset but UTF-8 is ever used!
(In reply to David Spector from comment #2)
> Test case: page
> "http://webcache.googleusercontent.com/search?q=cache:n1ZQwgYsi9wJ:www.
> zehnet.de/2005/02/12/unicode-utf-8-tutorial/
> +&cd=14&hl=en&ct=clnk&gl=us&client=firefox-b-1-ab" shows UTF-8 examples that
> render incorrectly

They render correctly for me on Nightly. Can you provide more context? What problem specifically do you see? Which Firefox version?
Flags: needinfo?(t20)
I looked at the page again and can find no examples that render incorrectly. I apologize for the invalid report. I'm using version 63.0.

I wonder if someone can explain the rationale behind disabling View > Text Encoding. Why and under what conditions is it disabled? As a developer, I would like to be able to see a page, especially on my development websites, in two or three different common encodings. Sometimes that could help me with debugging multibyte programming, which is not well supported in PHP. There is not even an about:config option to enable it!
Flags: needinfo?(t20)
(In reply to David Spector from comment #4)
> I wonder if someone can explain the rationale behind disabling View > Text
> Encoding. Why and under what conditions is it disabled?

The reasons to disable it are:
 1) Disable it in situation where enticing the user to use the menu could be part of an XSS attack.
 2) Avoid a situation where the user changes the browser state such that the browser submits data in the wrong encoding to the site such that the wrongly-encoded data ends up in a database or similar.
 3) Avoid offering the user the opportunity to waste time doing something that's not going to help them. (Before the current behavior of the menu, we gathered telemetry that indicated that people didn't have a good idea of when they should use the menu and what to choose from it. Specifically, telemetry showed that when people used the menu, very often they were using the menu a second time on the same page, i.e. the first attempt didn't help.)

The menu is disabled if any of the following is true:
 * The document isn't text/plain or text/html (it's instead some flavor of XML)
 * The document is in one of the UTF-16 encodings.
 * The document is in the replacement encoding (i.e. it was declared as one of the removed XSS-dangerous encodings, such as ISO-2022-KR).
 * The document was created via document.open()
 * The document is a Firefox built-in document.
 * The document had the UTF-8 byte order mark.
 * The document was declared as UTF-8 (by whatever means) and had no UTF-8 errors.

> As a developer, I
> would like to be able to see a page, especially on my development websites,
> in two or three different common encodings. Sometimes that could help me
> with debugging multibyte programming, which is not well supported in PHP.
> There is not even an about:config option to enable it!

Do you mean you'd like to see a page encoded in a multibyte encoding decoded according to a single-byte encoding so that you can see *some* character per byte? While I can see how that could help with PHP development with UTF-8, it's in conflict with achieving item #2 from the first list by the means of the last bullet point of the second list.
Thank you, Henri, for this interesting answer. This information deserves to be better known, so that the disabled menu item doesn't just look like a new bug in Firefox. At least remove the menu item entirely from the menu, and explain the behavior fully in the documentation. Casual Firefox users should not have to read bug reports or C code to understand Firefox (I am concerned with all users, whether casual or developers).

As to the specifics in your lists, I must admit that I didn't follow some of the reasons. In particular, point 2 would be better motivated and understood if it were accompanied by an example. I tried to imagine an example, but was unsuccessful. How would changing the encoding lead the user to being infected by a virus, or receiving a malicious message?

As a counter-example, I can imagine a situation where a Web programmer or author used a tool to post some non-UTF-8 text to a website. Because the website itself specified it as UTF-8, and because there were no UTF-8 coding errors, Firefox would disable the menu option. Yet the text was actually in some other encoding that could have been made visible as free of "garbage" characters to the user if the menu option had been available.

Also, I saw no convincing reason in your reply to prevent an about:config or "developer's toolbox" option to enable the menu. If there is no reason not to include, I ask that it be included, so at least a developer or other expert can quickly view in another encoding. I would not ask for this if UTF-8 were standard all over the world, but as you know, it is not yet standard, either de jure or de facto. For several more years, we can expect users to be manipulating code in many encodings, so we should make absolutely sure that they can read text in their language without interference from Firefox, Thunderbird, or other tools.
(In reply to David Spector from comment #6)
> Thank you, Henri, for this interesting answer. This information deserves to
> be better known, so that the disabled menu item doesn't just look like a new
> bug in Firefox. At least remove the menu item entirely from the menu,

The ideal state is to get rid of the menu completely as Chrome and Edge have already done. However, while the menu is available sometimes, the correct way to present it at other times is to disable it instead of hiding it completely. This is in line with GUI conventions all the way back to Macintosh in the 1980s.

> As to the specifics in your lists, I must admit that I didn't follow some of
> the reasons. In particular, point 2 would be better motivated and understood
> if it were accompanied by an example. I tried to imagine an example, but was
> unsuccessful.

The page is in encoding A and has a form. The server-side form submission handler expects to receive data in encoding A. The user overrides the encoding to encoding B and submits the form. The server interprets data in encoding B as though it was in encoding A and persists garbage in a back end system.

> How would changing the encoding lead the user to being
> infected by a virus, or receiving a malicious message?

Try the menu on https://hsivonen.com/test/moz/never-show-user-supplied-content-as-utf-16.htm in Firefox 20 or earlier.

> As a counter-example, I can imagine a situation where a Web programmer or
> author used a tool to post some non-UTF-8 text to a website. Because the
> website itself specified it as UTF-8, and because there were no UTF-8 coding
> errors, Firefox would disable the menu option. Yet the text was actually in
> some other encoding that could have been made visible as free of "garbage"
> characters to the user if the menu option had been available.

Concern #2 in my previous comment is the flip side of this. The menu protects against the user introducing bad data instead (if the page appears modern in the sense of being declared as UTF-8 and not having UTF-8 errors).

But more importantly, apart from ISO-2022-JP, which is mainly relevant to email rather than the Web, and ASCII-only (in which case the encoding used doesn't matter as long as it's ASCII compatible) it is very improbable for content in a non-UTF-8 encoding not to have any errors when decoded as UTF-8.

> Also, I saw no convincing reason in your reply to prevent an about:config or
> "developer's toolbox" option to enable the menu. If there is no reason not
> to include, I ask that it be included, so at least a developer or other
> expert can quickly view in another encoding.

All about:config options have an implementation cost, testing cost and the opportunity for users to shoot themselves in the foot. You are the first person to ask for this option for developer-motivated reasons, so the need for such an option might not be as broad as it might seem from your perspective.

> I would not ask for this if
> UTF-8 were standard all over the world, but as you know, it is not yet
> standard, either de jure or de facto.

The HTML spec requires authors to use UTF-8: https://html.spec.whatwg.org/#charset
As for de facto, W3Techs (not a great source but the only one available on this topic) puts UTF-8 at over 90%: https://w3techs.com/technologies/history_overview/character_encoding/ms/y
Some of your points have knowledge behind them that I do not have, so I will bow to your superior knowledge.

However, from the end user's perspective, viewing garbage characters in their browser with no menu item to fix the problem is unacceptable. It is the sort of thing I would expect from Mi***, not from Mozilla.

In your comment 5, point 2, you claimed that enabling the Text Encoding menu item would somehow corrupt databases. Even with your example in comment 7, I just don't buy this. If there is a 1% chance that the characters being rendered belong to a non-UTF-8 character set, there should be a way for the end user to correct the problem quickly, using the same browser that is doing the rendering.

And as for protecting the user from themselves, I see no reason why the user should go to the bother of choosing an incorrect encoding to view their web pages, since the result would be (with 90% probability according to you) seeing garbage on the screen.

Leave the assumptions about users and the protection of users to marketing-driven companies like Mi***. Even if Firefox is the only browser to allow the user to fix rendering problems resulting from encoding mistakes (no matter who is to blame for those mistakes), it should.

But more importantly, apart from ISO-2022-JP, which is mainly relevant to email rather than the Web, and ASCII-only (in which
case the encoding used doesn't matter as long as it's ASCII compatible) it is very improbable for content in a non-UTF-8
encoding not to have any errors when decoded as UTF-8.

Not true. Many pages display incorrectly when UTF-8 shows no decode errors. The problem is when a web server says all
pages are UTF-8, individual web pages cannot override that. So curly quotes, bullets, and other symbols do not appear
as intended. Even though there is no decode error! I rely on that menu to display pages correctly.

It is not possible for a web browser to decide if there is an incorrect encoding, as long as the byte stream does not
contain invalid UTF-8 sequences. So the menu MUST be present. It's why it was present in the first place.

Please reconsider the decision to disenfranchise (the opposite of empower) users.

(In reply to Wayne Pollock from comment #9)

Many pages display incorrectly when UTF-8 shows no decode errors.

Please provide an example.

Flags: needinfo?(pollock)

I heartily second Wayne's comment! A Text Encoding menu doesn't have to be visible to end users, but at least it should be available in the Web Developer Tools. It isn't needed often, but when it is needed, it really is needed, as I and others have tried to explain. If you haven't run into the need for such a menu, you are lucky indeed. However, there will always be specific language and website situations (illegal though they may be) that will call for it, to make content readable.

You need to log in before you can comment on or make changes to this bug.