Disable View>Character Coding menu when it won't have effect / is unnecessary (e.g. XML)

RESOLVED FIXED in mozilla21

Status

()

Core
Internationalization
RESOLVED FIXED
14 years ago
4 years ago

People

(Reporter: Sekundes, Assigned: hsivonen)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs, {intl, sec-low})

Trunk
mozilla21
intl, sec-low
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments, 11 obsolete attachments)

2.71 KB, patch
smaug
: review+
Details | Diff | Splinter Review
24.98 KB, patch
Details | Diff | Splinter Review
49.70 KB, patch
Details | Diff | Splinter Review
4.11 KB, patch
Gavin
: review+
Details | Diff | Splinter Review
(Reporter)

Description

14 years ago
User-Agent:       
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7a) Gecko/20040208 Firebird/0.8.0+

non-unicoded xml page, without encoding attributes in the xml PI, would be
treated as UTF-8 pages. It cannot be changed via View->Character Coding.

Reproducible: Always
Steps to Reproduce:
> non-unicoded xml page, without encoding attributes in the xml PI, would be
> treated as UTF-8 pages.

That's correct.  That's what they are, per the spec.

Assignee: general → smontagu
Component: Browser-General → Internationalization
QA Contact: general → amyy
(Reporter)

Comment 2

14 years ago
However, the charset couldn't be changed.
It makes one the Firefox extension, RSS reader, couldn't read the feed correctly
if loading these kind of xml I've mentioned above.
(Reporter)

Updated

14 years ago
Depends on: 240962

Updated

14 years ago
Keywords: intl
OS: Windows 2000 → All
Hardware: PC → All

Comment 3

14 years ago
I'm strongly against us even trying to deal with this sort of crap. One of the
key features of XML documents are that they are self-describing with respect to
the character encoding used.

See section "4.3.3 Character Encoding in Entities" in XML 1.1:

"Although an XML processor is required to read only entities in the UTF-8 and
UTF-16 encodings, it is recognized that other encodings are used around the
world, and it may be desired for XML processors to read entities that use them.
In the absence of external character encoding information (such as MIME
headers), parsed entities which are stored in an encoding other than UTF-8 or
UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration)
containing an encoding declaration."

Whatever the attachment in the URL field is supposed to be, it's *not* valid
XML, since it's not a valid UTF-8 or UTF-16 stream.

This is INVALID, unless we want to start going down the tagsoup road with XML
(and we don't).
This is invalid, yes, but more than that, we should actively _disable_ the
character set menu when using it will do nothing.

Comment 5

14 years ago
As Hixie says, the problem here is that the Character Coding menu should be
disabled for XML/XHTML documents. Changing summary accordingly and confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: couldn't modify the charset(XML) → Disable View>Character Coding menu for XML/XHTML documents

Updated

14 years ago
Assignee: smontagu → guifeatures
Component: Internationalization → XP Apps: GUI Features
QA Contact: amyy
(In reply to comment #3)
> I'm strongly against us even trying to deal with this sort of crap.

Agreed. In fact, we should really be rejecting the document as invalid and never
even displaying it (which is bug 174351)
Component: XP Apps: GUI Features → Internationalization

Updated

14 years ago
Depends on: 174351

Comment 7

14 years ago
Created attachment 147553 [details] [diff] [review]
Disable menu when document type is Image or XML

This patch disables the menu when the document is an 'XML document'
(specifically, one implementing nsIDOMXMLDocument) or an image. Is there a
better way to determine the 'XML-ness' of a document?

Disabling the menu item also prevents the user from accessing the popup menu; I
know that hixie considers this a bug (although separate from this one), though
I'm not sure. Regardless, the document type is still available in Page Info.

Updated

14 years ago
Attachment #147553 - Flags: review?(neil.parkwaycc.co.uk)
I deliberately didn't mark 174351 as dependee. This bug can and should be fixed
independently. Even if we display misencoded XML documents when we shouldn't, we
can at least do our best to make them unreadable by disabling the Character
Coding menu ;-)
No longer depends on: 174351

Comment 9

14 years ago
fwiw, IE6's behaviour is to add (and select/check) a submenu item 
called 'Unicode'  [not the same as the UTF-8 entry, oddly], and to also disable 
all the submenu items, including those in it's 'more encodings' submenu (but 
leave the menu item that introduce that submenu enabled). It looks a bit weird, 
but it does allow the user to see the current encoding.

We could decide to do the same, or rather, to disable all the immediate 
children of the character encoding menuitem - the 'current' item is always in 
the first submenu, so there's no reason to allow the user to display the 'more' 
sub-submenu. That might, in fact, be easier to do, as the submenu is built on-
the-fly, as I recall. I may take a look at that later.

I said above that 'the document type is still available in Page Info'. I was 
thinking of the Content-Type, not the character encoding; I don't think that 
that is in the Page Info dialog, after all.
Assignee: guifeatures → smontagu
QA Contact: amyy
I think that disabled submenus should still drop down (for the same reason that 
disabled toolbar buttons should still show tooltips). But that's a separate bug.

Comment 11

14 years ago
Comment on attachment 147553 [details] [diff] [review]
Disable menu when document type is Image or XML

>-        <menu id="charsetMenu" observes="isImage"/>
>+        <menu id="charsetMenu" observes="isImageOrXML"/>
I'm not sure it's worth creating a new broadcaster for one menuitem...

>+    const nsIDOMXMLDocument = Components.interfaces.nsIDOMXMLDocument;
>+      // Determine if the document is XML or not.
>+      var xmlDocument;
>+      try {
>+        xmlDocument = content.document.QueryInterface(nsIDOMXMLDocument);
>+      }
>+      catch(e) { };
>+
>+      if (xmlDocument)
You can actually compute this more simply using the following:
if (content.document instanceof XMLDocument)
although your superreviewer might prefer the more verbose
const nsIDOMXMLDocument = Components.interfaces.nsIDOMXMLDocument;
if (content.document instanceof nsIDOMXMLDocument)
Attachment #147553 - Flags: review?(neil.parkwaycc.co.uk) → review-
(Assignee)

Comment 12

13 years ago
What happened with driving the patch?
QA Contact: amyy → i18n
Duplicate of this bug: 800775
(Assignee)

Comment 14

5 years ago
I think we should disable the Character Encoding menu if any of these is true:
 * The document is not an instance of nsHTMLDocument
 * The document is an nsHTMLDocument and mIsRegularHTML is false.
 * The document has or had a wyciwyg channel.
 * The URL scheme of the document is about.
 * The URL scheme of the document is chrome.
 * The URL scheme of the document is res.
 * The URL scheme of the document is jar.
 * The charset source of the document is kCharsetFromByteOrderMark.

Comment 15

5 years ago
It is also needed to to disable Character Encoding menu on network error page.

Steps to reproduce.
1.Go to http://~.mozilla.org/
  (It will show Server not found error)
2.Change Character Encoding menu to UTF-16
3.Go to other page written by UTF-8.

Updated

5 years ago
Assignee: smontagu → nobody
Depends on: 580070
Frank, what happened?
(Assignee)

Comment 17

5 years ago
FWIW, I think it would make more sense to implement comment 14 and comment 15 by adding a new IDL method/property on nsIDocument and checking for the conditions in C++ than by trying to check for all the conditions via whatever APIs already happen to be exposed to JS.

Comment 18

5 years ago
(In reply to :Ms2ger from comment #16)
> Frank, what happened?

Bug 800775 was filed, and it turned out to be a duplicated of this one.
In that bug, Simon indicated that he wasn't working on this bug, so I removed him from the assignee field, in case anyone else wants to work on it.

On irx, bz noted that the good way to fix this bug would be put isHTML() in the IDL for nsIDocument (like Henri is suggesting in comment 17) but that it won't work until more of bug 580070 is fixed, so I marked this bug as depending on that.
Until that happens, bz suggested that we fix this bug by checking:
document.createElement("span").nodeName == "SPAN"

Therefore, a patch would look something like:

+# We only support changing character encoding in HTML documents, so we disable
+# the menu otherwise. Until we have an IDL with isHTML(), we need to use this
+# createElement hack, which is more expensive, so we are not using an observer.
+    for (let charsetMenu of document.querySelectorAll("[id$=charsetMenu]")) {
+      charsetMenu.parentNode.addEventListener("popupshowing", function() {
+        let isHTML = content.document && content.document.createElement("span").nodeName == "SPAN";
+        charsetMenu.disabled = !isHTML || XULBrowserWindow.isImage.getAttribute("disabled") == "true";
+      });
+    }

(See my bug 800775 patch here: https://bug800775.bugzilla.mozilla.org/attachment.cgi?id=671033 )
(Assignee)

Comment 19

5 years ago
Why does this depend on Paris Bindings? Can’t be add a new property mayEnableCharacterEncodingMenu to nsIDocShell regardless of Paris Binding status on documents?

The implementation of GetMayEnableCharacterEncodingMenu on the docshell would ask the document if the menu may be enabled and return false if not. Then it would call GetMayEnableCharacterEncodingMenu on each child shell and return false if any child shell says false. Otherwise, it would return true.

Comment 20

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #19)
> Why does this depend on Paris Bindings? Can’t be add a new property
> mayEnableCharacterEncodingMenu to nsIDocShell regardless of Paris Binding
> status on documents?

I don't remember. bz, do you recall what you noted the issue was?
The issue was only for adding APIs directly on document (specifically the isHTML API).  If we want to add an API on docshell (or windowutils) to query things, that's fine independent of bindings issues.
(Assignee)

Comment 22

5 years ago
OK. Let’s add the API that the chrome JS calls to the docshell.
(Assignee)

Comment 23

5 years ago
I’ll write something for the C++ side and then hand the bug over to someone who knows front end code.
Assignee: nobody → hsivonen
No longer depends on: 580070
(Assignee)

Comment 24

5 years ago
Created attachment 690770 [details] [diff] [review]
Gecko-side infrastructure

This adds a property mayEnableCharacterEncodingMenu to nsIDocShell. It returns false pretty eagerly: It returns false when any document in the docshell tree rooted at this docshell is not in readyState == "complete". Thus, the UI should disable the menu whenever loading activity in the docshell tree starts and query mayEnableCharacterEncodingMenu whenever loading activity in the docshell tree ends.
Attachment #147553 - Attachment is obsolete: true
(Assignee)

Comment 25

5 years ago
OK. This is now ready to be taken by someone who knows front end code. fryn?
Assignee: hsivonen → nobody
(Assignee)

Comment 26

5 years ago
Marking sec-low because of the UTF-16 aspect of this bug. See http://hsivonen.iki.fi/test/moz/never-show-user-supplied-content-as-utf-16.htm (Maybe that issues meets the definition of sec-moderate, but UTF-16 is rare and the exploit requires a site that uses UTF-16 and presents user-supplied text, such as comments.)
Keywords: sec-low
(Assignee)

Comment 27

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #24)
> Thus, the UI should disable the menu whenever loading activity in the
> docshell tree starts and query mayEnableCharacterEncodingMenu whenever
> loading activity in the docshell tree ends.

Or the disabledness could be computed upon opening the View menu if doing it like that works for a11y and Ubuntu global menu.
Why character encoding menu is disabled when loading? It will severely degrade the usability.

Updated

5 years ago
Assignee: nobody → fryn
Status: NEW → ASSIGNED
(Assignee)

Comment 29

5 years ago
Created attachment 691242 [details] [diff] [review]
Enable/disable the menu depending on what the docshell says

I was able to write the front end patch myself.

This seems to work as designed. However, the menu ends up being disabled on *lots* of sites, because a document.open()ed doc or about:blank in an iframe causes the menu to be disabled. I’m not yet sure if this is OK or if the criteria for disabling the menu needs to be relaxed.

To make an informed decision, it would be necessary to test a bunch of legacy pages that require the user to access the menu to see if that kind of pages have the kind of ads that cause the docshell to say the menu should be disabled with the current Gecko-side patch.
Assignee: fryn → hsivonen
(Assignee)

Comment 30

5 years ago
(In reply to Masatoshi Kimura [:emk] from comment #28)
> Why character encoding menu is disabled when loading? It will severely
> degrade the usability.

Maybe that’s over-eager disabling. My thought was that changing the charset will cause a reload and, therefore, disrupts long-poll iframes. Maybe that criterion should be relaxed. Especially now that the disabled state is computed when opening the menu, so there’s no need to track the state.
(Assignee)

Comment 31

5 years ago
Hmm. Maybe the best approach would be
 1) Making nsHTMLDocument reject the docshell-forced charset if HTTP says UTF-16.
 2) Making the HTML parser let the BOM take precedence over forced charset.
 3) Make the menu enabling/disabling not consider descendant docshells/documents.
(Assignee)

Comment 32

5 years ago
Created attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable
Attachment #690770 - Attachment is obsolete: true
Attachment #691242 - Attachment is obsolete: true
(Assignee)

Comment 33

5 years ago
Created attachment 691302 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell
(Assignee)

Comment 34

5 years ago
Created attachment 691303 [details] [diff] [review]
Part 3: Connect the docshell property to the UI

This new set of patches is less eager to disable the menu. The menu is enabled/disabled based on the top-level doc only. Child docs refuse to honor the override when they should not honor it even if the menu is enabled for the top-level doc.
(Assignee)

Comment 35

5 years ago
Oh, and these make UTF-16 useless as a menuitem, so we should zap that in another bug.
Comment on attachment 691302 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell

>+    bool schemeIs = false;
>+    uri->SchemeIs("about", &schemeIs);
>+    if (schemeIs) {
>+      return true;
>+    }
>+    uri->SchemeIs("chrome", &schemeIs);
>+    if (schemeIs) {
>+      return true;
>+    }
>+    uri->SchemeIs("res", &schemeIs);
>+    if (schemeIs) {
>+      return true;
>+    }
>+    uri->SchemeIs("jar", &schemeIs);
>+    if (schemeIs) {
>+      return true;
>+    }
It should test protocolFlags rather than hard-coding the scheme.
(Assignee)

Comment 37

5 years ago
Depends on bug 805374 for the UTF-16 menu items removals.
Depends on: 805374
(Assignee)

Comment 38

5 years ago
(In reply to Masatoshi Kimura [:emk] from comment #36)
> It should test protocolFlags rather than hard-coding the scheme.

Which flag would be the right one to check?
(In reply to Henri Sivonen (:hsivonen) from comment #38)
> (In reply to Masatoshi Kimura [:emk] from comment #36)
> > It should test protocolFlags rather than hard-coding the scheme.
> 
> Which flag would be the right one to check?

Boris may know.
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 40

5 years ago
Comment on attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable

kCharsetFromPreviousLoading was unused.
Attachment #691300 - Flags: review?(bugs)
(Assignee)

Comment 41

5 years ago
Created attachment 691344 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, with corrected docs
Attachment #691302 - Attachment is obsolete: true
Attachment #691344 - Flags: review?(bzbarsky)
(Assignee)

Comment 42

5 years ago
Created attachment 691347 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, with corrected docs
Attachment #691303 - Attachment is obsolete: true
Attachment #691347 - Flags: review?(dao)
> Which flag would be the right one to check?

That depends on what you're really trying to check.

Why exactly should one not be able to override jar: or about: things?  Extensions can install arbitrary such entities, with arbitrary encodings....
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 44

5 years ago
(In reply to Boris Zbarsky (:bz) from comment #43)
> > Which flag would be the right one to check?
> 
> That depends on what you're really trying to check.
> 
> Why exactly should one not be able to override jar: or about: things? 

The idea is to catch HTML documents that are not from the Web but from Firefox itself and, therefore, don't need facilities meant for legacy HTML content.

> Extensions can install arbitrary such entities, with arbitrary encodings....


If an extension is so badly authored that the user needs to reach for the charset menu, shouldn't the extension have been rejected by AMO review?
> The idea is to catch HTML documents that are not from the Web but from Firefox itself

URI_IS_UI_RESOURCE might be what you want.  That's set for chrome:, moz-icon:, res:.  Note that jar: delegates protocol flag checks to the nested URI.

> shouldn't the extension have been rejected by AMO review?

I don't know how easy it is to catch that sort of thing during AMO review...
Users may view inside local (or remote) jar files with jar: scheme just like local files with file: scheme. So It would not be a good idea to blacklist jar: scheme unconditionally.
(Assignee)

Comment 47

5 years ago
Created attachment 691752 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, using  URI_IS_UI_RESOURCE

Now using URI_IS_UI_RESOURCE.
Attachment #691344 - Attachment is obsolete: true
Attachment #691344 - Flags: review?(bzbarsky)
Attachment #691752 - Flags: review?(bzbarsky)
(Assignee)

Comment 48

5 years ago
If this is considered to require a test case, I could use a pointer to a suitable mochitest browser chrome test (that loads pages and opens menus) to learn from.
Comment on attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable

Could you explain this. What has this to do with XML/XHTML?
Comment on attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable

I mean, perhaps kCharsetFromPreviousLoading is not used, but what about
kCharsetFromUserForced or kCharsetFromPreviousLoading?

So, please explain and re-ask review.
Attachment #691300 - Flags: review?(bugs)
(Assignee)

Comment 51

5 years ago
Comment on attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable

(In reply to Olli Pettay [:smaug] from comment #49)
> Comment on attachment 691300 [details] [diff] [review]
> Part 1: Make the BOM unoverridable
> 
> Could you explain this. What has this to do with XML/XHTML?

It has to do with the comment 14 scope creep relative to the bug summary: Also disabling the charset menu for HTML when the charset source is the BOM (since the BOM is unique per encoding that has a BOM, so overriding the encoding when there is a BOM results in a bogus decoding for sure and the bogosity has the additional effect of knocking the document into the quirks mode).

Since the enabling/disabling the menu will be done based on the parent doc, this parser change is needed so that child docs with BOM reject the override. Docs with BOM reject the manual override already in Chrome and IE6 through 9. (The changes in IE10 were not motivated by Web compat, so we don’t need to worry about IE10 changes being a Web compat indicator. MS intends to revert the IE10 changes in this area.)

(In reply to Olli Pettay [:smaug] from comment #50)
> I mean, perhaps kCharsetFromPreviousLoading is not used, but what about
> kCharsetFromUserForced or kCharsetFromPreviousLoading?
> 
> So, please explain and re-ask review.

The rest of cardinality change of the BOM constant relative to the forced constants is handled in Part 2, so the cardinality change ends up being OK.
Attachment #691300 - Flags: review?(bugs)
(Assignee)

Comment 52

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #51)
> Docs with BOM reject the manual override already in Chrome 

And in Safari. Most likely a WebKit-level thing.
(Assignee)

Updated

5 years ago
Summary: Disable View>Character Coding menu for XML/XHTML documents → Disable View>Character Coding menu when it won't have effect / is unnecessary (e.g. XML)
Comment on attachment 691300 [details] [diff] [review]
Part 1: Make the BOM unoverridable

still trying to understand...
In any case you should remove the if-else
and have just
if (!wyciwygChannel) {
  return NS_OK;
}
...handle wyciwyg...
Attachment #691300 - Flags: review?(bugs) → review-
Comment on attachment 691752 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, using  URI_IS_UI_RESOURCE

So I've definitely had to force UTF-16 on pages that were incorrectly coming out in ISO-8859-1 or UTF-8.  Why do we want to remove that ability?
(In reply to Boris Zbarsky (:bz) from comment #54)
> Comment on attachment 691752 [details] [diff] [review]
> Part 2: Make built-in URLs and UTF-16 content reject overrides, expose
> information about whether the menu should be enabled on the docshell, using 
> URI_IS_UI_RESOURCE
> 
> So I've definitely had to force UTF-16 on pages that were incorrectly coming
> out in ISO-8859-1 or UTF-8.  Why do we want to remove that ability?

Didn't those pages have the BOM? If they have, you don't have to manually force UTF-16 anymore.
> Didn't those pages have the BOM? 

At least some of them didn't, because Firefox misdetected them when I told it UTF-16 without specifying LE or BE...  :(
But OK, I can accept that this is likely a rare case, so at least the cases with BOM will be fine...  Does disallowing forcing of UTF-16 particularly simplify things somewhere else?
(In reply to Boris Zbarsky (:bz) from comment #56)
> > Didn't those pages have the BOM? 
> 
> At least some of them didn't, because Firefox misdetected them when I told
> it UTF-16 without specifying LE or BE...  :(

UTF-16 decoder shouldn't misdetect if the first character is ASCII. Are those pages plain-text?
Those pages were "HTML", but had some garbage stuck in there...
Comment on attachment 691752 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, using  URI_IS_UI_RESOURCE

>+++ b/content/base/src/nsDocument.cpp
> nsHTMLDocument::TryParentCharset(nsIDocShell*  aDocShell,
>+    if (WillIgnoreCharsetOverride() || IsAsciiCompatible(parentCharset)) {

!IsAsciiCompatible, right?

Can we add a test that would catch that?

>@@ -731,47 +746,47 @@ nsHTMLDocument::StartDocumentLoad(const 
>+    // sources. Each try will return early if the source is higher or equal
>+    // to the source as its name describes. Some try call might change charset
>+    // source to multiple values, like TryHintCharset and TryParentCharset.

How about:

  Each Try* function will return early if the source is already at least as
  large as any of the sources it might look at.  Some of these functions
  (like TryHintCharset and TryParentCharset) can set charsetSource to various
  values depending on where the charset they end up finding originally comes from.

>+      // Otherwise, try the channel's charset (e.g., charset from HTTP
>+      // "Content-Type" header) first. This way, we get to reject overrides
>+      // if the channel said UTF-16.

It's not clear to me what the mechanism for that rejecting is.  How does that work, and what part of it relies on calling TryChannelCharset?

>+++ b/docshell/base/nsIDocShell.idl

Please rev the IID.

Sorry for the lag here, but I'd like to understand the TryChannelCharset bit before marking review...
Attachment #691752 - Flags: review?(bzbarsky)
(Assignee)

Comment 61

5 years ago
(In reply to Boris Zbarsky (:bz) from comment #54)
> So I've definitely had to force UTF-16 on pages that were incorrectly coming
> out in ISO-8859-1 or UTF-8.  Why do we want to remove that ability?

Giving the BOM the highest precedence takes care of the case where the content is actually UTF-16 with a BOM but the HTTP layer has the wrong encoding label. So there is no need for the encoding menu in that case anymore.

The case where the content is somehow (either via a HTTP or a BOM) declared as UTF-16, letting the user change the encoding is an XSS risk on any site that accepts user-provided content. See http://hsivonen.iki.fi/test/moz/never-show-user-supplied-content-as-utf-16.htm . To defend against this attack, I think we should disable the character encoding menu when there is a UTF-16 BOM or when HTTP declare some flavor of UTF-16.

That leaves the case where the content is actually BOMless UTF-16 and HTTP does not declare UTF-16. The case where the content is all Basic Latin and HTTP is silent is handled in the HTML parser. That still leaves the case where the content is not all Basic Latin and the case where a HTTP gives an incorrect label. I disabled the menu in these cases also, because if we allowed the user to choose UTF-16 manually in these cases, then we'd need to distinguish that case from the case described in the previous paragraph in order to allow the user to switch back. (It would be pretty bad if the user could change non-UTF-16 pages to be interpreted as UTF-16 without any way to undo.) I went for not allowing manual override in this case because:
 * Simplicity.
 * Smaller probability that the resulting complexity results in a hole that allows the attack described in the previous paragraph to proceed anyway.
 * Being able to simply remove UTF-16 labels from the menu when the set of labels gets reused in places where UTF-16 labels don't make sense (bug 802033).

But it would be possible to distinguish between a page that is currently loaded as UTF-16 thanks to the users and pages that are currently loaded as UTF-16 thanks to the server and enable the menu for undo in the former case (which would still complicate fixing bug 802033).

(In reply to Boris Zbarsky (:bz) from comment #59)
> Those pages were "HTML", but had some garbage stuck in there...

Were those pages important in a way that warrants supporting such severe bogosity?
> Were those pages important in a way that warrants supporting such severe bogosity?

Probably not, no.  Thank you for explaining why it would be a pain to do so.
(Assignee)

Comment 63

5 years ago
Created attachment 700403 [details] [diff] [review]
Part 1: Make the BOM unoverridable, with nicer control flow

(In reply to Olli Pettay [:smaug] from comment #53)
> Comment on attachment 691300 [details] [diff] [review]
> Part 1: Make the BOM unoverridable
> 
> still trying to understand...

Whether the menu is enabled or disabled depends on the top-level browsing context. Previously, when I made it depend on descendent contacts as well, it got disabled to eagerly, because it was disabled due to document.open()ed ad iframes.

In the case where HTTP does not say UTF-16 (i.e. HTTP is silent or has a bogus value) but the stream has a BOM, this patch is needed for the BOM to take precedence. Failure to let the BOM take precedence in this case would mean a failure to protect against the attack demoed in http://hsivonen.iki.fi/test/moz/never-show-user-supplied-content-as-utf-16.htm

As for the case where there is a UTF-8 BOM, it makes sense to let the BOM take precedence over the menu for at least four reasons:

 1) Overriding the encoding to something other than UTF-8 would be wrong for sure as long as the document contains content encoded in a single encoding (as opposed to the kind of bogus pages that concatenate bytes in different encodings and, therefore, can never be decoded “right”), since the bytes of the UTF-8 BOM are not legitimate first bytes for HTML encoded in any other encoding that Gecko supports. (Thankfully, Gecko doesn't support CESU-8.) Therefore, even if the parent document needs to be decoded according to the menu, it doesn't make sense to let the menu selection cause virtually certain misdecoding of frames that start with the UTF-8 BOM.

 2) When the document starts with the UTF-8 BOM, decoding the page according to other ASCII-compatible encoding has more drastic effects than decoding a page encoded using an ASCII-compatible encoding according to another ASCII-compatible encoding. In the latter case, only non-ASCII characters show up wrong, which might be a big deal when the content of the site is in a language that doesn't use the Latin script but might not be a huge deal for some Latin-script cases. However, in the case where the document starts with the UTF-8 BOM, decoding as anything but UTF-8 drops the document into the quirks mode in Gecko (since we don't implement the non-standard oddity that IE10 implements—i.e. discarding bytes that match a UTF-8 BOM regardless of the decoder).

 3) IE9 and older at least all the way to IE6 as well as WebKit-based browsers make their menu have no effect when a BOM is present, so it's extremely unlikely for overridability of the BOM to be available on the Web compat grounds.

 4) If a framed document accepts user input, changing the encoding of the document might cause data corruption by resulting in non-UTF-8 data to be submitted to a server. Rails used to use ☃ and now uses ✓ in a hidden field both to detect data corruption and to make legacy IE properly use UTF-8. Even though an established mechanism for detecting data corruption exists, I think it would be good to offer clueful developers away to avoid data corruption by providing a way to opt out of the legacy encoding craziness. So for developers who make iframeable widgets that potentially get included on clueless sites, I think it would be cool to offer a way for the widget to opt out of the effects of the encoding menu by using the UTF-8 BOM. (As for opt out mechanism being a way for authors to goof in a way that users can’t correct: We already provide such a way to goof: XML. It’s just that HTML currently lacks such an opt-out in Gecko.)

> In any case you should remove the if-else
> and have just
> if (!wyciwygChannel) {
>   return NS_OK;
> }
> ...handle wyciwyg...

Done.
Attachment #691300 - Attachment is obsolete: true
Attachment #700403 - Flags: review?(bugs)
(Assignee)

Comment 64

5 years ago
 5) Considering the hacks that mailnews uses to force UTF-8, I’d prefer mailnews to get rid of those hacks and start their streams with the UTF-8 BOM instead.
Attachment #700403 - Flags: review?(bugs) → review+
(Assignee)

Comment 65

5 years ago
Created attachment 701744 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, properly this time

(In reply to Boris Zbarsky (:bz) from comment #60)
> Comment on attachment 691752 [details] [diff] [review]
> Part 2: Make built-in URLs and UTF-16 content reject overrides, expose
> information about whether the menu should be enabled on the docshell, using 
> URI_IS_UI_RESOURCE
> 
> >+++ b/content/base/src/nsDocument.cpp
> > nsHTMLDocument::TryParentCharset(nsIDocShell*  aDocShell,
> >+    if (WillIgnoreCharsetOverride() || IsAsciiCompatible(parentCharset)) {
> 
> !IsAsciiCompatible, right?

Right. Thanks.

> Can we add a test that would catch that?

I thought we already had a test for that. I’ll investigate.

For the stuff introduced here, I suppose test cases should actually make selections in the Character Encoding menu. I have not written tests like that before. Do we have better docs and samples than https://developer.mozilla.org/en-US/docs/Browser_chrome_tests ?

> >@@ -731,47 +746,47 @@ nsHTMLDocument::StartDocumentLoad(const 
> >+    // sources. Each try will return early if the source is higher or equal
> >+    // to the source as its name describes. Some try call might change charset
> >+    // source to multiple values, like TryHintCharset and TryParentCharset.
> 
> How about:
> 
>   Each Try* function will return early if the source is already at least as
>   large as any of the sources it might look at.  Some of these functions
>   (like TryHintCharset and TryParentCharset) can set charsetSource to various
>   values depending on where the charset they end up finding originally comes
> from.

Used this text.

> >+      // Otherwise, try the channel's charset (e.g., charset from HTTP
> >+      // "Content-Type" header) first. This way, we get to reject overrides
> >+      // if the channel said UTF-16.
> 
> It's not clear to me what the mechanism for that rejecting is.  How does
> that work, and what part of it relies on calling TryChannelCharset?

Oops. The mechanism I thought was there was not there and my testing had been too narrow to reveal this. (Another bug in my code made it look like I had this right in the cases I tested.) Now the relevant methods check the charset that's passed into the method in the in/out param.

I also wrote a broader set of (manual so far) tests and ran them: http://hsivonen.iki.fi/test/moz/charset-menu/

> >+++ b/docshell/base/nsIDocShell.idl
> 
> Please rev the IID.

Done.

> Sorry for the lag here, but I'd like to understand the TryChannelCharset bit
> before marking review...

The general idea is to go read the charset off the channel first so that if it isn't ASCII-compatible (i.e. is a UTF-16 charset), it can be given precedence over the user-chosen charset.
Attachment #691752 - Attachment is obsolete: true
Henri, should I be reviewing that last attachment, or not yet?
(Assignee)

Comment 67

5 years ago
Not yet. One try orange to investigate.
(In reply to Henri Sivonen (:hsivonen) from comment #65)
> For the stuff introduced here, I suppose test cases should actually make
> selections in the Character Encoding menu. I have not written tests like
> that before. Do we have better docs and samples than
> https://developer.mozilla.org/en-US/docs/Browser_chrome_tests ?

I don't think you want to touch the menus themselves, given the gross RDF crap involved there. For testing, calling the various functions called via MultiplexHandler (http://hg.mozilla.org/mozilla-central/annotate/b72d2af170aa/browser/base/content/browser.js#l5431) directly is probably sufficient. browser_bug134911.js is an example of a test that currently tests the character set menu functionality.
(Assignee)

Comment 69

5 years ago
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #68)
> (In reply to Henri Sivonen (:hsivonen) from comment #65)
> > For the stuff introduced here, I suppose test cases should actually make
> > selections in the Character Encoding menu. I have not written tests like
> > that before. Do we have better docs and samples than
> > https://developer.mozilla.org/en-US/docs/Browser_chrome_tests ?
> 
> I don't think you want to touch the menus themselves, given the gross RDF
> crap involved there. For testing, calling the various functions called via
> MultiplexHandler
> (http://hg.mozilla.org/mozilla-central/annotate/b72d2af170aa/browser/base/
> content/browser.js#l5431) directly is probably sufficient.
> browser_bug134911.js is an example of a test that currently tests the
> character set menu functionality.

Thanks. 

As it happens, browser_bug134911.js is the the one test case that is failing for me on try. It isn't a matter of the Part 3 patch interfering with the test code. The problem persists if I push Part 1 and Part 2 to try without Part 3.

The test fails, because the document ends up being decoded as UTF-8. It should initially be decoded as Windows-1251 and then, after forcing the character encoding, as Shift_JIS.

When I copy to that test to my server for manual testing, it works as a expected with my patch is applied: it initially decodes as Windows-1251 and when choosing Shift_JIS from the menu, it is decoded as Shift_JIS. See http://hsivonen.iki.fi/p/test-form_sjis.htm

Any ideas what might be going on?
(Assignee)

Comment 70

5 years ago
OK, the test document is loaded from a chrome: URL. That’s a FAIL.
Two things:
 1) IIRC, our chrome channels always report UTF-8 as the channel charset.
 2) Part 2 here makes the encoding unoverridable manually for chrome URLs.

So the test case doesn’t actually test a real-world scenario at all. I wouldn’t be surprised if we had other tests under mochitest-browser-chrome that are unrealistic in subtle ways due to loading from a chrome URL rather than an http URL.
(Assignee)

Comment 71

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #65)
> > >+++ b/content/base/src/nsDocument.cpp
> > > nsHTMLDocument::TryParentCharset(nsIDocShell*  aDocShell,
> > >+    if (WillIgnoreCharsetOverride() || IsAsciiCompatible(parentCharset)) {
> > 
> > !IsAsciiCompatible, right?
> 
> Right. Thanks.
> 
> > Can we add a test that would catch that?
> 
> I thought we already had a test for that. I’ll investigate.

So we have a reftest for this, but it doesn’t actually catch this.
We should fix the chrome test.  It can load the test file from http; we have other chrome tests doing that.
(In reply to Henri Sivonen (:hsivonen) from comment #70)
> So the test case doesn’t actually test a real-world scenario at all. I
> wouldn’t be surprised if we had other tests under mochitest-browser-chrome
> that are unrealistic in subtle ways due to loading from a chrome URL rather
> than an http URL.

In general tests shouldn't be doing this, and AFAIK most don't.
(Assignee)

Comment 74

5 years ago
Created attachment 702708 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, with fixed pre-existing test

(In reply to Boris Zbarsky (:bz) from comment #72)
> We should fix the chrome test. 

Done.
Attachment #701744 - Attachment is obsolete: true
Attachment #702708 - Flags: review?(bzbarsky)
(Assignee)

Comment 75

5 years ago
Created attachment 702710 [details] [diff] [review]
Part 4: Test cases to show that I didn't break child browsing contexts in unintentional ways

(In reply to Henri Sivonen (:hsivonen) from comment #71)
> So we have a reftest for this, but it doesn’t actually catch this.

Adding a chrome mochitest for this and a bunch of other cases to make sure that child browsing contexts behave as intended.

Per comment 68, these tests don't test whether the menu is disabled/enabled correctly depending on the top-level doc.
Attachment #702710 - Flags: review?(bzbarsky)
Comment on attachment 702710 [details] [diff] [review]
Part 4: Test cases to show that I didn't break child browsing contexts in unintentional ways

r=me, though I mostly skimmed this....
Attachment #702710 - Flags: review?(bzbarsky) → review+
Comment on attachment 702708 [details] [diff] [review]
Part 2: Make built-in URLs and UTF-16 content reject overrides, expose information about whether the menu should be enabled on the docshell, with fixed pre-existing test

r=me
Attachment #702708 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 78

5 years ago
Comment on attachment 691347 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, with corrected docs

Need to null check the app menu stuff on Mac.
Attachment #691347 - Attachment is obsolete: true
Attachment #691347 - Flags: review?(dao)
(Assignee)

Comment 79

5 years ago
Landing the Gecko parts to avoid bitrot and making follow-up work more difficult for others.

https://hg.mozilla.org/integration/mozilla-inbound/rev/b814f5638b20
https://hg.mozilla.org/integration/mozilla-inbound/rev/010f52e1cf84
https://hg.mozilla.org/integration/mozilla-inbound/rev/2aa7bb583a6d
Whiteboard: [leave open]
(Assignee)

Comment 80

5 years ago
Created attachment 703895 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, with null check for platform differences
https://hg.mozilla.org/mozilla-central/rev/b814f5638b20
https://hg.mozilla.org/mozilla-central/rev/010f52e1cf84
https://hg.mozilla.org/mozilla-central/rev/2aa7bb583a6d
Flags: in-testsuite+
(Assignee)

Comment 82

5 years ago
Created attachment 704457 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, null check EVERYTHING

Turns out gBrowser can be null when the Error Console is in the front on Mac.
Attachment #703895 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #704457 - Flags: review?(gavin.sharp)
(Assignee)

Updated

5 years ago
Blocks: 832910
Comment on attachment 704457 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, null check EVERYTHING

Not sure I like the UI aspect of the menu just being disabled (rather than hidden). It will make people wonder why it's enabled on some page and not others, and why it's even there if it mostly always shows up as disabled (which I believe would be the "common case" for most users, right?).
(Assignee)

Comment 84

5 years ago
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #83)
> Not sure I like the UI aspect of the menu just being disabled (rather than
> hidden).

I'm sure the HIG-compliant way to do this is disabling rather than hiding. We don't hide Cut/Copy/Paste in the Edit menu when they are no available.

> It will make people wonder why it's enabled on some page and not
> others,

Well, the standard explanation for disabled UI is that it would do nothing in the case at hand, so it is disabled.

> and why it's even there if it mostly always shows up as disabled
> (which I believe would be the "common case" for most users, right?).

No, it will be enabled on most Web pages. Unfortunately.

Comment 85

5 years ago
Comment on attachment 704457 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, null check EVERYTHING

Review of attachment 704457 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Henri Sivonen (:hsivonen) from comment #84)
> (In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment
> #83)
> > Not sure I like the UI aspect of the menu just being disabled (rather than
> > hidden).
> 
> I'm sure the HIG-compliant way to do this is disabling rather than hiding.
> We don't hide Cut/Copy/Paste in the Edit menu when they are no available.

I concur with Henri. This is also the position of UX (as passed down from Faaborg and Limi).
The general rule is that, when type of the context is the same, menu items should have their disabled state toggled rather than their visible state.

In the page context menu, we toggle visibility only when right-clicking different _types_ of objects, e.g. document vs. link or image.
Attachment #704457 - Flags: ui-review+
Comment on attachment 704457 [details] [diff] [review]
Part 3: Connect the docshell property to the UI, null check EVERYTHING

I was going to suggest a more elegant approach using <observes/> or observes="", but these menu items already observe isImage and so that would just end up more complicated.
Attachment #704457 - Flags: review?(gavin.sharp) → review+
(Assignee)

Comment 87

5 years ago
Thanks. Landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a03b43bb56a1
(Assignee)

Updated

5 years ago
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/a03b43bb56a1
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Updated

5 years ago
Blocks: 842338
This may have caused Bug 852909.
Depends on: 852909

Updated

5 years ago
Depends on: 868407
You need to log in before you can comment on or make changes to this bug.