An extension showing the failure under Thunderbird trunk is here: https://github.com/bwinton/argtest
The extension adds two buttons to the Go Menu which open two different html pages using window.openDialog. The first page shows us that window.arguments is populated just fine. The second page has a single extra character in the comment at the top, and has window.arguments be undefined.
I have found that moving the tags up to and including <meta charset="utf-8" /> above the comment seems to work as well. I would be interested in hearing a less-hacky workaround.
Since our current boilerplate license is longer than the allowed limit, this is more of a problem than it might otherwise be.
Finally, I've put this under "HTML: parser" because changing the location or length of a comment changes the interpretation of the file, but I have no idea whether this is the right component or not.
If the <meta> comes later than the prescan size (which iirc is 512 or 1024 bytes depending on day of week and spec editor's mood), then the file is reloaded with the new charser when the <meta> is hit _if_ the charset in the <meta> is different from what the document had before. The global window code presumably sees this as a new load and clears .arguments (or rather, drops the inner window the arguments live on).
It's not clear to me that we want to support .arguments on HTML in the first place (and in fact, given the pitfall above, perhaps we should never allow it to work there). Or maybe we want to preserve .arguments across charset reloads
As far as "non-hacky" fixes go, you could use XHTML, where the <meta charset> thing is just not an issue. Past that, if you're going to serve HTML which is not in the default charset over a protocol that doesn't allow specifying charset information, then you have to live with the charset sniffing rules of HTML. Perhaps chrome:// _should_ be able to specify charset information?
I'ld like to use arguments in HTML, so I'ld prefer it if you preserved them across charset reloads, or let me specify the charset in chrome:// so that I can avoid the reload.
XHTML seemed to have some problems with jQuery (or jQuery-tmpl perhaps) the last time I tried it, so I'ld rather not go that direction.
What is the default charset for html5? Perhaps I can just code in that, and avoid the reload myself…
> What is the default charset for html5?
Depends on your preferences, iirc. And those depend on your locale. See http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding step 8, since none of the previous 7 steps apply here.
Benjamin, are we ok with adding a way to specify charset in chrome manifests? Or just forcing all chrome to UTF-8 by default? Or both?
How about "charset=utf-8" in the features? (Yeah, opening the same page with two different charsets doesn't make a lot of sense, but it's a place we could put that info.)
Also, it looks like I might be able to put the UTF-8 BOM at the start of my file, to force the charset in step 4 of the page you linked. What do you think of that option?
UTF8 by default would be my preference.
FWIW, I think the right fix here is to preserve .arguments across charset reloads. This is something which could affect add-ons (especially those who already use .arguments and just get tested against 2.0) and while allowing charset specification for chrome URIs is nice and all, I think not fixing it by preserving .arguments is going to cause pain to some add-on developers, especially since the fix for this issue is less than obvious.
> How about "charset=utf-8" in the features?
If we have to, but see below.
> Also, it looks like I might be able to put the UTF-8 BOM at the start
Ah, yes. That should work too.
> UTF8 by default would be my preference.
Note that it's not quite "by default". If we set it on the channel, that would mean that everything loaded from chrome:// is always UTF-8. <meta> would be ignored, etc. Now I, personally, am just fine with that. Want me to just write a patch for this?
> FWIW, I think the right fix here is to preserve .arguments across charset
The thing with charset reloads is that they can happen _after_ some script has already run on the page. They're really a fallback hack to handle really badly authored web pages, not something we can make behave sanely. And that script could pop arguments or modify them, for example. So "preserving" arguments across the reload will just make things work sometimes but not others depending on what happened before the charset reload; there's no sane way to make it work in all cases.
I'd prefer to just force all chrome:// to always be UTF-8. The only worry is compat, at this stage.
Yes, just set it on the channel.
(In reply to comment #1)
> If the <meta> comes later than the prescan size (which iirc is 512 or 1024
> bytes depending on day of week and spec editor's mood)
To be fair, it's not about Hixie's mood. Hixie said 512. WebKit said 1024. Dreamhost-hosted content (their Apache is broken) and chardet test cases worked better with 1024, so I went with 1024 instead of 512.
I think it doesn't make sense to change this for chrome:, and I think making chrome: channels report UTF-8 with kCharsetFromChannel would make sense.
Created attachment 499429 [details] [diff] [review]
Make chrome channels always default to UTF-8.
I'm going to land this after 2.0 branches. If someone thinks this should be in 2.0, please speak up and explain why.
I tried pushing this, but we have tests that assume they can load non-UTF from chrome. We need to fix those first. I'll file bugs blocking this one.