Last Comment Bug 617339 - window.arguments is undefined when opening an HTML file with a long-ish comment at the start.
: window.arguments is undefined when opening an HTML file with a long-ish comme...
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: x86 Mac OS X
: P2 normal (vote)
: mozilla15
Assigned To: Boris Zbarsky [:bz]
Depends on: 644206 644209
  Show dependency treegraph
Reported: 2010-12-07 09:04 PST by Blake Winton (:bwinton) (:☕️)
Modified: 2012-05-25 08:30 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Make chrome channels always default to UTF-8. (867 bytes, patch)
2010-12-22 16:50 PST, Boris Zbarsky [:bz]
benjamin: review+
Details | Diff | Splinter Review

Description Blake Winton (:bwinton) (:☕️) 2010-12-07 09:04:44 PST
An extension showing the failure under Thunderbird trunk is here:

The extension adds two buttons to the Go Menu which open two different html pages using window.openDialog.  The first page shows us that window.arguments is populated just fine.  The second page has a single extra character in the comment at the top, and has window.arguments be undefined.

I have found that moving the tags up to and including <meta charset="utf-8" /> above the comment seems to work as well.  I would be interested in hearing a less-hacky workaround.

Since our current boilerplate license is longer than the allowed limit, this is more of a problem than it might otherwise be.

Finally, I've put this under "HTML: parser" because changing the location or length of a comment changes the interpretation of the file, but I have no idea whether this is the right component or not.

Comment 1 Boris Zbarsky [:bz] 2010-12-07 09:21:02 PST
If the <meta> comes later than the prescan size (which iirc is 512 or 1024 bytes depending on day of week and spec editor's mood), then the file is reloaded with the new charser when the <meta> is hit _if_ the charset in the <meta> is different from what the document had before.  The global window code presumably sees this as a new load and clears .arguments (or rather, drops the inner window the arguments live on).

It's not clear to me that we want to support .arguments on HTML in the first place (and in fact, given the pitfall above, perhaps we should never allow it to work there).  Or maybe we want to preserve .arguments across charset reloads

As far as "non-hacky" fixes go, you could use XHTML, where the <meta charset> thing is just not an issue.  Past that, if you're going to serve HTML which is not in the default charset over a protocol that doesn't allow specifying charset information, then you have to live with the charset sniffing rules of HTML.  Perhaps chrome:// _should_ be able to specify charset information?
Comment 2 Blake Winton (:bwinton) (:☕️) 2010-12-07 09:47:31 PST
I'ld like to use arguments in HTML, so I'ld prefer it if you preserved them across charset reloads, or let me specify the charset in chrome:// so that I can avoid the reload.

XHTML seemed to have some problems with jQuery (or jQuery-tmpl perhaps) the last time I tried it, so I'ld rather not go that direction.

What is the default charset for html5?  Perhaps I can just code in that, and avoid the reload myself…

Comment 3 Boris Zbarsky [:bz] 2010-12-07 09:54:41 PST
> What is the default charset for html5?

Depends on your preferences, iirc.  And those depend on your locale.  See step 8, since none of the previous 7 steps apply here.

Benjamin, are we ok with adding a way to specify charset in chrome manifests?  Or just forcing all chrome to UTF-8 by default?  Or both?
Comment 4 Blake Winton (:bwinton) (:☕️) 2010-12-07 10:17:18 PST
How about "charset=utf-8" in the features?  (Yeah, opening the same page with two different charsets doesn't make a lot of sense, but it's a place we could put that info.)

Also, it looks like I might be able to put the UTF-8 BOM at the start of my file, to force the charset in step 4 of the page you linked.  What do you think of that option?

Comment 5 Benjamin Smedberg [:bsmedberg] 2010-12-07 10:21:29 PST
UTF8 by default would be my preference.
Comment 6 :Ehsan Akhgari 2010-12-07 11:57:56 PST
FWIW, I think the right fix here is to preserve .arguments across charset reloads.  This is something which could affect add-ons (especially those who already use .arguments and just get tested against 2.0) and while allowing charset specification for chrome URIs is nice and all, I think not fixing it by preserving .arguments is going to cause pain to some add-on developers, especially since the fix for this issue is less than obvious.
Comment 7 Boris Zbarsky [:bz] 2010-12-07 13:22:46 PST
> How about "charset=utf-8" in the features?

If we have to, but see below.

> Also, it looks like I might be able to put the UTF-8 BOM at the start

Ah, yes.  That should work too.

> UTF8 by default would be my preference.

Note that it's not quite "by default".  If we set it on the channel, that would mean that everything loaded from chrome:// is always UTF-8.  <meta> would be ignored, etc.  Now I, personally, am just fine with that.  Want me to just write a patch for this?

> FWIW, I think the right fix here is to preserve .arguments across charset
> reloads

The thing with charset reloads is that they can happen _after_ some script has already run on the page.  They're really a fallback hack to handle really badly authored web pages, not something we can make behave sanely.  And that script could pop arguments or modify them, for example.  So "preserving" arguments across the reload will just make things work sometimes but not others depending on what happened before the charset reload; there's no sane way to make it work in all cases.

I'd prefer to just force all chrome:// to always be UTF-8.  The only worry is compat, at this stage.
Comment 8 Benjamin Smedberg [:bsmedberg] 2010-12-07 20:17:59 PST
Yes, just set it on the channel.
Comment 9 Henri Sivonen (:hsivonen) 2010-12-08 02:08:27 PST
(In reply to comment #1)
> If the <meta> comes later than the prescan size (which iirc is 512 or 1024
> bytes depending on day of week and spec editor's mood)

To be fair, it's not about Hixie's mood. Hixie said 512. WebKit said 1024. Dreamhost-hosted content (their Apache is broken) and chardet test cases worked better with 1024, so I went with 1024 instead of 512.

I think it doesn't make sense to change this for chrome:, and I think making chrome: channels report UTF-8 with kCharsetFromChannel would make sense.
Comment 10 Boris Zbarsky [:bz] 2010-12-22 16:50:28 PST
Created attachment 499429 [details] [diff] [review]
Make chrome channels always default to UTF-8.
Comment 11 Boris Zbarsky [:bz] 2011-01-13 13:22:57 PST
I'm going to land this after 2.0 branches.  If someone thinks this should be in 2.0, please speak up and explain why.
Comment 12 Boris Zbarsky [:bz] 2011-03-23 09:58:53 PDT
I tried pushing this, but we have tests that assume they can load non-UTF from chrome.  We need to fix those first.  I'll file bugs blocking this one.
Comment 14 Ed Morley [:emorley] 2012-05-25 08:30:23 PDT

Note You need to log in before you can comment on or make changes to this bug.