Closed Bug 848840 Opened 11 years ago Closed 11 years ago

nsIUnicharStreamLoaderObserver::onDetermineCharset() should be able to read the first 1024 bytes

Categories

(Core :: Networking, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: lduros, Assigned: bzbarsky)

Details

Attachments

(1 file)

The specs have changed: http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#charset

A meta charset may appear within the 1024 first bytes as opposed to the 512 bytes that are currently being read. This causes an issue with extensions (LibreJS in particular) using the callback when the meta charset appears after 512 bytes, as in the following page:
http://www.tapochek.net/viewforum.php?f=183
Assignee: nobody → bzbarsky
Product: Firefox → Core
Component: General → Networking
Whiteboard: [need review]
I'd prefer that hsivonen review this. Note that what the original comment referred to is still 512 per http://mimesniff.spec.whatwg.org/
Attachment #722392 - Flags: review?(annevk) → review?(hsivonen)
Comment on attachment 722392 [details] [diff] [review]
Sniff the first 1024 bytes, not the first 512, for charsets, since <meta> can be anywhere in the first 1024 bytes.

I agree that 1024 bytes should be used for <meta> sniffing. But why is this class used for <meta> sniffing? It seems bad to make the CSS loader wait for more bytes before taking action in order to cater to an extension that isn't using the HTML parser for HTML. What's the extension doing? Why is the extension not using the HTML support in XHR for whatever it is that it is doing?
Or does the CSS loader wait for all bytes to arrive before doing anything anyway?
The original comment is just wrong, since there's no mimesniffing involved here.

> But why is this class used for <meta> sniffing?

Extensions do it.

> It seems bad to make the CSS loader wait for more bytes before taking action

How so?  The only "action" being taken here is providing the encoding to decode as.  Really, the CSS loader would be fine with having _all_ the data before doing that.

> in order to cater to an extension that isn't using the HTML parser for HTML.

If the extension is just trying to show the text of the response, using an HTML parser seems ... wrong.  But it does need to be able to convert the bytes to Unicode.

> Why is the extension not using the HTML support in XHR

That I can't tell you.

> Or does the CSS loader wait for all bytes to arrive before doing anything anyway?

It does, as do all consumers of the code being patched here.  The whole point of this class is to convert an HTTP request into a unichar stream, wait for it to be done, then hand the stream off to someone to read.
Comment on attachment 722392 [details] [diff] [review]
Sniff the first 1024 bytes, not the first 512, for charsets, since <meta> can be anywhere in the first 1024 bytes.

(In reply to Boris Zbarsky (:bz) from comment #5)
> > Or does the CSS loader wait for all bytes to arrive before doing anything anyway?
> 
> It does, as do all consumers of the code being patched here.  The whole
> point of this class is to convert an HTTP request into a unichar stream,
> wait for it to be done, then hand the stream off to someone to read.

Seems strange to have a sniffing limit at all then, but OK. Note that the HTML parser honors even laters metas (in the docshell case--not in the XHR case) by reloading.
Attachment #722392 - Flags: review?(hsivonen) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4c271568b51d
Flags: in-testsuite?
Whiteboard: [need review]
Target Milestone: --- → mozilla22
https://hg.mozilla.org/mozilla-central/rev/4c271568b51d
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: