long content before <meta http-equiv ... > = very strange behaviour




3 years ago
3 years ago


(Reporter: matthias.hamel, Unassigned, NeedInfo)


40 Branch

Firefox Tracking Flags

(Not tracked)



(1 attachment)

2.42 KB, application/octet-stream


3 years ago
Created attachment 8665856 [details]

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0
Build ID: 20150826023504

Steps to reproduce:

Generate a simple answer like this one : 
HTTP/1.1 200 OK
Date: Fri, 25 Sep 2015 09:03:03 GMT
Server: Apache/2.4.10 (Debian)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html

<!DOCTYPE html>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<body>Hi ! 922 </body>

With start of <meta ...> placed after byte 1120 ( 0=start of HTTP header )
You can feed the preambule with whatever character you want ( 0x00, 0x20, ... ).
You can place them either before <!DOCTYPE or after it.

Actual results:

First side effects : firefox laucnhes immediately a new request on the same ressource. ( see network capture enclosed )
Secondary side effects not fully identified : should the page contains lniks to other files ( js, css, ... ) caching policy is violated.

Expected results:

Firefox should not launch a secondary request.
Firefox should respect caching policy


3 years ago
OS: Unspecified → All
Hardware: Unspecified → All
Component: Untriaged → DOM
Product: Firefox → Core
@Reporter - have you attempted this in the latest released version of Firefox? Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0)
Flags: needinfo?(matthias.hamel)
Per spec, the initial prescan for a <meta> specifying a charset only considers the first 1024 bytes of the file.  See <https://html.spec.whatwg.org/#determining-the-character-encoding:prescan-a-byte-stream-to-determine-its-encoding>.

In this case, the initial prescan doesn't find the <meta>, because it's too far into the file, so the parse starts with the default encoding, which I expect is "ISO-8859-1" in your case.

During the parse we see the <meta> and that triggers https://html.spec.whatwg.org/#parsing-main-inhead:change-the-encoding which goes to https://html.spec.whatwg.org/#change-the-encoding which in step 6 restarts the navigation but this time forces the new character encoding.

We do fulfill this second request from cache if we can, but if the document is not fully in cache yet for whatever reason when this reload starts, that won't be possible.
Last Resolved: 3 years ago
Resolution: --- → INVALID
Oh, and the point is per spec a valid HTML document per the spec has to have the <meta> specifying charset within the first 1024 bytes.  See https://html.spec.whatwg.org/#charset third bullet point.  The rest of the stuff discussed in comment 2 is basically error recovery for invalid documents.
You need to log in before you can comment on or make changes to this bug.