Closed Bug 1208383 Opened 9 years ago Closed 9 years ago

long content before <meta http-equiv ... > = very strange behaviour

Categories

(Core :: DOM: Core & HTML, defect)

40 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: matthias.hamel, Unassigned, NeedInfo)

Details

Attachments

(1 file)

2.42 KB, application/octet-stream
Details
Attached file zero.cap
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0
Build ID: 20150826023504

Steps to reproduce:

Generate a simple answer like this one : 
HTTP/1.1 200 OK
Date: Fri, 25 Sep 2015 09:03:03 GMT
Server: Apache/2.4.10 (Debian)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
</head>
<body>Hi ! 922 </body>
</html>


With start of <meta ...> placed after byte 1120 ( 0=start of HTTP header )
You can feed the preambule with whatever character you want ( 0x00, 0x20, ... ).
You can place them either before <!DOCTYPE or after it.


Actual results:

First side effects : firefox laucnhes immediately a new request on the same ressource. ( see network capture enclosed )
Secondary side effects not fully identified : should the page contains lniks to other files ( js, css, ... ) caching policy is violated.


Expected results:

Firefox should not launch a secondary request.
Firefox should respect caching policy
OS: Unspecified → All
Hardware: Unspecified → All
Component: Untriaged → DOM
Product: Firefox → Core
@Reporter - have you attempted this in the latest released version of Firefox? Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0)
Flags: needinfo?(matthias.hamel)
Per spec, the initial prescan for a <meta> specifying a charset only considers the first 1024 bytes of the file.  See <https://html.spec.whatwg.org/#determining-the-character-encoding:prescan-a-byte-stream-to-determine-its-encoding>.

In this case, the initial prescan doesn't find the <meta>, because it's too far into the file, so the parse starts with the default encoding, which I expect is "ISO-8859-1" in your case.

During the parse we see the <meta> and that triggers https://html.spec.whatwg.org/#parsing-main-inhead:change-the-encoding which goes to https://html.spec.whatwg.org/#change-the-encoding which in step 6 restarts the navigation but this time forces the new character encoding.

We do fulfill this second request from cache if we can, but if the document is not fully in cache yet for whatever reason when this reload starts, that won't be possible.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Oh, and the point is per spec a valid HTML document per the spec has to have the <meta> specifying charset within the first 1024 bytes.  See https://html.spec.whatwg.org/#charset third bullet point.  The rest of the stuff discussed in comment 2 is basically error recovery for invalid documents.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: