Closed
Bug 638318
Opened 13 years ago
Closed 13 years ago
Page starting with lots of NUL chars is incorrectly sniffed as UTF-16BE (with HTML5 parser enabled)
Categories
(Core :: DOM: HTML Parser, defect, P1)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | .x+ |
People
(Reporter: streetwolf52, Assigned: hsivonen)
References
()
Details
(Keywords: regression, Whiteboard: [has patch][needs approval])
Attachments
(3 files, 2 obsolete files)
1.84 KB,
patch
|
bzbarsky
:
review+
beltzner
:
approval2.0+
|
Details | Diff | Splinter Review |
2.60 KB,
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
9.32 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre) Gecko/20110302 Firefox/4.0b13pre Build Identifier: 20110302123843 With the HTML5 Parser enabled the page renders in what looks like Asian characters. With the Parser disabled it renders correctly. Reproducible: Always Steps to Reproduce: 1.Go to http://www.rinconrojo.net/podium/index.php?showforum=4 2. 3. Actual Results: All Asian characters appear. Expected Results: A normal looking page. Here's another page with the same problem: http://www.creators.com/featurepages/11_comics_speed-bump.html?name=bmp
Reporter | ||
Updated•13 years ago
|
Version: unspecified → Trunk
Reporter | ||
Updated•13 years ago
|
blocking2.0: --- → ?
Comment 1•13 years ago
|
||
Something is causing the parser to trip the wrong encoding. Using the encoding menu, setting the option back to iso-8859-1 fixes the issue.
Thank you Dexter for filing the bug. It's been happening since quite some time. All I thought was a problem with the page. Until today I received email feeds from the reported forum and I opened it with IE but still couldn't render well in FF4.
Reporter | ||
Comment 3•13 years ago
|
||
(In reply to comment #1) > Something is causing the parser to trip the wrong encoding. Using the encoding > menu, setting the option back to iso-8859-1 fixes the issue. It is set on iso-8859-1 with the Parser enabled. Maybe I am not understanding your comment.
Comment 4•13 years ago
|
||
The page starts with a bunch of null bytes (12280 of them, in fact) before any content. So the <meta> tag is ignored, since it's too late in the document. So yes, there is very much a problem with the page!
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 5•13 years ago
|
||
Without the Parser enabled it look's like the encoding used is UTF-16BE. With the Parser enabled it's using iso-8859-1.
Comment 6•13 years ago
|
||
Henri, SniffBOMlessUTF16BasicLatin looks like it'll treat all null bytes as leaving both elements of byteNonZero as false. But then the code at the bottom of the function assumes that one of them must be true....
Blocks: 631751
Comment 7•13 years ago
|
||
> With the Parser enabled it's using iso-8859-1.
No. Without the parser enabled (or with the bug fixed), this page will use the user's default encoding, which is locale-dependent. So it'll still be broken for many users; it really needs to be fixed in its own right.
Comment 8•13 years ago
|
||
Changing platform to x86; since WOW64 in user agent indicates 32bit build of Firefox (platform refers to browser build, not OS).
Hardware: x86_64 → x86
Reporter | ||
Comment 9•13 years ago
|
||
(In reply to comment #8) > Changing platform to x86; since WOW64 in user agent indicates 32bit build of > Firefox (platform refers to browser build, not OS). I missed this. Thanks for correcting.
Comment 10•13 years ago
|
||
I'm not quote sure how to write a test for this, since the detected charset will be locale-dependent... Can I just assume that it'll be ISO-8859-1 anywhere where we run tests?
Attachment #516491 -
Flags: review?(hsivonen)
Updated•13 years ago
|
Assignee: nobody → bzbarsky
OS: Windows 7 → All
Priority: -- → P1
Hardware: x86 → All
Whiteboard: [need review]
Updated•13 years ago
|
Summary: Page renders incorrectly with HTML5 Parser enabled. → Page starting with lots of NUL chars is incorrectly sniffed as UTF-16BE (with HTML5 parser enabled)
Comment 11•13 years ago
|
||
when will this land so it doesnt become a hardblocker? (hopefully it doesnt become one)
Comment 12•13 years ago
|
||
It needs review first.
Assignee | ||
Comment 13•13 years ago
|
||
Comment on attachment 516491 [details] [diff] [review] Fix r=me.
Attachment #516491 -
Flags: review?(hsivonen) → review+
Assignee | ||
Updated•13 years ago
|
Whiteboard: [need review] → [has patch][needs approval]
Assignee | ||
Updated•13 years ago
|
Attachment #516491 -
Flags: approval2.0?
Assignee | ||
Comment 14•13 years ago
|
||
Hmm. Actually, won't this patch still sniff as UTF-16 if there's, say 500 zero bytes? A safer follow-up patch coming up.
Assignee | ||
Comment 15•13 years ago
|
||
Comment on attachment 516491 [details] [diff] [review] Fix Clearing review. Will post another patch after testing it.
Attachment #516491 -
Flags: review-
Attachment #516491 -
Flags: review+
Attachment #516491 -
Flags: approval2.0?
Assignee | ||
Updated•13 years ago
|
Whiteboard: [has patch][needs approval]
Assignee | ||
Comment 16•13 years ago
|
||
Mochitests coming up in another patch separately.
Assignee: bzbarsky → hsivonen
Attachment #516491 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #516517 -
Flags: review?(bzbarsky)
Assignee | ||
Updated•13 years ago
|
Keywords: regression
Whiteboard: [has patch][needs review]
Assignee | ||
Comment 17•13 years ago
|
||
(In reply to comment #14) > Hmm. Actually, won't this patch still sniff as UTF-16 if there's, say 500 zero > bytes? A safer follow-up patch coming up. Rather, the problem I'm trying to fix by making this "safer" is where there are runs of zeros and non-zeros are all even or odd. It seems safer to trigger the UTF-16 stuff only if we the an alternation of zero and non-zero all the way.
Assignee | ||
Comment 18•13 years ago
|
||
The file in the binary patch part has a long run of zeros at the start but one non-zero byte among them within the first 1024 bytes. Then after the 1024-byte limit, there's ASCII text.
Attachment #516520 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 19•13 years ago
|
||
About blockingness: I think this bug is worse from the Web compat point of view than bug 631751 (whose fix caused this regression). If this fix doesn't make it to Firefox 4.0, I think we should take the fix for 4.0.1 or, failing that, back out bug 631751 for 4.0.1.
Assignee | ||
Comment 20•13 years ago
|
||
Here's a patch that reverses the fix for bug 631751 in case we want to do that instead.
Comment 21•13 years ago
|
||
Comment on attachment 516517 [details] [diff] [review] Safer fix OK, yeah. This would be the other way to deal. r=me
Attachment #516517 -
Flags: review?(bzbarsky) → review+
Comment 22•13 years ago
|
||
Comment on attachment 516520 [details] [diff] [review] Mochitest r=me. Please request approval for this? For drivers: I think this is very safe, in the sense that it turns off some sniffing that we added recently in cases when there are lots of null bytes at the beginning of the file, because the sniffing is not reliable in that situation.
Attachment #516520 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 23•13 years ago
|
||
Comment on attachment 516517 [details] [diff] [review] Safer fix Setting the approval request per bz's comment.
Attachment #516517 -
Flags: approval2.0?
Assignee | ||
Updated•13 years ago
|
Whiteboard: [has patch][needs review] → [has patch][needs approval]
Comment 24•13 years ago
|
||
Comment on attachment 516517 [details] [diff] [review] Safer fix a=beltzner
Attachment #516517 -
Flags: approval2.0? → approval2.0+
Assignee | ||
Comment 26•13 years ago
|
||
I'll land.
Comment 27•13 years ago
|
||
Landed: http://hg.mozilla.org/mozilla-central/rev/ed9c526f93b5
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 28•13 years ago
|
||
Test landed: http://hg.mozilla.org/mozilla-central/rev/770e2709cff5
Flags: in-testsuite+
Comment 29•13 years ago
|
||
Comment 30•13 years ago
|
||
I think the first fix was better. Or maybe the second fix needs to distinguish between 0x0000 characters before and after having found a non-zero byte. The attached test case renders differently in current trunk than in Fx 4.0 b12 and IE 9.
Assignee | ||
Comment 31•13 years ago
|
||
(In reply to comment #29) > Created attachment 516644 [details] > Test case, UTF 16 BE with leading zeros Has this case been seen in the wild?
Comment 32•13 years ago
|
||
Not that I know of. New bug, target Fx 5?
Assignee | ||
Comment 33•13 years ago
|
||
(In reply to comment #32) > Not that I know of. New bug, target Fx 5? Let's not go further down the slippery slope here. Supporting imaginable bogosity is not a goal. The point is supporting the real-world bogosity seen in bug 631751 to the extent it works in both IE9 and Firefox 3.6--and only to that extent. Note that Chrome and Safari don't support that bogosity anyway.
Comment 34•13 years ago
|
||
> The attached test case renders differently
That was rather the point of Henri's approach, yes.
Updated•13 years ago
|
Attachment #516644 -
Attachment is obsolete: true
You need to log in
before you can comment on or make changes to this bug.
Description
•