Closed Bug 638318 Opened 15 years ago Closed 15 years ago

Page starting with lots of NUL chars is incorrectly sniffed as UTF-16BE (with HTML5 parser enabled)

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

.x+

People

(Reporter: streetwolf52, Assigned: hsivonen)

References

(
URL
)

Details

(Keywords: regression, Whiteboard: [has patch][needs approval])

Attachments

(3 files, 2 obsolete files)

Fix 15 years ago Boris Zbarsky [:bzbarsky] 1.01 KB, patch	hsivonen : review-	Details \| Diff \| Splinter Review
Safer fix 15 years ago Henri Sivonen (:hsivonen) 1.84 KB, patch	bzbarsky : review+ beltzner : approval2.0+	Details \| Diff \| Splinter Review
Mochitest 15 years ago Henri Sivonen (:hsivonen) 2.60 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
Patch for backing out the change (bug 631751) that caused the regression 15 years ago Henri Sivonen (:hsivonen) 9.32 KB, patch		Details \| Diff \| Splinter Review
Test case, UTF 16 BE with leading zeros 15 years ago jag (Peter Annema) 56 bytes, text/html		Details

Gary [:streetwolf52]

Reporter

Description

•

15 years ago

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre) Gecko/20110302 Firefox/4.0b13pre Build Identifier: 20110302123843 With the HTML5 Parser enabled the page renders in what looks like Asian characters. With the Parser disabled it renders correctly. Reproducible: Always Steps to Reproduce: 1.Go to http://www.rinconrojo.net/podium/index.php?showforum=4 2. 3. Actual Results: All Asian characters appear. Expected Results: A normal looking page. Here's another page with the same problem: http://www.creators.com/featurepages/11_comics_speed-bump.html?name=bmp

Gary [:streetwolf52]

Reporter

Updated

•

15 years ago

Version: unspecified → Trunk

Gary [:streetwolf52]

Reporter

Updated

•

15 years ago

blocking2.0: --- → ?

Leman Bennett [Omega]

Comment 1

•

15 years ago

Something is causing the parser to trip the wrong encoding. Using the encoding menu, setting the option back to iso-8859-1 fixes the issue.

xan K

Comment 2

•

15 years ago

Thank you Dexter for filing the bug. It's been happening since quite some time. All I thought was a problem with the page. Until today I received email feeds from the reported forum and I opened it with IE but still couldn't render well in FF4.

Gary [:streetwolf52]

Reporter

Comment 3

•

15 years ago

(In reply to comment #1) > Something is causing the parser to trip the wrong encoding. Using the encoding > menu, setting the option back to iso-8859-1 fixes the issue. It is set on iso-8859-1 with the Parser enabled. Maybe I am not understanding your comment.

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

The page starts with a bunch of null bytes (12280 of them, in fact) before any content. So the <meta> tag is ignored, since it's too late in the document. So yes, there is very much a problem with the page!

Status: UNCONFIRMED → NEW

Ever confirmed: true

Gary [:streetwolf52]

Reporter

Comment 5

•

15 years ago

Without the Parser enabled it look's like the encoding used is UTF-16BE. With the Parser enabled it's using iso-8859-1.

Boris Zbarsky [:bzbarsky]

Comment 6

•

15 years ago

Henri, SniffBOMlessUTF16BasicLatin looks like it'll treat all null bytes as leaving both elements of byteNonZero as false. But then the code at the bottom of the function assumes that one of them must be true....

Blocks: 631751

Boris Zbarsky [:bzbarsky]

Comment 7

•

15 years ago

> With the Parser enabled it's using iso-8859-1. No. Without the parser enabled (or with the bug fixed), this page will use the user's default encoding, which is locale-dependent. So it'll still be broken for many users; it really needs to be fixed in its own right.

Ed Morley [:emorley]

Comment 8

•

15 years ago

Changing platform to x86; since WOW64 in user agent indicates 32bit build of Firefox (platform refers to browser build, not OS).

Hardware: x86_64 → x86

Gary [:streetwolf52]

Reporter

Comment 9

•

15 years ago

(In reply to comment #8) > Changing platform to x86; since WOW64 in user agent indicates 32bit build of > Firefox (platform refers to browser build, not OS). I missed this. Thanks for correcting.

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

Attached patch Fix (obsolete) — Details — Splinter Review

I'm not quote sure how to write a test for this, since the detected charset will be locale-dependent... Can I just assume that it'll be ISO-8859-1 anywhere where we run tests?

Attachment #516491 - Flags: review?(hsivonen)

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Assignee: nobody → bzbarsky

OS: Windows 7 → All

Priority: -- → P1

Hardware: x86 → All

Whiteboard: [need review]

Jesse Ruderman

Updated

•

15 years ago

Summary: Page renders incorrectly with HTML5 Parser enabled. → Page starting with lots of NUL chars is incorrectly sniffed as UTF-16BE (with HTML5 parser enabled)

mattburles

Comment 11

•

15 years ago

when will this land so it doesnt become a hardblocker? (hopefully it doesnt become one)

Boris Zbarsky [:bzbarsky]

Comment 12

•

15 years ago

It needs review first.