Closed Bug 566280 Opened 14 years ago Closed 14 years ago

[HTML5] Plain text prefixed by U+0000 displays only U+FFFD

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

final+

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Keywords: regression)

Attachments

(2 files, 3 obsolete files)

Test case 14 years ago Henri Sivonen (:hsivonen) 13 bytes, text/html		Details
Fix bad copypasta 14 years ago Henri Sivonen (:hsivonen) 2.39 KB, patch		Details \| Diff \| Splinter Review
Fix bad copypasta, make the reftest reference work on the tinderbox 14 years ago Henri Sivonen (:hsivonen) 2.39 KB, patch		Details \| Diff \| Splinter Review
Fix bad copypasta, make the reftest reference work on the tinderbox, make an older reftest not fail 14 years ago Henri Sivonen (:hsivonen) 5.15 KB, patch	sicking : review+	Details \| Diff \| Splinter Review
Fix bad copypasta, make the reftest reference work on the tinderbox, make an older reftest not fail, fix WHATWG copyright year 14 years ago Henri Sivonen (:hsivonen) 5.53 KB, patch		Details \| Diff \| Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Description

•

14 years ago

Attached file Test case — Details

Steps to reproduce:
 1) Load the attachment.

Expected results:
�hello world

Actual results:
�

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

blocking2.0: --- → ?

Keywords: regression

Johnny Stenback (:jst)

Comment 1

•

14 years ago

Blocking.

blocking2.0: ? → final+

Johnny Stenback (:jst)

Updated

•

14 years ago

Assignee: nobody → hsivonen

Henri Sivonen (:hsivonen)

Assignee

Comment 2

•

14 years ago

Attached patch Fix bad copypasta (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Comment 3

•

14 years ago

Attached patch Fix bad copypasta, make the reftest reference work on the tinderbox (obsolete) — Details — Splinter Review

Attachment #447722 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 4

•

14 years ago

zwol, HTML5 invalidates http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/layout/reftests/bugs/228856-2.html?force=1 since the HTML5 parsering algorithm turns U+0000 into U+FFFD before it reaches the CSS parser. The test has accidentally passed due to this bug.

What should be done to 228856-2.html when landing this fix?

Jonathan Kew [:jfkthame]

Comment 5

•

14 years ago

Is there a specification that explicitly calls for U+0000 to be replaced by U+FFFD? That seems odd to me; if anything, I'd have expected to see a hexbox rather than a Unicode REPLACEMENT CHARACTER. U+FFFD would normally indicate an encoding error (e.g. an invalid UTF-8 sequence or unpaired UTF-16 surrogate, or an invalid code in a legacy codepage that cannot be transcoded to Unicode), not merely a correctly-encoded character that we can't display.

Zack Weinberg (:zwol)

Comment 6

•

14 years ago

The test is really about what U+0000 does to the CSS parser, so you should definitely pull the contents of the <style> tag out to a separate sheet.

I'm not sure what to do with the divs, though.  Does <div something="..&#0;.."> still generate an attribute with a literal NUL in its value?  If so, we could probably just delete the subtests with literal NULs in the input, and rely on the &#0;s.  If not, we need to convert this to a mochitest that uses JS to examine the parsed style sheet, which is a thing I can do if you don't know how.

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

14 years ago

(In reply to comment #6)
> The test is really about what U+0000 does to the CSS parser, so you should
> definitely pull the contents of the <style> tag out to a separate sheet.

OK.

> I'm not sure what to do with the divs, though.  Does <div something="..&#0;..">
> still generate an attribute with a literal NUL in its value?

&#0; generates U+FFFD per HTML5.

> If so, we could
> probably just delete the subtests with literal NULs in the input, and rely on
> the &#0;s.  If not, we need to convert this to a mochitest that uses JS to
> examine the parsed style sheet, which is a thing I can do if you don't know
> how.

I don't, so it would be nice if you'd do it to make sure the test still test what you intended.

Henri Sivonen (:hsivonen)

Assignee

Comment 8

•

14 years ago

(In reply to comment #5)
> Is there a specification that explicitly calls for U+0000 to be replaced by
> U+FFFD?

Yes, the HTML5 spec.

The zero byte:
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream

The numeric reference:
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references

Zack Weinberg (:zwol)

Comment 9

•

14 years ago

(In reply to comment #7)
> > we need to convert this to a mochitest that uses JS to
> > examine the parsed style sheet, which is a thing I can do if you don't know
> > how.
> 
> I don't, so it would be nice if you'd do it to make sure the test still test
> what you intended.

I will try to find time for this next week.  Note that Monday is a holiday in the USA.

(In reply to comment #8)
> (In reply to comment #5)
> > Is there a specification that explicitly calls for U+0000 to be replaced by
> > U+FFFD?
> 
> Yes, the HTML5 spec.

CSS presently doesn't define the behavior of U+0000 either as a literal character or as a \-escape.  It is tempting to propose that CSS change to match HTML5 - it's not like there's any cost to doing so, and we'd gain predictability.  dbaron, fantasai, what do you think?

Henri Sivonen (:hsivonen)

Assignee

Comment 10

•

14 years ago

Attached patch Fix bad copypasta, make the reftest reference work on the tinderbox, make an older reftest not fail — Details — Splinter Review

Splitting out the first <style> into a <link rel=stylesheet> was enough to make 228856-2.html not fail.

The binary patch that adds a reftest for this bug is like the reference for the test except there's a zero byte where the reference has &#xFFFD;.

Attachment #447960 - Attachment is obsolete: true

Attachment #448355 - Flags: review?(jonas)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Blocks: 568228

Henri Sivonen (:hsivonen)

Assignee

Comment 11

•

14 years ago

Attached patch Fix bad copypasta, make the reftest reference work on the tinderbox, make an older reftest not fail, fix WHATWG copyright year (obsolete) — Details — Splinter Review

Forgot to update a copyright year.

Attachment #448355 - Attachment is obsolete: true

Attachment #448371 - Flags: review?(jonas)

Attachment #448355 - Flags: review?(jonas)

Henri Sivonen (:hsivonen)

Assignee

Comment 12

•

14 years ago

Comment on attachment 448355 [details] [diff] [review]
Fix bad copypasta, make the reftest reference work on the tinderbox, make an older reftest not fail

(In reply to comment #11)
> Forgot to update a copyright year.

Sorry. Wrong bug.

Attachment #448355 - Attachment is obsolete: false

Attachment #448355 - Flags: review?(jonas)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

14 years ago

Attachment #448371 - Attachment is obsolete: true

Attachment #448371 - Flags: review?(jonas)

Zack Weinberg (:zwol)

Comment 13

•

14 years ago

Henri, when exactly were these rules for U+0000 and &#0; added to HTML5?  If there was public discussion of this change, a pointer to that would also be useful.

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

14 years ago

(In reply to comment #13)
> Henri, when exactly were these rules for U+0000 and &#0; added to HTML5?

http://html5.org/tools/web-apps-tracker?from=13&to=14

> If
> there was public discussion of this change, a pointer to that would also be
> useful.

I can't find a public discussion of this change. I can find some emails where I whined about U+0000 getting dropped without a parse error, but I don't see email from me or Hixie about mapping it to U+FFFD.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

14 years ago

Attachment #448355 - Flags: review?(jonas) → review+

Henri Sivonen (:hsivonen)

Assignee

Comment 17

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/14bb99ed59c8

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.