Closed Bug 572215 Opened 15 years ago Closed 15 years ago

[HTML5] ASCII unprintable characters (0x00-0x1F) rendered as questionmark-in-diamond instead of hexbox

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: netrolller.3d, Unassigned)

References

(
URL
)

Details

(Keywords: regression)

Gábor Stefanik

Reporter

Description

•

15 years ago

With the HTML5 parser enabled, ASCII unprintable characters no longer render as hexboxes, but rather as questionmark-in-diamond glyphs (the Missing Glyph symbol used before hexboxes were implemented). With HTML5 disabled, hexboxes are correctly rendered. Compare the following URL: data:text/html,This should be a hexbox:  (which is correct with the old parser but not with HTML5) with this: data:text/html,This should be a hexbox: 𐀀 (correct with both parsers). (However, the following URL: data:text/html,This should be a hexbox:  is also misrendered with HTML5.)

Boris Zbarsky [:bzbarsky]

Comment 1

•

15 years ago

The HTML5 spec requires this behavior, at least for U+0000.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → INVALID

Gábor Stefanik

Reporter

Comment 2

•

15 years ago

(In reply to comment #1) > The HTML5 spec requires this behavior, at least for U+0000. Where does the spec say "When the user agent comes across an unprintable ASCII character, it must not reveal any information about exactly what character it is"? Does HTML5 really specify the exact glyph to be used for unprintable characters? Doesn't make much sense to me...

Gábor Stefanik

Reporter

Comment 3

•

15 years ago

Also, how do you explain the discrepancy between: data:text/html,This should be a hexbox:  and data:text/html,This should be a hexbox: 𐀀 ?

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

> Where does the spec say "When the user agent comes across an unprintable > ASCII character, it must not reveal any information about exactly what > character it is"? Several different places, but the one relevant for  is http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#consume-a-character-reference and specifically the part that says: If that number is one of the numbers in the first column of the following table, then this is a parse error. Find the row with that number in the first column, and return a character token for the Unicode character given in the second column of that row. 0x00 is in the first row of the table, and the corresponding character is U+FFFD. Your other example () falls into the list of things that are considered a parse error, but should make it through to the DOM intact, and it does for me with both the HTML5 parser and the old one (neither shows the hexbox for me). similarly, both parsers show a hexbox for .

Boris Zbarsky [:bzbarsky]

Comment 5

•

15 years ago

> Also, how do you explain the discrepancy between: I don't see such a discrepancy here...

Boris Zbarsky [:bzbarsky]

Comment 6

•

15 years ago

And just so we're clear... which exact build are you using?

Gábor Stefanik

Reporter

Comment 7

•

15 years ago

It is the same in the latest trunk and in 3.6.3.  displays a hexbox here with HTML4 but an U+FFFD with HTML5. Same for  (and ). OTOH for 𐀀 (𐀀), both parsers display a hexbox. BTW, the relevant part of the spec seems to only deal with character references. However, the difference I reported in comment 0 also exists for literal ASCII NULs in the HTML source, which doesn't seem to be covered.

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

>  displays a hexbox here with HTML4 but an U+FFFD with HTML5. It does? Does the character in the DOM end up as U+FFFD? Because it sure doesn't here, on latest trunk on Mac (and I see a hexbox). I'll spin up a Windows build. > However, the difference I reported in comment 0 also exists for > literal ASCII NULs in the HTML source Sure. See http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream

Boris Zbarsky [:bzbarsky]

Comment 9

•

15 years ago

>  displays a hexbox here with HTML4 but an U+FFFD with HTML5. This shows up as a hexbox for me with the HTML5 parser on Mac OS 10.5, Linux (F12), and Windows 7.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[HTML5] ASCII unprintable characters (0x00-0x1F) rendered as questionmark-in-diamond instead of hexbox

Categories

(Core :: DOM: HTML Parser, defect)

Tracking

()

People

(Reporter: netrolller.3d, Unassigned)

References

(
URL
)

Details

(Keywords: regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9