Closed Bug 228099 Opened 18 years ago Closed 18 years ago

Parse U+000A, "
" and "
" correctly in attributes


(Core :: DOM: HTML Parser, defect)

Not set





(Reporter: iann_bugzilla, Unassigned)




(1 file)

In title="" and possibly other places U+000A, "
" and "
" get parsed as
newlines, U+000A should become a space and "
" or "
" should become a
Attached file Simple test case
I think this demonstrates this problem and also a similar problem with tabs.
For previous discussions see bug 67127
OS: Windows XP → All
Hardware: PC → All
Attachment #137213 - Attachment is patch: false
Attachment #137213 - Attachment mime type: text/plain → text/html
> U+000A should become a space

Why?  It's not at all a space...
See Hixie's comment in bug 67127#143
Summary: Parse U+000A, "
" and "
" correctly → Parse U+000A, "
" and "
" correctly in attributes
For future reference, "but 67127 comment 143" will get properly linkified... 
that comment just says "we should do something weird".  It doesn't say WHY.

Clear links to the "why" explanation belonged in comment 0.
Excuse me, but isn't this a dupe of bug 47078, which has already been blocking
bug 67127 before this one was even filed?

For the explanation, see bug 47078, or bug 67127 comment 37, bug 67127 comment
51, bug 67127 comment 62...
This is partly covered by bug 47078 but also covers what to do with the
character entity "
Well, as filed this bug (and bug 47078) would change the values of "value"
attributes, which is a good way to break half the web.  So someone better
provide me with good justification not to wontfix them both out of hand as not
feasible given the content that's out there.  Tooltips (bug 67127 comment 62)
are not justification; we could treat the "title" attribute differently from
others, and that would fix tooltips.  The other cited reasons are pretty vacuous
in light of all the much more serious violations of the HTML spec that have to
happen in the parser to handle real-world content....

And before someone suggests standards vs quirks mode, I am highly opposed to
adding yet another ill-tested "standards" codepath in a parser that's
non-standards-compliant by design.
Blocks: 47078
No response; marking wontfix.
Closed: 18 years ago
Resolution: --- → WONTFIX
No longer blocks: 67127
Please reopen.

In XHTML for sure (see XML rec) and very likely in HTML as well (unfortunately I
do not have acces to the SGML spec, but see the comments in #67127) the values
of attributes defined as "CDATA" should not be stripped of beginning/trailing
newlines when they are encoded as entities, like in e.g.

<input type="hidden" value="&#10;&#10;test 3&#10;here&#10;&#10;" name="h2">

What actually happens is that the newline between "test 3" and "here" is kept,
but all others are discarded. When tested with IE5 or IE6, all newlines are kept.

Whitespace, when encoded in CDATA attribute values, should not be stripped (but
raw whitespace, including raw newlines, should be stripped of beginning/end of
values, and IE does not do that). And the first reporter is right if he says
that raw nl/cr/tab anywhere in a CDATA attribute should be replaced by space
(that's what both XML and HTML4.01 state).

This is not, for once, tooltip-related (and thus not a dupe of #67127), and
causes real problems with ECMAscript apps that deal with textareas and
preformatted text stored passed in <input type="hidden"> elements. Had to work
around it with <pre>.
> In XHTML for sure (see XML rec) 

That has nothing to do with this bug.  This is a bug on the tag-soup parser; the
XML parser follows the XML rec.

> the values of attributes defined as "CDATA" should not be stripped of
> beginning/trailing newlines when they are encoded as entities

That's not what this bug was about.
I have just submitted bug 322270 to address just the issue of U+000A's in attribute values, but with the need for an exception for the "value" attribute explicitly recognized.  Comments (or criticisms) are very welcome.
You need to log in before you can comment on or make changes to this bug.