Closed Bug 228099 Opened 17 years ago Closed 17 years ago
Parse U+000A, " " and "&#x
A;" correctly in attributes
In title="" and possibly other places U+000A, " " and "
" get parsed as newlines, U+000A should become a space and " " or "
" should become a newline.
I think this demonstrates this problem and also a similar problem with tabs. For previous discussions see bug 67127
> U+000A should become a space Why? It's not at all a space...
See Hixie's comment in bug 67127#143
Summary: Parse U+000A, " " and "
" correctly → Parse U+000A, " " and "
" correctly in attributes
For future reference, "but 67127 comment 143" will get properly linkified... that comment just says "we should do something weird". It doesn't say WHY. Clear links to the "why" explanation belonged in comment 0.
Excuse me, but isn't this a dupe of bug 47078, which has already been blocking bug 67127 before this one was even filed? For the explanation, see bug 47078, or bug 67127 comment 37, bug 67127 comment 51, bug 67127 comment 62...
This is partly covered by bug 47078 but also covers what to do with the character entity " "
Well, as filed this bug (and bug 47078) would change the values of "value" attributes, which is a good way to break half the web. So someone better provide me with good justification not to wontfix them both out of hand as not feasible given the content that's out there. Tooltips (bug 67127 comment 62) are not justification; we could treat the "title" attribute differently from others, and that would fix tooltips. The other cited reasons are pretty vacuous in light of all the much more serious violations of the HTML spec that have to happen in the parser to handle real-world content.... And before someone suggests standards vs quirks mode, I am highly opposed to adding yet another ill-tested "standards" codepath in a parser that's non-standards-compliant by design.
No response; marking wontfix.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Please reopen. In XHTML for sure (see XML rec) and very likely in HTML as well (unfortunately I do not have acces to the SGML spec, but see the comments in #67127) the values of attributes defined as "CDATA" should not be stripped of beginning/trailing newlines when they are encoded as entities, like in e.g. <input type="hidden" value=" test 3 here " name="h2"> What actually happens is that the newline between "test 3" and "here" is kept, but all others are discarded. When tested with IE5 or IE6, all newlines are kept. Whitespace, when encoded in CDATA attribute values, should not be stripped (but raw whitespace, including raw newlines, should be stripped of beginning/end of values, and IE does not do that). And the first reporter is right if he says that raw nl/cr/tab anywhere in a CDATA attribute should be replaced by space (that's what both XML and HTML4.01 state). This is not, for once, tooltip-related (and thus not a dupe of #67127), and causes real problems with ECMAscript apps that deal with textareas and preformatted text stored passed in <input type="hidden"> elements. Had to work around it with <pre>.
> In XHTML for sure (see XML rec) That has nothing to do with this bug. This is a bug on the tag-soup parser; the XML parser follows the XML rec. > the values of attributes defined as "CDATA" should not be stripped of > beginning/trailing newlines when they are encoded as entities That's not what this bug was about.
I have just submitted bug 322270 to address just the issue of U+000A's in attribute values, but with the need for an exception for the "value" attribute explicitly recognized. Comments (or criticisms) are very welcome.
You need to log in before you can comment on or make changes to this bug.