Last Comment Bug 169521 - newline in XML attributes should be serialized as 

: newline in XML attributes should be serialized as 

Status: NEW
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: All All
-- normal with 2 votes (vote)
: ---
Assigned To: Heikki Toivonen (remove -bugzilla when emailing directly)
: Andrew Overholt [:overholt]
Depends on:
  Show dependency treegraph
Reported: 2002-09-18 12:53 PDT by Heribert Schuetz
Modified: 2014-09-29 02:55 PDT (History)
2 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Description User image Heribert Schuetz 2002-09-18 12:53:17 PDT
An attribute value that would be written as "asdf\nqwer" in C or JavaScript
should be serialized to "asdf
qwer" in XML so that an XML parser that
conforms to section 3.3.3 of the XML-1.0 spec
( will normalize it to "asdf\nqwer"
again. Currently the XML serializer writes out the newline character without
escaping, so a conforming parser normalizes the attribute value to "asdf qwer".

The same applies to tab and carriage-return.

I think it should suffice to set kAttrEntities[10] to "
" in
Comment 1 User image Heribert Schuetz 2002-09-18 13:17:11 PDT
Unfortunately the work-around to convert the newline to "
" already in the
DOM before serializing does not work. Here the serializer is smart and replaces
the ampersand by "&".
Comment 2 User image ondra zara 2014-09-29 02:13:34 PDT
Nice, a 12-year old bug! And still unresolved.

Added a testcase to show what is happening:

Turns out that a literal newline in attribute value is parsed okay *as long as the parser operates in text/html mode*. For application/xml documents, the literal newline is converted and normalized to a space, as per the original post (and spec).

So clearly we need to explain to the XMLSerializer that it should escape newlines, because the result it not text/html, but rather application/xml.
Comment 3 User image Simon Pieters 2014-09-29 02:55:49 PDT
This is indeed a bug. U+000A and U+0009 should be escaped in XML. (I think it's fine to not escape U+000D, since HTML serializer doesn't escape it and nobody complains about that.)

Note You need to log in before you can comment on or make changes to this bug.