Closed Bug 547667 Opened 10 years ago Closed 3 years ago

XMLSerializer auto url-encodes href attributes of xhtml links

Categories

(Core :: DOM: Serializers, defect, major)

x86
macOS
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1371010

People

(Reporter: thomas.comiotto, Unassigned)

References

Details

(Keywords: regression)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de-DE; rv:1.9.2) Gecko/20100115 Firefox/3.6
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de-DE; rv:1.9.2) Gecko/20100115 Firefox/3.6

XMLSerializer auto url-encodes href attributes of xhtml links. 

This breaks the serialization of xslt's where attributes can contain expressions e.g. <link href="{href}"/>  

Also see <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=541937">Bug 541937</a> for further details and a discussion on how to fix this.

Reproducible: Always

Steps to Reproduce:
1. Run the attached testcase: Serialization of 
<link xmlns="http://www.w3.org/1999/xhtml" href="{$href}"></link>
Actual Results:  
<link xmlns="http://www.w3.org/1999/xhtml" href="%7B$href%7D"></link>


Expected Results:  
<link xmlns="http://www.w3.org/1999/xhtml" href="{$href}"></link>

Do it the right way.
Attached file XMLSerializer Testcase
The href attribute in a link element contains an URI : http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#dtdentry_xhtml1-strict.dtd_link

An URI should respect RFC 2396 http://www.ietf.org/rfc/rfc2396.txt . This RFC says that "{" and "}" should be escaped. (see page 9 and page 10 in the previous link).

So for me, your issue is invalid, from the point of view of the serializer.
Component: General → Serializers
Product: Firefox → Core
QA Contact: general → dom-to-text
You want to say that (in your view) every XML Serializer in compliance with RFC 2396 will break serializing xslt's that have xhtml link elements with href attribute substitution? 

This might be correct according to RFC 2396 - but I doesn't make sense to me...
How do other implementations do it? I never had these problems with org.apache.xml.serialize for instance.
The issue is related to this change : bug 500937.

So for pure text/xml content, we have to call the XML serializer, not the XHTML serializer (which should be called only for application/xhtml+xml content). Probably we have to revert changes of bug 500937 and find an other solution.
Blocks: 500937
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #3)
> You want to say that (in your view) every XML Serializer in compliance with RFC
> 2396 will break serializing xslt's that have xhtml link elements with href
> attribute substitution? 
Hmm, I think you should use <xsl:attribute ...><xsl:value-of ...> for attributes that explicitly forbid "{".
Sure. Problem is, my extension needs to serialize user defined xslts. Because it's more elegant/compact what users do is href="${var}" and none of the tools I am aware of complain. 

Enforcing RFC 2396 might break reuse of their stylesheets in my extension (wysiwyg-xml-editor) and might break every other extension as well that lets users edit xslts or depends on doctype agnostic xml seralisation.
Keywords: regression
According to current DOM Parsing and Serialization spec (http://domparsing.spec.whatwg.org), url-encoding of attribute values during serialization is definitely forbidden, all attribute values must be serialized as-is:

    The XML serialization of the attributes of an element element is the result of the following algorithm:

    Let result be the empty string.

    For each attribute attr in element attributes, in order, append the following strings to result:
        " " (U+0020 SPACE);
        attr's name;

        escaping / throwing
        "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
        attr's value;

        escaping / throwing
        """ (U+0022 QUOTATION MARK). 
    Return result. 


The spec is reasonable: (1) How can XMLSerializer know which attribute is ought to be a url in an arbitrary XML document? (2) Even if XMLSerializer does known, it should not try to "correct" the url. For example, we are building an XML/HTML editor and using XMLSerializer to save files, XMLSerializer should keep the possibly "wrong" urls as-is and let users correct them themselves. (3) Internationalized Resource Identifiers should not be url-encoded either.

In fact, firefox's HTML/XML serialization implementation is not consistent: for HTML document, elem.innerHTML/outerHTML DON'T url-encode urls, but in XHTML they do; however <object>'s data attribute is NEVER encoded.

Chrome conforms to the spec.

So I urge you to fix this issue.
Duping forward because I have attached patches to the new bug.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1371010
You need to log in before you can comment on or make changes to this bug.