Closed Bug 226815 Opened 21 years ago Closed 21 years ago

xsl:output method='text' should not escape using entities

Categories

(Core :: XSLT, defect)

x86
Windows 2000
defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: manos_lists, Assigned: peterv)

Details

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007

The text output method should be supported properly without escaping by
transformiix or the Moz glue/serialization  code according to [1]:

[[[
The text output method outputs the result tree by outputting the string-value of
every text node in the result tree in document order without any escaping.
]]] 

Additionally, no HTML elements like <html><head></head><body> or <pre
id="transformiixResult"> should be generated to wrap the result since the output
method is text.

[1] http://www.w3.org/TR/xslt#section-Text-Output-Method

Reproducible: Always

Steps to Reproduce:
1. in an XSLT stylesheet, set the oputput method to 'text' and output characters
 like '<' or '&'that are XML escaped using the predefined XML entities

Actual Results:  
In your result, note that the characters are still escaped in the form of
entities. Additionally, the result is wrapped in an XML root element which
itself is wrapped inside HTML elements

Expected Results:  
Instead, the (rendered/serialized) result should only include the unescaped text
generated by the stylesheet without wrapping it in HTML tags.
I think this is a major bug for XSLT as the text output method cannot be used
with expected results.
Attachment #136348 - Attachment mime type: text/xsl → text/xml
tested on 1.4 and trunk, INVALID.
The generated result does indeed look like attachment 136349 [details], the result in 
attachment 136350 [details] is the html serialisation of the generated document.
This wrapping document is compatible with what mozilla does for text documents,
just look at the selection source for 136349, if you mark all.
If you use document inspector on the result of the transformation, you will 
see that the textnode in the document actually does have '<' as required.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
Hi Axel,

The document inspector does show '<' instead of '&lt;' but if I ctrl+A and
context-select "view selection source" I see entities. Also, if I use the JS
interface the result contains entities as well, so I guess the DOM inspector
just renders entities as it renders markup in general...

xsl=document.implementation.createDocument('','',null)
xml=document.implementation.createDocument('','',null)
xsl.load("file:///C:/Sources/QATests/test.xsl")
xml.load("file:///C:/Sources/QATests/test.xml")
p=new XSLTProcessor
p.importStylesheet(xsl)
f=p.transformToFragment(xml, document)
f.firstChild.nodeValue
returns:

		<documentElement attribute="value">
		<element attribute="value">
		<element>
		<element attribute="value">
		<element>
		<element attribute="value">
		<element>
		<documentElement>

I have no idea what you're doing. (Note that I skipped the async loading stuff)
Manos: It's the 'view selection source' code that replaces '<' with '&lt;'. Do
the same thing with the 'what the result should be like' attachment. The
DOM-inspector shows the nodevalue just as it's in the DOM.

transformiix displays textresult the same way that mozilla displays textfiles.
Which is a reasonable thing to do IMHO.

What we should possibly do is to set the mime-type of the result-document to
'text/plain' since that is done for textdocuments
Right, the MIME should be dependant on the xsl:output method.

I am insisting that not everything is quite right though. Axel's example uses
nodeValue, which will not return entities anyway AFAIK. I created an attachment
that prints both the nodeValue and a string serialization of the transformation
result (using an XMLSerializer): the latter is full with entities.
Well doh, XMLSerializer needs to translate to entities, otherwise the resulting
string wouldn't contain valid XML! I have no idea why you would expect this not
to happen, this works as advertized. What exactly are you trying to do?
> XMLSerializer needs to translate to entities, otherwise 
> the resulting string wouldn't contain valid XML!

If the transformation result we feed in the serializer was produced by a
stylesheet, with the output method set to text, it's not supposed to be valid
XML in the first place.

In this case however, it actually is; but the serializer, in his attempt to
"make sure" this is going to be well-formed XML ruins the actual markup which is
 not the smarter thing to do, thats all. IMHO the XML serializer should either
refuse to process the transformation result (which is not XML in the first
place, per the xsl:output method) or not try to be smart in escaping it. 

I'm not saying the implementation does not work as advertized! The reason I use
XMLSerializer is that I dont know the output method of the XSLT in advance...
VERIFIED INVALID.
There is *no* way to make an xml serializer take a DOMNode and not encode the
occuring '<'. You do either that or insert a CDATA section, which is as far
away from what you want as the entity stuff.
DOM does not have text documents, period. So, as we need a DOM to display stuff, 
there is no way to get around this.
As finding out the output method is trivial (at least for text, that has to be
explicitly set), there is no real problem here in the first place.
Status: RESOLVED → VERIFIED
The XMLSerializer takes a DOM Node as input and outputs it as serialized XML, it
does not take a string so I don't understand why you'd expect it to not
translate to entities.
Furthermore, as Axel points out, the DOM doesn't have "text documents" so what
you want will never work in the DOM. There's a reason why we convert the result
of the transformation for output="text" into a textnode with a pre element (with
id transformiixResult) as its parent, it's so you can detect this situation
through the DOM and do the reasonable thing: access the result through
.nodeValue. I don't think there's anything to solve here, you just need to adapt
your code to the fact that you're using a non-serializing processor.
We don't support section 16 in Mozilla, we don't have to and we can't anyway. We
do try to support what we can or adapt it to a non-serializing model. Giving you
access to the resulting string through a nodeValue of a textNode is one of those
adaptations. The alternative is removing support for output="text".
Manos: you still havn't provided an example where textfiles and text-output
differs. What are you trying to do that fails?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: