Open Bug 640400 Opened 13 years ago Updated 2 years ago

<pre> generated by XSLT seems to be handled the same as <xhtml:pre>

Categories

(Core :: XSLT, defect)

x86
Windows 7
defect

Tracking

()

People

(Reporter: julian.reschke, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0

It appears than HTML generated through XSLT is actually treated like XHTML with respect to treating initial blank lines in <pre>.

Reproducible: Always

Steps to Reproduce:
1. Visit http://greenbytes.de/tech/webdav/rfc5988.xml#rfc.section.5.5
2. Compare to http://greenbytes.de/tech/webdav/rfc5988.html#rfc.section.5.5

Actual Results:  
The examples show up with a leading blank line.

Expected Results:  
The examples shouldn't have a leading blank line (compare to static HTML output).
I can reproduce this bug on:
 Mozilla/5.0 (Windows NT 6.1; rv:2.0b13pre) Gecko/20110309 Firefox/4.0b13pre
When I raised this I assumed it was new behavior, but it seems 3.6.* behaves the same. (So no regression).
Is this the same issue as in Bug 640390. Then please mark it as a duplicate
(In reply to comment #3)
> Is this the same issue as in Bug 640390. Then please mark it as a duplicate

No, it's not.
This is not a regression from any HTML5 work: This happens even in 3.5.17.

This is clearly a bug, though, since Firefox fails to emulate the "html" output mode and behaves differently from Chrome, Opera and IE.

Julian, filing XSLT bugs in Core: XSLT has a higher probability of the right people seeing the bugs compared to Firefox: General.
Status: UNCONFIRMED → NEW
Component: General → XSLT
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → xslt
Version: unspecified → Trunk
Chrome, Opera, and IE all generate a string from XSLT and parse it, as I recall.

Gecko generates a DOM directly, so there's never an HTML parser involved here.

As far as I can tell, both behaviors are perfectly correct per the XSLT specs.  They do give different results in some cases, especially when the string is parsed with a non-XML parser or when the DOM that the XSLT creates can't be represented in serialized form.

You should be able to test this theory by creating HTML that has things like <table><form><tr></tr></form></table> in your XLST and seeing what the result looks like in the various browsers.

We don't particularly want to switch to the "generate a string and parse" approach here, last I checked.
(In reply to comment #6)
> Chrome, Opera, and IE all generate a string from XSLT and parse it, as I
> recall.
> ...

I'm not sure this is correct for IE...

> We don't particularly want to switch to the "generate a string and parse"
> approach here, last I checked.

Understood and agreed.

Nevertheless, in edge cases like this one, it does make a difference what the xslt:output method of the XSLT was. I assume when parsing HTML vs XHTML, the different behavior for <pre> happens at parse time, right? In that case it seems that the XSLT engine in Mozilla could special-case the creation of the <pre> node in the DOM based on the output method. (Just thinking load).
So let me summarize to make sure I've understood things correctly:

The HTML parser (both old and new) drops the newline after the start <pre> tag (not element). Thus it never showed up in the DOM and never got rendered.

With XSLT generated pages, no parsing happens and so no newlines are dropped. Thus a newline in the beginning of a generated pre-element will appear in the DOM and will be rendered.

Does this sound correct?


Like Boris says, there are lots and lots of cases when serializing and reparsing a document will yield a different result. So far the only case where we tweak the output to compensate for this difference is that we add a tbody-element around the tr-elements inside a table-element. We only do this for HTML output mode. We originally did this since back in the days the table rendering code would not work properly without such an element.

However any time we do this it creates weird edge cases. What should happen if a page passes a HTML document through a identity-like transform using the XSLTProcessor DOM API? Should that drop any newlines appearing at the start of the contents of any pre-elements, even though the document had potentially already passed through a HTML parser and thus had the relevant newline removed? What if you do this repeatedly? Should that remove a newline every time?

Yes, this problem does exist with the tbody-insertion mentioned above. Though back in the day it fixed a bigger problem than it created. Possibly it's something we could remove now. And at least it doesn't produce progressively more mutated DOMs if you pass the same document through an identity transform multiple times.


Though really, IMHO serializers should add a newline after an opening <pre> tag as to avoid round tripping issues. If they did, then this bug would be moot as safari et al. would produce the same result as firefox since the only newline they would drop during parsing would be the one added during "output" (i.e. serializing)
(In reply to comment #8)
> So let me summarize to make sure I've understood things correctly:
> 
> The HTML parser (both old and new) drops the newline after the start <pre> tag
> (not element). Thus it never showed up in the DOM and never got rendered.
> 
> With XSLT generated pages, no parsing happens and so no newlines are dropped.
> Thus a newline in the beginning of a generated pre-element will appear in the
> DOM and will be rendered.
> 
> Does this sound correct?

I haven't looked at the code; just observing the behavior. What you say sounds plausible.

> Like Boris says, there are lots and lots of cases when serializing and
> reparsing a document will yield a different result. So far the only case where
> we tweak the output to compensate for this difference is that we add a
> tbody-element around the tr-elements inside a table-element. We only do this
> for HTML output mode. We originally did this since back in the days the table
> rendering code would not work properly without such an element.

Interesting; wasn't aware of that.
 
> However any time we do this it creates weird edge cases. What should happen if
> a page passes a HTML document through a identity-like transform using the
> XSLTProcessor DOM API? Should that drop any newlines appearing at the start of
> the contents of any pre-elements, even though the document had potentially
> already passed through a HTML parser and thus had the relevant newline removed?
> What if you do this repeatedly? Should that remove a newline every time?

That's a good point; on the other hand, if a hack was added maybe it could detect that situation as well.

> Yes, this problem does exist with the tbody-insertion mentioned above. Though
> back in the day it fixed a bigger problem than it created. Possibly it's
> something we could remove now. And at least it doesn't produce progressively
> more mutated DOMs if you pass the same document through an identity transform
> multiple times.
> 
> 
> Though really, IMHO serializers should add a newline after an opening <pre> tag
> as to avoid round tripping issues. If they did, then this bug would be moot as
> safari et al. would produce the same result as firefox since the only newline
> they would drop during parsing would be the one added during "output" (i.e.
> serializing)

In retrospective, that probably would have been good. But I don't see XSLT *1* changing at this point.

If somebody defines a new HTML5 output method for XSLT, this probably should be added.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.