Closed Bug 153281 Opened 22 years ago Closed 3 years ago

No way to reach original, untransformed XML document from DOM

Categories

(Core :: DOM: Core & HTML, enhancement, P5)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: WeirdAl, Unassigned)

References

()

Details

(Whiteboard: XSLT)

A friend and I are able to reach an XML document transformed by XSLT in the
post-transformation form.  However, we are having a great deal of trouble
reaching them pre-transformation form of the document by DOM.

For instance, document.styleSheets.length == 0.  This despite the fact that the
second line in the XML document is an <?xml-stylesheet ?> PI.

View Source gives us the source document, untransformed.  DOM Inspector sees
only the transformed document.

I have considered filing this bug as an RFE to add an "originalDocument"
property of the Document node, as another Document node reflecting the
untransformed document.  I do not believe this is the correct solution, given
that the count of stylesheets is incorrect.

Testcase:  Open the above URI.  Check document.styleSheets.length.

Actual results:  document.styleSheets.length == 0.

Expected results:  document.styleSheets.length == 1, or some reference to an
untransformed XML document existing with that document having a
styleSheets.length == 1.
document refers to the result document, which is what most use cases of js need.
Exposing the original document is a RFE, and I'd say this is rather a Mozilla DOM
extension than anything else.
I am not sure, what kind of nsIDOMStyleSheet implementation we should have for
XSLT, btw.
What would we do on dynamic stylesheet removal? Heikki had issues with not 
transforming the source document, so should we do that?
Hrm. I guess nobody tried to add a XSLT stylesheet PI via js yet.

If we do it, I'd expect the originalDocument to be hooked up to the window 
object. I doubt we should expose it for documents that have been transformed
by js. Use cases like the p3p viewer leave me puzzled.

Not sure if we really wanna do that, putting more folks on CC.
Heikki for the stylesheet adding and removing stuff, vidur as he's co-author of 
the DOM Style spec.
(I keep QA and owner as is, as that'll end up on peterv anyway, the way the world
turns)

Lots of issues.
Severity: normal → enhancement
Component: XSLT → DOM Mozilla Extensions
OS: Windows 98 → All
Hardware: PC → All
Whiteboard: XSLT
I really think you should implement that. If there is not standard to access the
original doc, then create one. I can see tons of cases where this would be
needed. For example, I add an XSLT stylesheet for the case that somebody opens
that document in a browser, but the primary use is to get the information out of
the original doc via DOM and JS for some fancy webapp. Or I have several uses
for the same document (maybe in different browsers or in different apps), and in
one case, I need the transformed doc and in one case, I need the untransformed one.
FYI, I've filed bug 153799 for a crash bug caused when I click on the URI above.
I personally like the idea of grabbing the original, unmodified document's DOM, 
at least from a perspective of data (not rendering) and the DOM.  Sometimes we 
might be able to garner data from the original document which the XSLT 
stylesheet has, deliberately or otherwise, obscured.  Such data might be more 
efficiently accessible through the original document's DOM.
Alas, this bug's URL is dead.
Thanks to bz for advice on where to look.  New URL field reflecting the file in LXR.

I still want to see this implemented, but I just realized that we may pay a
heavy price for this one extension.  As I see it, we have three options.

The first is to make the original document available as DOM document node.  If
we go that route, that will result in a sizable increase in memory usage for a
feature.  Despite the enthusiasm of comment 3, this makes me think twice.

The second is to instead store the text of the original document, and have the
application developer call for a DOMParser on, say,
window.originalDocumentSource.  Though this would use much less memory, bz says
this might be much harder to implement.

The third is WONTFIX.  I really do not want to see that happen, but if the first
and second options are unsavory to DOM module owners, then that's all she wrote.

I'm willing to do the legwork on this one.  I'd like module owners to advise me
on which route they prefer.
I must admit I don't see a huge need for this. If you need access to some data
that is in the source document bring it over to the result documet while
creating it. I.e. let the XSLT copy over all data you need.

For the usecase in comment 2 when you primarily are using a document as data but
want to be able to display it if it's opened in a browser just put a PI at the
top of it. Then we'll apply the stylesheet if the document is displayed, but we
won't if you load it from js.

Though I'm a little confused if comments 0 to 2 is asking for more. It seems
like they're talking about dynamic modifications to the source document would
trigger automatic retransforms to create new source documents. Please take that
in a new bug, although it does sound a little like bug 18722.


Anyhow, back to the RFE in the summary

As a convinience (or possibly purity) feature I can see uses of this. But as
previous comment stated, keeping around a DOM that is very likly not to be used
is a waste of memory. One way to possibly implement this without using more
memory though would be to drop the DOM but still keep a handle to the cached
datastream. Then when someone requests the original DOM reparse it using the
cache data. The problem is what to do if the data is removed from the cache, we
could download it again, but that won't work for POST documents and is in
general pretty evil.

Another solution is to have some way in the XSLT to state that you want access
to the original document. For example something like <xsl:output
moz:keep-source-document="true" .../>. I think this is the best solution from a
mozilla perspective, but it's nonstandard with small chance of getting support
from any other browser.
jst's opinion, per #developers, is as follows:

(1) This is not a high-priority bug.  But if you want to work on it, go for it
and good luck.
(2) Does any other software product implement something like this?  If so, we
should mimic their behavior as closely as is practical. ("where it makes sense to")

As I said before, I'm willing to work on an implementation, but don't expect me
to submit a patch for review anytime soon.  I'm taking this bug and cc'ing
peterv, but this bug may still end up a WONTFIX.
Assignee: peterv → ajvincent
It seems IE is using

windows.document.XMLDocument : XMLDocument

and

windows.document.XSLDocument : XMLDocument

based on the code for "	
Internet Explorer Tools for Validating XML and Viewing XSLT Output" found at
http://www.microsoft.com/downloads/details.aspx?FamilyId=D23C1D2C-1571-4D61-BDA8-ADF9F6849DF9&displaylang=en
According to the DOM inspector, the original XML (in text form) can be accessed
from the textContent property in the view-source: URI conversion.

For example, the following code:

document.location = "view-source:test.xml"
alert(document.documentElement.textContent)

should pop up a text box with the original XML content.  Feeding this through
the parseFromString() should produce the XML DOM.  Is this something that could
be wrapped into a convenience function which would only be generated on demand?
Unfortunatly that won't work reliably. First off it relies on caching. If the
url does not happen to be cached for some reason (too big, accessed through
https, etc) the page will be redownloaded and you might end up with a different
document.

Second, it won't work for pages that are generated from POST since those aren't
indentified soly by their URL.

And I don't think that anyone is interested in using a js-function that just
works 'sometimes'.

As i've stated before. If you're interested in the original document just copy
it over to the result document. It'll just take a single xslt instruction:

<xsl:copy-of select="/"/>
Blocks: 305121
*** Bug 305121 has been marked as a duplicate of this bug. ***
After four years, I'm moving this bug back to DOM default owner.
Assignee: ajvincent → general
QA Contact: keith → ian
Assignee: general → nobody
QA Contact: ian → general
Solution I took based on
http://www.w3schools.com/xsl/tryit.asp?filename=cdcatalog
is to do it from an HTML page, using JavaScript to:
1. Load the XML (pre-transformation), save it into a global variable
2. Load the XSL
3. Put both into the xsltProcessor to get the post-transformation form

Now you can do what you want with the post-transformation form (eg: print it), and because of step 1, you still have the pre-transformation form in that global variable.
Component: DOM: Mozilla Extensions → DOM
(In reply to Jonas Sicking (:sicking) from comment #11)
> As i've stated before. If you're interested in the original document just
> copy
> it over to the result document. It'll just take a single xslt instruction:
> 
> <xsl:copy-of select="/"/>

FWIW, I tried to follow this suggestion.  I added 
<div id="sourcexml" style="display:none;">
	<xsl:copy-of select="/" />
</div>
to my xslt, and (according to xsltproc, and when inspecting with Google Chrome) do get a copy of the original xml in the resulting XHTML document (albeit with xmlns="", because, like most use cases I expect, I was using a default namespace for the XML),  Firefox 22 does not show even the parent "<div>" in the Inspector.  If I assign the original XML a namespace, then Firefox will accept it (with that namespace) into the XHTML (according to Inspector), but then (AFAICT) I would need to modify all my stylesheets to include the namespace prefix in all node references of any subsequent transform.  Hardly "a single xslt instruction".  I stopped there.

The goal was simply to get a page transformed from the original XML, then have elements in the resulting page be modified with javascript using the page's original source XML, a transformToFragment and a replaceChild without the performance penalty of having to go back and re-fetch the page XML again with a XMLHttpRequest.open

I ended up with the ugly hack of pre-loading by putting the XMLHttpRequest.open in an "onload" function and adding:
    ExpiresByType application/xhtml+xml "access plus 5 seconds"
to the Apache configuration and moving on...
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven't been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Component: DOM → DOM: Core & HTML

We're not going to add new features to our XSLT support.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.