Closed Bug 336551 Opened 18 years ago Closed 12 years ago

E4X: Implement ability to process new XML("<?xml...?> ...")

Categories

(Core :: JavaScript Engine, enhancement)

x86
All
enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: BijuMailList, Unassigned)

References

()

Details

(Keywords: helpwanted)

E4X: Implement ability to process new XML("<?xml...?> ...")

see: bug 321564 comment 9 

I did not see any enhancement request filed
This kinda makes E4X not-very-useful for working with XMLHttpRequest responses, unless the app author does a bunch of string replaces on the input.  Considering that the response will already be parsed twice (once by XMLHttpRequest itself), and now we need to introduce yet another string copy, trying to use E4X is getting to be very expensive memory-wise unless you're dealing with very small xml snippets.
OS: Windows XP → All
Bug 270553 is the way to make E4X work with XMLHttpRequest, not this. IMHO.
(In reply to comment #3)
> Bug 270553 is the way to make E4X work with XMLHttpRequest, not this. IMHO.

It's not a matter of opinion.  Sometimes, you need a DOM parsed from the XML returned by an XHR, other times you don't.  If you don't need a DOM, but you wish to use E4X (ok, it's another DOM in many ways) to manipulate the data, then you want this bug fixed, not the E4X <=> DOM bug.

IOW both bidirectional DOM/E4X mapping and just text/xml deserialization into E4X are valid use-cases.  Fixing the latter by patching this bug is easier.  Both bugs should be fixed.

/be
*** Bug 351371 has been marked as a duplicate of this bug. ***
Jeff, you want this one?  It'll require avoiding default namespace setting via <parent xmlns="..."> </parent> wrapping, I think.

/be
If I am tracking this properly, then you can still use that wrapping tactic, you would just need to exclude the xml declaration from the wrap.
I assume that this bug is filed on Spidermonkey. I've just filed another one on Rhino, which has the same issue.
https://bugzilla.mozilla.org/show_bug.cgi?id=395853
Blocks: e4x
I don't think this is a trivial bug nor simply an enhancement.
Also in my experience the usability of E4X is mainly compromised when constructing XML for XHR, not when working with the response. For a response you do something like this:

var response = http_request.responseText; // bug 270553
response = response.replace('<?xml version="1.0"?>', ""); // bug 336551
var e4x = new XML(response);

It's unsexy but get your DOM allowing you to easily read the response. Constructing the document is different.
Why even use E4X when the main requirement is string concatenation (be it naive or abstracted) and you have to string concatenate with the PI anyway.

brendan's example (think XML):
var html = <html/>;
html.head.title = "My Page Title";
html.body.@bgcolor = "#e4e4e4";
html.body.form.@name = "myform";
html.body.form.@action = "someurl.jss";
html.body.form.@method = "post";
html.body.form.@onclick = "return somejs();";
html.body.form.input[0] = "";
html.body.form.input[0].@name = "test";

Strings!

Dear Mozillians, I'm requesting blocking 1.9. If Mozilla is to be the locomotive of "Web 2.0" this should be addressed soon. More and more extensions use XHR, E4X lubricates the process and this bug makes it unsexy.

I would request blocking 1.8.x if I dared. (I'd love to re-write my extension to use E4X for requests!)

Thank you for your time!
Flags: blocking1.9?
 
> Dear Mozillians, I'm requesting blocking 1.9. If Mozilla is to be the
> locomotive of "Web 2.0" this should be addressed soon. More and more extensions
> use XHR, E4X lubricates the process and this bug makes it unsexy.
> 
> I would request blocking 1.8.x if I dared. (I'd love to re-write my extension
> to use E4X for requests!)
> 
> Thank you for your time!
> 

We'd love to get this fixed but are way past new-feature time for FF3.  There's always a next release!


Flags: blocking1.9? → blocking1.9-
It's next release time now.
This is a very silly bug that pretty much anybody trying to use E4x will trip over and scratch and then shake their head over.

Workaround is easy:
contents = contents.replace(/<\?xml[^>]*\?>/, ""); // Bug 336551 trips over <?xml ... >
but a) you have to know first and b) it's annoying to do this everytime you want to process XML (or to have to create your own wrapper around |new XML()| - in every project you use E4X in).
Simple bug. Can we fix it now?
Flags: blocking1.9.1?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Be careful when fixing this; it may introduce security issues due to the ability to <script src=""> a remote XHTML document which is parsed as E4X. There are more details at:
http://code.google.com/p/doctype/wiki/ArticleE4XSecurity

If supporting XML prolog syntax, keeping <!DOCTYPE... as an error would be prudent as that will continue to prevent many remote XHTML documents parsing as E4X.

Alternatively, fix this bug hand in hand with one of the other proposed E4X security measures:
- Require remote-domain script to serve with the correct Javascript MIME type, otherwise disable E4X.
- Do not support E4X anonymous inline XML initialization at all for remote-domain script.
also see Bug 321685
sounds like might be late in fx 3.5 cycle to do this considering security things that should be looked at.  should it be moved to 3.next?
This is one of *the* two E4X blocking bugs.
E4X is a really cool feature, but this bug is just embarrassing.

Re security, I think Chris, when mentioning the security problem in comment 13, also mentioned good stopgaps for it.
Updating with some recent research on E4X security....

Demo: https://cevans-app.appspot.com/static/e4xtheft.html

This actually steals significant data cross-domain, based on a plausible XML syntax for a webmail XML feed. Here's the XML from within which we steal a bit:

<inbox><mail id="1234"><from>evil@hacker.com</from><subject>{ x = '</subject><body>PWN...</body></mail><mail id="1235"><from>bank@bank.com</from><subject>Super sensitive!</subject><body>New pin: 9976</body></mail><mail id="1236"><subject>' }</subject><body>...ed!!</body></mail></inbox>

As can be seen, the attacker has injected E4X content by simply e-mailing the victim with "interesting" subjects :) This causes the following E4X inline Javascript to evaluate:

{ x = '</subject><body>PWN...</body></mail><mail id="1235"><from>bank@bank.com</from><subject>Super sensitive!</subject><body>New pin: 9976</body></mail><mail id="1236"><subject>' }

Which as you can see pulls some of the XML into an inline assignment of a string to the "x" variable. It works because the repeating XML element structure does not result in mismatches.

A couple of contraints for this to work:
- Newlines will break this attack.
- As per above, an XML prolog will prevent this attack due to an inline XML parse error.
- Any XML emitter that escapes single quotes in XML values will break this attack; but in practice most people don't do this because it is not required to prevent XML injection or any other security threat.
Depends on: 375250
now is a good time to figure out if we are planning to do something for this on the next release.
After seeing concerns in comment #17, I feel it is OK to implement.
   new XML("<?xml ... ?><x><y/></x>");
But not 
   xml = <?xml ... ?><x><y/></x>;
    
or in other words, it is OK to be in the parameter passed to XML() function. but not to be as inline literal. And that solves the important use cases.
Flags: wanted1.9.2?
Flags: blocking1.9.2?
(In reply to comment #17)
> Updating with some recent research on E4X security....
> 
> Demo: https://cevans-app.appspot.com/static/e4xtheft.html

This is defeated in Firefox 3.5:

Error: XML cannot be the whole program
Source File: https://cevans-secure.appspot.com/static/e4xtheft.xml
Line: 2, Column: 31
Source Code:
y>...ed!!</body></mail></inbox> 

See bug 375250, resolved FIXED, on which this bug depends.

/be
If someone attaches a patch, we'll take it.  But, we're not going apply resources to this.  Per bkap.
URL:
Flags: wanted1.9.2?
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Keywords: helpwanted
From private correspondence with Michael Daumling of Adobe, I learned he extended E4X in Flash to allow the top-level production parsed from the string passed to new XML to be an XMLList's children, implicitly bracketed by <> and </>. He must have set default namespace by a private API (which is fine) instead of following ECMA-357's notorious implicit <parent xmlns="..."> ... </parent> wrapping.

Anyone who wants to patch this should try to match what E4X in Flash/Flex does.

/be
On a related note as it might help others:

I have some Javascript in a JS XPCOM component that was doing:

response = response.replace(/^<\?xml\s+version\s*=\s*(["'])[^\1]+\1[^?]*\?>/, ""); // bug 336551

That used to work, but I now get a JavaScript error:

line 701: regular expression too complex

That regex comes from https://developer.mozilla.org/en/e4x

I'm not sure if other people (like add-on developers) are running into this (see http://forums.mozillazine.org/viewtopic.php?f=23&t=1971533)

This might be related to bug #564953

My apologies for not having a regression range.
Thanks for the additional info, Seth. If the old RegExp still does not work on Mozilla/Firefox trunk, please report it as a new bug, but I think what you're seeing may be a remnant from PCRE, which has since been removed.

In the meantime, marking this bug as WNF. Please file a new bug for contributed E4X enhancements, per comment 21.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
REOPEN. We don't close bugs as WONTFIX just because you don't want to commit resources, but leave them open for contributors.

This bug is critical for E4X, as pretty much every user of E4X runs into this and has to use the above posted workaround. (I agree this bug is not about the workaround breaking, but the fact was worth noting it here.)
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I think we should WONTFIX it because every calorie directed towards E4X is a calorie misdirected.  I don't think it's "critical for E4X" because it has never been supported, and there are (perhaps unfortunately) users of E4X.

You could always seek to change the standard to permit this, of course, if you could find enough interest in the world to touch it.
(In reply to comment #27)

Meh, same argument for calories per languishing resolution. Was just trying to do a little cleanup while I was in here responding. ;-)
> there are (perhaps unfortunately) users of E4X.

And *all* of them run into this bug, and use the workaround. (And that workaround reportedly broke now.)
E4X is being phased out now, so it's definitely won't fix.
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.