Rhino E4X impl. does not accept XML prolog <?xml ?> and others meta declarations

UNCONFIRMED
Unassigned

Status

Rhino
E4X
UNCONFIRMED
11 years ago
6 years ago

People

(Reporter: Jiri Kopsa, Unassigned)

Tracking

Details

(Reporter)

Description

11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Build Identifier: 1_6R7

Creating an instance of XML object from a string that contains <?xml ... ?> prolog results in an exception. The same problem applies to other XML meta declarations.

Reproducible: Always

Steps to Reproduce:
context.newObject(scope,"XML", new Object[]{xmlString});
Actual Results:  
org.mozilla.javascript.EcmaError: TypeError: Cannot parse XML: The processing instruction target matching "[xX][mM][lL]" is not allowed.
        at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3350)
        at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3340)
        at org.mozilla.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3356)
        at org.mozilla.javascript.xmlimpl.XMLLibImpl.parse(XMLLibImpl.java:411)
        at org.mozilla.javascript.xmlimpl.XMLLibImpl.ecmaToXml(XMLLibImpl.java:433)
        at org.mozilla.javascript.xmlimpl.XMLObjectImpl.ecmaToXml(XMLObjectImpl.java:779)
        at org.mozilla.javascript.xmlimpl.XML.jsConstructor(XML.java:250)
        at org.mozilla.javascript.xmlimpl.XMLObjectImpl.execIdCall(XMLObjectImpl.java:570)
        at org.mozilla.javascript.IdFunctionObject.call(IdFunctionObject.java:127)
        at org.mozilla.javascript.BaseFunction.construct(BaseFunction.java:313)
        at org.mozilla.javascript.Context.newObject(Context.java:1514)
(Reporter)

Comment 1

11 years ago
Also note that similar bug was filed on "core JavaScript engine" product/component. Not sure if one or two bugs (Rhino & Spidemonkey) is needed.
(Reporter)

Comment 2

11 years ago
Updating the summary; it somehow got messed up. Sorry.
Summary: E4 → Rhino E4X impl. does not accept XML prolog <?xml ?> and others meta declarations

Comment 3

11 years ago
this is conformant with the e4x specification. if you look at the grammar for an XML initializer, you'll see it doesn't allow for XML prologues or entities or anything of that sort.
(Reporter)

Comment 4

11 years ago
Could I get some pointers to the relevant parts of spec, please? I don't have huge visibility into it, but I noticed that it does mention XML processing instructions on a few places (e.g. 13.4.3.3 XML.ignoreProcessingInstructions, or the XMLPI entity in the grammar itself).

Also notice related bug #336551 on Spidermonkey engine, which is mentioned on the Mozilla E4X home page:
http://developer.mozilla.org/en/docs/E4X

Comment 5

11 years ago
Section 11.1.4 describes an XML literal like this:

An XML initialiser is an expression describing the initialization of an XML object, written in a form of a literal. It may specify an XML element, an XML comment, an XML PI, or a CDATA section using ordinary XML syntax. For XML elements, it provides the name, XML attributes and XML properties of an XML object.

Section 8.3 and 11.1.4 combine to describe the grammar for XML content, both used as a literal and as passed to the XML constructor. Specifically:

XMLMarkup ::
  XMLComment
  XMLCDATA
  XMLPI
XMLTagCharacters ::
  SourceCharacters but no embedded XMLTagPunctuator
    or left-curly { or quote ' or double-quote " or forward-slash / or XMLWhitespaceCharacter
XMLWhitespaceCharacter ::
  <SP>
  <TAB>
  <CR>
  <LF>
XMLWhitespace ::
  XMLWhitespaceCharacter
  XMLWhitespace XMLWhitespaceCharacter
XMLText ::
  SourceCharacters but no embedded left-curly { or less-than <
XMLName ::
  XMLNameStart
  XMLName XMLNamePart
XMLNameStart ::
  UnicodeLetter
  underscore _
  colon :
XMLNamePart ::
  UnicodeLetter
  UnicodeDigit
  period .
  hyphen -
  underscore _ colon :
XMLComment ::
  <!-- XMLCommentCharactersopt -->
XMLCommentCharacters ::
  SourceCharacters but no embedded sequence --
XMLCDATA ::
  <![CDATA[ XMLCDATACharactersopt ]]>
XMLCDATACharacters ::
  SourceCharacters but no embedded sequence ]]>
XMLPI ::
  <? XMLPICharactersopt ?>
XMLPICharacters ::
  SourceCharacters but no embedded sequence ?>
XMLAttributeValue::
  " XMLDoubleStringCharactersopt "
  ' XMLSingleStringCharactersopt '
XMLDoubleStringCharacters ::
  SourceCharacters but no embedded double-quote "
XMLSingleStringCharacters ::
  SourceCharacters but no embedded single-quote '
SourceCharacters ::
  SourceCharacter SourceCharactersopt
XMLTagPunctuator :: one of
  =
  >
  />

XMLInitialiser :
  XMLMarkup
  XMLElement
XMLElement :
  < XMLTagContent XMLWhitespaceopt />
  < XMLTagContent XMLWhitespaceopt > XMLElementContentopt
      </ XMLTagName XMLWhitespaceopt >
XMLTagContent :
  XMLTagName XMLAttributesopt
XMLTagName :
  { Expression }
  XMLName
XMLAttributes :
  XMLWhitespace { Expression }
  XMLAttributeopt XMLAttributes
XMLAttribute :
  XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt { Expression }
  XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt XMLAttributeValue
XMLElementContent :
  { Expression } XMLElementContentopt
  XMLMarkup XMLElementContentopt
  XMLText XMLElementContentopt
  XMLElement XMLElementContentopt

There's also a notion of constructing an XML node from a "W3C Information Item", which I believe is a W3C DOM node, but the spec doesn't really cover it. When using the XML constructor on a string, as described in section 10.3.1, the argument is wrapped in <parent xmlns="defaultNamespace"> </parent> where defaultNamespace is the default namespace uri. since <?xml?> is allowed only at the start of the document, this is why the xerces parser is generating the exception you note.

If it's any comfort, I hit this exact same problem 2 months ago when we upgraded to the latest xerces and it started throwing that exception. That's when I wandered down this trail to the spec to conclude that E4X doesn't actually handle manipulating full XML documents.
(Reporter)

Comment 6

11 years ago
Thanks Adam for your comprehensive response. I can see that the pseudocode in 10.3.1. suggests that the processing instructions may cause problems in the XML(String) constructor.

Still I'm a bit confused. Isn't the creation of an XML object from a String a primary use case in the REST/AJAX application scenario? Are developers really expected to use workaround suggested on the Mozilla E4X home page (i.e. replace text). If so, it sounds like a bug on E4X spec (a feature gap). I wonder what are yours (and community's) thoughts on this topic.

It seems suggested feature is assumed or requested by others.
https://bugzilla.mozilla.org/show_bug.cgi?id=336551

Anybody knows what's the roadmap for this "bug" and related one on spidermonkey?

Comment 7

11 years ago
Oh, I completely agree that it's a bug or feature gap in the specification. I have no idea what they were thinking that the spec doesn't support full XML documents; I think that's lame and goofy.

Comment 8

11 years ago
by the way, the particular error you (and I) are seeing comes because the rhino e4x support does exactly as the spec suggests: it wraps the document in a <parent> element, which is why xerces is complaining that one cannot have an xml PI in the document, since that's what the <?xml?> prologue then looks like.

Comment 9

10 years ago
Not to be nitpicking, but <?xml ...?> is actually part of the text declaration of an external entity; it's either a TextDecl or a XMLDecl, and is not part of the entity's replacement text (the document itself is an entity). As such, it belongs to the physical structure of the text serialization of an XML infoset structure, and it doesn't belong to its logical structure. 

Since E4X is manipulating the logical structure, it strikes me as being actually correct that it doesn't deal with the <?xml ...?> declaration.

Comment 10

10 years ago
while that's certainly true, it seems not a stretch to say that the xml literal is the text serialization of an XML infoset structure, and as such, the e4x parser ought to be able to handle things like <?xml ?>. you'll note that no one is expecting to be able to actually examine the <?xml ?> prologue through the primitives that e4x provides.

Comment 11

6 years ago
Just ran into this bug in Alfresco (which uses the Rhino JS API to perform server-side Javsacript execution), with the following XML processing instruction:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

I think valid workaround would include ignoring the XML processing instructions before wrapping it in a <parent> tag, or searching for them (since they will be at the start of the file) and moving them to the start of the new XML document.
You need to log in before you can comment on or make changes to this bug.