Closed Bug 246441 (e4x) Opened 20 years ago Closed 11 years ago

Implement E4X in SpiderMonkey

Categories

(Core :: JavaScript Engine, enhancement, P1)

enhancement

Tracking

()

RESOLVED WONTFIX
mozilla1.8beta2

People

(Reporter: brendan, Unassigned)

References

()

Details

(Keywords: js1.5)

Attachments

(12 files, 17 obsolete files)

424.34 KB, patch
Details | Diff | Splinter Review
116.03 KB, application/x-gzip
Details
2.22 KB, patch
shaver
: superreview+
brendan
: approval1.8a5+
Details | Diff | Splinter Review
58.28 KB, patch
Details | Diff | Splinter Review
65 bytes, application/x-javascript
Details
19.60 KB, patch
Details | Diff | Splinter Review
6.56 KB, patch
Details | Diff | Splinter Review
142 bytes, text/plain
Details
14.79 KB, text/plain
Details
3.61 KB, patch
Details | Diff | Splinter Review
2.17 KB, patch
shaver
: review+
Details | Diff | Splinter Review
1.46 KB, patch
Details | Diff | Splinter Review
We should support ECMAScript for XML (E4X), which is a draft ECMA standard.

/be
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.8alpha2
I'll try to pry it loose tomorrow at the ECMA TG1 meeting.

/be
http://lambda.weblogs.com/discuss/msgReader$6380 has some good feedback on the
E4X spec, much of which I agree with.  Maybe we can remedy some of them in the
JS24X stuff?
Depends on: 40757
But @a::b, *, etc. work.

I'm compiling a big E4X errata sheet; I'll mail that to the right folks next
week.

/be
Attached patch checkpoint (obsolete) — — Splinter Review
Attachment #156645 - Attachment is obsolete: true
Target Milestone: mozilla1.8alpha2 → mozilla1.8alpha4
Alias: e4x
Attachment #156860 - Attachment is obsolete: true
Attached file Errata so far (obsolete) —
Possibly some of these claimed errata are erroneous -- there sure are a lot. 
The spec seems like a somewhat buggy translation of slightly buggy Java code,
which is unfortunate, but possibly better than a spec based on no
implementation!

/be
Another erata:

In E4X specs the section 11.5.1 The Abstract Equality Comparison Algorithm states:

   ...
   3. If Type(x) is the same as Type(y)
   ...
        c. If Type(x) is Object and x.[[Class]] == "Namespace", return the
results of the comparison x.uri == y.uri
    
That should be changed to 
        c. If Type(x) is Object and x.[[Class]] == "Namespace" and y.[[Class]]
== "Namespace", return the results of the comparison x.uri == y.uri

A similar treatment should be applied to QName part.


Note also that the algorithm can be simplified if [[Equals]] would be defined to
return triple true|false|null where null indicates that the object does not know
how to do the equality. Then the method can be defined for Namespace and QName
to return null if the argument is not Namespace/QName or do the proper job and
similarly for XML.

Then 1-2-3-4 in the algorithm can be replaced by:

1. If x implements [[Equals]],
  a. Call x.[[Equals]](y)
  b. If result of a is not null, return it
2. If y implements [[Equals]],
  a. Call y.[[Equals]](x)
  b. If result of a is not null, return it

Sorry for sounding harsher than I intended in comment 7.  Consider that JS was
more than slightly buggy in the old days, and ECMA standardization of it
involved at least two other implementations.  I think E4X will benefit from
other attempts to implement the spec, and I hope the errata here are helpful.

/be
A note on a pretty minor glitch in spec, that I'd like to share here (I'll roll
it into the complete errata file, which needs to be HTML-ized and checked for
errors):

11.1.4 The sub-grammar here allows tag content of the form <T A={e1}{e2}>,
       but ExcapeAttributeValue is called only on the result of evaluating e1,
       not e2, and unless e2 evaluates to the empty string, the resulting
       string concatenation will not be well-formed XML because only the e1
       result is escaped and quoted.                                           
            
                         
       The spec clearly intends that one can have exactly one {expr} in lieu
       of a properly quoted and escaped attribute value, but the cover grammar
       does not require this, and it's up to the XML parser to try to catch
       such errors.
                                                                                
       This simplifies the E4X spec, but it deserves at least an informative
       comment, if not a tighter cover grammar.  Implementations where the XML
       parsing is built into the JS parser can be simplified based on the
       restriction, and we don't want to court interop bugs where an {e2}
       that evaluates to the empty string is allowed in one implementation but
       not in another.

I remember bugging John about the cover grammar issue just before ECMA-357 was
finalized, but I didn't have time to find this quirk, or ambiguity.  I should
have seen it then; better late than never.

/be
Attached file compressed checkpoint patch, landing today (obsolete) —
Need to fix a few more bugs, test again, and land.  Current plan is to avoid a
branch by landing on the trunk, configured off.

/be
Attachment #160708 - Attachment is obsolete: true
Checked into cvs.mozilla.org trunk, configured off (via jsconfig.h).  See the
TODO comment near the top of jsxml.c.

/be
Attached file latest errata file (obsolete) —
I included the erratum Igor gave in comment 8.	Feel free to add more to this
bug, and I'll roll them up.

/be
Attachment #160709 - Attachment is obsolete: true
Here is another errata:

ECMA 357, 11.1 Primary Expressions
...
Semantics
The production PrimaryExpression : PropertyIdentifier is evaluated as follows:

1. Let n be the result of evaluating PropertyIdentifier
2. Let name = ToXMLName(n)
3. While (true)
  a. If there are no more objects on the scope chain,
    i. Return undefined
  ^^^^^^^^^^^^^^^^^^^^^^^^
  b. Let o be the next object on the scope chain.
     NOTE: on the first iteration, o will be the first object on the scope chain
  c. If Type(o)  {XML, XMLList}
    i. Let hasProp be the result of calling the [[HasProperty]] method of o,
passing name as the property
    ii. If hasProp == true
      1. Return a value of type Reference whose base object is o and whose
property name is name

The algorithm should not return undefined for unbound name. Rather in this case
it should return a value of type Reference whose base object is the first XML or
XMLList object found on the scope chain or null otherwise and whose property
name is <i>name</i> so the algorithm should read:

1. Let n be the result of evaluating PropertyIdentifier
2. Let name = ToXMLName(n)
3. Let firstXML = null
4. While (true)
  a. If there are no more objects on the scope chain,
    i. Return a value of type Reference whose base object is firstXML and whose
property name is name. 
      NOTE: firstXML will be null if no XML or XMLList value was found on the
scope chain.
  b. Let o be the next object on the scope chain.
     NOTE: on the first iteration, o will be the first object on the scope chain
  c. If Type(o)  {XML, XMLList}
    i. Let hasProp be the result of calling the [[HasProperty]] method of o,
passing name as the property
    ii. If hasProp == true
      1. Return a value of type Reference whose base object is o and whose
property name is name
    iii. If firstXML == null
      1. Set firstXML to o 

Without the change while script consisting of the single "x" generates
ReferenceError, its E4X modifications, *::x or @x results in undefined.

In addition the following would not work if xmlList contains an element without
child subtag:

xmlList.(child.@name == 'something') 

as child would evaluates to undefined resulting to ReferenceError on child.@name.
Igor, thanks for writing that up -- it was in the errata file at attachment
161180 [details] already at the top, in cryptic form: "??? @a::b => undefined if not found
in scope chain !?!?"  I noted this while implementing but forgot to go back and
explain that note.

/be
Erratum in 13.5.4.18 XMLList.prototype.propertyIsEnumerable ( P ):

The first step in the semantic algorithms should be changed from:

1. if ToNumber(P) is greater than or equal to 0 and ToNumber(P) is less than
x.[[Length]], return true

to

1. Let i = ToUint32(P)
2. If ToString(i) == P
  a. If i is greater than or equal to 0 and i is less than x.[[Length]], return true
  b. Otherwise return false
  
  
Without the change x.propertyIsEnumerable(0.1) or even
x.propertyIsEnumerable(-0.0) (depending on the meaning of "or equal to 0") can
return true. 
Possible errata in XMLList.prototype.child

13.5.4.5 defines XMLList.prototype.child(propertyName) to return empty list if
the list itself is empty as well without any checks for valid propertyName. 

That means that x.child(null) or x.child(undefined) returns empty list if x is
empty and throw TypeError when x is not empty during the eventual call to
toXMLName(null|undefined).

Compare that to 13.4.4.4 XML.prototype.attribute(attributeName) which always
throws TypeError when attributeName is null or undefined.

Should XMLList.prototype.child always require valid name even when the list is
empty?
Blocks: 265341
It seems > does not need to be escaped in element values (text), and the test
in e4x/Expressions/11.1.4.js wants it not to be escaped -- this is a
ToXMLString erratum.  Also, an element's attribute and namespace uri values are
not escaped, but should be.

More when I get back from the ECMA TG1 meeting.

/be
Attachment #161086 - Attachment is obsolete: true
Attached patch checkpoint, more tests passed, closing in (obsolete) — — Splinter Review
Attachment #163500 - Attachment is obsolete: true
Plus some lurking bugs, and some bogus tests in the suite.  Testsuite errata
doc soon, along with updated spec errata.

Reduced context from -pu8 to fit within bugzilla's 500K patch size limit.

/be
Attachment #164502 - Attachment is obsolete: true
Attached patch checkpoint, just about done — — Splinter Review
Impurity lurking, plus testsuite bugs (I think) that give false failures in six
tests (could be real bugs in the SpiderMonkey e4x lurking still, causing one or
two of these):

# Retest List, smdebug, generated Fri Nov 12 16:13:00 2004.
# Original test base was: e4x.
# 129 of 129 test(s) were completed, 6 failures reported.
e4x/Expressions/11.6.1.js
e4x/Types/9.1.1.9.js
e4x/XML/13.4.3.js
e4x/XML/13.4.4.38.js
e4x/XML/13.4.4.39.js
e4x/Regress/regress-264369.js

/be
Attachment #165651 - Attachment is obsolete: true
I checked in something close to attachment 166178 [details], rs=shaver.  It passes the
js/tests suite, excluding e4x, configured without E4X support (the default for now).

With JS_HAS_XML_SUPPORT defined to 1 for JS_VERSION == 150 in jsconfig.h, it
passes all e4x tests but the six noted in comment 21, and seems pure undef
Purify on Windows (although I had some "ghost" leaks that were once real, then
fixed, but persisted even after the fixes, which demonstrably eliminated all
such leaks using the DEBUG_notme code in jsxml.c).

I'm leaving this bug open to work on further testing, correctness, footprint and
performance wins necessary to justify turning on E4X by default.  One idea: use
NSPR's dlopen/dlsym portable veneer to make the E4X support runtime-conditional,
and put the bulk of the code in a separate DLL/DSO.  Before resorting to that,
I'm going to try to recover space throughout SpiderMonkey, by reducing the use
of macros (optimizations done in the '90s on RISC targets, mostly, and possibly
counterproductive on superscalar x86 machines now).

/be 
Oh, and I put off thread safety for "later".

The plan is to avoid penalizing the all single-threaded use-cases and embeddings
by building on the zero-cost locking implemented in bug 54743.  In the case of
E4X, this isn't trivial, because just using a per-XML-object zero-cost lock
doesn't avoid nested locks and AB-BA deadlock hazards inherent in XMLLists,
[[Insert]], [[Append]], and [[Replace]].  Just for starters!

/be
Feedback in bug 270552 and bug 270553.

- N.
Updated errata file next.

/be
Attached file latest errata file (obsolete) —
I really should clean this up and HTML-ize it, perhaps tomorrow.

/be
Comment on attachment 166406 [details] [diff] [review]
fix a spec bug that I faithfully transcribed into a code bug

I'd like to fix this for 1.8a5, just for the early adopters, but it's not a big
deal either way -- remember that E4X is configured off by default, and not
hooked up to the <script> tag by a MIME type parameter yet.

John, check my erratum claim here.  It sure seems like the
[[Replace]]-calls-[[Insert]] case, replacing at or beyond the length with a a
list value, will create a hole and an off-by-1 x.[[Length]].

/be
Attachment #166406 - Flags: superreview?(shaver)
Attachment #166406 - Flags: review?(john.schneider)
Comment on attachment 166406 [details] [diff] [review]
fix a spec bug that I faithfully transcribed into a code bug

Optimizing, since this is all #if'd off for 1.8a5.

/be
Attachment #166406 - Flags: approval1.8a5?
Comment on attachment 166406 [details] [diff] [review]
fix a spec bug that I faithfully transcribed into a code bug

a=asa for 1.8a5 checkin.
Attachment #166406 - Flags: approval1.8a5? → approval1.8a5+
Comment on attachment 166406 [details] [diff] [review]
fix a spec bug that I faithfully transcribed into a code bug

Thanks, but there's a better patch that probably does what the spec should do:
avoid incrementing x.[[Length]] until we know the value being inserted at the
end is not an XMLList.

/be
Attachment #166406 - Attachment is obsolete: true
Attachment #166406 - Flags: superreview?(shaver)
Attachment #166406 - Flags: review?(john.schneider)
Attachment #166406 - Flags: approval1.8a5+
Attached patch better approach — — Splinter Review
I'll check this into 1.8a5.  Reviews welcome, after the fact! ;-)

/be
Comment on attachment 166536 [details] [diff] [review]
better approach

Restoring Asa's stamp.

/be
Attachment #166536 - Flags: superreview?(shaver)
Attachment #166536 - Flags: review?(john.schneider)
Attachment #166536 - Flags: approval1.8a5+
Who should review this patch?

/be
Attached file latest errata file (obsolete) —
I didn't get the time to HTML-ize this yet, but it grew a bit.	Diff against
the last one:

94a95,106
> 9.2.1.2 step 2(c)(ii) sets _y.[[Parent]] = r_ where _r_ is the result of
>	  [[ResolveValue]] called on _x.[[TargetObject]] in 2(a)(i).  This can
>	  result in text parenting text:
>
>	  var MYXML = new XML();
>	  MYXML.appendChild(new XML("<TEAM>Giants</TEAM>"));
>
>	  (testcase from Werner Sharp <wsharp@macromedia.com>).
>
>	  To match insertChildAfter, insertChildBefore, prependChild, and
>	  setChildren, we should silently do nothing in this case.
>

/be
Attachment #166407 - Attachment is obsolete: true
Attached patch updated e4x testsuite patch (obsolete) — — Splinter Review
- Vacuously ported e4x/shell.js's NL function to SpiderMonkey.
									       

- Fixed bugs (deviations from the ECMA-357 spec) in e4x/Expressions/11.6.1.js
  sections 3 and 6.
									       

- Fixed backward use of ignoreWhitespace (or backward sense of equality test,
  but I went with minimal change here) in e4x/Types/9.1.1.9.js section 9.
									       

- e4x/XML/13.4.3.js repeated used XML.defaultSettings() where it wanted
  XML.setSettings() -- this suggests that the verb-less "defaultSettings"
  name is confusing, because hackers love to verb nouns, and "default" is
  often used as a verb, so XML.defaultSettings connotes "set defaults", not
  "get defaults".
									       

- e4x/XML/13.4.4.38.js sets XML.prettyPrinting = false but then expects
  XMLLists to use NL() as the list element separator, contrary to 10.2.2
  Step 2(a).  Same goes for 13.4.4.39.js.
									       

- Added missing newline at end of file to a bunch of tests.
									       

- Fixed misspelled "arbitrary" in e4x/Regress/regress-257679.js.

I'd like to check this in soon, but not before someone reviews the substantive
changes and agrees (and files any needed Rhino E4X bugs).  Igor, are you game?

/be
Attachment #166821 - Attachment is obsolete: true
Attached patch patch I just checked in — — Splinter Review
to finish off several PutProperty bugs, a few other odd bugs, and fixes based
on the recently discovered errata.

/be
Breaking out the E4X <=> DOM hookup to the bug that Nigel kindly filed.  This
bug can track that, and accumulate errata and testsuite fixes, for a while.

When the DOM hookup is done, I'd like to close both this bug and bug 270553, and
let any further problems (which should arise at much lower rate) be tracked by
individual bugs.

For regression testing purposes, it would have been ideal to make separate bugs
for each erratum or SpiderMonkey bug fixed by the last few patches attached
here, but that would have slowed things down to a crawl.  The consolidated
errata file can help tell us where we need to add more tests, instead of a list
of micro-bugs to treat as "regressions".

/be
Depends on: 270553
(In reply to comment #36)
> Created an attachment (id=166969)
> updated e4x testsuite patch
> 
> I'd like to check this in soon, but not before someone reviews the substantive
> changes and agrees (and files any needed Rhino E4X bugs).  

The changes triggered test failures in Rhino but AFAICS these are real bugs.
Note also that e4x/XMLList/13.5.4.16.js contains:

x = new XML();
TEST(2, undefined, x.parent());

Should it be changed to null?
> /be

Brief feedback.

Errata: Section 8 of E4X incorrectly pluralizes "Punctuators"
in the second paragraph.

Bugs: js -x could error more gracefully and earlier when presented
with this test data:

var a = <>;
// put anything lexable here

Seen with Linux/gcc 3.3.3-7, latest trunk.

- N.
Igor wrote:

> x = new XML();
> TEST(2, undefined, x.parent());
> 
> Should it be changed to null?

Yes, to match the spec.  I changed SpiderMonkey's E4X to match the testsuite the
other week, and noted the confusion in the errata file:

13.5.4.16 Steps 1 and 3 specify undefined return, not null as is done for
          XML.prototype.parent().  The testsuite wants null for "no parent"
          from XMLList.prototype.parent().

It's handy to be able to distinguish "not there" from "null" in cases such as
this, so I think the spec has it right.

I'll post an updated testsuite patch with this fix, and a SpiderMonkey E4X patch
to match, within the next day or two.

/be
Attached file Sample showing <> error lateness (obsolete) —
Now that it's not the wee hours, here's a more coherent report.
Error from this attachment reports Line 5 not 2:

5: SyntaxError: missing } in XML expression:
5:   flag = false;
5: ..............^

- N.
Now that it's not the wee hours, here's a more coherent report.
Error from this attachment reports Line 5 not 2:

5: SyntaxError: missing } in XML expression:
5:   flag = false;
5: ..............^

- N.
Attachment #167237 - Attachment is obsolete: true
Attachment #167237 - Attachment mime type: application/octet-stream → application/x-javascript
Comment on attachment 167238 [details]
Sample showing <> error lateness

The original Netscape HTML parser went to great lengths to cope with unclosed
tags and improperly terminated entities.  Of course here we don't want to
"correct" errors by auto-closing tags, but how much code footprint do we want
to throw at diagnosing these issues?

Food for thought.

/be
Attachment #167238 - Attachment mime type: application/octet-stream → application/x-javascript
Attached patch latest e4x testsuite patch — — Splinter Review
Fixes tests to-do with XML.prototype.parent() and XMLList.prototype.parent().

SpiderMonkey patch next.

/be
Attachment #166969 - Attachment is obsolete: true
I'm checking this in today, along with the testsuite patch.

/be
Attached file toXMLString() comment serialisation problem (obsolete) —
Trivial case showing non-comment serialization in toXMLString()

- N.
Trivial test case showing entity reference loss with to[XML]String().

- N.
Re: comment 47: comments are ignored by default (XML.ignoreComments defaults to
true -- set it to false to get the desired results).

Re: comment 48: that's a bug, I'll fix it (feel free to report such lesser
followup bugs separately from this bug).  Thanks,

/be
Fixed the bug reported in comment 48.  This leads to results that might be
considered surprising:
 
toString()
---------
&nbsp;
 
toXMLString()
---------
<tag>&amp;nbsp;</tag>

But that's per spec, and applies to any unknown entity (nbsp is not defined by
the XML spec).

/be
Just out of interest, why wouldn't:
   var xml = <tag>&nbsp;</tag>;
...cause a well-formedness error?
Hixie: cuz my integrated XML parser isn't doing that kind of well-formedness
check?  D'oh.  Easily fixed, thanks....

/be
So E4X is written without specifying XML apart from 1.0 (vs. 1.1, e.g. in what
makes an identifier an XMLName).  But it doesn't say anything about entity refs
that refer to entities other than the built-in XML 1.0 ones (amp, lt, gt, apos,
quot).

This means that an XML parser used in conjunction with a JS engine to implement
E4X could process <!ENTITY ...> declarations and expand entity references, but
XML.prototype.toXMLString would not convert these expansions back into
references. That means no round-tripping via toXMLString, which seems like a bug
to me.  Probably ECMA TG1 considered this already.

Anyway, 10.3.2.1 step 8 is quite clear, and SpiderMonkey's E4X should throw a
TypeError for any unexpanded entity ref.  I'll fix that now.

/be
E4X's XML language doesn't support <!ENTITY> at all, right? So to end up in this
position you'd need something like:

   document.createEntityReference('nbsp').toXML().toXMLString();

...right? That would presumably just return a string consisting of U+00A0,
assuming "document" is an HTML document or an XML document with the appropriate
entity defined. Do the specs define this well enough?
Hixie:

>    document.createEntityReference('nbsp').toXML().toXMLString();

There is no toXML in the DOM specs, but if we added such a thing, we'd have to
extend ECMA-357 as well.  10.3.2.1 MapInfoItemToXML step 8 says:

8. If i is an unexpanded entity reference information item
    a. Throw a TypeError exception

/be
> There is no toXML in the DOM specs

My bad, I misunderstood ECMA-357 section 10.3. Take this example instead:

   new XML(document.createEntityReference('nbsp')).toXMLString();


> 10.3.2.1 MapInfoItemToXML step 8 says:
> 
> 8. If i is an unexpanded entity reference information item
>     a. Throw a TypeError exception

By "unexpanded" I assume they mean any entity reference, as opposed to one that
has been flattened. I guess that works. So basically, entities in general aren't
handled by E4X. (That's a good thing. I wish they'd dropped entities other than
the numeric ones and the five predefined ones from XML in the first place.)
Attachment #167900 - Attachment is obsolete: true
Attached file latest errata file (obsolete) —
This should include everyone's errata -- if not, please mail me.

/be
Attachment #161180 - Attachment is obsolete: true
Attachment #166822 - Attachment is obsolete: true
Regarding erratas:

It would be nice if E4X clarifies whether XML.[[Get]] or Object.[[Get]] should
be used during prototype lookup if the head of the prototype chain is Object.
Consider:

function Foo() { }
Foo.prototype = XML.prototype;

var bar = new Foo();

var test = bar.toXMLString

The question is whether test should refer to the toXMLString method defined in
XML.prototype or should it be empty XMLList?

Here is ECMAScript for Object.[[Get]] from ECMA-262 v3, section 8.6.2.1:

--------------------------------------
8.6.2.1 [[Get]] (P)

When the [[Get]] method of O is called with property name P, the following steps
are taken:

1 . If O doesn't have a property with name P, go to step 4.
2 . Get the value of the property.
3 . Return
4 . If the [[Prototype]] of O is null, return undefined.
5 . Call the [[Get]] method of [[Prototype]] with property name P.
6 . Return Result(5).
--------------------------------------

Now E4X specs gives in ECMA-357, section 9.1.1.1, first paragraph:

--------------------------------------
9.1.1.1 [[Get]] (P)
Overview

The XML type overrides the internal [[Get]] method defined by the Object type.
...
--------------------------------------


Given the traditional definition of "override" this tells that the prototype
lookup should use XML.[[Get]] and in the example "test" should be empty XMLList.

On the other hand the 3-rd paragraph of 9.1.1.1 [[Get]] (P) states:

--------------------------------------
NOTE
Unlike the internal Object [[Get]] method, the internal XML [[Get]] method is
never used for retrieving methods associated with XML objects. E4X modifies the
ECMAScript method lookup semantics for XML objects as described in section 11.2.2.
--------------------------------------

which indicates that Object.[[Get]] should always call Object.[[Get]] and "test"
should alias XML.prototype.toXMLString method. Note that 11.2.2 does not
consider the case of prototype chain and simple refers to Object.[[Get]] definition.

So what should be the right thing? I would prefer the second interpretation
since as far as I can see it is the only way to get reference to methods defined
in XML.prototype to use them in continuation with Function.protype.apply, but it
is just my preference.



NCRs are implemented now, and all entity references other than the fab five
defined in XML 1.0 are illegal.

/be
Attached file latest errata file —
Diff with previous errata file on left:

113a114,124
> 9.2.1.2 Step 2(e)(i, ii),
> 9.2.1.2 Step 7(e)(i),
> 9.2.1.3 Step 2(b)(ii)(1)(a)
>	  All uses of a.[[Name]] for an attribute a in these sections that pass

>	  that QName object to [[Delete]] must pass an AttributeName cloned
from>	      a.[[Name]].  The [[Name]] internal property is always a QName
instance>	  and never an AttributeName or AnyName instance.  But
[[Delete]] will
>	  not operate on x.[[Attributes]] when given a QName by these sections,

>	  so a child could be wrongly deleted instead of the attribute of the
>	  same name.
>

/be
Attachment #168427 - Attachment is obsolete: true
That was ugly -- I wish bugzilla would wrap long text better, or not at all if
it appears to be "cited" (diff > at start of line looks like mail citation line
mark) -- fudging the indentation so the diff is readable, to foster discussion:

113a114,124
> 9.2.1.2 Step 2(e)(i, ii),
> 9.2.1.2 Step 7(e)(i),
> 9.2.1.3 Step 2(b)(ii)(1)(a)
>     All uses of a.[[Name]] for an attribute a in these sections that pass
>     that QName object to [[Delete]] must pass an AttributeName cloned from
>     a.[[Name]].  The [[Name]] internal property is always a QName instance
>     and never an AttributeName or AnyName instance.  But [[Delete]] will
>     not operate on x.[[Attributes]] when given a QName by these sections,
>     so a child could be wrongly deleted instead of the attribute of the
>     same name.
>

The problem in these steps is that an attribute's [[Name]] is used as if it were
of (internal) type AttributeName, but it's specified in 9.1.1 that [[Name]] must
be null or a QName object.

/be
Oh, and I just enabled E4X at compile time, with shaver prodding me to action. 
The footprint hit is ~80k, but I have some wins coming, and this is a case where
we gain a feature that may be worth the cost, even if we can't save elsewhere to
try to reduce the size of the hit.

/be
dmose pointed out the followup bug patched here.  E4X breaks the invariant,
preserved until now for all JS objects, native or host: for all method m in an
object o, o.m(arguments) is equivalent to var f = o.m; f.apply(o, arguments). 
With E4X, you cannot extract a function-valued method from an XML instance.

In SpiderMonkey's E4X, you can extract a method using the function:: namespace,
but that extension has not made its way back to ECMA, and it doesn't, in any
case, help prevent the broken symmetry from rippling through other layers of C
or C++ code built on the JS API.  In this case, XPConnect.

Still seems like a lose to break the invariant for XML, but I haven't got a way
to unify the conflicting goals:

1.  For E4X, that every get of a property id that is not an index (in the case
of an XMLList; reserved and an error for XML) returns an XMLList, which may be
empty, to return all matching attributes or children.

2.  Preserving the JS method extraction invariant mentioned above, and related
ones (call as well as apply, manual invocation if the |this| parameter binding
to o in the examples above doesn't matter, or can be faked via o.m2 = f;
o.m2(arguments)).

Since 1 does not apply uniformly to all XML properties (get of an index id on
an XML object is illegal, reserved for future use; get on an XMLList returns
the indexed child, not an XMLList containing the indexed child), I would simply
add another special case to 1, preserving 2.  But it's a bit late for that --
dmose is using E4X for CalDAV hacking already.

/be
I checked in a slightly better version of attachment 169767 [details] [diff] [review], so dmose can (I
hope) get past the problem he's seeing.

/be
(In reply to comment #63)
...
> In SpiderMonkey's E4X, you can extract a method using the function::namespace,
but that extension has not made its way back to ECMA, 

You can extract method in the standard E4X ( well, at least the way I read E4X
specs, see comment 58) via:

function getMethod(object, name)
{
    var Proxy = function() { }
    Proxy.prototype = object;
    var proxy = new Proxy();
    return proxy[name];
}

and then a.function::method can be done via getMethod(a, "method")

> 
> 1.  For E4X, that every get of a property id that is not an index (in the case
> of an XMLList; reserved and an error for XML) returns an XMLList, which may be
> empty, to return all matching attributes or children.
> 
> 2.  Preserving the JS method extraction invariant mentioned above, and related
> ones (call as well as apply, manual invocation if the |this| parameter binding
> to o in the examples above doesn't matter, or can be faked via o.m2 = f;
> o.m2(arguments)).

In Rhino the above is satisfied via allowing to apply "()" to XMLList which
remember sufficient history to be able to support var f = xmlobject.m;
f.apply(xmlobject, arguments) and related operations.


> 
> Since 1 does not apply uniformly to all XML properties (get of an index id on
> an XML object is illegal, reserved for future use; get on an XMLList returns
> the indexed child, not an XMLList containing the indexed child), I would simply
> add another special case to 1, preserving 2.  But it's a bit late for that --
> dmose is using E4X for CalDAV hacking already.
> 
> /be
> and then a.function::method can be done via getMethod(a, "method")

Clever!  But it still doesn't handle name collisions between prototype methods
and (user-defined) XML children.

> In Rhino the above is satisfied via allowing to apply "()" to XMLList which
> remember sufficient history to be able to support var f = xmlobject.m;
> f.apply(xmlobject, arguments) and related operations.

Again the problem is that someone may write XML of the form:

  x = <parent><name>...</name><child id="1">...</child>...</parent>;

and then you cannot call x.child() or x.name().

E4X really wanted to put XML.prototype.* methods delegated from XML instances
into a separate namespace, but instead specified so that they can't be extracted
easily or in general.  It sounded at the last TG1 meeting as though everyone
agreed with the idea of putting the methods in a separate namespace, which would
be automatically selected when calling.

Then at least the following issues arise:

(1) What URI for the namespace?  It would not be useful to use a URI that could
identify an XML schema, since no schema can describe the grammar for native
functions.  It would be better if it were a reserved "anti-URI", which could not
collide with any XML schema URI.

(2) Should there be a shorthand, _a la_ SpiderMonkey's function::child use of
the function reserved identifier, so that users don't have to repeatedly create
a function namespace (var fn = new Namespace("<anti-URI goes here>"); var f =
x.fn::child; etc.)?  Obviously I think so, although "function" isn't that short
(but at least it's reserved, and clear).

/be
(In reply to comment #66)
> 
> Again the problem is that someone may write XML of the form:
> 
>   x = <parent><name>...</name><child id="1">...</child>...</parent>;
> 
> and then you cannot call x.child() or x.name().

But why? 

var f = x.child; f.apply(x); 
can be made to work since '()' can check for [[TargetProperty]]/[[TargetObject]]
of child to see if it come from xml and then check for Function.prototype as a
source of additional properties since x.child() is defined.

In fact it does not work in Rhino this way due to fixable bugs.


> 
> E4X really wanted to put XML.prototype.* methods delegated from XML instances
> into a separate namespace, but instead specified so that they can't be extracted
> easily or in general.  It sounded at the last TG1 meeting as though everyone
> agreed with the idea of putting the methods in a separate namespace, which would
> be automatically selected when calling.
> 
> Then at least the following issues arise:
> 
> (1) What URI for the namespace?  It would not be useful to use a URI that could
> identify an XML schema, since no schema can describe the grammar for native
> functions.  It would be better if it were a reserved "anti-URI", which could not
> collide with any XML schema URI.
> 
> (2) Should there be a shorthand, _a la_ SpiderMonkey's function::child use of
> the function reserved identifier, so that users don't have to repeatedly create
> a function namespace (var fn = new Namespace("<anti-URI goes here>"); var f =
> x.fn::child; etc.)?  Obviously I think so, although "function" isn't that short
> (but at least it's reserved, and clear).
> 
> /be

(In reply to comment #66)
> Then at least the following issues arise:
> 
> (1) What URI for the namespace?  It would not be useful to use a URI that could
> identify an XML schema, since no schema can describe the grammar for native
> functions.  It would be better if it were a reserved "anti-URI", which could not
> collide with any XML schema URI.
> 
> (2) Should there be a shorthand, _a la_ SpiderMonkey's function::child use of
> the function reserved identifier, so that users don't have to repeatedly create
> a function namespace (var fn = new Namespace("<anti-URI goes here>"); var f =
> x.fn::child; etc.)?  Obviously I think so, although "function" isn't that short
> (but at least it's reserved, and clear).
> 
> /be

But why not to provide a global function, say methods(object), that would return
the method-only view of the object? Then one can write methods(x).child or even
methods(x)['child'] which is much more clear IMO then x[QName(fn, 'child')].

> can be made to work since '()' can check for [[TargetProperty]]/[[TargetObject]]
> of child to see if it come from xml and then check for Function.prototype as a
> source of additional properties since x.child() is defined.

The E4X spec says that the XMLList returned in that example should contain one
element for reach immediate child of the parent tag, and not include a function
object in the array.  We'd have to say where it goes, and it would make the list
inhomogeneous, breaking operators such as the filtering predicate.  It's yet
more asymmetry.

Second reason not to do this is that it adds random logic to the XML method
calling (really, getting -- GetMethod is factored from CallMethod in
SpiderMonkey, and could be in E4X spec) path, which is already complicated enough.

The Macromedia folks commented on how the magic-ness of E4X objects required
extra random logic to be added to generic and type-specific layers, and that was
true in SpiderMonkey and (IIRC) in Rhino too.  Simpler is better, and a
namespace isolates all functions and preserves homogeneity of XMLList [[Get]]
results.

> But why not to provide a global function, say methods(object), that would
> return the method-only view of the object? Then one can write
> methods(x).child or even methods(x)['child'] which is much more clear IMO
> then x[QName(fn, 'child')].

First, there's no need to write x[QName(fn, 'child')] -- x.fn::child works and
is the natural way to express it in E4X or ECMA Edition 4.

Second, a global function would have to create or find a safely reusable
methods-only view object.  That's ineffecient.

Third, a global function pollutes the global namespace.

Given namespaces in E4X and Edition 4, why would you not use them to handle this
asymmetry imposed by E4X's current design, which was based on the reasonable
goal mentioned in comment 63 (numbered 1 there): to have homogeneous lists of
child or attribute results from [[Get]]?

Using a global method is a "JS 1" type of hack (I should know, I perpetrated JS
1 and its hacks), given namespaces.  We should not spend much more space arguing
in this bug, so if you still feel strongly, write something up and attach it. 
I'll take it to the TG1 group.

/be
(In reply to comment #69)
> > can be made to work since '()' can check for [[TargetProperty]]/[[TargetObject]]
> > of child to see if it come from xml and then check for Function.prototype as a
> > source of additional properties since x.child() is defined.
> 
> The E4X spec says that the XMLList returned in that example should contain one
> element for reach immediate child of the parent tag, and not include a function
> object in the array.  We'd have to say where it goes, and it would make the list
> inhomogeneous, breaking operators such as the filtering predicate.  It's yet
> more asymmetry.
> 
> Second reason not to do this is that it adds random logic to the XML method
> calling (really, getting -- GetMethod is factored from CallMethod in
> SpiderMonkey, and could be in E4X spec) path, which is already complicated enough.
> 
> The Macromedia folks commented on how the magic-ness of E4X objects required
> extra random logic to be added to generic and type-specific layers, and that was
> true in SpiderMonkey and (IIRC) in Rhino too.  Simpler is better, and a
> namespace isolates all functions and preserves homogeneity of XMLList [[Get]]
> results.
> 
> > But why not to provide a global function, say methods(object), that would
> > return the method-only view of the object? Then one can write
> > methods(x).child or even methods(x)['child'] which is much more clear IMO
> > then x[QName(fn, 'child')].
> 
> First, there's no need to write x[QName(fn, 'child')] -- x.fn::child works and
> is the natural way to express it in E4X or ECMA Edition 4.

I was refering to the case when property is not known at run time like in:

function callit(object, method_name)
{
    return object[method_name]();
}

then to prepare it for E4X I first thought it would have to be modified as:

function callit(object, method_name)
{
  if (typeof object == "xml") {
    return object[new QName(fn, method)]();
  }
  return object[method]();
}

which would be more verbouse then

function callit(object, method_name)
{
  return methods(object)[method]();
}

but then I forgot that :: can be followed by [] so callit would look like:

function callit(object, method_name)
{
  return object.function::[method]();
}

which is nice and I guess now function:: should go to Rhino as well.
(In reply to comment #63)
> In SpiderMonkey's E4X, you can extract a method using the function:: namespace,

It sems that CVS tip does not treat function:: specially if it starts expression
like in:

with (<xml>...</xml>) {
  function::name();
}

Should FunctionStmt be XML-ised as well?
1)  Propagate the TCF_HAS_DEFXMLNS flag from the parser to the constant folder,
so that the latter doesn't fold XML literals whose default namespace can't be
known till runtime.

2)  The xml_mark object-op must call the native mark object-op (js_Mark),
because XML objects are like native objects: they have scopes (almost always, a
share in the XML.prototype object's scope), and they have slots (almost always,
just the first four).

/be
Attachment #169878 - Flags: review?(shaver)
This is a jsparse.c patch generated on top of the patch at attachment 169878 [details] [diff] [review].
Thanks, Igor!

/be
Comment on attachment 169878 [details] [diff] [review]
further fixes for dmose's JS component re-registration case

This patch and the next one after it are checked into the cvs trunk now.

/be
Attachment #169878 - Attachment description: furher fixes for dmose's JS component re-registration case → further fixes for dmose's JS component re-registration case
Another erratum:

13.3.5.4 [[GetNamespace]]
         10.3.2.1 Step 6(c) NOTE does not say how a [prefix] containing "no
         value" (see http://www.w3.org/TR/xml-infoset 2.2.3) maps to
         x.[[Name]].[[Prefix]], but it is reasonable to assume it maps to the
         *undefined* prefix value mentioned various places in ECMA-357 as a
         distinct value from any string value, to mean "no prefix".
 
         10.3.2.1 Step 6(h)(i)(1) specifies:
  
         1. If the [local name] property of a is "xmlns"
             a. Map ns.prefix to the empty string
 
         Now 13.3.5.4 [[GetNamespace]] Step 3 includes this note:
 
         NOTE: implementations that preserve prefixes in qualified names may
         additionally constrain ns, such that ns.prefix == q.[[Prefix]]
 
         But "" != undefined, or *undefined* represented by null or any other
         non-void value that is distinct from any string when compared with
         == (note: it would be better if the spec used ===, since 0 == "" and
         0 == "0").
 
         Therefore, without further special-casing in [[GetNamespace]],
         10.2.1 "ToXMLString Applied to the XML Type" step 11, given XML
         of the form:
 
         <t xmlns="http://foo.com"/>
 
         will call [[GetNamespace]] on x.[[Name]] = (prefix=undefined,
         localName="t") with argument { (prefix="", uri="http://foo.com") }.
         According to the NOTE in 13.3.5.4 Step 3, the singleton namespace
         won't match the QName because "" != undefined, so [[GetNamespace]]
         will create a new Namespace (prefix=undefined, uri="http://foo.com").
 
         Then 10.2.1 Step 12 will generate an implementation-defined
         namespace prefix distinct from any prefix in { "", undefined }, sets
         namespace.prefix to that generated string, and unions namespace into
         namespaceDeclarations.  The effect on the example ToXMLString input
         is the string:
 
         '<pfx:t xmlns:pfx="http://foo.com" xmlns:pfx2="http://foo.com"/>'
 
         (pfx2 and pfx may be the same generated string, if the prefix auto-
         generation algorithm deterministically computes a variation on a part
         of the namespace's uri.)
 
Need more toXMLString tests!

/be
Here's an even subtler suboptimality in the spec:

10.2.1 "ToXMLString applied to the String type" Step 12
        Erratum: if both namespace.prefix and x.[[Name]].[[Prefix]] are
        undefined, we know that x was named using the default namespace
        (proof: see [[GetNamespace]] and the Namespace constructor called
        with two arguments).  So we ought not generate a new prefix here,
        when we can declare namespace as the default namespace in x.
 
        The fix is to change Step 12 to something like this:
 
        12. If (namespace.prefix == undefined),
            a.  If (x.[[Name]].[[Prefix]] == undefined),
                i.  Let namespace.prefix be the empty string
            b.  Else
                ii. Let namespace.prefix be an arbitrary implementation
                    defined namespace prefix, such that there is no ns2 in
                    (AncestorNamespaces U namespaceDeclarations) with
                    namespace.prefix == ns2.prefix
            c.  [what was 12(b) goes here]
 
        This helps descendants inherit the namespace instead of redundantly
        redeclaring it with generated prefixes in each descendant.
 

If I'm missing a good reason not to serialize default namespaces via default
namespace declarations (xmlns=...) via toXMLString, let me know.  With the code
version of this fix in SpiderMonkey's E4X, the testsuite still passes fine (it
fails to test toXMLString on a deep tree that uses only the default namespace).
 This case, from dmose, is even better:

default xml namespace = "urn:ietf:params:xml:ns:caldav";
var queryXml =
  <calendar-query>
    <calendar-query-result/>
      <filter>
        <icalcomp-filter name="VCALENDAR">
          <icalcomp-filter name="VEVENT">
          </icalcomp-filter>
        </icalcomp-filter>
      </filter>
  </calendar-query>;
 
print(queryXml.toXMLString());

Without the fix, you get:

<caldav:calendar-query xmlns:caldav="urn:ietf:params:xml:ns:caldav">
  <caldav:calendar-query-result xmlns:caldav="urn:ietf:params:xml:ns:caldav"/>
  <caldav:filter xmlns:caldav="urn:ietf:params:xml:ns:caldav">
    <caldav:icalcomp-filter xmlns:caldav="urn:ietf:params:xml:ns:caldav"
name="VCALENDAR">
      <caldav:icalcomp-filter xmlns:caldav="urn:ietf:params:xml:ns:caldav"
name="VEVENT"></caldav:icalcomp-filter>
    </caldav:icalcomp-filter>
  </caldav:filter>
</caldav:calendar-query>

Painful!  With the fix, you get:

<calendar-query xmlns="urn:ietf:params:xml:ns:caldav">
  <calendar-query-result/>
  <filter>
    <icalcomp-filter name="VCALENDAR">
      <icalcomp-filter name="VEVENT"></icalcomp-filter>
    </icalcomp-filter>
  </filter>
</calendar-query>

Nice!

/be
The proposed fix in comment 76 needs fixing:

        12. If (namespace.prefix == undefined),
            a.  If (x.[[Name]].[[Prefix]] == undefined),
                i.  Let namespace.prefix be the empty string
                ii. If there exists a member ns2 of namespaceDeclarations that
                    has the empty string as its prefix
                    1.  Remove ns2 from namespaceDeclarations
            b.  Else
                ii. Let namespace.prefix be an arbitrary implementation
                    defined namespace prefix, such that there is no ns2 in
                    (AncestorNamespaces U namespaceDeclarations) with
                    namespace.prefix == ns2.prefix
            c.  [what was 12(b) goes here]

This came up via bug 277779, which wants XML.prototype.setNamespace (at least;
what about setName as well?) to auto-magically add to the declared namespaces,
but not to duplicate default namespace declarations.

/be
Blocks: 277887
Errata for E4X 13.1.2.1 isXMLName(value)

The specs states: 

2. If q.localName does not match the production NCName, return false

but it does not define the meaning of "match". This is important since NCName, 
http://w3.org/TR/xml-names11/#NT-NCName, includes among other thinks characters
beyond 64K limit, http://www.w3.org/TR/xml11/#NT-NameStartChar :

NameStartChar ::=  ... [#x10000-#xEFFFF]

Since JS strings are just 2byte chunks, the specs should define the meaning of
"match". IMO a clean way to deal with this is to require to interpret JS strings
as UTF-16 encoding and to define what to do if the JS string is not a valid UTF-16. 

Comment on attachment 166536 [details] [diff] [review]
better approach

rs=shaver
Attachment #166536 - Flags: superreview?(shaver) → superreview+
Latest thinking from ECMA TG1, or the subset working on E4X as it goes to ISO:
the [[InScopeNamespaces]] internal property should be mapped from the XML
Infoset's [in-scope namespaces], excluding the xml: prefix's namespace.  Not
mapped from the [namespace attributes] property, as ECMA-357 10.3.2.1 step
6(g-h) have it.

TG1, or at least the subset consisting of Jeff Dyer of Macromedia and me, also
wants to constrain [[InScopeNamespaces]] to include only *used* namespaces.

This changes a bunch of the code I wrote in SpiderMonkey, and it probably
changes the Rhino E4X implementation.  It means that you can always compute the
declared namespaces (returned by namespaceDeclarations, and serialized in the
output of toXMLString) for an element are always the set difference of that
element's in-scope namespaces minus the union of its ancestors' in-scope
namespaces.  So there's no need for the |declared| flag in JSXMLNamespace.

More in a bit.

/be
Need to finish this off for 1.8:

- Revise namespace inheritance so [[InScopeNamespaces]] matches [in-scope
namespaces] from XML-infoset (this bug).

- Implement XML(document) for two-way DOM <-> E4X data binding (bug 270553).

- Enable E4X by default for XUL contexts (bug not filed yet).

/be
Target Milestone: mozilla1.8alpha4 → mozilla1.8beta2
Depends on: 289630
More ECMA-357 errata:

I claim ECMA-357 has a spec bug in 10.2.1 "ToXMLString Applied to the XML Type":

17. For each an in attrAndNamespaces
 a. Let s be the result of concatenating s and the space <SP> character
 b. If Type(an) is XML and an.[[Class]] == "attribute"
  i. Let ans be a copy of the result of calling [[GetNamspace]] on a.[[Name]]
     with argument AncestorNamespaces

The last line should read

     with argument AncestorNamespaces U namespaceDeclarations

as namespaceDeclarations (the xmlns= and xmlns:prefix= attributes in this
element) are in-scope for this element's attributes.

I also claim 13.4.4.35 XML.prototype.setName and 13.4.4.36
XML.prototype.setNamespace are incomplete.  They say absolutely nothing about
updating [[InScopeNamespaces]].  Other parts of the spec manually update this
set, but these methods assume that it magically keeps itself up-to-date with
respect to x.[[Name]].

Warning: you cannot use AddInScopeNamespace easily from 13.4.4.{35,36} to fix
this erratum.

/be
errata nit: spelling "initializer" in 12.1 example on page 58, while rest of
document uses "initialiser"
Section 10.1.2 ToString Applied to the XMLList Type of the specification says
the following:

Given an XMLList object l, ToString performs the following steps:
!!!1. If l.hasSimpleContent() == true
  a. Let s be the empty string
  b. For i = 0 to l.[[Length]]-1,
    i. If x[i].[[Class]] &#8713; {"comment", "processing-instruction"}
      1. Let s be the result of concatenating s and ToString(l[i])
  c. Return s
!!!2. Else
  a. Return ToXMLString(x)

Where does the 'x' in 1.b.i and 2.a come from? I guess that should be
  i. If l[i].[[Class]] &#8713; {"comment", "processing-instruction"}
and
  a. Return ToXMLString(l)
Blocks: 290393
Blocks: branching1.8
Blocks: 300861
No longer blocks: 300861
Flags: testcase?
we've got what we're going to get, I believe, so pulling this off of the branch
blocking list.
No longer blocks: branching1.8
Attachment #166536 - Flags: review?(john.schneider)
Flags: testcase? → testcase-
I see bug 321547, bug 321549 in E4X implementation
Depends on: 336921
QA Contact: pschwartau → general
WHy is this bug still open rather than marked as fixed?
Not because of bug 349263, that can wait. It must be due to bug 270553.

/be
Since we've tracked errata here so far...

The specification currently says that, during parsing of an XML literal or a string passed to XML(), if XML.ignoreWhitespace is true whitespace-only text nodes are removed from the resultant XML object; it does not say that leading and trailing whitespace in XML text nodes should be trimmed.  Per discussion in bug 369394, this is a mistake in the specification -- when XML.ignoreWhitespace is true, leading and trailing whitespace should be stripped from XML text nodes.
Someone else should lead the charge here; I've got bigger fish to fry.

/be
Assignee: brendan → general
Status: ASSIGNED → NEW
Depends on: 410192
Depends on: 410263
Depends on: 410366
Depends on: 355422, 355674, 381197, 382339, 384435
Depends on: 410740
Blocks: 435398
Depends on: 452729
Depends on: 490758
E4X will be removed again from Spidermonkey (bug 788293)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
No longer depends on: 349263
You need to log in before you can comment on or make changes to this bug.