Closed Bug 478491 Opened 15 years ago Closed 15 years ago

appendChild extremly slow

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: jodyer, Assigned: daumling)

References

Details

Attachments

(2 files)

Steps to reproduce:
1. complie the following class as main-class
package 
{
    import flash.display.Sprite;
    import flash.utils.getTimer;

    public class Main extends Sprite 
    {
        public function Main():void 
        {
			var methods:Array = ["usingAppendChildAndE4X", "addingToXMLList", "appendChildAndString", "concatenatingStringsFromE4X", "simpleStringConcatenation"];
            var nodeCount:int = 1000;
            
            var referenceResult:String = null;
            
            var i:int;
            var l:int = methods.length;
            
            for (i = 0; i < l; i++) {
                var method:Function = this[methods[i]];
                
                var startTime:Number = getTimer();
                var result:XML = method(nodeCount);
                var endTime:Number = getTimer();
                
                trace(methods[i] + ": " + (endTime - startTime));
                
                if (referenceResult != null) {
                    if (referenceResult != result.toXMLString()) {
                        trace("ERROR!");
                    } else {
                        trace("-- OK!");
                    }
                } else {
                    referenceResult = result.toXMLString();
                }
            }
        }
        
        public function usingAppendChildAndE4X(nodeCount:int):XML
        {
            var xml:XML = <root />;
            
            for (var i:int = 0; i < nodeCount; i++) {
                xml.appendChild(<node id={i} />);
            }
            
            return xml;
        }
		
		public function appendChildAndString(nodeCount:int):XML
        {
            var xml:XML = <root />;
            
            for (var i:int = 0; i < nodeCount; i++) {
                xml.appendChild(XML("<node id=\"" + i + "\" />"));
            }
            
            return xml;
        }
        
        public function addingToXMLList(nodeCount:int):XML
        {
            var xml:XML = <root />;
			var nodes:XMLList = new XMLList();
            
            for (var i:int = 0; i < nodeCount; i++) {
                nodes += <node id={i} />;
            }
            
			xml.setChildren(nodes);
			
            return xml;
        }
		
		public function concatenatingStringsFromE4X(nodeCount:int):XML
        {
            var  nodes:Array = new Array();
            
            for (var i:int = 0; i < nodeCount; i++) {
				nodes.push((<node id={i} />).toXMLString());
            }
			
            return XML("<root>" + nodes.join("") + "</root>");
        }
        
        public function simpleStringConcatenation(nodeCount:int):XML
        {
            var str:String = "<root>";
            
            for (var i:int = 0; i < nodeCount; i++) {
                str += "<node id=\"" + i + "\" />";
            }
            
            str += "</root>";
            
            return XML(str);
        }
    }
}
 
 Actual Results:
 both methods using appendChild are between 50 and 200 times slower than the simpleStringConcatenation method.
 
 Expected Results:
 no such big performance differences
 
 Workaround (if any):
 concatenating strings, which is very ugly
Transferred Comments:

Trevor Baker - Tue Jan 27 16:33:44 CST 2009
send to internal review for prioritization

Jeff Dyer - Thu Jan 29 15:27:11 CST 2009
Moving to Future.

Jd


This bug transferred from: http://bugs.adobe.com/jira/browse/ASC-3543
Blocks: AS3_Builtins
the appendChild implementation follows ECMA-357 strictly by operating on the XMLList returned by "*". This is extremely(!) slow - the list has to be created on every append operation, and the operation itself has to go though the entire XMLList semantics. It would be much faster if appendChild() would operate on the node directly. Other methods, like childIndex() already bypass the ECMA-357 implementation this way.

Can someone, please, let me know if this change would be OK, and assign the bug to me?
Werner Sharp is the expert on this, IIRC -- adding him for comments.
Sounds fine to me as long as all the sanity tests pass with the new implementation.
OK - I'll take that bug then...
Status: NEW → ASSIGNED
this is crying out for benchmark tests.
Dan can you check this patch against the Dell Axim please, just make sure that we can execute in a timely fashion for hybrid, interp and jit.
Attachment #382286 - Flags: review?(dschaffe)
Flags: in-testsuite?
Flags: flashplayer-triage+
Flags: flashplayer-qrb?
tested on Dell Axim with build 1993, (I mistakenly first tried on an older build and 1st test ran > 60s)
- set desktop=false in dir.asc_args
- export AVM=ceremoteshell.exe, ceremotedeployer.exe avmplus_arm.exe
$ ./runtests.py language/e4x/
Tamarin tests started: 2009-06-09 10:04:37.938739
Executing 5 tests against vm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/R
elease/ceremoteshell.exe
Executing tests at 2009-06-09 10:04:38.097739
avm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/Release/ceremoteshell.exe

iterations: 1


test                                                   avm  metric

language/e4x/addingToXMLList.as                        891    time
language/e4x/appendChildAndString.as                  5704    time
language/e4x/concatenatingStringsFromE4X.as            711    time
language/e4x/simpleStringConcatenation.as               57    time
language/e4x/usingAppendChildAndE4X.as                5666    time

dschaffe@DSCHAFFER-XP0 /c/hg/tamarin-redux/test/performance
$ ./runtests.py --vmargs=-Dinterp language/e4x/
Tamarin tests started: 2009-06-09 10:06:08.013982
Executing 5 tests against vm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/R
elease/ceremoteshell.exe
Executing tests at 2009-06-09 10:06:08.193982
avm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/Release/ceremoteshell.exe
-Dinterp
iterations: 1


test                                                   avm  metric

language/e4x/addingToXMLList.as                       1036    time
language/e4x/appendChildAndString.as                  5977    time
language/e4x/concatenatingStringsFromE4X.as            965    time
language/e4x/simpleStringConcatenation.as               62    time
language/e4x/usingAppendChildAndE4X.as                5880    time

dschaffe@DSCHAFFER-XP0 /c/hg/tamarin-redux/test/performance
$ ./runtests.py --vmargs=-Ojit language/e4x/
Tamarin tests started: 2009-06-09 10:07:10.059894
Executing 5 tests against vm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/R
elease/ceremoteshell.exe
Executing tests at 2009-06-09 10:07:10.225894
avm: c:/hg/tamarin-redux/utils/wmremote/ceremoteshell/Release/ceremoteshell.exe
-Ojit
iterations: 1


test                                                   avm  metric

language/e4x/addingToXMLList.as                        916    time
language/e4x/appendChildAndString.as                  5737    time
language/e4x/concatenatingStringsFromE4X.as            647    time
language/e4x/simpleStringConcatenation.as               79    time
language/e4x/usingAppendChildAndE4X.as                5676    time
Attachment #382286 - Flags: review?(dschaffe) → review+
Assignee: nobody → daumling
Flags: flashplayer-qrb? → flashplayer-qrb+
Target Milestone: --- → flash10.x
e4x test cases pushed 2117:7da91870a749

language/e4x/addingToXMLList.as
language/e4x/appendChildAndString.as
language/e4x/concatenatingStringsFromE4X.as
language/e4x/simpleStringConcatenation.as
language/e4x/usingAppendChildAndE4X.as
I have looked at the code. The center of the code is XMLListNode::setUintProperty(). This is a very complex beast, closely following E4X, and it should IMHO not be touched. I tried a few options:

1) Abstract above method to make it work both with an XMLList and an XMLObject. Very difficult - I ended up with oodles of parameters.

2) Copy/paste the code to XMLObject::setUintPropertyHelper() and remove XMLList specific code. Would work, but would be very difficult to maintain.

The implementation as it is now is correct - the problem is that you need to construct a complete XMLList filled with XMLObjects for every appendChild() operation.

Caching that XMLList with the XMLObject, is ugly, wastes space, and requires a lot of code that checks whether to throw away that list.

I'd like to suggest that I rewrite the XMLList to using a modified AtomArray just like the children array of an E4XElementNode. This array contains the original E4X nodes, and access to the elements wraps nodes on demand -> lazy XMLObject creation. The creation of that array would be much faster. E4X operations on the list should be about the same - most operations seem to work with the E4XNodes inside the XMLObjects anyway.

This would also require the split of some of the XMLObject methods like hasSimpleContents(); the worker code should work with an E4XNode and be accessible to XMLListObject.

Maybe it is also possible to cache a single XMLList with the XMLObject, at least for accesses that would store a targetProperty into the list. That targetProperty reference could be used to determine whether to discard the list on subsequent property accesses.

I believe that the code as it is now could be preserved best while speeding up XMLList creations considerably.

Comments? Suggestions?
More thoughts: Often, there is already an XMLObject that you want to store in a list. Should not be a problem - E4XNodes are stored in an AtomArray via a call to AvmCore::genericObjectToAtom(), which tags an E4XNode with a kDoubleType tag. This makes it easy to distinguish between an E4XNode (which needs to be wrapped on access), and an already wrapped XMLObject.

Again: Comments? Suggestions?
Maybe Jeff has suggestions.
I have changed the implementation as I suggested. All tests pass, and the appendChild performance tests are 10 times faster - actually, all XMLList operations should be very much faster now. After testing the patch with the try buildbot on Monday, I will submit the patch.

Please advise about the reviewer - Werner Sharp?
I'd love to have Werner look it over if he has the time this week. I'm somewhat familiar with the code as well so I can review as well. Please upload the patch.

Note: have you tested it in Flash/AIR or just in avmshell? (We'll definitely have to let some Flex apps beat on it before landing.)
The patch attempts to not change any algorithm; this is a very careful replacement of XMLList elements that used to be XMLObjects with E4XNodes, and XMLObjects if supplied or asked for (in that case, an E4XNode is replaced with its wrapper XMLObject).

1) Move code that works with E4XNodes only from XMLObject to E4XNode. This makes it possible to call this code from within XMLListObject without the need of an XMLObject.

2) Add XMLListObject::_getNodeAt() that retrieves E4XNodes only.

3) Change XMLListObject::_getAt() to create and store an XMLObject if the object at the index is an E4XNode (lazy creation).

4) Change XMLListObject::_append(E4XNode*) to _appendNode() for clarity and symmetry.

5) Change the use of XMLListObject::_getAt() to _getNodeAt() wheever possible.

6) Move code that updates targetObject and targetProperty from _appendNode() to a separate method fixTargetObject(). This method is called before any read access to targetObject and/or targetProperty, and it updates the m_targetObject and m_targetProperty with the last E4XNode added. For the life of me, I could not find out why this code exists, but I'll leave it in to not break any existing code. The only difference is that the code is only executed on demand and not on any E4XNode that is appended to the list.

Speed improvements (I had to increase the loop count to 3000 for all tests but one, and 5000 for simpleStringConcatenation.as):

addingToXMLList.as              190.8  153.2    33.6%
appendChildAndString.as        5799.2  512.6  1056.6%
concatenatingStringsFromE4X.as   62.6   65.6    31.9%
simpleStringConcatenation.as     31.3   25.7   106.7%
usingAppendChildAndE4X.as      5841.6  522.0  1019.6%

As requested, asking Werner for r and Steven for sr.
Attachment #391059 - Flags: superreview?(stejohns)
Attachment #391059 - Flags: review?(wsharp)
Attachment #391059 - Flags: review?(wsharp) → review+
Comment on attachment 391059 [details] [diff] [review]
XMLList contains E4XNodes and/or XMLobjects

I don't see any obvious red flags but this is deeper than my knowledge of this code, so I'll defer to Werner here. 

Before landing this, we definitely need to test it in Flex apps.

You probably also need to coordinate these changes with the changes in https://bugzilla.mozilla.org/show_bug.cgi?id=502589 to remove recursion, as it's touching the XML code as well.
Attachment #391059 - Flags: superreview?(stejohns) → superreview+
Happy to see, that this could arouse some interest.

Michael Daumling, the results of your performance test look awesome, 10x better performance ... wow.

Anyway, for me as the AS3 programmer, not having any insight in the mechanics behind XML with E4X in the VM, it looks still strange that using methods that were made for XML generation is slower (by factor 20), than a simple string concatenation + parsing. (i first suspected XML(myString) to be lazy, but including a simple usage of the xml in the tests disproved this)
Its as if using typed variables in AS3 were to make code slower, not faster.

One more aspect of the whole thing might be to look at runtime as a function of the node-count. Running my tests with multiple nodeCounts (1000, 2000, 4000), i get the following results:

usingAppendChildAndE4X(1000 nodes): 535, time per node: 0.535
usingAppendChildAndE4X(2000 nodes): 2161, time per node: 1.0805
usingAppendChildAndE4X(4000 nodes): 8604, time per node: 2.151
addingToXMLList(1000 nodes): 67, time per node: 0.067
addingToXMLList(2000 nodes): 161, time per node: 0.0805
addingToXMLList(4000 nodes): 616, time per node: 0.154
appendChildAndString(1000 nodes): 524, time per node: 0.524
appendChildAndString(2000 nodes): 2092, time per node: 1.046
appendChildAndString(4000 nodes): 8445, time per node: 2.11125
concatenatingStringsFromE4X(1000 nodes): 17, time per node: 0.017
concatenatingStringsFromE4X(2000 nodes): 30, time per node: 0.015
concatenatingStringsFromE4X(4000 nodes): 85, time per node: 0.02125
simpleStringConcatenation(1000 nodes): 9, time per node: 0.009
simpleStringConcatenation(2000 nodes): 10, time per node: 0.005
simpleStringConcatenation(4000 nodes): 19, time per node: 0.00475

(of course these tests are very imprecise, but one can see the crux)
Scary is here that the time per node in the current implementation (WIN 10,0,22,87) is not constant, but increases with the node-count. This will lead to quadratic time consumption with n, making generation of huge XMLs impossible.

Could you run such a test against the new optimized implementation?
Perhaps one should verify that the test-results are not an effect of garbage collection or other mechanisms in the VM.
This has passed the torture tests I've put to it in Flash/AIR... I think we can land it in Redux.
Patch #2 re-merged and pushed as changeset 2404:bc6ad0ad5d06.

I believe that the test files (patch #1) are already added. Is that correct?

Setting the bug to RESOLVED/FIXED.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
janosch, the E4X standard is a huge mess. Much of the functionality requires the use of an XMLList as an intermediate carrier, which is the major contributor of the performance differences. All that can be done without a huge rewrite is to optimize XMLList.
Patch #1 was pushed as 2117:7da91870a749
Status: RESOLVED → VERIFIED
Tests pushed.
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: