Closed Bug 360964 Opened 18 years ago Closed 18 years ago

Native array push is really slow

Categories

(Rhino Graveyard :: Core, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: james, Unassigned)

Details

Attachments

(1 file, 4 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20060601 Firefox/2.0 (Ubuntu-edgy) Build Identifier: Rhino 1.6 R4 The code below shows a hundred fold speed difference between the push function and setting array elements directly. Furthermore, when running push some calls do execute quickly and some don't. This looks like some kind of simple optimisation problem in the push function, I've had a look but can't spot quite what it is.... Reproducible: Always Steps to Reproduce: var iMax = 100, jMax = 1000; var sys = java.lang.System; var t0 = sys.currentTimeMillis (); var x = []; for (var i = 0; i < iMax; i++) { for (var j = 0; j < jMax; j++) { x.push ({}); } sys.out.print ('.'); } var t1 = java.lang.System.currentTimeMillis (); java.lang.System.out.println (t1 - t0); var x = [], n = 0; for (var i = 0; i < iMax; i++) { for (var j = 0; j < jMax; j++) { x [n++] = {}; } sys.out.print ('.'); } var t2 = java.lang.System.currentTimeMillis (); java.lang.System.out.println (t2 - t1); Actual Results: The first line of dots are slow and appear erraticaly. The second line appear very quickly. ....................................................................................................36170.0 ....................................................................................................352.0 Expected Results: Both lines should appear quickly and show short (< 1 second) times.
The problem here seems to be the lookup of the Array.push function property. The slot allocation mechanism in ScriptableObject scales really badly for dense arrays, because it completely fills the slot array without leaving any gaps. Now if a function property or other non-index property is looked up, ScriptableObject.getSlotPosition() searches from the property's hashcode value until it finds either the slot in question or a null slot. If the first n thousand slots are occupied by array elements, this causes a lot of useless looping. I created an experimental patch that warps the hash/slot index to leave a free slot every 500 elements. With this, push() is roughly half as fast as direct element access for very large arrays. Maybe the right solution would be to always store numeric Array properties in the dense array in NativeArray (how common are sparse arrays, really?), or separate the array elements from other properties by some other means.
Is there a good reason not to just use a Java HashMap for this?
I don't know for sure, but I guess the reasons were performance considerations and keeping order for numeric array elements. I think the easiest way to fix this would be to use chaining instead of linear probing (see <http://en.wikipedia.org/wiki/Hash_table#Open_addressing_versus_chaining>). I'll see if I can come up with a patch.
"I don't know for sure, but I guess the reasons were performance considerations and keeping order for numeric array elements." FWIW, this doesn't prevent numeric array elements getting out of order... eg:var a = {}; a [0] = 0; a [2] = 2; a.b = 'b'; a.n = 'n'; a [1] = 1; for (var i in a) java.lang.System.out.println (''+i'); 0 n 2 1 b
Here's a patch that implements the ScriptableObject slot table using chaining rather than linear probing for hash collisions. I've been using it for a few days locally and also run the test suite on it without any noticable problems. From my testing, performance is quite a bit better for large objects/arrays, and pretty much unchanged (maybe a hint slower, but almost unmeasurable) for small ones.
Contains a few fixes and improvements over the previous patch: * Actually check integer index in slot lookup (!!!) This caused numeric slots to get lost in sparse arrays. * Make Slot.next transient. This should preserve serialization compatibility, and should be ok since the slot table is newly built upon deserialization (haven't actually tested). * Unify the two local getSlot() methods, no real need for the static one. * Only use REMOVED object for lastAccess cache, we never actually use it in the slot table. * Various fixes in replacing setterSlot: Check for slot identity, set wasDeleted on old slot, break the loop when done.
Attachment #246156 - Attachment is obsolete: true
Attachment #246454 - Attachment is obsolete: true
Hannes, as you seem to be familiar with ScriptableObject's hashing code, I'd like to ask whether you see any reason why couldn't we just simply use a HashMap in the ScriptableObject implementation? In fact, we could go one step further and introduce a sort of a "MapFactory" somewhere (probably pluggable into the Context object), so people could plug in whatever map implementations they want, i.e. - the Trove <http://trove4j.sourceforge.net/> THashMap, to restore open-addressing implementation if they're unsatisfied with HashMap's strategy - TreeMap if they'd want ordering guarantees in "for" - LinkedHashMap if they'd want weaker ordering guarantees
(In reply to comment #8) > as you seem to be familiar with ScriptableObject's hashing code, I'd like to > ask whether you see any reason why couldn't we just simply use a HashMap in the > ScriptableObject implementation? That would normally seem like the way to go. But from a gut level, I'd hesitate to make such a big change at a spot that is so central to Rhino. For example, this might affect those people using the small-footprint version of Rhino in an environment where memory or cpu performance are scarse. So the goal of my patch was to fix the problem but change as little as possible otherwise. > In fact, we could go one step further and introduce a sort of a "MapFactory" > somewhere (probably pluggable into the Context object), so people could plug in > whatever map implementations they want That would definitely be interesting to experiment with, and to compare it with the current implementation. I don't think it would be useful for me to have as a permanent feature, but maybe it would be for others.
(In reply to comment #9) > (In reply to comment #8) > > as you seem to be familiar with ScriptableObject's hashing code, I'd like to > > ask whether you see any reason why couldn't we just simply use a HashMap in the > > ScriptableObject implementation? > > That would normally seem like the way to go. But from a gut level, I'd hesitate > to make such a big change at a spot that is so central to Rhino. For example, > this might affect those people using the small-footprint version of Rhino in an > environment where memory or cpu performance are scarse. So the goal of my patch > was to fix the problem but change as little as possible otherwise. > > In fact, we could go one step further and introduce a sort of a "MapFactory" > > That would definitely be interesting to experiment with, and to compare it with > the current implementation. I don't think it would be useful for me to have as > a permanent feature, but maybe it would be for others. To satisfy both of these efficiently you could create a factory for the default Scriptable implementation. Then you can have a hand-crafted version if you want, or one which uses java libraries and things like map factories internally. I'd definitely like to see this kind of flexibility in Rhino... James
The patch for bug 224334 that has been committed breaks my patch. Since the problem with linear probing in ScriptableObject.java persists, I'll work on a new, updated patch when I have time.
This is a rewrite of my previous patch adapted to recent changes in ScriptableObject. Running the test suite over it with 5 second timeout, I got the same results as with CVS HEAD, with performance slightly improved.
Attachment #248648 - Attachment is obsolete: true
David: adding you to CC list, maybe you are interested in looking into this.
I get a compile error building with the 2007-02-06 07:13 PDT patch: [javac] Compiling 112 source files to /home/norris/rhino/mozilla/js/rhino/build/classes [javac] /home/norris/rhino/mozilla/js/rhino/src/org/mozilla/javascript/ScriptableObject.java:2293: cannot find symbol [javac] symbol : variable tableSize [javac] location: class org.mozilla.javascript.ScriptableObject [javac] int insertPos = getSlotIndex(tableSize, slot.indexOrHash); [javac] ^ [javac] 1 error
The definition of int tableSize was removed in revision 1.120 of ScriptableObject. This new patch puts it in again. From a quick test I did, everything looks ok.
Attachment #254164 - Attachment is obsolete: true
Patch committed on both 1.6R6 branch and trunk. The idea of switching to HashMap is good, but that's more of a task for next release rather than this release.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Created Bug 383592 for the future enhancement of using HashMap in ScriptableObject.
Target Milestone: --- → 1.6R6
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: