Closed Bug 417131 Opened 16 years ago Closed 16 years ago

JS Enumeration Allocation Consternation

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: sayrer, Assigned: igor)

References

Details

(Keywords: perf, testcase, verified1.9.1)

Attachments

(3 files, 25 obsolete files)

keep a stack of leftover iterator structs to be reused as we go; free in gc 16 years ago Brian Crowder 6.53 KB, patch		Details \| Diff \| Splinter Review
less wasteful 16 years ago Brian Crowder 10.00 KB, patch	brendan : review-	Details \| Diff \| Splinter Review
WIP, good for testing 16 years ago Brendan Eich [:brendan] 48.04 KB, patch		Details \| Diff \| Splinter Review
change JSON iterator call 16 years ago Robert Sayre 953 bytes, patch		Details \| Diff \| Splinter Review
lightly tested, handles cross-over from fastest to faster optimization 16 years ago Brendan Eich [:brendan] 82.26 KB, patch		Details \| Diff \| Splinter Review
passes js tests in the shell 16 years ago Brendan Eich [:brendan] 90.96 KB, patch		Details \| Diff \| Splinter Review
brute force alterna-patch 16 years ago Brendan Eich [:brendan] 27.09 KB, patch		Details \| Diff \| Splinter Review
hg qdiff against patch for bug 443039 16 years ago Brendan Eich [:brendan] 14.07 KB, patch		Details \| Diff \| Splinter Review
with ifdef'ed instrumentation 16 years ago Brendan Eich [:brendan] 14.77 KB, patch		Details \| Diff \| Splinter Review
just cache and fusing state and ids 16 years ago Igor Bukanov 13.39 KB, patch		Details \| Diff \| Splinter Review
strees test for the cache 16 years ago Igor Bukanov 1.38 KB, text/plain		Details
exclusive cache and the global list 16 years ago Igor Bukanov 16.52 KB, patch		Details \| Diff \| Splinter Review
optimizing empty enumerator 16 years ago Igor Bukanov 18.52 KB, patch		Details \| Diff \| Splinter Review
optimizing empty enumerator v2 16 years ago Igor Bukanov 19.52 KB, patch		Details \| Diff \| Splinter Review
optimizing empty enumerator v3 16 years ago Igor Bukanov 20.33 KB, patch		Details \| Diff \| Splinter Review
optimizing empty enumerator v4 16 years ago Igor Bukanov 20.15 KB, patch		Details \| Diff \| Splinter Review
sunspider results, thread-safe jsshell, GCC 4.3, Fedora 9, Petium-M 1.1GHz 16 years ago Igor Bukanov 3.58 KB, text/plain		Details
sunspider results, thread-safe jsshell, 64-bit Linux (sm-valgrind01) 16 years ago Igor Bukanov 3.55 KB, text/plain		Details
v5 16 years ago Igor Bukanov 17.20 KB, patch		Details \| Diff \| Splinter Review
native enumerator as GC thing 16 years ago Igor Bukanov 30.91 KB, patch		Details \| Diff \| Splinter Review
native enumerator as GC thing v1b 16 years ago Igor Bukanov 30.76 KB, patch		Details \| Diff \| Splinter Review
native enumerator as GC thing v1c 16 years ago Igor Bukanov 20.22 KB, patch		Details \| Diff \| Splinter Review
native enumerator as GC thing v2 16 years ago Igor Bukanov 19.45 KB, patch		Details \| Diff \| Splinter Review
native enumerator as GC thing v3 16 years ago Igor Bukanov 19.59 KB, patch		Details \| Diff \| Splinter Review
v2-v3 delta 16 years ago Igor Bukanov 6.77 KB, patch	brendan : review+	Details \| Diff \| Splinter Review
v4 16 years ago Igor Bukanov 19.65 KB, patch	igor : review+	Details \| Diff \| Splinter Review
v3-v4 delta 16 years ago Igor Bukanov 6.75 KB, patch		Details \| Diff \| Splinter Review
v5 16 years ago Igor Bukanov 19.66 KB, patch	igor : review+	Details \| Diff \| Splinter Review

Robert Sayre

Reporter

Description

•

16 years ago

string-fasta debian shootout test shows that enumeration is slow because we malloc to make enumerators used in for-in loops. It would be nice to get these on the stack for native stuff.

Brian Crowder

Updated

•

16 years ago

Assignee: general → crowder

Robert Sayre

Reporter

Comment 1

•

16 years ago

Shark:
http://people.mozilla.com/~sayrer/2008/02/string-fasta.zip

Brian Crowder

Comment 2

•

16 years ago

Attached patch keep a stack of leftover iterator structs to be reused as we go; free in gc (obsolete) — Details — Splinter Review

Just finishing up testing on this now.

Brian Crowder

Comment 3

•

16 years ago

Comment on attachment 303302 [details] [diff] [review]
keep a stack of leftover iterator structs to be reused as we go; free in gc

Victory in the JS test-suite, I think this is a decent fix with a light touch.

Attachment #303302 - Flags: review?(brendan)

Ryan VanderMeulen [:RyanVM]

Updated

•

16 years ago

Version: unspecified → Trunk

Brian Crowder

Comment 4

•

16 years ago

This seems to yields a small, but measurable perf-bump across the board.

Flags: blocking1.9?

Priority: -- → P2

Target Milestone: --- → mozilla1.9beta4

Brian Crowder

Comment 5

•

16 years ago

Comment on attachment 303302 [details] [diff] [review]
keep a stack of leftover iterator structs to be reused as we go; free in gc

Duhrr, this doesn't need the extra next pointer, I'll just overload the original one.  Patch next.

Attachment #303302 - Attachment is obsolete: true

Attachment #303302 - Flags: review?(brendan)

Brian Crowder

Comment 6

•

16 years ago

Attached patch less wasteful (obsolete) — Details — Splinter Review

Attachment #303595 - Flags: review?(brendan)

Brian Crowder

Comment 7

•

16 years ago

Actually, ditching my blocking request here.  Might be worth a small speed-up, but I don't think worth blocking on.

Flags: blocking1.9?

Brendan Eich [:brendan]

Comment 8

•

16 years ago

Comment on attachment 303595 [details] [diff] [review]
less wasteful

The jsarena.[ch] stuff is unrelated to this bug. 

This doesn't avoid allocating the JSIdArray, however -- and it can hoard memory bounded by nesting depth of for-in constructs. If we want to use only interpreter stack space, we should look for a way to bypass the id-array.

/be

Attachment #303595 - Flags: review?(brendan) → review-

Brendan Eich [:brendan]

Comment 9

•

16 years ago

Attached patch WIP, good for testing (obsolete) — Details — Splinter Review

More to do...

/be

Assignee: crowder → brendan

Attachment #303595 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Robert Sayre

Reporter

Comment 10

•

16 years ago

Comment on attachment 303836 [details] [diff] [review]
WIP, good for testing

This is failing JSON tests. I think I'm using it right.

*** CHECK FAILED: {"child has two members":{"this":"one"}} == {"child has two members":{"this":"one","2":"and this one"}}
JS frame :: xpcshell-simple/head.js :: do_throw :: line 99
JS frame :: xpcshell-simple/head.js :: do_check_eq :: line 114
JS frame :: xpcshell-simple/json_test/unit/test_encode.js :: testStringEncode :: line 118
JS frame :: xpcshell-simple/json_test/unit/test_encode.js :: run_test :: line 232
JS frame :: xpcshell-simple/tail.js :: _execute_test :: line 43
JS frame :: xpcshell-simple/execute_test.js :: <TOP_LEVEL> :: line 38

Robert Sayre

Reporter

Comment 11

•

16 years ago

Attached patch change JSON iterator call (obsolete) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 12

•

16 years ago

sayrer: thanks, I will supply a full patch for the next WIP, which should work -- may need your help testing, but won't break overtly.

/be

Brendan Eich [:brendan]

Updated

•

16 years ago

OS: Mac OS X → All

Hardware: PC → All

Target Milestone: mozilla1.9beta4 → mozilla1.9

Brendan Eich [:brendan]

Updated

•

16 years ago

Priority: P2 → P1

Target Milestone: mozilla1.9 → mozilla1.9beta5

Brendan Eich [:brendan]

Comment 13

•

16 years ago

Attached patch lightly tested, handles cross-over from fastest to faster optimization (obsolete) — Details — Splinter Review

This includes patched nsJSON.{cpp,h}, so patch -p0 < fasta.patch from the mozilla directory. Testing help welcome, more tests coming.

/be

Attachment #303836 - Attachment is obsolete: true

Attachment #304269 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 14

•

16 years ago

Attached patch passes js tests in the shell — Details — Splinter Review

Some important fixes here. Comments and tests coming soon.

/be

Attachment #307865 - Attachment is obsolete: true

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 15

•

16 years ago

(In reply to comment #14)
> Some important fixes here. Comments and tests coming soon.

Suspense level: [XXXXXXXXXXXXX  ]

Please save us!

Brendan Eich [:brendan]

Comment 16

•

16 years ago

Sorry, optimizing is easy but recovering correctness in the case of deletion of not-yet-enumerated properties after a loop starts, and addition of same, is not. This is for 3.next.

/be

Target Milestone: mozilla1.9beta5 → ---

Damon Sicore (:damons)

Comment 17

•

16 years ago

Getting this on the 1.9.1 triage queue.

Flags: blocking1.9.1?

Robert Sayre

Reporter

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Brendan Eich [:brendan]

Comment 18

•

16 years ago

Attached patch brute force alterna-patch (obsolete) — Details — Splinter Review

** TOTAL **:           1.039x as fast    2904.9ms +/- 0.3%   2795.8ms +/- 0.1%     significant

Shape-based JSNativeEnumerator (formerly JSNativeIteratorState) caching in the runtime. Comments welcome.

The first patch (attachment 307967 [details] [diff] [review]) tries to optimize the short-ish property map case by avoiding allocating anything but another stack slot. The downside is that it has to cope with the reverse-linked property tree path, and to compensate for adds and deletes (js_EnumeratingBarrier -- not yet written and debugged).

This patch has the virtue of being simple, and saving non-trivial micro-benchmark runtime. It could be considered contrived, though. How often does real-world code re-enumerate the same shape?

/be

Attachment #326858 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 19

•

16 years ago

(In reply to comment #18)

> This patch has the virtue of being simple, and saving non-trivial
> micro-benchmark runtime. It could be considered contrived, though. How often
> does real-world code re-enumerate the same shape?

I'll instrument the hit rate and report it from the GC, so we can see how it trends over real browsing sessions using gmail, etc.

/be

Brendan Eich [:brendan]

Updated

•

16 years ago

Depends on: 443039

Brendan Eich [:brendan]

Comment 20

•

16 years ago

Attached patch hg qdiff against patch for bug 443039 (obsolete) — Details — Splinter Review

Still need to instrument hit rate.

/be

Attachment #327692 - Flags: review?(igor)

Brendan Eich [:brendan]

Updated

•

16 years ago

Attachment #326858 - Attachment is obsolete: true

Attachment #326858 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 21

•

16 years ago

After loading gmail at a message, going back to Inbox, with six other tabs all loading bugzilla pages:

nativeEnumCache hit rate 79.8891
nativeEnumCache hit rate 79.9094
nativeEnumCache hit rate 80.0898
nativeEnumCache hit rate 80.1295
--WEBSHELL 0x25eb42e0 == 18
--WEBSHELL 0x25ea44b0 == 17

After SunSpider in-browser run:

nativeEnumCache hit rate 81.6874

Patch with instrumentation next.

/be

Brendan Eich [:brendan]

Comment 22

•

16 years ago

Attached patch with ifdef'ed instrumentation (obsolete) — Details — Splinter Review

hg qdiff again -- this is pushed on top of the JSOP_ITER patch. I can attach the `hg diff -r qparent -U 8 -p` result but it will show both patches of course.

I let the hit rate accumulate -- resetting probe and miss counters on each gc seems too forgetful, often gives 100% hit rate (nan% on shutdown ;-). Suggestions welcome, but this seems winning. Latest final rate on shutdown after SunSpider and gmail:

nativeEnumCache hit rate 98.4947%

/be

Attachment #327692 - Attachment is obsolete: true

Attachment #327711 - Flags: review?(igor)

Attachment #327692 - Flags: review?(igor)

Igor Bukanov

Assignee

Comment 23

•

16 years ago

The pat(In reply to comment #22)
> Created an attachment (id=327711) [details]
> with ifdef'ed instrumentation

The patch in JSENUMERATE_INIT removes JSNativeEnumerator from the cache and then cache it back in JSENUMERATE_DESTROY which requires to take the lock again. AFAICS that can be avoided if JSENUMERATE_INIT would keep the state in the cache and not put it to doubly-linked list. The state is pushed to the doubly-linked list only if it is evicted from the cache while the enumerator is still running.

Then JSENUMERATE_DESTROY can check that the state is not on the list and simply set next_index to some sentinel value indicating that the state struct can be reused. A possible race with another thread that evicts the state from the cache and puts it to the list is harmless. It would just mean that the deallocation of the struct happens during the GC when it finds a struct with a sentinel value of next_index on the list.

Igor Bukanov

Assignee

Comment 24

•

16 years ago

Another thing that can be improved is to allocate the id array together with the state object improving cache locality.

Brendan Eich [:brendan]

Comment 25

•

16 years ago

(In reply to comment #23)
> The patch in JSENUMERATE_INIT removes JSNativeEnumerator from the cache and
> then cache it back in JSENUMERATE_DESTROY which requires to take the lock
> again.

The JSENUMERATE_DESTROY case already takes the look to remove the state from the linked list, I didn't add extra lock-taking. The critical path is only slightly longer.

> AFAICS that can be avoided if JSENUMERATE_INIT would keep the state in
> the cache and not put it to doubly-linked list. The state is pushed to the
> doubly-linked list only if it is evicted from the cache while the enumerator is
> still running.

That is an interesting idea. It would require marking from the cache, though, instead of purging the cache as we do now. How then would the cache entries be collected under memory pressure?

> Then JSENUMERATE_DESTROY can check that the state is not on the list and simply
> set next_index to some sentinel value indicating that the state struct can be
> reused. A possible race with another thread that evicts the state from the
> cache and puts it to the list is harmless. It would just mean that the
> deallocation of the struct happens during the GC when it finds a struct with a
> sentinel value of next_index on the list.

There are several other things to try, as you note:

* Fuse allocations more.

* Avoid an object (use a PRIVATE jsval as the iterator, which untags to the state pointer by itself -- this requires another stack slot for the current object being enumerated but I had a patch that supplied jsval *vp instead of JSObject *obj to js_CallIteratorNext, etc. -- wasn't hard);

* Try sharing cache hits as you propose here.

But I'm hoping we can do these in follow up bugs. I'm happy to give this bug and patch away too, if you are interested. Or r+ it and I'll file the followup bugs, which are up for grabs. Thoughts?

/be

Igor Bukanov

Assignee

Comment 26

•

16 years ago

(In reply to comment #25)
> > AFAICS that can be avoided if JSENUMERATE_INIT would keep the state in
> > the cache and not put it to doubly-linked list. The state is pushed to the
> > doubly-linked list only if it is evicted from the cache while the enumerator is
> > still running.
> 
> That is an interesting idea. It would require marking from the cache, though,
> instead of purging the cache as we do now. How then would the cache entries be
> collected under memory pressure?

The entry can be collected as long as the enumerator is done as indicated by a sentinel value in next_index. Otherwise the entry must be traced. In general the idea is to view the cache as an extension to the current doubly-linked list. First the enumerator goes to the cache and then to the list after eviction from the cache.

> 
> But I'm hoping we can do these in follow up bugs.

Well, fusing the state and id array would make the changes from the current patch about JS_DestroyIdArray -> js_DestroyIdArray renames unnecessary (and may even require to undo them to minimize code bloat). So it is better to do it here within this bug. 

On the other hand that lock-free JSENUMERATE_DESTROY can be built upon the patch in separated bugs.

Brendan Eich [:brendan]

Comment 27

•

16 years ago

To hide the empty id array singleton I would be willing to swallow the js_ vs. JS_DestroyIdArray layering -- we do that for similar reasons elsewhere. Hope this is not controversial.

I'll invite whoever is up to it to steal this bug, but do a fusing pass when I get time (may be next week) and keep this fresh in my mq.

/be

Igor Bukanov

Assignee

Comment 28

•

16 years ago

Attached patch just cache and fusing state and ids (obsolete) — Details — Splinter Review

The patch is a result of late night of hacking to check if fusing state and ids would minimize the patch and code. AFAICS this is indeed the case.

I have not done any testing with the patch.

Brendan Eich [:brendan]

Comment 29

•

16 years ago

The empty_id_array singleton was a win in past testing, but I forget whether with SunSpider or my usual gmail+bugzilla session. Worth testing.

Steal this bug, please! Or we can ping-pong patches, but want to converge.

/be

Igor Bukanov

Assignee

Comment 30

•

16 years ago

(In reply to comment #29)
> 
> Steal this bug, please! Or we can ping-pong patches, but want to converge.

OK, I will test the fused patch tomorrow and ask for a review. All those lock-less optimizations can go to a separated bug.

Assignee: brendan → igor

Status: ASSIGNED → NEW

Igor Bukanov

Assignee

Comment 31

•

16 years ago

Attached file strees test for the cache — Details

The test should provide a good coverage for the cache code including accessing the cache from different threads.

Igor Bukanov

Assignee

Comment 32

•

16 years ago

Attached patch exclusive cache and the global list (obsolete) — Details — Splinter Review

The new version puts the enumerator states to the global list only when it is evicted from the cache and implements that idea of lock-free JSENUMERATION_DESTROY. For the states in the cache it also takes the GC lock only once.

The patch passes shell and mochi tests.

Igor Bukanov

Assignee

Updated

•

16 years ago

Attachment #327997 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 33

•

16 years ago

(In reply to comment #29)
> The empty_id_array singleton was a win in past testing, but I forget whether
> with SunSpider or my usual gmail+bugzilla session. Worth testing.

I should have remembered: empty_id_array is an obvious win for all for-in loops that use enumeration, where the standard prototype(s) -- Object.prototype at least, Array.prototype before it for arrays -- have not been extended with enumerable ad-hoc properties.

/be

Igor Bukanov

Assignee

Comment 34

•

16 years ago

(In reply to comment #33)
> I should have remembered: empty_id_array is an obvious win for all for-in loops
> that use enumeration, where the standard prototype(s) -- Object.prototype at
> least, Array.prototype before it for arrays -- have not been extended with
> enumerable ad-hoc properties.

Year, but with fused state/id array this is no longer an issue: no extra memory will be allocated for id-less enumerations.

Brendan Eich [:brendan]

Comment 35

•

16 years ago

Can you eliminate the state allocation altogether for objects with zero enumerable properties? More stack encoding of native-empty state pointer, with another slot for current obj, and friend API changes to pass jsval *vp -- this means nsJSON.cpp changes too, I will look for the patch I hacked a few months ago that did this -- let me know.

/be

Igor Bukanov

Assignee

Comment 36

•

16 years ago

(In reply to comment #35)
> Can you eliminate the state allocation altogether for objects with zero
> enumerable properties? 

That would not work since Object.prototype and Array.prototype would have different shape values for the scope. So a different empty state would be required for each of them. Assuming good hashing, this is exactly what happens with the cache. 

A shared state would be possible only if state->shape is moved outside the state into the hashtable itself. But this would increase the size of the cache by the factor of 2 and probably hurt CPU caching.

Igor Bukanov

Assignee

Comment 37

•

16 years ago

(In reply to comment #36)
> (In reply to comment #35)
> > Can you eliminate the state allocation altogether for objects with zero
> > enumerable properties? 
> 
> That would not work since Object.prototype and Array.prototype would have
> different shape values for the scope. So a different empty state would be
> required for each of them. Assuming good hashing, this is exactly what happens
> with the cache. 

Still the empty state can be made better: one do not need to to put it to the doubly linked list. In fact prevp/next/next_index can be eliminated for it completely. So the cost for each empty scope would be a hash pointer to 2-word struct.

Brendan Eich [:brendan]

Comment 38

•

16 years ago

(In reply to comment #36)
> (In reply to comment #35)
> > Can you eliminate the state allocation altogether for objects with zero
> > enumerable properties? 
> 
> That would not work since Object.prototype and Array.prototype would have
> different shape values for the scope.

I was proposing a more radical change: push a pseudo-boolean meaning "empty state" instead of any state pointer. This builds on the idea of avoiding making an object to wrap the state for the native enumerator case.

/be

Brendan Eich [:brendan]

Comment 39

•

16 years ago

No need for pseudo-boolean novelty, just use JSVAL_VOID:

vp[0] - current object along prototype chain
vp[1] - JSVAL_VOID if zero enumerable properties,
        else PRIVATE_TO_JSVAL(state),
        else OBJECT_TO_JSVAL(iterobj).

This works with something like the current js_ValueToIterator signature. With more changes, we could simply stop enumerating on zero enumerable properties if at the last object in the proto chain, or skip that object if not at last.

/be

Brendan Eich [:brendan]

Comment 40

•

16 years ago

Ok, so the patch "passes js tests in the shell" (attachment 307967 [details] [diff] [review]) does have the js_CallIteratorNext(cx, vp, rval) signature change I've been talking up.

/be

Igor Bukanov

Assignee

Comment 41

•

16 years ago

Attached patch optimizing empty enumerator (obsolete) — Details — Splinter Review

The new version returns JSVAL_NULL as a state value for empty enumerations. In addition for such enumeration the cache entry points to a heap-allocated uint32 value holding the shape. It avoids the need to allocate the full JSNativeEnumerator struct. To distinguish between normal and empty cache entries the pointer for normal entry is tagged with 1 in the lowest bit. 

I checked the patch just against the stress test.

Attachment #327997 - Attachment is obsolete: true

Attachment #327997 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 42

•

16 years ago

(In reply to comment #40)
> Ok, so the patch "passes js tests in the shell" (attachment 307967 [details] [diff] [review]) does have
> the js_CallIteratorNext(cx, vp, rval) signature change I've been talking up.
> 
> /be
> 

I filed the bug 443599 to implement this iterator object elimination as this is orthogonal to optimizing JSNativeEnumerator allocation.

Igor Bukanov

Assignee

Comment 43

•

16 years ago

Attached patch optimizing empty enumerator v2 (obsolete) — Details — Splinter Review

The new patch fixes various issues and passes the tests. But before asking for a review I want to check if caching shapes of just objects with no properties to enumerate would alone provide most of the benchmark speedups. 

The observation is that allocation of the state object and looping through its properties to collect ids may not be that expensive if the script later accesses the values of such properties as typically happens with for-in loops.

Attachment #328064 - Attachment is obsolete: true

Igor Bukanov

Assignee

Updated

•

16 years ago

Attachment #327898 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 44

•

16 years ago

Attached patch optimizing empty enumerator v3 (obsolete) — Details — Splinter Review

The new version does not push to the cache enumerators with more then 10 entries to release their state arrays ASAP. The number 10 comes from benchmarks.

Attachment #328126 - Attachment is obsolete: true

Attachment #328181 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 45

•

16 years ago

(In reply to comment #42)
> I filed the bug 443599 to implement this iterator object elimination as this is
> orthogonal to optimizing JSNativeEnumerator allocation.

Agreed, I wanted to get the idea out -- great to have that filed separately. Reviewing next.

/be

Brendan Eich [:brendan]

Comment 46

•

16 years ago

Comment on attachment 328181 [details] [diff] [review]
optimizing empty enumerator v3

What kind of SunSpider speedup do you get, and what's the string-fasta individual-test speedup?

>+ * The structure is not allocated when the number of enumeratable properties

Typo/misspelling: s/enumeratable/enumerable/

>+ * in the scope is zero. js_Enumerate represents such enumerators with
>+ * JSVAL_ZERO. To avoid repeated scanning in JSENUMERATE_INIT of scopes for
>+ * Object.prototype and similar objects with no properties to enumerate
>+ * JSRuntime.nativeEnumCache is still filled with a pointer to a heap-
>+ * allocated uint32 containing the shape of such scope. To distinguish cached
>+ * JSNativeEnumerator* entries from such uint32* pointers the former is tagged
>+ * with 1 in the lowest bit.

No need for heap-allocated word -- shapes fit in 24 bits. See jsinterp.h:

#define SHAPE_OVERFLOW_BIT      JS_BIT(32 - PCVCAP_TAGBITS)

/be

Igor Bukanov

Assignee

Comment 47

•

16 years ago

Attached patch optimizing empty enumerator v4 (obsolete) — Details — Splinter Review

The new version stores the shapes of scope without properties to enumerate directly in the cache.

Attachment #328181 - Attachment is obsolete: true

Attachment #328219 - Flags: review?(brendan)

Attachment #328181 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 48

•

16 years ago

Attached file sunspider results, thread-safe jsshell, GCC 4.3, Fedora 9, Petium-M 1.1GHz (obsolete) — Details

The results contain too much noise to draw any conclusion.

Igor Bukanov

Assignee

Comment 49

•

16 years ago

Attached file sunspider results, thread-safe jsshell, 64-bit Linux (sm-valgrind01) (obsolete) — Details

Here string-fasta shows 7.9% improvment but I guess the penalty from the increased code size offsets that improvement.

Igor Bukanov

Assignee

Updated

•

16 years ago

Blocks: 443746

Igor Bukanov

Assignee

Updated

•

16 years ago

No longer blocks: 443746

Depends on: 443746

Igor Bukanov

Assignee

Comment 50

•

16 years ago

Comment on attachment 328219 [details] [diff] [review]
optimizing empty enumerator v4

I will update the patch after considering bug 443746.

Attachment #328219 - Attachment is obsolete: true

Attachment #328219 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 51

•

16 years ago

(In reply to comment #48)
> Created an attachment (id=328220) [details]
> sunspider results, thread-safe jsshell, GCC 4.3, Fedora 9, Petium-M 1.1GHz
> 
> The results contain too much noise to draw any conclusion.

With the patch tested here, could you reproduce my results with thread-unsafe js shell from comment 18?

/be

Igor Bukanov

Assignee

Comment 52

•

16 years ago

(In reply to comment #51)
> (In reply to comment #48)
> > Created an attachment (id=328220) [details] [details]
> > sunspider results, thread-safe jsshell, GCC 4.3, Fedora 9, Petium-M 1.1GHz
> > 
> > The results contain too much noise to draw any conclusion.
> 
> With the patch tested here, could you reproduce my results with thread-unsafe
> js shell from comment 18?

That patch does not apply to the trunk. I guess this is due to changes that went into bug 443039. Also note that string-fasta is not realistic benchmark for for-in loops. It contains the following code:

      for (var c in table) {
        if (r < table[c]) {
          line[i] = c;
          break;
        }
      }

Here the nature of algorithm is such that the loop almost always exits via the break. So Object.prototype is almost never touched and any explicit optimizations for zero-length enumerators would only slow things down.

Moreover, the loop benefits from the id caching substantially as it breaks often rather early. So id array caching avoids repeated scanning of properties in JSENUMERATE_INIT that would not be enumerated during the loop.

I would claim that such for-in loops are not typical. for-in loop in string-tagcloud is more representative as it does not break. But there, due to the complex loop body, improvements from faster enumeration setup are not observable.

Igor Bukanov

Assignee

Comment 53

•

16 years ago

(In reply to comment #51)
> With the patch tested here, could you reproduce my results with thread-unsafe
> js shell from comment 18?

One more thing: on sm-valgrind01 string-fasta is 17% faster with thread-unsafe build than with thread-safe one. In fact the benchmark shows the biggest improvement from turning JS_THREADSAFE off. Given that the objects under the benchmark contains 15 and 4 properties (some of which are not enumerated due to the early break) the cost of extra locking in JSENUMERATE_INIT/DESTROY must be contributing to that substantially.

Brendan Eich [:brendan]

Comment 54

•

16 years ago

SunSpider benchmarks (some from the Great Language Shootout, others not) are to be sure far from representative of anything but themselves.

The locking cost is non-trivial, and we could use a JSThinLock instead of the GC lock. It's hard to use compare-and-swap since several fields are being updated at once, but there may be a way.

/be

Brendan Eich [:brendan]

Comment 55

•

16 years ago

(In reply to comment #54)
> The locking cost is non-trivial, and we could use a JSThinLock instead of the
> GC lock. It's hard to use compare-and-swap since several fields are being
> updated at once, but there may be a way.

I meant "hard to use compare-and-swap directly" of course, since thin locks are built on CAS.

/be

Igor Bukanov

Assignee

Comment 56

•

16 years ago

Attached patch v5 (obsolete) — Details — Splinter Review

Here is a patch update to reflect changes from the bug 443746. This version avoids taking the GC lock when checking the cache for scopes with no enumerable properties. It does not benefit string-fasta but it avoids the lock when enumerating Object.prototype and friends.

The patch passes the tests but before asking for a review I would like to try an idea of freeing JSNativeEnumerator only from the GC. This way the code can read JSNativeEnumerator.shape without taking the lock, it should be possible to update JSNativeEnumerator.cursor from -1 to length using compare-and-swap and only singly-linked list for the GC safety would be necessary.

Igor Bukanov

Assignee

Comment 57

•

16 years ago

Attached patch native enumerator as GC thing (obsolete) — Details — Splinter Review

The new patch implements that idea of deallocating JSNativeEnumerator only from the GC. It allowed to implement mostly-lock-free cache using js_CompareAndSwap to check for running enumerators that are in the cache. The lock is only taken when adding the newly created enumerator to the global list for the GC and updating the cache with the new value. When the enumerator is empty, lock is never taken.

Since only the GC frees the enumerators, it is necessary to update gcBytes counters to account for increased memory pressure. Hence the reason for js_AddAsGCBytes/js_RemoveAsGCBytes functions.

The patch also adds the missing js_CompareAndSwap implementation on 64-bit Linux using compiler's atomic built-ins. 

With the patch on sm-valgrind01 (64 bit Linux) string-fasta is 11% faster with thread-safe built and 6% faster with thread-unsafe built.

Attachment #328273 - Attachment is obsolete: true

Attachment #328371 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 58

•

16 years ago

Attached patch native enumerator as GC thing v1b (obsolete) — Details — Splinter Review

The new version of the patch fixes stalled comments.

Attachment #328220 - Attachment is obsolete: true

Attachment #328224 - Attachment is obsolete: true

Attachment #328371 - Attachment is obsolete: true

Attachment #328374 - Flags: review?(brendan)

Attachment #328371 - Flags: review?(brendan)

Brendan Eich [:brendan]

Updated

•

16 years ago

Attachment #327711 - Attachment is obsolete: true

Attachment #327711 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 59

•

16 years ago

Sorry for reviewing out of order -- possible to get an updated patch after the NativeCompareAndSwap stuff lands?

/be

Igor Bukanov

Assignee

Comment 60

•

16 years ago

Attached patch native enumerator as GC thing v1c (obsolete) — Details — Splinter Review

Here is an updated patch to sync with trunk changes.

Attachment #328374 - Attachment is obsolete: true

Attachment #328374 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 61

•

16 years ago

Attached patch native enumerator as GC thing v2 (obsolete) — Details — Splinter Review

The new patch just syncs the previous version with the trunk. The patch was ready for a review but I forgot to ask for one.

Attachment #330808 - Attachment is obsolete: true

Attachment #332987 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 62

•

16 years ago

Comment on attachment 332987 [details] [diff] [review]
native enumerator as GC thing v2

>  * Increase runtme->gcBytes by sz bytes to account for an allocation outside

"runtime" misspelled

> #error "NSPR_LOCK should be on when the platform lacks native comprae-and-swap."

"compare" misspelled

Fold JS_ATOMIC_INCREMENT calls into ENUM_CACHE_METER macro

Where is newenumerator set to true?

/be

Igor Bukanov

Assignee

Comment 63

•

16 years ago

Attached patch native enumerator as GC thing v3 (obsolete) — Details — Splinter Review

The new version of the patch fixes the nits and that missing newenumerator = true embarrassing bug.

Attachment #333873 - Flags: review?(brendan)

Igor Bukanov

Assignee

Updated

•

16 years ago

Attachment #332987 - Attachment is obsolete: true

Attachment #332987 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 64

•

16 years ago

Attached patch v2-v3 delta (obsolete) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 65

•

16 years ago

Comment on attachment 333874 [details] [diff] [review]
v2-v3 delta

>-# define ENUM_CACHE_METER(x)    x
>+# define ENUM_CACHE_METER_INCR(name)    JS_ATOMIC_INCREMENT(&cx->runtime->name)

The _INCR suffix is unnecessary (METER means measure) and we don't use it in analogous situations -- at most UNMETER is added if a decrementing form is needed. Shorter and similar naming wins?

>@@ -4330,42 +4328,43 @@ js_Enumerate(JSContext *cx, JSObject *ob
>                     !(sprop->flags & SPROP_IS_ALIAS) &&
>                     (!SCOPE_HAD_MIDDLE_DELETE(scope) ||
>                      SCOPE_HAS_PROPERTY(scope, sprop))) {
>                     JS_ASSERT(ids < ne->ids + length);
>                     *ids++ = sprop->id;
>                 }
>             }
>             JS_ASSERT(ids == ne->ids + length);
>+            newenumerator = JS_TRUE;
>         } while (0);
>         JS_UNLOCK_SCOPE(cx, scope);
> 
[snip]
>         if (!ne) {
>             JS_ASSERT(length == 0);
>+            JS_ASSERT(!newenumerator);
>             *statep = JSVAL_ZERO;
>         } else {
>             JS_ASSERT(length != 0);
>             JS_ASSERT(ne->cursor == length);
>+            if (newenumerator) {
>+                JS_LOCK_GC(cx->runtime);
>+                if (!js_AddAsGCBytes(cx, NativeEnumeratorSize(length))) {

How about commoning this NativeEnumeratorSize(length) with the one used by JS_malloc above when newenumerator is going to be set true, and storing the result in a allocated size variable -- then you can test if (allocated) instead of if (newenumerator)? The alllowercase name caught my eye but the CSE seems worthwhile.

r=me with these addressed.

/be

Attachment #333874 - Flags: review+

Igor Bukanov

Assignee

Comment 66

•

16 years ago

Attached patch v4 (obsolete) — Details — Splinter Review

The new version of the patch addresses the nits from the comment 65.

Attachment #333873 - Attachment is obsolete: true

Attachment #333874 - Attachment is obsolete: true

Attachment #334430 - Flags: review+

Attachment #333873 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 67

•

16 years ago

Attached patch v3-v4 delta (obsolete) — Details — Splinter Review

Igor Bukanov

Assignee

Comment 68

•

16 years ago

For the sheriff:

To check in - attachment 334430 [details] [diff] [review]

the check in message: bug 417131 - caching enumerators to speedup for-in loops. r=brendan

Brendan Eich [:brendan]

Comment 69

•

16 years ago

Comment on attachment 334430 [details] [diff] [review]
v4

>+            cachep = &cx->runtime->
>+                     nativeEnumCache[NATIVE_ENUM_CACHE_HASH(shape)];
>+            oldcache = *cachep;
>+            if (oldcache & (jsuword) 1) {
>+                if ((uint32) (oldcache >> 1) == shape) {
>+                    /*scope has a shape with no enumerable properties. */

Uber-nit: Missing space before "scope has a shape".

>+                    break;
>+                }
>+            } else if (oldcache != (jsuword) 0) {
>+                /*
>+                 * We can safely read ne->shape without taking the GC lock as
>+                 * ne is only deleted when running the GC and ne->shape is

s/only deleted/deleted only/

and perhaps add "outside of this request" before "when running the GC".

Sorry for missing these earlier.

/be

Brendan Eich [:brendan]

Comment 70

•

16 years ago

Comment on attachment 334430 [details] [diff] [review]
v4

>+    while (!!(ne = *nep)) {

Missed this too -- house style prefers nesting assignment in a loop condition only, and then comparing != NULL explicitly. The !! idiom is rarely used, not in loop conditions at any rate. It stands out due to ! being exclamation point, and lack of common use of it makes it harder to read and understand for most readers. If we used it all over, maybe... but I'd rather not.

/be

Igor Bukanov

Assignee

Comment 71

•

16 years ago

(In reply to comment #70)
> (From update of attachment 334430 [details] [diff] [review])
> >+    while (!!(ne = *nep)) {
> 
> Missed this too -- house style prefers nesting assignment in a loop condition
> only, and then comparing != NULL explicitly.

Using !! was from the desire to avoid using NULL in new C++ code. Due to operator overloading and templates it can bring more harm than clarity.

Brendan Eich [:brendan]

Comment 72

•

16 years ago

C++ and NULL, sigh. Does 0x have null built-in?

We should be careful about operators and templates. I take it the other while ((p=q) != NULL) instances in the code do not cause problems. I guess I would rather cross that bridge if (not when) we come to it, than start incrementally injecting !! while leaving a majority case of != NULL around.

/be

Igor Bukanov

Assignee

Comment 73

•

16 years ago

Attached patch v5 — Details — Splinter Review

Fixing the latest nits, here is the delta:

@@ -4272,13 +4272,13 @@ js_Enumerate(JSContext *cx, JSObject *ob
             oldcache = *cachep;
             if (oldcache & (jsuword) 1) {
                 if ((uint32) (oldcache >> 1) == shape) {
-                    /*scope has a shape with no enumerable properties. */
+                    /* scope has a shape with no enumerable properties. */
                     break;
                 }
             } else if (oldcache != (jsuword) 0) {
                 /*
                  * We can safely read ne->shape without taking the GC lock as
-                 * ne is only deleted when running the GC and ne->shape is
+                 * ne is deleted only when running the GC and ne->shape is
                  * read-only after initialization.
                  */
                 ne = (JSNativeEnumerator *) *cachep;
@@ -4415,7 +4415,7 @@ js_TraceNativeEnumerators(JSTracer *trc)
     }

     nep = &rt->nativeEnumerators;
-    while (!!(ne = *nep)) {
+    while ((ne = *nep) != NULL) {
         JS_ASSERT(ne->length != 0);
         if (ne->cursor != 0) {
             /* Trace ids of the running enumerator. */
-

Attachment #334430 - Attachment is obsolete: true

Attachment #334431 - Attachment is obsolete: true

Attachment #334501 - Flags: review+

Igor Bukanov

Assignee

Comment 74

•

16 years ago

For the sheriff:

To check in - attachment 334501 [details] [diff] [review]

the check in message: bug 417131 - caching enumerators to speedup for-in loops.
r=brendan

Benjamin Smedberg

Comment 75

•

16 years ago

Pushed to mozilla-central: https://hg.mozilla.org/mozilla-central/index.cgi/rev/96203b8e20df

Ginn Chen

Comment 76

•

16 years ago

Is it bug ready to close?

The change of js/src/jslock.cpp caused a problem on Solaris x86.

-#ifndef NSPR_LOCK
-
 /* Implement NativeCompareAndSwap. */
[snip]
 #error "JS_HAS_NATIVE_COMPARE_AND_SWAP should be 0 if your platform lacks a compare-and-swap instruction."
 
 #endif /* arch-tests */
 
+#if JS_HAS_NATIVE_COMPARE_AND_SWAP
+

"ifndef NSPR_LOCK" moved down caused build error if JS_HAS_NATIVE_COMPARE_AND_SWAP is 0.
We should either remove the #error, or add #if  JS_HAS_NATIVE_COMPARE_AND_SWAP.

Ginn Chen

Updated

•

16 years ago

Depends on: 451504

Ginn Chen

Comment 77

•

16 years ago

Filed bug 451504

No longer depends on: 451504

Ginn Chen

Updated

•

16 years ago

Depends on: 451504

Igor Bukanov

Assignee

Updated

•

16 years ago

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Updated

•

16 years ago

Flags: in-testsuite?

Flags: in-litmus-

Bob Clary [:bc] (inactive)

Updated

•

16 years ago

Keywords: testcase

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

16 years ago

Keywords: fixed1.9.1

Bob Clary [:bc] (inactive)

Comment 78

•

16 years ago

added test http://hg.mozilla.org/mozilla-central/rev/46fb3c7d6faa and cvs.

Flags: in-testsuite? → in-testsuite+

Bob Clary [:bc] (inactive)

Comment 79

•

16 years ago

v 1.9.1, 1.9.2

Status: RESOLVED → VERIFIED

Keywords: fixed1.9.1 → verified1.9.1

You need to log in before you can comment on or make changes to this bug.