480297 - string-fasta spends 22% of time in ObjectToIterator_tn, 13% of which in js_NewObject

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Description

•

15 years ago

(Above for unthreaded shell.)

We cache the enumerator state on the basis of shape, but don't cache the iterator object.  I tried to hang the iterobj off JSNativeEnumerator, but the cache logic and iterator-object-lifecycle-management[*] have defeated me for now.  Filing this bug so I don't forget that it's such a fruitful area for improvement.

[*] This code used to be a lot less complex and still correctly handle the vastly-dominant "for (i in objectWithout__iterator__)" case, right?  Am I misremembering?  Can we just put that old iteration model back with separate bytecodes (JSOP_ITERSIMPLE and friends, perhaps), add a JSOP_HASITERATOR and double-emit the loop body if needed?  Seems like the normal case is carrying a heavy runtime load, to say nothing of systematic complexity, to support the iterator/generator case. :(

Brendan Eich [:brendan]

Comment 1

•

15 years ago

shaver: the compiler cannot tell what kind of object is being iterated. What code were you remembering?

We could make the protocol use an arbitrary jsval instead of an object, and store the native enumerator pointer, private-tagged, on the stack. Who needs an object? Small matter of coding to avoid calling .next() on it, etc., of course!

/be

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 2

•

15 years ago

I mean that we could emit

    JSOP_HASITERATOR (66)
    <loop emitted using generic iteration protocol>
    JSOP_GOTO (99)
66: <loop emitted using old pre-__iterator__ specialization>
99: JSOP_WHATEVER_IS_NEXT

at the cost of some bytecode duplication, but gaining easier specialization on trace I think!

Blake Kaplan (:mrbkap) (inactive)

Comment 3

•

15 years ago

For what it's worth, as long as I've seen the code, we've used an iterator object to keep state around (__iterator__ just lets arbitrary objects override the default object's method, which did used to be encoded in the interpreter).

Does the cost of creating new objects dominate us when we're off trace (in the interpreter only)? It seems like we could trace through the default iterator and spit out a bunch of Really Fast machine code for that case and win without any crazy jsemit.cpp machinations.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 4

•

15 years ago

All of this time is called from trace, by my profile.  I doubt that allocating objects is much faster when we're interpreting, though.

We can easily guard on has-custom-iterator due to the power of shapes; I was looking for a cheaper fix that didn't require writing trace generation for handling of mid-iteration delete, etc.

Brendan Eich [:brendan]

Comment 5

•

15 years ago

We could slow the interpreter down and bump our code footprint by emitting more bytecode (we could run out of bytecode too ;-).

Really, tracing FTW. We could unroll short-property-list loops completely, make others use a memoized (guarded) constant list (a la the FrameInfo memoization). No need for more bytecode.

/be

Andreas Gal :gal

Assignee

Comment 6

•

15 years ago

This needs an owner if we want this for 1.9.2. I ran some new numbers. fasta is now gated in the ballpark of > 80% on uncached property access and iterator allocations. I will try to find the bug with the other benchmark data.

Severity: normal → major

Flags: blocking1.9.2?

Priority: -- → P1

Andreas Gal :gal

Assignee

Comment 7

•

15 years ago

This probably wants to be split into two or three bugs:

- iterators allocate to much
- iterators but also other places (i.e. defaultValue()) do uncached property accesses
- iterators generate a ton of strings when iterating over arrays

Brendan Eich [:brendan]

Comment 8

•

15 years ago

(In reply to comment #7)
> This probably wants to be split into two or three bugs:
> 
> - iterators allocate to much

File it and mark it blocking this bug.

> - iterators but also other places (i.e. defaultValue()) do uncached property
> accesses

This is not an issue for the tracer if we use imacros well. Do we? Else how can non-JIT interpreter perf matter unless there's an abort problem we need to fix that is keeping us in the interpreter instead of on trace?

> - iterators generate a ton of strings when iterating over arrays

Bug 505818.

/be

Depends on: 505818

Keywords: meta

OS: Mac OS X → All

Hardware: x86 → All

Robert Sayre

Updated

•

15 years ago

Flags: blocking1.9.2? → blocking1.9.2+

Priority: P1 → P2

Brendan Eich [:brendan]

Comment 9

•

15 years ago

trace-test.js allocating 500MB doesn't seem crazy. Could be better, compared to other open source JS repls? Test and file new bugs.

/be

Brendan Eich [:brendan]

Comment 10

•

15 years ago

(In reply to comment #9)
> trace-test.js allocating 500MB doesn't seem crazy. Could be better, compared to
> other open source JS repls? Test and file new bugs.

Argh, wrong bug (meant for bug 507425).

/be

Brendan Eich [:brendan]

Comment 11

•

15 years ago

Should not be creating an iterator object at all for native enumeration.

/be

Assignee: general → brendan

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.9.2

Version: 1.9.1 Branch → Trunk

Brendan Eich [:brendan]

Comment 12

•

15 years ago

Attached patch patch in progress, not quite working (obsolete) — Details — Splinter Review

Fails this test:

o = {p:1,q:2,r:3,s:4,t:5};
a = [];
for(i in o)
    a[a.length] = i;
print(a.length);
assertEq(a.join(""), "pqrst");

a.join("") returns "pqrs". So close!

Interp perf boost on fasta is ~25%.

/be

Brendan Eich [:brendan]

Comment 13

•

15 years ago

Attached patch this gives almost 6x string-fasta.js speedup (obsolete) — Details — Splinter Review

Mad props to dvander for several pieces of crucial advice.

/be

Attachment #396339 - Attachment is obsolete: true

Attachment #403674 - Flags: review?(jorendorff)

Brendan Eich [:brendan]

Comment 14

•

15 years ago

Attached patch fix jsiter.h FRIEND APIs and impls, improve comments — Details — Splinter Review

Attachment #403674 - Attachment is obsolete: true

Attachment #403708 - Flags: review?(jorendorff)

Attachment #403674 - Flags: review?(jorendorff)

Brendan Eich [:brendan]

Comment 15

•

15 years ago

Comment on attachment 403708 [details] [diff] [review]
fix jsiter.h FRIEND APIs and impls, improve comments

Igor, I would appreciate your review too. I attempted to encapsulate the native enum cache and extend it to encompass the "ned" open descriptor table notion.

Hazards include having an "open" ne with 0 cursor reachable via the cache. The rule is that open callers who then ne->next() must check ne->done() and close if done. This is a bit delicate but for an internal API it seems ok and avoids undue complexity/error-checking. Better ideas welcome.

/be

Attachment #403708 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 16

•

15 years ago

JSNativeEnumTable could be viewed as a cache extended by an open descriptor table. The cache has get and putBack methods. The table has open, index, and close as its main methods. js_ValueToIterator will call open only if !tableFull() to avoid that internal error added in the latest patch.

The small OPEN_LIMIT value of 20, inspired by stdio/Unix kernel code of old, is more than enough for nesting/interleaved-same-thread-context for-in loops, IMHO.

/be

Igor Bukanov

Comment 17

•

15 years ago

(In reply to comment #15)
> 
> Hazards include having an "open" ne with 0 cursor reachable via the cache. The
> rule is that open callers who then ne->next() must check ne->done() and close
> if done. This is a bit delicate but for an internal API it seems ok and avoids
> undue complexity/error-checking. Better ideas welcome.

Does it sound in case of thread worker that suspends the enumeration via yield in the generator and then gets scheduled on another native thread where it will resume and call ne->done from a different thread?

Igor Bukanov

Comment 18

•

15 years ago

The case I worry about is code like:

function gen(obj)
{
    for (var i in obj)
        yield [i, obj[i]];
}

var iter = gen(some_object);
var delay = 100;
setTimeout(function f() { iter.next(); setTimeout(f, delay); }, delay);

that can be repeatedly executed from different native threads.

Igor Bukanov

Comment 19

•

15 years ago

2 more issues with the patch:

1. The patch does not contain code to mark enumerator states in JSNativeEnumTable.

2. Another issue with the patch is that the native enumerator in the table would never be closed if the interpreter or trace exits with oom or infinite-loop-termination. In these cases the current code delegates closing of the open enumerator to the GC via unreachable iterator object that the patch eliminates.

So what about caching not the enumerator but rather the iterator object itself with special support for returning empty enumerators?

Brendan Eich [:brendan]

Comment 20

•

15 years ago

Re: comment 17 and comment 18 -- don't we have this issue already with threadData based nativeEnumCache? If a worker migrates any of its shared-nothing (unshared, I mean) objects, it could leave native enumerators in the old thread's cache with non-zero cursors, then finish enumerating, dropping those cursors to zero. What prevents this in current (unpatched) code?

I will look at caching objects along with NEs, but it is a lose for perf to have to create objects just to enumerate. I had an earlier version of this bug's patch that made JSNativeEnumerator a GC-thing. I'm thinking this is the way to go, but again welcome feedback.

/be

Igor Bukanov

Comment 21

•

15 years ago

(In reply to comment #20)
> Re: comment 17 and comment 18 -- don't we have this issue already with
> threadData based nativeEnumCache? If a worker migrates any of its
> shared-nothing (unshared, I mean) objects, it could leave native enumerators in
> the old thread's cache with non-zero cursors, then finish enumerating, dropping
> those cursors to zero. What prevents this in current (unpatched) code?

Yep, I have missed that the enumerator can be destroyed on another thread where it migrated while leaving a stalled entry in the cache on the original thread. This is indeed a bug.
 
> I will look at caching objects along with NEs, but it is a lose for perf to
> have to create objects just to enumerate.

Those objects can be reused for enumerating over different property sets. That is, when the enumeration is done, they can be set aside. Typically we are talking about just one or two nested for-in loops so a pool for two objects per thread should be enough to cover typical usage. Moreover, those objects do not need to be the current iterator objects (which the patch eliminates). They truly can be just a holder for JSNativeEnumerator and a cursor with null parent/proto slots.

In addition, these objects would eliminate the complexity of the current GC marking code since js_Enumerate would return not a private thing but a normal GC object.

Robert Sayre

Comment 22

•

15 years ago

this is still blocking192. probably have to minus it, but it should really get reviewed and landed.

Robert Sayre

Comment 23

•

15 years ago

no reviews in 4 weeks? we can do better than that.

Flags: blocking1.9.2+ → blocking1.9.2-

Brendan Eich [:brendan]

Comment 24

•

15 years ago

Sorry, Igor gave negative feedback, I should have obsoleted the patch (or he could minus -- no sweat).

/be

Jason Orendorff [:jorendorff]

Comment 25

•

15 years ago

I have a few superficial comments.

The js_DefineFunction change is independent of the rest, right? It would be a
mercy to have just that in a separate changeset, for future bisecting.

What happens when we hit JSOP_YIELD if there is a ned on the stack? Does the enumerator just stay open as long as the generator-iterator is alive? Does it matter? js_ValueToIterator could check and avoid using the cache if we're in a generator.

In JSNativeEnumTable::get:
>+    if (oldcache & (jsuword) 1) {
>+        if (uint32(oldcache >> 1) == shape) {

Macroize the bit-twiddling here?

If the property cache is disabled due to shapeGen overflowing, the native enumerator cache needs to be disabled too.

> TraceRecorder::record_JSOP_NEXTITER()
> {
>     jsval& iterobj_val = stackval(-2);

"iterobj" is a misnomer now in places like this.

>+... lir->ins2i(LIR_lsh,
>+               ...,
>+               (sizeof(JSNativeEnumerator*) == 4) ? 2 : 3)...

With the patch, this idiom is used 3 times. Template/macro?

Jason Orendorff [:jorendorff]

Updated

•

15 years ago

Attachment #403708 - Flags: review?(jorendorff)

Attachment #403708 - Flags: review?(igor)

Robert Sayre

Comment 26

•

15 years ago

who can take a look at this these days?

Target Milestone: mozilla1.9.2 → mozilla1.9.3

Brendan Eich [:brendan]

Comment 27

•

15 years ago

I was gonna update the patch -- it has rotted but it's still in my q.

/be

Robert Sayre

Updated

•

14 years ago

Assignee: brendan → mrbkap

Brendan Eich [:brendan]

Comment 28

•

14 years ago

mrbkap: the native enum table encapsulation was on Waldo's plate, IIRC he filed a bug. The "ned" stuff (mandatory open/close, small descriptor table) should be dropped. All we need is a two-deep (for doubly-nested for-in loops -- could go to four just be sure ;-) cx-based cache of non-escaping native enumerator JSObjects.

/be

Andreas Gal :gal

Assignee

Comment 29

•

14 years ago

The per-runtime enumerating caching requires locking and is super expensive (i.e. fasta). That should be fixed as well.

Andreas Gal :gal

Assignee

Comment 30

•

14 years ago

I would like to suggest the following change:

We introduce a new path that enumerates direct object properties for native objects if there are no class or objectop hooks.

bool
js_EnumerateNativeObjectProperties(JSContext *cx, JSObject *obj,
                                   JSIDVector* vec, JSIDHashSet* hs)
{
    JS_ASSERT(!(obj->getClass()->flags & JSCLASS_NEW_ENUMERATE));
    JS_ASSERT(obj->getClass()->enumerate == JS_EnumerateStub);

    JSScope *scope = OBJ_SCOPE(obj);
    for (JSScopeProperty *sprop = scope->lastProperty(); sprop; sprop = sprop->parent) {
        if (sprop->enumerable() && !sprop->isAlias()) {
            jsid id = sprop->id;
            JSIDHashSet::AddPtr p = hs->lookupForAdd(id);
            if (!p) {
                if (!hs->add(p, id))
                    return false;
		if (!vec->append(id))
                    return false;
            }
        }
    }
    return true;
}

The path is given a hash table and a vector. The hash table is used to deal with shadowing, the vector is what we actually enumerate on.

We change the iteration protocol as follows:

- On INIT walk up the proto chain and use this path to collect properties and snapshot the shape of the object. If an XML object or a non-native object is found along the proto chain, use the old iteration protocol.
- If we get a complete list of all properties, iteration will only draw from this list, and not check whether the property still exists as long shapes match.
- Cache the list of ids we generated (all shapes along the proto chain must match).
- With this we should be able to delete the native iterator caching code. It caches per-object, so for any given for-in loop, we have to generate several native iterators. This will almost never be hit any more, and is a fair amount of complexity.

Brendan and Blake have dibs on this, so I will stay away, except for contributing my unsolicited opinion :)

Andreas Gal :gal

Assignee

Comment 31

•

14 years ago

I think I just fixed most of this. Stealing. I will re-word the bug and find a new owner for the rest.

Assignee: mrbkap → gal

Andreas Gal :gal

Assignee

Comment 32

•

14 years ago

bug 558058 fixed a large chunk of this bug.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

14 years ago

Target Milestone: mozilla1.9.3 → mozilla2.0

Andreas Gal :gal

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → INVALID

patch in progress, not quite working 15 years ago Brendan Eich [:brendan] 23.61 KB, patch		Details \| Diff \| Splinter Review
this gives almost 6x string-fasta.js speedup 15 years ago Brendan Eich [:brendan] 40.84 KB, patch		Details \| Diff \| Splinter Review
fix jsiter.h FRIEND APIs and impls, improve comments 15 years ago Brendan Eich [:brendan] 48.33 KB, patch		Details \| Diff \| Splinter Review