Closed Bug 480297 Opened 15 years ago Closed 9 years ago

string-fasta spends 22% of time in ObjectToIterator_tn, 13% of which in js_NewObject

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED INVALID
mozilla2.0

People

(Reporter: shaver, Assigned: gal)

References

Details

(Keywords: meta, perf)

Attachments

(1 file, 2 obsolete files)

(Above for unthreaded shell.)

We cache the enumerator state on the basis of shape, but don't cache the iterator object.  I tried to hang the iterobj off JSNativeEnumerator, but the cache logic and iterator-object-lifecycle-management[*] have defeated me for now.  Filing this bug so I don't forget that it's such a fruitful area for improvement.

[*] This code used to be a lot less complex and still correctly handle the vastly-dominant "for (i in objectWithout__iterator__)" case, right?  Am I misremembering?  Can we just put that old iteration model back with separate bytecodes (JSOP_ITERSIMPLE and friends, perhaps), add a JSOP_HASITERATOR and double-emit the loop body if needed?  Seems like the normal case is carrying a heavy runtime load, to say nothing of systematic complexity, to support the iterator/generator case. :(
shaver: the compiler cannot tell what kind of object is being iterated. What code were you remembering?

We could make the protocol use an arbitrary jsval instead of an object, and store the native enumerator pointer, private-tagged, on the stack. Who needs an object? Small matter of coding to avoid calling .next() on it, etc., of course!

/be
I mean that we could emit

    JSOP_HASITERATOR (66)
    <loop emitted using generic iteration protocol>
    JSOP_GOTO (99)
66: <loop emitted using old pre-__iterator__ specialization>
99: JSOP_WHATEVER_IS_NEXT

at the cost of some bytecode duplication, but gaining easier specialization on trace I think!
For what it's worth, as long as I've seen the code, we've used an iterator object to keep state around (__iterator__ just lets arbitrary objects override the default object's method, which did used to be encoded in the interpreter).

Does the cost of creating new objects dominate us when we're off trace (in the interpreter only)? It seems like we could trace through the default iterator and spit out a bunch of Really Fast machine code for that case and win without any crazy jsemit.cpp machinations.
All of this time is called from trace, by my profile.  I doubt that allocating objects is much faster when we're interpreting, though.

We can easily guard on has-custom-iterator due to the power of shapes; I was looking for a cheaper fix that didn't require writing trace generation for handling of mid-iteration delete, etc.
We could slow the interpreter down and bump our code footprint by emitting more bytecode (we could run out of bytecode too ;-).

Really, tracing FTW. We could unroll short-property-list loops completely, make others use a memoized (guarded) constant list (a la the FrameInfo memoization). No need for more bytecode.

/be
This needs an owner if we want this for 1.9.2. I ran some new numbers. fasta is now gated in the ballpark of > 80% on uncached property access and iterator allocations. I will try to find the bug with the other benchmark data.
Severity: normal → major
Flags: blocking1.9.2?
Priority: -- → P1
This probably wants to be split into two or three bugs:

- iterators allocate to much
- iterators but also other places (i.e. defaultValue()) do uncached property accesses
- iterators generate a ton of strings when iterating over arrays
(In reply to comment #7)
> This probably wants to be split into two or three bugs:
> 
> - iterators allocate to much

File it and mark it blocking this bug.

> - iterators but also other places (i.e. defaultValue()) do uncached property
> accesses

This is not an issue for the tracer if we use imacros well. Do we? Else how can non-JIT interpreter perf matter unless there's an abort problem we need to fix that is keeping us in the interpreter instead of on trace?

> - iterators generate a ton of strings when iterating over arrays

Bug 505818.

/be
Depends on: 505818
Keywords: meta
OS: Mac OS X → All
Hardware: x86 → All
Flags: blocking1.9.2? → blocking1.9.2+
Priority: P1 → P2
trace-test.js allocating 500MB doesn't seem crazy. Could be better, compared to other open source JS repls? Test and file new bugs.

/be
(In reply to comment #9)
> trace-test.js allocating 500MB doesn't seem crazy. Could be better, compared to
> other open source JS repls? Test and file new bugs.

Argh, wrong bug (meant for bug 507425).

/be
Should not be creating an iterator object at all for native enumeration.

/be
Assignee: general → brendan
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.9.2
Version: 1.9.1 Branch → Trunk
Fails this test:

o = {p:1,q:2,r:3,s:4,t:5};
a = [];
for(i in o)
    a[a.length] = i;
print(a.length);
assertEq(a.join(""), "pqrst");

a.join("") returns "pqrs". So close!

Interp perf boost on fasta is ~25%.

/be
Mad props to dvander for several pieces of crucial advice.

/be
Attachment #396339 - Attachment is obsolete: true
Attachment #403674 - Flags: review?(jorendorff)
Attachment #403674 - Attachment is obsolete: true
Attachment #403708 - Flags: review?(jorendorff)
Attachment #403674 - Flags: review?(jorendorff)
Comment on attachment 403708 [details] [diff] [review]
fix jsiter.h FRIEND APIs and impls, improve comments

Igor, I would appreciate your review too. I attempted to encapsulate the native enum cache and extend it to encompass the "ned" open descriptor table notion.

Hazards include having an "open" ne with 0 cursor reachable via the cache. The rule is that open callers who then ne->next() must check ne->done() and close if done. This is a bit delicate but for an internal API it seems ok and avoids undue complexity/error-checking. Better ideas welcome.

/be
Attachment #403708 - Flags: review?(igor)
JSNativeEnumTable could be viewed as a cache extended by an open descriptor table. The cache has get and putBack methods. The table has open, index, and close as its main methods. js_ValueToIterator will call open only if !tableFull() to avoid that internal error added in the latest patch.

The small OPEN_LIMIT value of 20, inspired by stdio/Unix kernel code of old, is more than enough for nesting/interleaved-same-thread-context for-in loops, IMHO.

/be
(In reply to comment #15)
> 
> Hazards include having an "open" ne with 0 cursor reachable via the cache. The
> rule is that open callers who then ne->next() must check ne->done() and close
> if done. This is a bit delicate but for an internal API it seems ok and avoids
> undue complexity/error-checking. Better ideas welcome.

Does it sound in case of thread worker that suspends the enumeration via yield in the generator and then gets scheduled on another native thread where it will resume and call ne->done from a different thread?
The case I worry about is code like:

function gen(obj)
{
    for (var i in obj)
        yield [i, obj[i]];
}

var iter = gen(some_object);
var delay = 100;
setTimeout(function f() { iter.next(); setTimeout(f, delay); }, delay);

that can be repeatedly executed from different native threads.
2 more issues with the patch:

1. The patch does not contain code to mark enumerator states in JSNativeEnumTable.

2. Another issue with the patch is that the native enumerator in the table would never be closed if the interpreter or trace exits with oom or infinite-loop-termination. In these cases the current code delegates closing of the open enumerator to the GC via unreachable iterator object that the patch eliminates.

So what about caching not the enumerator but rather the iterator object itself with special support for returning empty enumerators?
Re: comment 17 and comment 18 -- don't we have this issue already with threadData based nativeEnumCache? If a worker migrates any of its shared-nothing (unshared, I mean) objects, it could leave native enumerators in the old thread's cache with non-zero cursors, then finish enumerating, dropping those cursors to zero. What prevents this in current (unpatched) code?

I will look at caching objects along with NEs, but it is a lose for perf to have to create objects just to enumerate. I had an earlier version of this bug's patch that made JSNativeEnumerator a GC-thing. I'm thinking this is the way to go, but again welcome feedback.

/be
(In reply to comment #20)
> Re: comment 17 and comment 18 -- don't we have this issue already with
> threadData based nativeEnumCache? If a worker migrates any of its
> shared-nothing (unshared, I mean) objects, it could leave native enumerators in
> the old thread's cache with non-zero cursors, then finish enumerating, dropping
> those cursors to zero. What prevents this in current (unpatched) code?

Yep, I have missed that the enumerator can be destroyed on another thread where it migrated while leaving a stalled entry in the cache on the original thread. This is indeed a bug.
 
> I will look at caching objects along with NEs, but it is a lose for perf to
> have to create objects just to enumerate.

Those objects can be reused for enumerating over different property sets. That is, when the enumeration is done, they can be set aside. Typically we are talking about just one or two nested for-in loops so a pool for two objects per thread should be enough to cover typical usage. Moreover, those objects do not need to be the current iterator objects (which the patch eliminates). They truly can be just a holder for JSNativeEnumerator and a cursor with null parent/proto slots.

In addition, these objects would eliminate the complexity of the current GC marking code since js_Enumerate would return not a private thing but a normal GC object.
this is still blocking192. probably have to minus it, but it should really get reviewed and landed.
no reviews in 4 weeks? we can do better than that.
Flags: blocking1.9.2+ → blocking1.9.2-
Sorry, Igor gave negative feedback, I should have obsoleted the patch (or he could minus -- no sweat).

/be
I have a few superficial comments.

The js_DefineFunction change is independent of the rest, right? It would be a
mercy to have just that in a separate changeset, for future bisecting.

What happens when we hit JSOP_YIELD if there is a ned on the stack? Does the enumerator just stay open as long as the generator-iterator is alive? Does it matter? js_ValueToIterator could check and avoid using the cache if we're in a generator.

In JSNativeEnumTable::get:
>+    if (oldcache & (jsuword) 1) {
>+        if (uint32(oldcache >> 1) == shape) {

Macroize the bit-twiddling here?

If the property cache is disabled due to shapeGen overflowing, the native enumerator cache needs to be disabled too.

> TraceRecorder::record_JSOP_NEXTITER()
> {
>     jsval& iterobj_val = stackval(-2);

"iterobj" is a misnomer now in places like this.

>+... lir->ins2i(LIR_lsh,
>+               ...,
>+               (sizeof(JSNativeEnumerator*) == 4) ? 2 : 3)...

With the patch, this idiom is used 3 times. Template/macro?
Attachment #403708 - Flags: review?(jorendorff)
Attachment #403708 - Flags: review?(igor)
who can take a look at this these days?
Target Milestone: mozilla1.9.2 → mozilla1.9.3
I was gonna update the patch -- it has rotted but it's still in my q.

/be
Assignee: brendan → mrbkap
mrbkap: the native enum table encapsulation was on Waldo's plate, IIRC he filed a bug. The "ned" stuff (mandatory open/close, small descriptor table) should be dropped. All we need is a two-deep (for doubly-nested for-in loops -- could go to four just be sure ;-) cx-based cache of non-escaping native enumerator JSObjects.

/be
The per-runtime enumerating caching requires locking and is super expensive (i.e. fasta). That should be fixed as well.
I would like to suggest the following change:

We introduce a new path that enumerates direct object properties for native objects if there are no class or objectop hooks.

bool
js_EnumerateNativeObjectProperties(JSContext *cx, JSObject *obj,
                                   JSIDVector* vec, JSIDHashSet* hs)
{
    JS_ASSERT(!(obj->getClass()->flags & JSCLASS_NEW_ENUMERATE));
    JS_ASSERT(obj->getClass()->enumerate == JS_EnumerateStub);

    JSScope *scope = OBJ_SCOPE(obj);
    for (JSScopeProperty *sprop = scope->lastProperty(); sprop; sprop = sprop->parent) {
        if (sprop->enumerable() && !sprop->isAlias()) {
            jsid id = sprop->id;
            JSIDHashSet::AddPtr p = hs->lookupForAdd(id);
            if (!p) {
                if (!hs->add(p, id))
                    return false;
		if (!vec->append(id))
                    return false;
            }
        }
    }
    return true;
}

The path is given a hash table and a vector. The hash table is used to deal with shadowing, the vector is what we actually enumerate on.

We change the iteration protocol as follows:

- On INIT walk up the proto chain and use this path to collect properties and snapshot the shape of the object. If an XML object or a non-native object is found along the proto chain, use the old iteration protocol.
- If we get a complete list of all properties, iteration will only draw from this list, and not check whether the property still exists as long shapes match.
- Cache the list of ids we generated (all shapes along the proto chain must match).
- With this we should be able to delete the native iterator caching code. It caches per-object, so for any given for-in loop, we have to generate several native iterators. This will almost never be hit any more, and is a fair amount of complexity.

Brendan and Blake have dibs on this, so I will stay away, except for contributing my unsolicited opinion :)
I think I just fixed most of this. Stealing. I will re-word the bug and find a new owner for the rest.
Assignee: mrbkap → gal
bug 558058 fixed a large chunk of this bug.
Target Milestone: mozilla1.9.3 → mozilla2.0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: