Closed Bug 589398 Opened 14 years ago Closed 14 years ago

JM: Use a PIC for JSOP_NEW's prototype lookup

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: dvander, Assigned: dvander)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 11 obsolete files)

126.56 KB, patch
dvander
: review+
Details | Diff | Splinter Review
For JSOP_NEW we call a NewObject stub that does a property lookup and Object allocation. The property lookup can be PIC'd. Disclaimer: I have not yet investigated how much this is costing us.

Our call ICs have a fast path that guards on the callee object identity. The prototype lookup can be baked in as part of the fast path. This would be very easy to add after bug 587698 lands. Even easier (no patching needed) would be to just pass the IC struct to NewObject, and store the shape & sprop in the IC struct.

There are many other approaches. Re-using the existing GETPROP machinery is probably overkill and would be hard to integrate without hoisting it above the entire call path. That would be slightly slower in the fastest case (minimally, you need to guard on the callee like the call path does), but conceptually much cleaner.
On SS this is all in access-binary-trees and costing 4-5ms iirc.  Much more in V8.
Attached patch fix (obsolete) — Splinter Review
Thanks - I don't get that much but this approach could probably be better. This eliminates the getProperty() for SS and v8. 1% on SS and v8 each.
dvander said that I could drive this into the tree. Will check how it does on binary trees, where we see this matter in profiling.
Assignee: general → cdleary
Status: NEW → ASSIGNED
blocking2.0: --- → ?
blocking2.0: ? → -
status2.0: --- → wanted
un-stealing with permission. JSC places the constructor in the callee - I think this is a good approach, since then we don't have to sync |thisv| before the call.

the hard part will be rejiggering generatePrologue() to be callable without a Compiler instance, since we don't want to re-JIT methods but functions must support both constructing and non-constructing calls.
Assignee: cdleary → dvander
Attached patch WIP v0, interp changes only (obsolete) — Splinter Review
Moves |this| construction into the callee. Unfortunately there's a bunch of special cases - the empty script, InvokeConstructor on non-natives, etc, so there is a fair amount of duplication.

tracer changes coming next.
Attachment #468077 - Attachment is obsolete: true
Thinking aloud, it would be nice if the call object creation could go in JSOP_BEGIN. script->nfixed initialization, too.
I guess that's hard because of:

    /*
     * Compute |this|. Currently, this must happen after the frame is pushed
     * and fp->scopeChain is correct because the thisObject hook may call
     * JS_GetScopeChain.
     */
So nevermind; both callobj and nfixed initialization will have to be included in the method prologue.
As soon as brain transplants land, bug 592992 wants to hoist the thisObject hook to ExternalInvoke (since it can only happen from extrnal api calls), which should remove the constraint.
Awesome! I'll leave it as a follow-up, then. I think it will simplify things.
Attached patch WIP v2 (obsolete) — Splinter Review
More factoring to split out prolog generation, fixes a bunch of OOM/refcount bugs in PICs in the process.
Attachment #479653 - Attachment is obsolete: true
Attachment #479677 - Attachment is obsolete: true
Comment on attachment 479718 [details] [diff] [review]
WIP v2

It turns out the prologue-bifurcation idea is quite difficult to make fast. Right now we have the caller checking for primitive |this|, and reading |thisv| out of the old frame. It can't do that anymore with an arity mismatch; |this| is written after the frame-fixup. We can't guarantee that |thisv| is written to both the canonical and actual argv without mysterious and undesirable changes to the tracer.

Really, we want the callee to perform |thisv| fixups, but if we did that with only header-splitting, every return would need an extra bit check in fp->flags and then some gross stuff.

So, new approach. We'll give JSScript the ability to have two JITScripts, one for normal calls and another for constructing. This would slow down how IC stubs access their members however, by adding a few loads and a branch. Part of this patch queue is going to pass IC pointers directly to stubs.
Attachment #479718 - Attachment is obsolete: true
Comment on attachment 480003 [details] [diff] [review]
part 1: formalize compiler's cx->fp() requirement

I know its pre-existing, but should Compiler::Compile be Compiler::compile?  r+ regardless.
Attachment #480003 - Flags: review?(lw) → review+
method JIT changes aren't done yet, but this part is nicely separable
Attachment #480011 - Flags: review?(lw)
Comment on attachment 480004 [details] [diff] [review]
part 2: pass IC pointers directly to stubs

I noticed that there is

callIC.addrLabel1 = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1);
callIC.addrLabel2 = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1);

Looks like maybe those should use passPICAddress (unless there is some other constraint).
Attachment #480004 - Flags: review?(dmandelin) → review+
Attached patch part 3: interp+tracer changes (obsolete) — Splinter Review
fixes IRL comments about InvokeConstructor and CreateThis
Attachment #480011 - Attachment is obsolete: true
Attachment #480200 - Flags: review?(lw)
Attachment #480011 - Flags: review?(lw)
Attached patch part 3: interp+tracer changes (obsolete) — Splinter Review
with some extra gunk removed
Attachment #480200 - Attachment is obsolete: true
Attachment #480202 - Flags: review?(lw)
Attachment #480200 - Flags: review?(lw)
Attached patch part 4: bifurcation, PIC (obsolete) — Splinter Review
This patch passes tests except for debug mode; part 5 will fix that.

Maybe a 1% change SunSpider, biggest % win is in access-binary-trees, but that's about 0.4ms on my machine.

v8 is better. With "-m -j" it's about a 5.5% win (132ms in over-optimistic AWFY units), and 9.6% win with "-m" alone (284ms).
Attachment #480280 - Flags: review?(dmandelin)
Attached patch part 5: fix debug mode (obsolete) — Splinter Review
Lets debug mode work with new JITScript bifurcation.
Attachment #480338 - Flags: review?(sstangl)
Comment on attachment 480280 [details] [diff] [review]
part 4: bifurcation, PIC

r+ with these small changes:

1. Use script->maybeNativeMap() at the point where nmap is needed in jsinterp, just so we have less mutable state to go wrong.

2. Consider some extra abstraction around fp->isConstructing(), especially in the compiler, such as creating a member for constructor mode. It always kind of bothered me that the compiler reaches into the current frame directly to figure out what to do. I'm not sure that change is a good idea, though.

3. Add comments to loadReturnValue, fixPrimitiveReturn, and constructThis. In particular, for this kind of function I really like to have an explanation of what the inputs and outputs are of the generated code. 

4. There is a comment "Any value value" in jsscript.h.
Attachment #480280 - Flags: review?(dmandelin) → review+
(In reply to comment #21)
> Created attachment 480280 [details] [diff] [review]
> part 4: bifurcation, PIC

This patch changes the "register version" of FrameState::storeTo() by adding a 64-bit specific patch. However, the #define used is "JS_PUNBOX", when the 64-bit value format is "JS_PUNBOX64".

Additionally, it is only beneficial to use loadValueAsComponents() when both values have to be loaded -- otherwise, loading a single part is faster. So the condition must be

> fe->type.inMemory() && fe->data.inMemory()
(In reply to comment #14)
> Created attachment 480003 [details] [diff] [review]
> part 1: formalize compiler's cx->fp() requirement

This introduces Compiler::Compile() and Compiler::performCompilation(), which mean the same thing. Since the purpose of ::Compile() is to be called by mjit::TryCompile(), would it make sense to rename Compiler::Compile() to Compiler::TryCompile(), and Compiler::performCompilation() to Compiler::Compile()?
Attachment #480338 - Flags: review?(sstangl) → review+
Comment on attachment 480202 [details] [diff] [review]
part 3: interp+tracer changes

Nice work!

>-    Value &thisv = fp->functionThis();
>-    JS_ASSERT_IF(flags & JSINVOKE_CONSTRUCT, !thisv.isPrimitive());
>-    if (thisv.isObject() && !(flags & JSINVOKE_CONSTRUCT)) {
>+    if (!(flags & JSINVOKE_CONSTRUCT)) {
>+        Value &thisv = fp->functionThis();

Cool
 
>@@ -1156,8 +1161,9 @@ InvokeConstructor(JSContext *cx, const C
>+    // Scripts create their own |this|.
>+    if (!fun || fun->isNative()) {
>+        JSObject *obj = js_CreateThis(cx, callee);

Perhaps:
/* Scripts create their own |this| in JSOP_BEGIN */
?
Also, so that the reader doesn't have to spend time wondering if there are non-isNative() non-isInterpreted() functions, could you s/fun->isNative()/!fun->isInterpreted()/ ?

>+        // The interpreter fixes rval for us.
>+        JS_ASSERT(!fun->isInterpreted());

/* */

>+        regs.fp->formalArgs()[-1].setObject(*obj2);

Perhaps regs.fp->functionThis().setObject(*obj2);

>+// Specialized call for constructing |this| with a known function callee,
>+// and a known prototype.

/* */
Attachment #480202 - Flags: review?(lw) → review+
Review follow-ups:

(In reply to comment #16)
> Should Compiler::Compile be Compiler::compile?

Yeah, I'll clean this up. Sean had good complaints here too.

(In reply to comment #18)
> Looks like maybe those should use passPICAddress

passPICAddress only works on PICs; since there's only two callIC sites it just got expanded.

(In reply to comment #23)

> 1. Use script->maybeNativeMap() at the point where nmap is needed in jsinterp,
just so we have less mutable state to go wrong.

Actually caching this state could break after recompilation, so thanks, will fix.

> 2. Consider some extra abstraction around fp->isConstructing ... compiler reaches into the current frame directly

Okay, this makes sense given that we cache other things too. Though, it doesn't peek into the current frame, but the given frame.

Will do on the rest.

(In reply to comment #24)
(In reply to comment #25)
(In reply to comment #26)
Thanks, will fix these.
Attached patch finalSplinter Review
Some follow-up fixes: Bump XDR, fix no-IC builds, remove bogus asserts, don't explicitly mark fp->thisValue (stack space scanner does so automatically & conservatively).
Attachment #480003 - Attachment is obsolete: true
Attachment #480004 - Attachment is obsolete: true
Attachment #480202 - Attachment is obsolete: true
Attachment #480280 - Attachment is obsolete: true
Attachment #480338 - Attachment is obsolete: true
Attachment #480800 - Flags: review+
JSOP_BEGIN is always emitted at the start of an interpreted function's script, so why not just hardcode its guts (interp, mjit, tracer) on entry to function code? That seems strictly faster in terms of dispatch -- native C++ optimized and PGO'ed instead of a bytecode to interpret or JIT-compile.

/be
On v8, regress-x86 shows this as a 258ms (9%) win for -m, and 88ms (3.7%) win for -m -j.

SunSpider appears to have gotten a 5ms (1%) regression. It's in:
 * access-fannkuch (2.7ms)
 * math-cordic (2ms)

This ate up a 1.7ms win on access-binary-trees.

This is curious as neither calls |new| in any performance meaningful way. I will investigate and file a follow-up tomorrow.
You might want to compare browser before and after.  In the shell, fannkuch especially is subject to random but repeatable gain or loss in speed; I think it is very sensitive to the layout of the four arrays it accesses vis a vis the cache, and since the shell is deterministic (unlike the browser) values will be laid out the same way every time it runs.  See bugs 584917 (compares browser vs. shell) and 580492 (uninvestigated, probably the same thing).
Depends on: 601982
Urgh, what a pill crab. This is all too familiar. The "bug" is that two members were removed from JSScript. If I add padding in, the perf hit goes away (and it's sometimes quite large, 10% on my MBP).

Luke wants a follow-up bug to investigate these weird effects. For now, I'll add two members to JSScript in the fix for bug 601982.
(In reply to comment #35)
> Urgh, what a pill crab. This is all too familiar. The "bug" is that two members
> were removed from JSScript. If I add padding in, the perf hit goes away (and
> it's sometimes quite large, 10% on my MBP).
> 
> Luke wants a follow-up bug to investigate these weird effects. For now, I'll
> add two members to JSScript in the fix for bug 601982.

Three cheers for measuring instruction counts as well as time!
(In reply to comment #32)

Moving the guts of JSOP_BEGIN out means putting it in both JSOP_NEW and the prologue of Interpret(). The new opcode cleanly focuses the semantics. The dispatch overhead shouldn't matter since we're interpreting so rarely.
(In reply to comment #37)
> (In reply to comment #32)
> 
> Moving the guts of JSOP_BEGIN out means putting it in both JSOP_NEW and the
> prologue of Interpret().

C'mon, that's what inline helpers are for.

> The new opcode cleanly focuses the semantics. The
> dispatch overhead shouldn't matter since we're interpreting so rarely.

I figured, but this is probably a mistake, or just more work to refactor later when eliminating the interpreter. Why not use a (static or class-based) inline helper now and be done?

/be
http://hg.mozilla.org/mozilla-central/rev/32b049250e03
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 606639
Depends on: 621420
You need to log in before you can comment on or make changes to this bug.