Closed Bug 592976 Opened 9 years ago Closed 9 years ago

JM: Optimize formal arguments access


(Core :: JavaScript Engine, defect)

Not set





(Reporter: dvander, Assigned: dvander)



(Whiteboard: fixed-in-tracemonkey)


(3 files, 1 obsolete file)

We can treat accesses to local arguments (almost) like local variables. "Almost" depends on the outcome of |f.arguments| breakage. If we can't nop sets to these ninja'd arguments objects, we have to eagerly write the results of SETARG/FORARG to memory. However, we can still copy propagate.
Attached file perf test case
Right now, args are roughly 10% slower than local variables.
Assignee: general → dvander
OS: Linux → All
Hardware: x86_64 → All
Attached patch treat args as locals (obsolete) — Splinter Review
This makes args as fast as locals in the attached test case. For the most part it's a very small win, maybe 2ms on SunSpider, but we lose 1ms reproducibly on date-format-xparb. No change in v8.

Posting the patch for posterity, I'll look at it again once all of the necessary bugs have landed.
Comment on attachment 475314 [details] [diff] [review]
treat args as locals

Getting review now in case this wants to land in a week ;)
Attachment #475314 - Flags: review?(dmandelin)
Comment on attachment 475314 [details] [diff] [review]
treat args as locals

Looks good to me. A few questions/requests:


+FrameState::getArg(uint32 slot)
+    JS_ASSERT(slot < nargs);
+    FrameEntry *fe = &entries[slot];

Is this future-proof? I.e., are we requiring that entries[0] always be the first arg? Might be good to put a thin abstraction in place here.

2. isClosedVar and isClosedArg are a little funny as names. I'm not sure what a better name would be--it seems the sense is "can this var be accessed otherwise than through its local name". isNonLocal sounds about right but I'm not sure the terminology is familiar there. isAliased kind of works, but is not accurate about the closed-over part. Maybe maybeImplicitlyAccessed or the like?

In addition, or alternatively, it would be good to comment those functions on exactly what they do mean.

3. Does trace-tests exercise the main paths through incarg et al? If not, please add some test cases for those.

On that note, it might be a good idea to add some tests for the major kinds of implicit accesses we are trying to detect and deoptimize for.

Other than the test coverage, I think all these could be addressed as followup bugs if that is more convenient than updating this patch.
Attachment #475314 - Flags: review?(dmandelin) → review+
Depends on: 601109
No longer depends on: 601109
This patch bitrotted enough to warrant a rewrite. The new queue addresses most of Dave's comments, and the tracer incarg tests caught bugs along the way so I think we've got good coverage.

Since the frame depth will now include arguments and (callee, this), this patch removes frameDepth() to audit its callers and use the appropriate semantics for each.
Attachment #475314 - Attachment is obsolete: true
Attachment #488541 - Flags: review?(lw)
Changes in this patch:
 * Track callee in FrameState, always as an object.
 * Track |this| in FrameState, sometimes as an object.
 * Track arguments in FrameState.
 * Rewrite arginc & friends to be much simpler and saner.

On the microbenchmark, before -
Arg perf: 145
Var perf: 114

After -
Arg perf: 112
Var perf: 115

SunSpider shows maybe a 2ms win, v8 shows a 2.5% win, both -m only. v8 mostly seeps up in earley-boyer, deltablue, and richards. These do lots of constructing, and register allocation is much better with lots of writes to "this" in a row.
Attachment #488543 - Flags: review?(lw)
Attachment #488543 - Attachment description: part 2: change some invariants → part 2: track args, callee, this
Out-of-order comment on part 2: could you remove RematInfo::isConstant() since its not used?  (Otherwise, there is the latent problem that isConstant() implies inMemory())
Comment on attachment 488541 [details] [diff] [review]
part 1: remove FrameState::frameDepth()

>         OOL_STUBCALL_SLOTS(JS_FUNC_TO_DATA_PTR(void *, stubs::UncachedCall),
>-                           frame.frameDepth() + frameDepthAdjust);
>+                           frame.localSlots() + frameDepthAdjust);

Since 'slots' seems to usually mean frame.frameSlots(), perhaps the macro name could be changed to OOL_STUBCALL_LOCAL_SLOTS ?

>-                callIC.frameSize.initStatic(frame.frameDepth(), speculatedArgc - 1);
>+                callIC.frameSize.initStatic(frame.localSlots(), speculatedArgc - 1);

For regularity, could you also change the corresponding member of FrameSize from staticFrameDepth to staticLocalSlots?
Attachment #488541 - Flags: review?(lw) → review+
Comment on attachment 488543 [details] [diff] [review]
part 2: track args, callee, this

Looks good, only nits:

>+            frame.learnThisIsObject();


>+        // Before: 
>+        // After:  V
>+        frame.pushArg(slot);
>+        // Before: V
>+        // After:  1
>+        frame.push(Int32Value(amt));
>+        // Note, SUB will perform integer conversion for us.
>+        // Before: V
>+        // After:  N+1
>+        jsop_binary(JSOP_SUB, stubs::Sub);

I think the second after and first before should be "V 1" ?

Also, it might be a bit shorter to just have a single between each push/pop that just says what the current stack state is...

>-    args = entries;
>+    callee_ = entries;
>+    this_ = entries + 1;
>+    args = entries + 2;
>     locals = args + nargs;
>     spBase = locals + script->nfixed;
>     sp = spBase;

Not related to the patch, but its a bit disturbing when trying to see what's local vs. memvar to see mixed used of trailing _.  Perhaps they could be all _?

>+    if (!eval) {
>+        if (script->nslots) {
>+            closedVars = (JSPackedBool *)cursor;
>+            cursor += sizeof(JSPackedBool) * script->nslots;
>+        }
>+        if (!usesArguments && nargs) {
>+            closedArgs = (JSPackedBool *)cursor;
>+            cursor += sizeof(JSPackedBool) * nargs;
>+        }
>     }

Perhaps set to null in else path to make potential bugs unexploitable?

>+    uint32 indexOf(int32 depth) { return uint32((sp + depth) - entries); }
>+    uint32 indexOfFe(FrameEntry *fe) const { return uint32(fe - entries); }
>+    uint32 feLimit() { return script->nslots + nargs + 2; }

Perhaps add asserts that fe/depth are within feLimit() ?
Attachment #488543 - Flags: review?(lw) → review+
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.