Closed Bug 592965 Opened 14 years ago Closed 11 years ago

JM: Inline PutCallObject

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dvander, Assigned: dvander)

References

Details

Attachments

(1 file, 3 obsolete files)

PutCallObject is fairly simple, but expensive. It copies all args and locals and does some dynamic tests. We statically known exactly which slots must be written, so we can specialize it in the return path of methods.

This doesn't do anything for SS, maybe 2% for v8-v4, but a lot for v8-v5. Before:

Richards: 7298
DeltaBlue: 2806
Crypto: 3177
RayTrace: 1811
EarleyBoyer: 2915
----
Score: 3215

After:

Richards: 7417
DeltaBlue: 2694
Crypto: 3760
RayTrace: 1769
EarleyBoyer: 4061
----
Score: 3519
This patch is also needed for further argument optimizations, on top of Luke's work.
Attachment #471437 - Flags: review?(dmandelin)
Attached patch part 3: inline PutCallObject (obsolete) — Splinter Review
not ready for review yet because deltablue and SS get slightly slower (SS by ~5ms).
Comment on attachment 471437 [details] [diff] [review]
part 2: mark which arguments escape

>+    if (!(cg->flags & TCF_FUN_CALLS_EVAL) && pn->pn_defn &&

Nit: break after every && (||) so as to avoid obscuring the clauses. This is common style when writing multiline conjunctions and disjunctions.

Nit: "ClosedOverName"? Perhaps not, thought I would moot it.

>       case TOK_ARGSBODY:
>-        ok = js_EmitTree(cx, cg, pn->last());
>+      {
>+        JSParseNode *pnlast = pn->last();
>+        for (JSParseNode *pn2 = pn->pn_head; pn2 != pnlast; pn2 = pn2->pn_next) {
>+            if (!BindNameToSlot(cx, cg, pn2))
>+                return JS_FALSE;
>+            if (JOF_OPTYPE(pn2->pn_op) == JOF_QARG &&
>+                !(cg->flags & TCF_FUN_USES_ARGUMENTS) &&

The middle condition is loop-invariant, so hoist it and make the outer block around this case's statements the |then| block of an if, ending before the ok = js_EmitTree(...) call.

>+++ b/js/src/jsinterp.cpp
>@@ -2625,6 +2625,7 @@ END_CASE(JSOP_TRACE)
> /* ADD_EMPTY_CASE is not used here as JSOP_LINENO_LENGTH == 3. */
> BEGIN_CASE(JSOP_LINENO)
> BEGIN_CASE(JSOP_DEFUPVAR)
>+BEGIN_CASE(JSOP_DEFUPARG)
> END_CASE(JSOP_DEFUPVAR)

Saying it with bytecode helps something (I don't see what, in this patch, though) but it occurs to me that script meta-data processed by native code might be a lot faster than prolog ops dispatched by some interpreter or compiled by some JIT.

But maybe I'm missing the motivation for emitting JSOP_DEFUP{VAR,ARG} prolog ops. I remember JSOP_DEFUPVAR but didn't think about the script meta-data alternative then.

/be
Comment on attachment 471437 [details] [diff] [review]
part 2: mark which arguments escape

(In reply to comment #4)

> The middle condition is loop-invariant

Whoops - just realized this condition shouldn't even be there.

> Saying it with bytecode helps something (I don't see what

It's a prerequisite to optimizations that rely on knowing which arguments can't be mutated by implicit calls. That it's expressed in bytecode as a nop (nothing emitted in JIT) is just a hack. We could hang this information off the script.
Attachment #471437 - Flags: review?(dmandelin)
Comment on attachment 471436 [details] [diff] [review]
part 1: mark scripts as using arguments

Okay, patch queue fail - up too late. Was getting the wrong baseline. This is a slight perf win (eliminates every PutCallObject call) but nowhere near as impossibly dramatic.

I'm going to file a separate bug for the DEFUPARG analysis, and move both that and DEFUPVAR into the script rather than a prolog op. That will block this and other arguments-related optimizations.
Attachment #471436 - Attachment is obsolete: true
Attachment #471436 - Flags: review?(dmandelin)
Attached patch fixSplinter Review
Improves v8-v5 earley-boyer by about 100 points on my machine, but it's probably not worth taking this. It only seems to gain about 5 nanoseconds per PutCallObject removal on a microbenchmark. Tabling for a rainy day.
Attachment #471437 - Attachment is obsolete: true
Attachment #471438 - Attachment is obsolete: true
is this still not worth taking?  100 points is currently approx 14% of the difference between us and the complete v8 score on awfy (729 ms diff)

it's also currently raining in Plattsburgh, NY
Seems to be an easy performance win, and getting a higher score will probably impress some kids.

We already had snow in Germany.
Just to clarify, in response to comment 8, 100 points on the V8 score is completely different than 100 ms on AWFY.  With that being said, we are currently far behind V8 on the Earley-Boyer test, so any improvement there (particularly if it is low-hanging fruit) may be worth the effort.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: