Closed Bug 1049840 Opened 10 years ago Closed 10 years ago

Add generic GetProp stub to baseline for polymorphic sites

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: jschulte, Assigned: jschulte)

Details

Attachments

(2 files, 6 obsolete files)

Both a todo in BaselineIC.cpp and jandem in bug 965793, comment 4 suggested to replace the optimized stub chain for getprop with a generic stub, if MAX_OPTIMIZED_STUBS is reached.
Attached patch v1.patch (obsolete) — Splinter Review
Attachment #8468727 - Flags: feedback?(kvijayan)
So, on the attached micro-benchmark with ./js --baseline-eager --no-ion -b (average of 5 runs) I get:
pre-patch: 27,7562 ms
with patch: 21,1738 ms
-> 24 % improvement

On Octane-Typescript (again baseline-eager and no-ion, average of 10 runs) I get:
pre-patch: 4828.2
with patch: 4867,5
-> 0.8 % improvement
Comment on attachment 8468727 [details] [diff] [review]
v1.patch

Review of attachment 8468727 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at this patch, it seems that the "generic" getprop stub is almost identical to the fallback stub, except that the fallback stub does a bunch of extra checks to see if it can attach a new stub.

I am curious about the difference between this approach, and one where we just rely on the fallback stub.  Instead of clearing all the optimized stubs and attaching a new generic stub, we can just remove all the optimized stubs and set a "generified" flag on the fallback stub that tells it not to attach new optimized stubs.

The main reason I'm concerned about this is the cost overhead of a new VMWrapper for DoGetPropGeneric.  Each of the VMWrapper trampolines for every new native function being called with a vmCall uses up a ton of memory.  It was leading to slow start-up times on B2G.  We want to try to minimize these as much as possible.

Can you try the approach suggested here and see if it gives roughly equivalent results?  If we can avoid a new VMWrapper, that'd be a good thing.
Attachment #8468727 - Flags: feedback?(kvijayan)
Attached patch v2.patch (obsolete) — Splinter Review
Looks good, thanks. With same parameter like above I get:
microbenchmark
pre-patch: 26.8576 ms
with patch: 18.966 ms
-> 29 % improvement

typescript
pre-patch: 4806.7
with patch: 4884.3
-> 1.6 % improvement

One Question: Why do we monitor the result inside DoGetPropFallback. Isn't the typemonitor chain triggered anyway through masm(ICGetProp_Fallback::compiler::generateCode at least looks like this)?
Attachment #8468727 - Attachment is obsolete: true
Attachment #8470444 - Flags: feedback?(kvijayan)
Comment on attachment 8470444 [details] [diff] [review]
v2.patch

Review of attachment 8470444 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome!

About the Monitor in the fallback C++ code: Fallback stubs aren't monitored by the TypeMonitor IC chain.  They return directly to the caller, so fallback code has to manually update the relevant typeset from the C++ code, just as the interpreter does.
Attachment #8470444 - Flags: feedback?(kvijayan) → feedback+
Attached patch v3.patch (obsolete) — Splinter Review
Well, this didn't really work out, the Monitor-Call is apparently to costly.
I get the following results:
microbenchmark:
pre-patch: 26.0322 ms
with patch: 20.5875 ms
-> 21 % improvement
but...
octane-tyescript:
pre-patch: 4738.2
with patch: 4493.6
-> 5 % regression

Doesn't look like this is really an optimization, what do you think?
Attachment #8470444 - Attachment is obsolete: true
Flags: needinfo?(kvijayan)
Well, I think your original patch is the way to go then.  Thanks for trying out my suggestion anyway.
Flags: needinfo?(kvijayan)
Attached patch v4.patch (obsolete) — Splinter Review
Ok, if you think, that's worth the memory cost, let's use my first patch. Nevertheless I would also be fine with not doing anything about this, as the wins aren't too big, so it's up to you.

I also did some further testing on setting MAX_OPTIMIZED_STUBS to 16, as jandem proposed in bug 965793, comment 4 , and the results are promising:
only-baseline typescript:
stock: 4717.8
patch-8: 4758,7
patch-16: 4798

full octane with both jits:
stock: 18040.6
patch-8: 18036.8
patch-16: 18076.8
(all results are the average of 10 runs)

So I'm also attaching a v4 with MAX_OPTIMIZED_STUBS = 16. Your decision, if we pick one at all, and if so, which one :)
Attachment #8474233 - Attachment is obsolete: true
Attachment #8474754 - Flags: review?(kvijayan)
Attachment #8468727 - Attachment is obsolete: false
Attachment #8468727 - Flags: review?(kvijayan)
Comment on attachment 8474754 [details] [diff] [review]
v4.patch

Review of attachment 8474754 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  I want an additional ASSERT at the top of DoGetPropFallback, right before the call to |ComputeGetPropResult|.

We should never get to the fallback stub when the generic GetProp stub is attached.  So we should JS_ASSERT that if we're at the fallback stub, that no GetProp_Generic stub is attached:

  JS_ASSERT(!stub->hasStub(ICStub::GetProp_Generic));


r=me with comments addressed.

::: js/src/jit/BaselineIC.cpp
@@ +6534,5 @@
>          return false;
>  
>      if (stub->numOptimizedStubs() >= ICGetProp_Fallback::MAX_OPTIMIZED_STUBS) {
> +        // Discard all stubs in this IC and replace with generic getprop stub.
> +        for(ICStubIterator iter = stub->beginChain(); !iter.atEnd(); iter++) {

Nit: SpiderMonkey style is to not use braces on one-line conditionals.
Attachment #8474754 - Flags: review?(kvijayan) → review+
Attached patch v5.patch (obsolete) — Splinter Review
Thanks for the quick review. Could you please do a try-run for this, as I don't have the necessary commit access level?
Attachment #8468727 - Attachment is obsolete: true
Attachment #8474754 - Attachment is obsolete: true
Attachment #8468727 - Flags: review?(kvijayan)
Attachment #8474816 - Flags: review+
Flags: needinfo?(kvijayan)
Oh yes, seems like I have to keep the stack synced for the decompiler. Will work in this, once I'm back on my dev-machine instead of my smartphone :)
Ah!  Yeah.. just take a look at the PopValues(N) arguments passed to VMFunction trampolines in other places in BaselineIC.cpp.
Attached patch v6.patch (obsolete) — Splinter Review
PopValues looks like it's only useful for tailcalls. This approach seems to work. Again, if r+, a try-run would be great.
Attachment #8474816 - Attachment is obsolete: true
Attachment #8481947 - Flags: review?(kvijayan)
Comment on attachment 8481947 [details] [diff] [review]
v6.patch

Review of attachment 8481947 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Good call on the Stow/Unstow thing.  Doing try run now: https://tbpl.mozilla.org/?tree=Try&rev=a75c03a88589
Attachment #8481947 - Flags: review?(kvijayan) → review+
Attached patch v7.patchSplinter Review
Try-run in comment 15.
Attachment #8481947 - Attachment is obsolete: true
Attachment #8482878 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6aae588b39a7
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: