Closed Bug 903802 Opened 11 years ago Closed 11 years ago

Benchmark regressions: 7.2% Sunspider, 5% Octane and 3.6% Kraken

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox25 --- unaffected
firefox26 + fixed
firefox27 --- fixed

People

(Reporter: nbp, Assigned: bhackett1024)

References

Details

(Keywords: perf, regression, Whiteboard: [c=benchmark p= s= u=])

Attachments

(3 files)

This regression is visible on AWFY [1, 2], and is not visible else-where.  It is possible that this is not only a FxOS issue but potentially a Browser issue as the tests are made in the browser application of the phone.

This bug is certainly not filed in the correct component, but until we find the patch responsible for these regressions, I am filling it as part of the JS Engine.  Note that such regression is like an orange on tbpl and will cause the patch to be back out [3] unless fixed in time.

Note: Octane did not run on AWFY between 17:44 and 21:30, because Bug 903394 caused it to OOM/crash.

[1] http://arewefastyet.com/#machine=14
[2] http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=6a5a7b55c22a954f12f4c7e8e178706daa806fe7&tochange=29c341748a6b6358110446ea3c0466dfa799fb6c
[3] https://groups.google.com/forum/#!topic/mozilla.dev.tech.js-engine.internals/xq6uDKWU8yk
The most likely culprit is bug 895758, I suspect.

Is there something (presumably around bareword var access) that the JS engine deoptimizes when there is a proxy on the proto chain of the global?  That would certainly explain why shell tests are not showing a change...

If the issue is in fact bug 895758, I suspect this will in fact need to be fixed in the JS engine.
(In reply to Vacation until Aug 19.  Do not ask for review. from comment #1)
> Is there something (presumably around bareword var access) that the JS
> engine deoptimizes when there is a proxy on the proto chain of the global? 

We don't optimize proxy accesses. A proxy resolve method is hard to optimize from the JS point of view, unless we can garantee that it is pure and that it would not change.  This is the main reason why Proxy are not optimized at all, except for the corner cases of DOM proxies where we made an *exception* in IonMonkey.

I guess using a proxy for the global will break the lookup of named variable and we will have to walk the scope chain to then add a property on the wrapped object.  I know TI has some special rules for the __proto__ chain of the global object, and that changing that might break many optimizations, but I don't remember the details.

I am not sure if this answer your question.
It doesn't.

The global is not a proxy, of course: SpiderMonkey flat-out doesn't support that.  But there is now a Proxy on the proto chain of the global, as I said in comment 1.

This is a hard requirement: the behavior that we need here requires either a global as proxy or a proxy on its proto chain.  If the issue is that TI can't handle the latter, we need to fix it.  Note that this is a somewhat well-behaved proxy, so if TI just needs to make certain assumptions it might still be able to make them.

Also note that there used to be an object with a resolve hook where we now have a proxy, which seems like it would be just as bad.
OK, I can reproduce a 7% or so regression in browser (on Mac) from bug 895758.

I'll attach a file with links to test runs from the corresponding builds and a breakdown of the changes, but the upshot is:

3d:morph:           *1.25x as slow*  4.0ms +/- 8.4%   5.0ms +/- 6.7%      significant
bitops:bitwise-and: *2.37x as slow*  1.9ms +/- 11.9%  4.5ms +/- 8.4%      significant

and some across-the-board slowdowns in the "crypto" and "string" tests.

I guess starting with bitips-bitwise-and and figuring out what's up might be a good starting point.
Blocks: 895758
Attached file The data
Looks like the entire bitwise-and benchmark looks like this:

  bitwiseAndValue = 4294967296;
  for (var i = 0; i < 600000; i++)
      bitwiseAndValue = bitwiseAndValue & i;

or so.

Here's what jit inspector has to say about the generated code in the old build:

---------------------------------------------------------------------------
Block #3 -> #4 -> #5 :: 59998904 hits
[InterruptCheck]
movabsq    $0x109dcd088, %r11
cmpl       $0x0, 0x0(%r11)
jne        ((628))
[OsiPoint]
[LoadSlotT]
movl       0x650(%rax), %edx
[CompareAndBranch]
cmpl       %ecx, %edx
jge        ((642))

Block #4 -> #3 :: 59998903 hits
[MoveGroup]
[LoadSlotT]
movl       0x660(%rax), %ebx
[BitOpI]
andl       %edx, %ebx
[StoreSlotT]
movl       %ebx, 0x660(%rax)
[AddI]
addl       $0x1, %edx
[StoreSlotT]
movl       %edx, 0x650(%rax)
[MoveGroup]
[Goto]
jmp        ((684))
##link     ((684)) jumps to ((594))
---------------------------------------------------------------------------

and here's what we get in the new build:

---------------------------------------------------------------------------
Block #3 -> #4 -> #5 :: 59998903 hits
[InterruptCheck]
movabsq    $0x109cd1088, %r11
cmpl       $0x0, 0x0(%r11)
jne        ((815))
[OsiPoint]
[LoadSlotV]
movq       0x650(%rax), %rcx
[Unbox]
movq       %rcx, %r11
shrq       $47, %r11
cmpl       $0x1fff1, %r11d
jne        ((842))
movl       %ecx, %ebx
[CompareAndBranch]
cmpl       %edx, %ebx
jge        ((852))

Block #4 -> #3 :: 59998902 hits
[MoveGroup]
[LoadSlotV]
movq       0x660(%rax), %rcx
[Unbox]
movq       %rcx, %r11
shrq       $47, %r11
cmpl       $0x1fff1, %r11d
jne        ((893))
movl       %ecx, %ebp
[BitOpI]
andl       %ebx, %ebp
[StoreSlotT]
jmp        ((902))
movq       0x660(%rax), %r11
shrq       $47, %r11
cmpl       $0x1fff5, %r11d
jb         ((926))
push       %rdx
leaq       0x660(%rax), %rdx
call       ((939))
pop        %rdx
##link     ((926)) jumps to ((940))
jmp        ((945))
.balign 8
##link     ((945)) jumps to ((952))
##link     ((902)) jumps to ((952))
movl       %ebp, 0x660(%rax)
movl       $0xfff88000, 0x664(%rax)
[AddI]
addl       $0x1, %ebx
[StoreSlotT]
jmp        ((976))
movq       0x650(%rax), %r11
shrq       $47, %r11
cmpl       $0x1fff5, %r11d
jb         ((1000))
push       %rdx
leaq       0x650(%rax), %rdx
call       ((1013))
pop        %rdx
##link     ((1000)) jumps to ((1014))
jmp        ((1019))
.balign 8
##link     ((1019)) jumps to ((1024))
##link     ((976)) jumps to ((1024))
movl       %ebx, 0x650(%rax)
movl       $0xfff88000, 0x654(%rax)
[MoveGroup]
[Goto]
jmp        ((1045))
##link     ((1045)) jumps to ((781))
---------------------------------------------------------------------------

We're clearly failing to optimize the own property on the global the way we used to...
Flags: needinfo?(jdemooij)
OS: Gonk (Firefox OS) → All
Hardware: ARM → All
Looks like our browser test automation finally has enough of a baseline to call this a regression too: it's showing about 10% regressions on V8 and about 3% on Kraken, which is consistent with the numbers for this bug.
I will investigate this today; if it's something complicated we should consider backing out bug 895758 in the meantime though.
So when an object has a proxy on its proto chain, we deoptimize property accesses even if the object itself is not a proxy. Brian, can you think of an easy fix here?
Flags: needinfo?(bhackett1024)
Summary: AWFY FxOS regressions: 7.2% Sunspider, 5% Octane and 3.6% Kraken on Aug 9 between 18:40 - 20:02 → Benchmark regressions: 7.2% Sunspider, 5% Octane and 3.6% Kraken
Attached patch patchSplinter Review
The problem here is that proxies have unknown properties and that gets propagated to whatever other objects have the proxy on their prototype chain.  Type information for an object is only required to describe native properties of an object, either own properties or those inherited on lookups from prototypes.

Given this, it would be ok if the proxy's type had empty rather than unknown properties.  Making that change could be problematic as proxy prototypes sometimes mutate their prototypes dynamically (or, at least this used to be the case) which will mark the types as unknown anyways and do an expensive crawl of type sets in the compartment.

An alternative is to loosen the definition of what the type set for a property represents, so that it only reflects properties inherited via cacheable lookups from prototypes.  We never cache lookups along proto chains containing a non-native object (and I don't see any reason why we ever would), so with this definition objects inheriting from proxies could have known types for properties appearing before the proxy, even if the proxy itself has unknown properties.  The attached patch makes this change and fixes the unknown property information when accessing global variables.
Assignee: general → bhackett1024
Attachment #790431 - Flags: review?(jdemooij)
Flags: needinfo?(jdemooij)
Flags: needinfo?(bhackett1024)
Keywords: perf
Keywords: perf
Keywords: regression
Comment on attachment 790431 [details] [diff] [review]
patch

Review of attachment 790431 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #790431 - Flags: review?(jdemooij) → review+
(In reply to Brian Hackett (:bhackett) from comment #12)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/2766d0ee65c5

AWFY acknowledge the change, and so far it seems to be even faster than it used to be.
https://hg.mozilla.org/mozilla-central/rev/2766d0ee65c5
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Brian, thank you for fixing this!

> We never cache lookups along proto chains containing a non-native object (and I don't
> see any reason why we ever would)

Ignoring DOM proxies for a second, since I assume those are already special-cased, the proto chain for the Window object looks like this:

global (Window) -> Window.prototype -> global scope polluter (proxy) -> EventTarget.prototype -> Object.prototype

The proxy in question there is the one bug 895758 added.  It has the nice property that it does not shadow things that are on its proto chain (just like some DOM proxies), so for example window.addEventListener will find EventTarget.prototype.addEventListener.  It might be good to cache that lookup in the future if it's not too hard to do.

Also, there should be no more dynamic mutation of proto chains (apart from whatever pages manually do with settable __proto__) for DOM proxy objects, since all of those are now on WebIDL bindings and get the right prototypes created right off the bat... again except for the global scope polluter proxy, which gets spliced (once) into an existing proto chain.  Unfortunately, I can't say whether the same is true for various other proxies (and in particular for the various cross-compartment wrappers we have).

Do we need any followup bugs about any of that?
Flags: needinfo?(bhackett1024)
> so for example window.addEventListener will find EventTarget.prototype.addEventListener. 

More precisely, bareword addEventListener will.  window.addEventListener will go through the OuterWindow proxy, which is a totally different thing altogether.  Given that such a bareword lookup of something on EventTarget.prototype is likely to be rare, it's probably not worth worrying about it.
Backed out for causing frequent Windows XP timeouts in test_transitions_per_property.html in bug 906378.
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5fd139db781
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla26 → ---
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #17)
> Backed out for causing frequent Windows XP timeouts in
> test_transitions_per_property.html in bug 906378.

(In particular, see bug 906378 comment 176, and bug 906378 comment 180 through 185)
Keywords: perf
Blocks: 709452
Attached patch alternateSplinter Review
Here is an alternate patch that is a bit more targeted and does not seem to have the M5 timeouts associated with the previous patch:

https://tbpl.mozilla.org/?tree=Try&rev=d777ffea9d36

This just does for the scope polluter proxy what we already do for the outer window proxy, which is give the object a singleton type and leave its type information as empty.
Attachment #806938 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Whiteboard: [c= p= s= u=]
Comment on attachment 806938 [details] [diff] [review]
alternate

Review of attachment 806938 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/WindowNamedPropertiesHandler.cpp
@@ +168,5 @@
>    gsp = js::NewProxyObject(aCx, WindowNamedPropertiesHandler::getInstance(),
>                             JS::NullHandleValue, protoProto,
> +                           js::GetGlobalForObjectCrossCompartment(aProto),
> +                           js::ProxyNotCallable,
> +                           /* singleton = */ true);

Can you add a short comment here explaining why we need a singleton type for the GSP?
Attachment #806938 - Flags: review?(jdemooij) → review+
Comment on attachment 806938 [details] [diff] [review]
alternate

[Approval Request Comment]
Bug caused by (feature/regressing bug #): scope polluter proxies
User impact if declined: large regression on benchmarks and JS heavy websites
Testing completed (on m-c, etc.): on inbound
Risk to taking this patch (and alternatives if risky): low
Attachment #806938 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/9bd2a167dfa1
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Attachment #806938 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
is there data on the success of this solving the benchmark regressions? That or instructions how to find/generate them.  Thanks
Keywords: verifyme
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #26)
> is there data on the success of this solving the benchmark regressions? That
> or instructions how to find/generate them.  Thanks

You can see this regression and its fix in the graphs on http://arewefastyet.com/#machine=14 by dragging to zoom in to August 9 and August 15th on each graph.
(In reply to Brian Hackett (:bhackett) from comment #22)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/9bd2a167dfa1

After investigating deeply the regression of Bug 915167, I am coming to the conclusion that the second patch did not completly fix the regression on AWFY FxOS for the following sunspider benchmarks:

- validate-input
- format-tofte
- tagcloud
- format-xparb
- cube

Brian, do you have any idea what could have been wrong in the second patch?

(In reply to Matt Brubeck (:mbrubeck) from comment #27)
> (In reply to [:tracy] Tracy Walker - QA Mentor from comment #26)
> > is there data on the success of this solving the benchmark regressions? That
> > or instructions how to find/generate them.  Thanks
> 
> You can see this regression and its fix in the graphs on
> http://arewefastyet.com/#machine=14 by dragging to zoom in to August 9 and
> August 15th on each graph.

For the first patch yes, but it got backed out and the results are not as conclusive for the second one.
Flags: needinfo?(bhackett1024)
Whiteboard: [c= p= s= u=] → [c=benchmark p= s= u=]
(In reply to Nicolas B. Pierron [:nbp] from comment #28)
> (In reply to Matt Brubeck (:mbrubeck) from comment #27)
> > (In reply to [:tracy] Tracy Walker - QA Mentor from comment #26)
> > > is there data on the success of this solving the benchmark regressions? That
> > > or instructions how to find/generate them.  Thanks
> > 
> > You can see this regression and its fix in the graphs on
> > http://arewefastyet.com/#machine=14 by dragging to zoom in to August 9 and
> > August 15th on each graph.
> 
> For the first patch yes, but it got backed out and the results are not as
> conclusive for the second one.

In this case, is there anything qa can help with here?
(In reply to Ioana Budnar, QA [:ioana] from comment #29)
> (In reply to Nicolas B. Pierron [:nbp] from comment #28)
> > For the first patch yes, but it got backed out and the results are not as
> > conclusive for the second one.
> 
> In this case, is there anything qa can help with here?

I do not think so, I think Brian is the best person who can answer this question.  I have no idea what was the details of the original slow down, and I did not analyze the second patch either to find out if cases were missing.
Removing keyword per the above comments and no further feedback.
Keywords: verifyme
(In reply to Nicolas B. Pierron [:nbp] from comment #28)
> (In reply to Brian Hackett (:bhackett) from comment #22)
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/9bd2a167dfa1
> 
> After investigating deeply the regression of Bug 915167, I am coming to the
> conclusion that the second patch did not completly fix the regression on
> AWFY FxOS for the following sunspider benchmarks:
> 
> - validate-input
> - format-tofte
> - tagcloud
> - format-xparb
> - cube
> 
> Brian, do you have any idea what could have been wrong in the second patch?

I don't think there's anything wrong with the patch in rev 9bd2a167dfa1.  All of these tests are as good as they've ever been on FxOS.  Some seem to have improved for no discernible reason, others improved a lot with bug 932982, bizarrely.  Maybe 9bd2a167dfa1 interacted with the GC in some mysterious way.
Flags: needinfo?(bhackett1024)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: