36.8% Regression on Massive on Dec 12, 2014

RESOLVED FIXED in Firefox 37

Status

()

RESOLVED FIXED
4 years ago
4 months ago

People

(Reporter: h4writer, Assigned: h4writer)

Tracking

({regression})

unspecified
mozilla38
x86_64
Mac OS X
regression
Points:
---

Firefox Tracking Flags

(firefox34 unaffected, firefox35 unaffected, firefox36 unaffected, firefox37+ fixed, firefox38 verified)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
http://arewefastyet.com/#machine=30&view=single&suite=massive&start=1417421034&end=1418948816
Note: don't look at the axis. It is currently mislabeled. On that graph higher is better.

http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f1f48ccb2d4e&tochange=48f4e72f94cb

I didn't find a specific changeset that could have caused this.
(Assignee)

Updated

4 years ago
Flags: needinfo?(luke)

Comment 1

4 years ago
Alon, I know you've been watching these numbers; have you seen a similar drop?  Hannes, is the harness scraping the final top-level number that stock Massive outputs in a normal opt browser build?  (As a feature request, it'd be pretty helpful to track the individual section scores (Main Thread Responsiveness, Throughput, Preparation, ...) as it would both help pin down what got worse.)

I don't really have time to bisect this (it requires building a browser), perhaps, if Alon also sees the regression, we can request QA bisection.  Because 36% is pretty significant.
Flags: needinfo?(luke) → needinfo?(azakai)
(Assignee)

Comment 2

4 years ago
(In reply to Luke Wagner [:luke] from comment #1)
> Alon, I know you've been watching these numbers; have you seen a similar
> drop?  Hannes, is the harness scraping the final top-level number that stock
> Massive outputs in a normal opt browser build?  (As a feature request, it'd
> be pretty helpful to track the individual section scores (Main Thread
> Responsiveness, Throughput, Preparation, ...) as it would both help pin down
> what got worse.)

No need for feature request ;).
http://www.arewefastyet.com/#machine=30&view=breakdown&suite=massive

I think most are here labeled correct, except for one subtest.

Comment 3

4 years ago
Ooh nice.  And, to confirm, these are scraped from a full opt browser running stock Massive?

Second question: is "up" still always "good"?  I see several benchmarks that go up significantly (massive-box2d-throughput, massive-box2d-variance, massive-lua-binarytrees) and one that goes down a bunch (massive-lua-scimark).
(Assignee)

Comment 4

4 years ago
(In reply to Luke Wagner [:luke] from comment #3)
> Ooh nice.  And, to confirm, these are scraped from a full opt browser
> running stock Massive?

Indeed.

Browser is from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/b2g-inbound-macosx64/
with:
- user_pref("dom.max_script_run_time", 0);
- env["JSGC_DISABLE_POISONING"] = "1";

The website is served through a http proxy which caches the original site and scrapes the results.

> Second question: is "up" still always "good"?  I see several benchmarks that
> go up significantly (massive-box2d-throughput, massive-box2d-variance,
> massive-lua-binarytrees) and one that goes down a bunch
> (massive-lua-scimark).

In the breakdown view the thumb of rule is "lower is better", except for massive-lua-scimark where "higher is better".
(Sorry but that benchmark is so annoying with giving results mixed between score and ms and I have no way to handle that on AWFY currently.)

Comment 5

4 years ago
So running Massive locally, I see throughput numbers that pretty much match the pre-Dec-12 numbers.  The magnitude of the regression makes me thing there is a problem in the testing harness.  Could you perhaps manually run the nightlies manually on the testing machine, verify the "successfully compiled asm.js code" console message, etc?

Comment 6

4 years ago
Alon: new question: do you think we could change lua-scimark to be lower-is-better?
(Assignee)

Comment 7

4 years ago
(In reply to Luke Wagner [:luke] from comment #5)
> So running Massive locally, I see throughput numbers that pretty much match
> the pre-Dec-12 numbers.  The magnitude of the regression makes me thing
> there is a problem in the testing harness.  Could you perhaps manually run
> the nightlies manually on the testing machine, verify the "successfully
> compiled asm.js code" console message, etc?

Sure

(In reply to Luke Wagner [:luke] from comment #6)
> Alon: new question: do you think we could change lua-scimark to be
> lower-is-better?

And if possible also provide the final score in "lower is better".
That would make it much easier for AWFY and to read results.
Flags: needinfo?(hv1989)

Comment 8

4 years ago
Flipping the final score would confuse people using Massive already, but we could add it as an option I suppose. Flipping scimark specifically seems less risky, but the inverse would be hard to interpret. However, in both cases - doesn't awfy already have the capability to flip axes, since it does that for Octane?
Flags: needinfo?(azakai)
(Assignee)

Comment 9

4 years ago
(In reply to Alon Zakai (:azakai) from comment #8)
> Flipping the final score would confuse people using Massive already, but we
> could add it as an option I suppose. Flipping scimark specifically seems
> less risky, but the inverse would be hard to interpret. However, in both
> cases - doesn't awfy already have the capability to flip axes, since it does
> that for Octane?

AWFY can flip between "score" and "ms", but only on a benchmark basis. So it gives final score and subtest score in the same direction. If we could have total Massive score as "ms" (even if it is only an option) and flip that scimark benchmark, it would be fixed for AWFY. And I would show everything as "ms" (lower is better)
Flags: needinfo?(hv1989)
Ok, now I understand.

I pushed a 1.1 update to Massive now, which uses 'higher is better' for all the individual results (and keeps the final score as 'higher is better'), so it is all consistent now. Let me know if you need any other changes.
(Assignee)

Comment 11

4 years ago
(In reply to Alon Zakai (:azakai) from comment #10)
> Ok, now I understand.
> 
> I pushed a 1.1 update to Massive now, which uses 'higher is better' for all
> the individual results (and keeps the final score as 'higher is better'), so
> it is all consistent now. Let me know if you need any other changes.

TYVM!!!


Manual bisecting gives:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=43f41a82aa63&tochange=54e57eca5929
Flags: needinfo?(sunfish)
(Assignee)

Updated

4 years ago
OS: Linux → Mac OS X
(Assignee)

Comment 12

4 years ago
[Tracking Requested - why for this release]:
Bug 1065339 which caused this regression on Massive landed in FF37
Blocks: 1065339
status-firefox34: --- → unaffected
status-firefox35: --- → unaffected
status-firefox36: --- → unaffected
status-firefox37: --- → affected
tracking-firefox37: --- → ?
(Assignee)

Comment 13

4 years ago
Given http://www.arewefastyet.com/#machine=30&view=breakdown&suite=massive

massive-box2d-throughput: 951% regression
massive-box2d-throughput-f32: 1798% regression
massive-box2d-variance: 1350% regression
massive-lua-binarytrees: 29% regression
massive-lua-scimark: 50.3% regression

I checked and they all still get odinmonkey compiled!
I'll investigate this when I get time, but if someone wants to do something quick in the meantime, it'd be useful to see what affect --no-avx has.
(Assignee)

Comment 15

4 years ago
Created attachment 8548462 [details] [diff] [review]
Disable AVX for now

My initial investigation didn't gave me a specific issue that should get fixed. 

So pending further investigation this patch disables the AVX feature. This is an easy patch to uplift since FF37 is affected.
Assignee: nobody → hv1989
Attachment #8548462 - Flags: review?(jdemooij)
(Assignee)

Comment 16

4 years ago
@Sunfish: When reading more into the Legacy encodings and AVX encodings I found two interesting things:

- I think you mentioned that switching between legacy encoding and AVX shouldn't have a penalty. Though intel is reporting it will: 
https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf 

- V8 is blacklisting AVX for MacOSX 10.9 and lower. Apparently not for 10.10. Note: AWFY is running Mac OSX 10.10.
https://www.mail-archive.com/v8-dev@googlegroups.com/msg120690.html

Comment 17

4 years ago
Wow, nice detective work Hannes; that certainly sounds like it could be the explanation of what went wrong here!   The V8 patch mentions ISR transitions and indeed the Intel pdf you linked to specifically calls out ISRs on page 2.
Comment on attachment 8548462 [details] [diff] [review]
Disable AVX for now

Review of attachment 8548462 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks a lot for investigating this!

::: js/src/shell/js.cpp
@@ +5749,5 @@
>                               "to test JIT codegen (no-op on platforms other than x86 and x64).")
>          || !op.addBoolOption('\0', "no-sse4", "Pretend CPU does not support SSE4 instructions"
>                               "to test JIT codegen (no-op on platforms other than x86 and x64).")
> +        || !op.addBoolOption('\0', "enable-avx", "AVX is disabled by default. Enable AVX. "
> +                             "(no-op on platforms other than x86 and x64).")

Removing the flag might break the fuzzers. Maybe keep it as a no-op or give gkw a heads-up?
Attachment #8548462 - Flags: review?(jdemooij) → review+
(In reply to Hannes Verschore [:h4writer] from comment #16)
> @Sunfish: When reading more into the Legacy encodings and AVX encodings I
> found two interesting things:
> 
> - I think you mentioned that switching between legacy encoding and AVX
> shouldn't have a penalty. Though intel is reporting it will: 
> https://software.intel.com/sites/default/files/m/d/4/1/d/8/
> 11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf 

The penalty is documented to happen when transitioning between 256-bit AVX instructions and legacy SSE instructions. We aren't using any 256-bit AVX instructions right now, so in theory we shouldn't get the penalty. However, there's enough uncertainty here that we should probably switch our default here to use all AVX in AVX mode, just so that we can stop wondering about it.

> - V8 is blacklisting AVX for MacOSX 10.9 and lower. Apparently not for
> 10.10. Note: AWFY is running Mac OSX 10.10.
> https://www.mail-archive.com/v8-dev@googlegroups.com/msg120690.html

This is useful information. However, if AWFY is running 10.10, then this doesn't explain the regression, right?
(Assignee)

Comment 20

4 years ago
(In reply to Dan Gohman [:sunfish] from comment #19)
> > - V8 is blacklisting AVX for MacOSX 10.9 and lower. Apparently not for
> > 10.10. Note: AWFY is running Mac OSX 10.10.
> > https://www.mail-archive.com/v8-dev@googlegroups.com/msg120690.html
> 
> This is useful information. However, if AWFY is running 10.10, then this
> doesn't explain the regression, right?

Yeah indeed. I've been wondering about that too. Though I'm running 10.10, I'm compiling with XCode for 10.9. I have no idea if that could cause this.

Another idea I have is that this slowdown isn't specific to Mac OS X, but also present on Linux/Windows, only we don't hit the case. The variance of the runs is very high. The avg on the affected system is 30ms (stable), but has a high variance. Some runs give 2ms, other 100ms. Also we don't see this regression on shell builds in Mac running Mac OS X.

(In reply to Dan Gohman [:sunfish] from comment #19)
> The penalty is documented to happen when transitioning between 256-bit AVX
> instructions and legacy SSE instructions. We aren't using any 256-bit AVX
> instructions right now, so in theory we shouldn't get the penalty. However,
> there's enough uncertainty here that we should probably switch our default
> here to use all AVX in AVX mode, just so that we can stop wondering about it.

Hmm. Okay. So the upper part of the 128bit register should always be zero? Could it be that we change it by accident with a function that also touches the upper bits?

So it is all still guess work.
https://hg.mozilla.org/mozilla-central/rev/5e6e959f0043
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38

Comment 23

4 years ago
Hannes, can you verify the regressions were fixed?  I tried looking on awfy, but I didn't see anything but I also don't know if I'm looking at the right thing.
(Assignee)

Comment 24

4 years ago
(In reply to Luke Wagner [:luke] from comment #23)
> Hannes, can you verify the regressions were fixed?  I tried looking on awfy,
> but I didn't see anything but I also don't know if I'm looking at the right
> thing.

Oh yeah I will verify. The browser benchmarks run every morning GMT. So tomorrow I'll verify and mark for uplift if everything is well.
(Assignee)

Comment 25

4 years ago
(In reply to Hannes Verschore [:h4writer] from comment #24)
> (In reply to Luke Wagner [:luke] from comment #23)
> > Hannes, can you verify the regressions were fixed?  I tried looking on awfy,
> > but I didn't see anything but I also don't know if I'm looking at the right
> > thing.
> 
> Oh yeah I will verify. The browser benchmarks run every morning GMT. So
> tomorrow I'll verify and mark for uplift if everything is well.

Confirmed fixed! Back to from 3880 to 7300.
Flags: needinfo?(sunfish)
(Assignee)

Updated

4 years ago
Duplicate of this bug: 1118237
(Assignee)

Updated

4 years ago
Duplicate of this bug: 1118238
(Assignee)

Comment 28

4 years ago
Comment on attachment 8548462 [details] [diff] [review]
Disable AVX for now

Approval Request Comment

[Feature/regressing bug #]: bug 1065339

[User impact if declined]: Huge performance loss on Massive benchmark on Mac OS X

[Describe test coverage new/current, TBPL]: 1 day TBPL. AreWeFastYet confirms this fixes the 89% regression

[Risks and why]: 
Low risk. This disables a feature. We used to run in the before configuration. So that should be tested quite extensive.

[String/UUID change made/needed]:
/
Attachment #8548462 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 29

4 years ago
(In reply to Alon Zakai (:azakai) from comment #10)
> I pushed a 1.1 update to Massive now, which uses 'higher is better' for all
> the individual results (and keeps the final score as 'higher is better'), so
> it is all consistent now. Let me know if you need any other changes.

Starting of tomorrow the AWFY will use v1.1. Thanks for this.
(Had to wait to confirm regression was fixed.)
status-firefox38: --- → verified
tracking-firefox37: ? → +
Comment on attachment 8548462 [details] [diff] [review]
Disable AVX for now

Aurora+
Attachment #8548462 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
There's a discrepancy between this bug and dup bug 1118238. Bug 1118238 has Firefox 36 marked as affected and this bug has it marked as unaffected. Can you confirm that 36 is unaffected?
Flags: needinfo?(hv1989)
(Assignee)

Comment 32

4 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #31)
> There's a discrepancy between this bug and dup bug 1118238. Bug 1118238 has
> Firefox 36 marked as affected and this bug has it marked as unaffected. Can
> you confirm that 36 is unaffected?

Good catch. My fault. Firefox 36 is indeed unaffected. Bug 1065339 has only landed in Firefox 37.
Flags: needinfo?(hv1989)
Dan, do you think it's worth trying to re-enable AVX at some point?
Flags: needinfo?(sunfish)
Keywords: regression
Possibly. We did see a few speedups [0]. And by now, stable AVX support in mainstream OS's ought to be more common, so in theory we can be more conservative on older OS's rather than trying to identify specific configurations that support it, making it simpler to decide when it's safe to enable. That said, I don't think it's urgent.

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1065339#c54
Flags: needinfo?(sunfish)
You need to log in before you can comment on or make changes to this bug.