Closed Bug 1111715 Opened 7 years ago Closed 7 years ago

3.47% OSX 10.6|win8 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(1 file)

JITs are now able to emit AVX instructions, but they need to check that AVX is supported by the processor. This was incorrectly made before, and this commit fixes AVX detection.

Two possible explanations for this regression come to mind:
- AVX was detected beforehands, while it shouldn't have been; however AVX instructions were correctly generated and used. Joel, are there any chances that the XSAVE / XRESTORE feature is disabled on the VMs / machines running the Talos suite tests? (might be an option in the VM config or in the machines' bios, sorry I can't really help more) Is this regression showing up only on MacOS X?
- Feature detection is OS-dependent, and we shouldn't check for the XSAVE bit on some platforms. Hard to confirm.
Flags: needinfo?(jmaher)
these are run on real hardware:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation

I did see this in windows 7 also, I haven't seen it on osx, but there is a chance it did increase there- we coalesced a lot of jobs around this time so our osx numbers are light in that area.  Here are the graphs for osx:
http://graphs.mozilla.org/graph.html#tests=%5B%5B261,63,21%5D,%5B261,63,24%5D%5D&sel=1418152438766.4546,1418345562402.818,193103448.27586204,400000000&displayrange=7&datatype=running

I really don't see a bump, if there is a bump it is really really small :)
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #2)
> these are run on real hardware:
> https://wiki.mozilla.org/Buildbot/Talos/
> Misc#Hardware_Profile_of_machines_used_in_automation
> 
> I did see this in windows 7 also, I haven't seen it on osx

I am confused: the bug title says OS X, and the platform Bugzilla field says OS X as well?

Any chance you could give a look at the boot configuration data-store of the Windows machines and see if xsavedisable is set to 0? Apparently bcdedit is available from the Win command line, but I don't have more clues. More information at [0].

[0] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542202%28v=vs.85%29.aspx
Flags: needinfo?(jmaher)
ah, sorry- I was looking at a different regression, here are the graphs of concern:
http://graphs.mozilla.org/graph.html#tests=%5B%5B72,63,21%5D,%5B72,131,31%5D,%5B72,131,25%5D,%5B72,131,35%5D%5D&sel=1418275916042.371,1418446487756.1814&displayrange=7&datatype=running

Win8 shows a drop along with osx 10.6.  Linux64 and Win7 show a larger range of values (could be considered a drop, but the highs remain the same).

let me get a loaner win8 machine a try it out.
Flags: needinfo?(jmaher)
Depends on: 1112120
ok, looking on a win8 loaner slave, xsavedisable is not set; in fact this is the output from bcdedit:
Windows Boot Manager
--------------------
identifier              {bootmgr}
device                  partition=C:
description             Windows Boot Manager
locale                  en-US
inherit                 {globalsettings}
integrityservices       Enable
default                 {current}
resumeobject            {59308483-772a-11e4-8cc1-0025909b6796}
displayorder            {current}
toolsdisplayorder       {memdiag}
timeout                 30

Windows Boot Loader
-------------------
identifier              {current}
device                  partition=C:
path                    \windows\system32\winload.exe
description             Windows 8
locale                  en-US
inherit                 {bootloadersettings}
recoverysequence        {1d3e8abd-772b-11e4-b316-0025909b6797}
integrityservices       Enable
recoveryenabled         Yes
allowedinmemorysettings 0x15000075
osdevice                partition=C:
systemroot              \windows
resumeobject            {59308483-772a-11e4-8cc1-0025909b6796}
nx                      OptIn
bootmenupolicy          Standard


There is no xsavedisable and when I did:
bcdedit /set xsavedisable 0

then bcdedit showed xsavedisable in the output.  Likewise setting it to a value of '1' yielded a value of '1' in the default output.  

I wonder if this is '0' by default, it is hard to tell.
Would there be a way to run the Dromaeo CSS tests in two configurations, please?

- one with |bcdedit /set xsavedisable 0| *before* running the tests.
- one with |bcdedit /set xsavedisable 1| *before* running the tests.

So we can be sure whether it's due to this parameter or not, and what's the initial default value.
Flags: needinfo?(jmaher)
I ran dromaeo css on win8 with 3 scenarios:
* default
* xsavedisable 0
* xsavedisable 1

results are up here:
https://docs.google.com/a/mozilla.com/spreadsheets/d/153G3zZYWh-2aXBn_wNeqKn9WqGQj0TNUa23AEimuwbI/edit?usp=sharing

I really don't see any difference in the results.
Flags: needinfo?(jmaher)
Wow, this is weird. The only last idea i have is that as Dan added more tests that could prevent AVX, and these changes have been checked in in the meanwhile, this might have an impact on tests here.

What i would try:
- hg up -r 9e9daf256c68
- run the tests as described in comment 6 again
- hg up -r 19edf4b9d338
- run tests again

Unfortunately, I don't have any other ideas.
Great, I did some more work on this- I pushed some builds to try, and the results are summarized in the same spreadsheet as above:
https://docs.google.com/a/mozilla.com/spreadsheets/d/153G3zZYWh-2aXBn_wNeqKn9WqGQj0TNUa23AEimuwbI/edit?usp=sharing

you can see that 19edf4b9d338 has a higher value than 9e9daf256c68.  For dromaeo_css, lower is better, so rev 19edf4b9d338 is an improvement.
for reference, here is the build for 19edf4b9d338:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=159e3d1f2154

and here is the build for 9e9daf256c68:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=291bfbad546c

it is possible that the build configurations to try don't match 100% what we build for nightly.
Attached patch check.patchSplinter Review
It's unclear to me what's happening here. Maybe there were some other improvements between the two revisions that I've referred to in comment 8. Other than that, the second revision is *at least* as restrictive as the first revision, so results should be equivalent.

2 possible cases:
- we're seeing AVX instructions being emitted, then this bug can be closed (the regression would have been introduced by these instructions *not* being emitted *anymore*. If it's not this, it's not related to this change.).
- AVX instructions aren't emitted, while they were before rev 9e9daf256c68, and there's something weird going on. First, how did it work in the first place? Secondly, what did effectively deactivate AVX emitting?

Here is a patch that printf's if AVX instructions are enabled / present or not. It will spam stdout with this information, so that will badly affect performance, but that's not what I want to test here. Right before 9e9daf256c68, I'd expect to see "enabled and present". Afterwards, that's what we want to see.

Joel, any chance you could please test
- before 9e9daf256c68 with this patch
- after 9e9daf256c68 with this patch
- before 19edf4b9d338 with this patch
- after 19edf4b9d338 with this patch
Flags: needinfo?(jmaher)
patched builds:
at 9e9daf256c68: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c8ab3a5c29cc
before 9e9daf256c68: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a88f9c7479e4

at 19edf4b9d338: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5c0f5d80cc07
before 19edf4b9d338: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e97b71310403

in a few hours we will see the results run on buildbot infrastructure, we can then look at the raw logs and decipher what is useful
Flags: needinfo?(jmaher)
Summary: 3.47% OSX 10.6 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68 → 3.47% OSX 10.6|win8 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68
on all 4 pushes, I see this:
B1111715 avx is not present and enabled

I suspect that isn't what we were expecting to see.
(In reply to Joel Maher (:jmaher) from comment #13)
> on all 4 pushes, I see this:
> B1111715 avx is not present and enabled
> 
> I suspect that isn't what we were expecting to see.

If that is so, then this isn't related to the changesets hereby declared guilty.  Is there any way the bisection got wrong? Or did the regression disappear a few days after it was introduced, somehow?  (I've seen it was the case in other bugs you've commented to)
looking at the graphs here, I see the issue is still there, no better no worse.  It could be possible the wrong changeset was identified.  Obviously we did our due diligence for this patch(set).  Lets see how things look during the uplift today.
Has the fix in bug 1118235 had any effect on benchmarks, Joel? If so, it might be interesting to mention it in the bug there...
Flags: needinfo?(jmaher)
I don't see big changes here, although there are only a few data points since landing that
Flags: needinfo?(jmaher)
this has since shipped, any issues with me closing it?
Flags: needinfo?(benj)
The changeset that was identified as the cause of this regression enabled AVX instructions. However, bug 1118235 disabled these AVX instructions in our JITs. As a result, if the regression was caused by enabling AVX instructions, disabling them should have made the regression disappear. Now we should have more data points to look at after landing of bug 1118235, so did it make a difference? In any cases, I can't think of other things to do here.
Flags: needinfo?(benj) → needinfo?(jmaher)
there was a slight improvement back in January- it sounds like we fixed as much of this as we could!  Thanks for doing what you could back in January!
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jmaher)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.