Closed
Bug 1111715
Opened 9 years ago
Closed 9 years ago
3.47% OSX 10.6|win8 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(1 file)
1.00 KB,
patch
|
Details | Diff | Splinter Review |
here is a graph: http://graphs.mozilla.org/graph.html#tests=%5B%5B72,63,21%5D%5D&sel=1418318091000,1418490891000&displayrange=7&datatype=running I did some retriggers on tbpl: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&fromchange=84bb72174c24&tochange=0f7050eddd7d&jobname=Rev4%20MacOSX%20Snow%20Leopard%2010.6%20mozilla-inbound%20talos%20dromaeojs this is the changeset where we went from ~4300+ down to ~4200 https://hg.mozilla.org/integration/mozilla-inbound/rev/9e9daf256c68
Comment 1•9 years ago
|
||
JITs are now able to emit AVX instructions, but they need to check that AVX is supported by the processor. This was incorrectly made before, and this commit fixes AVX detection. Two possible explanations for this regression come to mind: - AVX was detected beforehands, while it shouldn't have been; however AVX instructions were correctly generated and used. Joel, are there any chances that the XSAVE / XRESTORE feature is disabled on the VMs / machines running the Talos suite tests? (might be an option in the VM config or in the machines' bios, sorry I can't really help more) Is this regression showing up only on MacOS X? - Feature detection is OS-dependent, and we shouldn't check for the XSAVE bit on some platforms. Hard to confirm.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 2•9 years ago
|
||
these are run on real hardware: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation I did see this in windows 7 also, I haven't seen it on osx, but there is a chance it did increase there- we coalesced a lot of jobs around this time so our osx numbers are light in that area. Here are the graphs for osx: http://graphs.mozilla.org/graph.html#tests=%5B%5B261,63,21%5D,%5B261,63,24%5D%5D&sel=1418152438766.4546,1418345562402.818,193103448.27586204,400000000&displayrange=7&datatype=running I really don't see a bump, if there is a bump it is really really small :)
Flags: needinfo?(jmaher)
Comment 3•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #2) > these are run on real hardware: > https://wiki.mozilla.org/Buildbot/Talos/ > Misc#Hardware_Profile_of_machines_used_in_automation > > I did see this in windows 7 also, I haven't seen it on osx I am confused: the bug title says OS X, and the platform Bugzilla field says OS X as well? Any chance you could give a look at the boot configuration data-store of the Windows machines and see if xsavedisable is set to 0? Apparently bcdedit is available from the Win command line, but I don't have more clues. More information at [0]. [0] http://msdn.microsoft.com/en-us/library/windows/hardware/ff542202%28v=vs.85%29.aspx
Flags: needinfo?(jmaher)
Reporter | ||
Comment 4•9 years ago
|
||
ah, sorry- I was looking at a different regression, here are the graphs of concern: http://graphs.mozilla.org/graph.html#tests=%5B%5B72,63,21%5D,%5B72,131,31%5D,%5B72,131,25%5D,%5B72,131,35%5D%5D&sel=1418275916042.371,1418446487756.1814&displayrange=7&datatype=running Win8 shows a drop along with osx 10.6. Linux64 and Win7 show a larger range of values (could be considered a drop, but the highs remain the same). let me get a loaner win8 machine a try it out.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 5•9 years ago
|
||
ok, looking on a win8 loaner slave, xsavedisable is not set; in fact this is the output from bcdedit: Windows Boot Manager -------------------- identifier {bootmgr} device partition=C: description Windows Boot Manager locale en-US inherit {globalsettings} integrityservices Enable default {current} resumeobject {59308483-772a-11e4-8cc1-0025909b6796} displayorder {current} toolsdisplayorder {memdiag} timeout 30 Windows Boot Loader ------------------- identifier {current} device partition=C: path \windows\system32\winload.exe description Windows 8 locale en-US inherit {bootloadersettings} recoverysequence {1d3e8abd-772b-11e4-b316-0025909b6797} integrityservices Enable recoveryenabled Yes allowedinmemorysettings 0x15000075 osdevice partition=C: systemroot \windows resumeobject {59308483-772a-11e4-8cc1-0025909b6796} nx OptIn bootmenupolicy Standard There is no xsavedisable and when I did: bcdedit /set xsavedisable 0 then bcdedit showed xsavedisable in the output. Likewise setting it to a value of '1' yielded a value of '1' in the default output. I wonder if this is '0' by default, it is hard to tell.
Comment 6•9 years ago
|
||
Would there be a way to run the Dromaeo CSS tests in two configurations, please? - one with |bcdedit /set xsavedisable 0| *before* running the tests. - one with |bcdedit /set xsavedisable 1| *before* running the tests. So we can be sure whether it's due to this parameter or not, and what's the initial default value.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 7•9 years ago
|
||
I ran dromaeo css on win8 with 3 scenarios: * default * xsavedisable 0 * xsavedisable 1 results are up here: https://docs.google.com/a/mozilla.com/spreadsheets/d/153G3zZYWh-2aXBn_wNeqKn9WqGQj0TNUa23AEimuwbI/edit?usp=sharing I really don't see any difference in the results.
Flags: needinfo?(jmaher)
Comment 8•9 years ago
|
||
Wow, this is weird. The only last idea i have is that as Dan added more tests that could prevent AVX, and these changes have been checked in in the meanwhile, this might have an impact on tests here. What i would try: - hg up -r 9e9daf256c68 - run the tests as described in comment 6 again - hg up -r 19edf4b9d338 - run tests again Unfortunately, I don't have any other ideas.
Reporter | ||
Comment 9•9 years ago
|
||
Great, I did some more work on this- I pushed some builds to try, and the results are summarized in the same spreadsheet as above: https://docs.google.com/a/mozilla.com/spreadsheets/d/153G3zZYWh-2aXBn_wNeqKn9WqGQj0TNUa23AEimuwbI/edit?usp=sharing you can see that 19edf4b9d338 has a higher value than 9e9daf256c68. For dromaeo_css, lower is better, so rev 19edf4b9d338 is an improvement.
Reporter | ||
Comment 10•9 years ago
|
||
for reference, here is the build for 19edf4b9d338: https://treeherder.mozilla.org/#/jobs?repo=try&revision=159e3d1f2154 and here is the build for 9e9daf256c68: https://treeherder.mozilla.org/#/jobs?repo=try&revision=291bfbad546c it is possible that the build configurations to try don't match 100% what we build for nightly.
Comment 11•9 years ago
|
||
It's unclear to me what's happening here. Maybe there were some other improvements between the two revisions that I've referred to in comment 8. Other than that, the second revision is *at least* as restrictive as the first revision, so results should be equivalent. 2 possible cases: - we're seeing AVX instructions being emitted, then this bug can be closed (the regression would have been introduced by these instructions *not* being emitted *anymore*. If it's not this, it's not related to this change.). - AVX instructions aren't emitted, while they were before rev 9e9daf256c68, and there's something weird going on. First, how did it work in the first place? Secondly, what did effectively deactivate AVX emitting? Here is a patch that printf's if AVX instructions are enabled / present or not. It will spam stdout with this information, so that will badly affect performance, but that's not what I want to test here. Right before 9e9daf256c68, I'd expect to see "enabled and present". Afterwards, that's what we want to see. Joel, any chance you could please test - before 9e9daf256c68 with this patch - after 9e9daf256c68 with this patch - before 19edf4b9d338 with this patch - after 19edf4b9d338 with this patch
Updated•9 years ago
|
Flags: needinfo?(jmaher)
Reporter | ||
Comment 12•9 years ago
|
||
patched builds: at 9e9daf256c68: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c8ab3a5c29cc before 9e9daf256c68: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a88f9c7479e4 at 19edf4b9d338: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5c0f5d80cc07 before 19edf4b9d338: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e97b71310403 in a few hours we will see the results run on buildbot infrastructure, we can then look at the raw logs and decipher what is useful
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(jmaher)
Summary: 3.47% OSX 10.6 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68 → 3.47% OSX 10.6|win8 Dromaeo CSS regression on inbound (v.37) Dec 12 from push 9e9daf256c68
Reporter | ||
Comment 13•9 years ago
|
||
on all 4 pushes, I see this: B1111715 avx is not present and enabled I suspect that isn't what we were expecting to see.
Comment 14•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #13) > on all 4 pushes, I see this: > B1111715 avx is not present and enabled > > I suspect that isn't what we were expecting to see. If that is so, then this isn't related to the changesets hereby declared guilty. Is there any way the bisection got wrong? Or did the regression disappear a few days after it was introduced, somehow? (I've seen it was the case in other bugs you've commented to)
Reporter | ||
Comment 15•9 years ago
|
||
looking at the graphs here, I see the issue is still there, no better no worse. It could be possible the wrong changeset was identified. Obviously we did our due diligence for this patch(set). Lets see how things look during the uplift today.
Comment 16•9 years ago
|
||
Has the fix in bug 1118235 had any effect on benchmarks, Joel? If so, it might be interesting to mention it in the bug there...
Flags: needinfo?(jmaher)
Reporter | ||
Comment 17•9 years ago
|
||
I don't see big changes here, although there are only a few data points since landing that
Flags: needinfo?(jmaher)
Reporter | ||
Comment 18•9 years ago
|
||
this has since shipped, any issues with me closing it?
Flags: needinfo?(benj)
Comment 19•9 years ago
|
||
The changeset that was identified as the cause of this regression enabled AVX instructions. However, bug 1118235 disabled these AVX instructions in our JITs. As a result, if the regression was caused by enabling AVX instructions, disabling them should have made the regression disappear. Now we should have more data points to look at after landing of bug 1118235, so did it make a difference? In any cases, I can't think of other things to do here.
Flags: needinfo?(benj) → needinfo?(jmaher)
Reporter | ||
Comment 20•9 years ago
|
||
there was a slight improvement back in January- it sounds like we fixed as much of this as we could! Thanks for doing what you could back in January!
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(jmaher)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•