Closed
Bug 1262203
Opened 10 years ago
Closed 10 years ago
7.54% tp5o responsiveness (windowsxp) regression on push dc32b86990d5 (Wed Mar 30 2016)
Categories
(Core :: JavaScript: GC, defect)
Core
JavaScript: GC
Tracking
()
RESOLVED
FIXED
mozilla48
| Tracking | Status | |
|---|---|---|
| firefox48 | --- | fixed |
People
(Reporter: jmaher, Assigned: jonco)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(2 files)
|
1.12 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
|
5.55 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
Talos has detected a Firefox performance regression from push dc32b86990d5:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=6c7e6cf636af
As author of one of the patches included in that push, we need your help to address this regression.
This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=689
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5
Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win32 -u none -t tp5o-e10s[Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data
(we suggest --rebuild 5 to be more confident in the results)
To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code
Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a tp5o --e10s
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Friday, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
| Reporter | ||
Comment 1•10 years ago
|
||
:terrence, I see that :jonco is on pto for another week- he authored all the patches in the push, and you had reviewed them all. Could you help weigh in here on why we are seeing the responsiveness regress on winxp e10s only?
I did many retriggers of this push vs the previous push on the tree and have a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=cbf86e5e944f&newProject=mozilla-inbound&newRevision=6c7e6cf636af&framework=1&filter=e10s
no other issues showing up that I can see.
If you need me to bisect it down on try, I would be happy to- I thought I would ask in case something obvious stands out.
Component: Untriaged → JavaScript: GC
Flags: needinfo?(terrence)
Product: Firefox → Core
Comment 2•10 years ago
|
||
I'm honestly pretty surprised to see any of this regressing anything.
6c7e6cf636af - Only adds code to DEBUG builds, so probably not implicated.
dbbe7c7e305d - Again only changes some assertions around, so also not implicated.
457e7d5b1136 - Traces one new edge in some shapes, but only for very large objects. This is a core GC invariant that we were violating and getting lucky on. This is required for compacting Shapes. The |table| member should be in the same cache line as |this|, so I doubt it's the regression.
24b56ce6d8db - Changes the order of the members in a frequently used GC data structure. This not the most frequently touched field and it should have improved the alignment characteristics of the more accessed fields. That said, I see a couple things we could be doing better here.
Flags: needinfo?(terrence)
| Reporter | ||
Comment 3•10 years ago
|
||
thanks Terrence- want me to push to try with/without 24b56ce6d8db and see if this changes at all? do you have a few things you want to try out to optimize 24b56ce6d8db slightly?
Comment 4•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> thanks Terrence- want me to push to try with/without 24b56ce6d8db and see if
> this changes at all? do you have a few things you want to try out to
> optimize 24b56ce6d8db slightly?
Yes, please!
It's sometimes still possible to see what might have caused a 20% regression, but on modern machines it's basically impossible to tell if something is going to cause a 5% regression. Maybe the compiler got confused and causes the pipeline to stall now; maybe we went slightly outside the bounds of intel's optimal caching strategy; maybe that slow divide is the culprit, but maybe it doesn't matter because we're already waiting on a prior load; maybe we changed the instruction's address and now our icache is shared with something that's also hot and we keep getting evicted. I've even heard of one case where just moving a block of code below its usage "won" a 15% speedup. There is just really no telling. At least not without vTune and a free afternoon.
Unless there's major algorithmic change in the patch *and* the regression is larger than 20%, it's probably going to be nearly impossible to track down the exact cause and even if you do, it's almost certainly going to be some aspect of the platform that is out of your control anyway.
Comment 5•10 years ago
|
||
Reordering the object initialization writes to be in order won us 2% on Early Boyer (which create ~2B objects). Maybe if this test is dominated by nursery GC's that tenure most things?
I seriously doubt this will do anything, but it's good hygiene in any case.
Attachment #8738245 -
Flags: review?(sphink)
Comment 6•10 years ago
|
||
Comment on attachment 8738245 [details] [diff] [review]
do_relocation_writes_in_order-v0.diff
Review of attachment 8738245 [details] [diff] [review]:
-----------------------------------------------------------------
Hopefully something is doing this automatically -- if not the compiler, then the hardware. But anything could happen, and you're right, this is better hygiene.
Attachment #8738245 -
Flags: review?(sphink) → review+
| Reporter | ||
Comment 7•10 years ago
|
||
3 pushes to try:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=3065885fc201&tochange=d6fdb966d939
1) baseline (tip of inbound)
2) baseline + backout of patch
3) baseline + above patch
I will probably get this data sorted out tomorrow morning.
Flags: needinfo?(jmaher)
| Reporter | ||
Comment 8•10 years ago
|
||
all 3 try pushes are not showing any improvement- they fall on the same range of the regressed responsiveness number.
I realized I didn't backout the right patch, so here are two backouts (1 each of the non debug only changes):
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=e52220ff7dc4&tochange=2097ad115305
a few hours and we should have results.
| Reporter | ||
Comment 9•10 years ago
|
||
backing out 457e7d5b1136 seems to fix this regression. this was the patch which most likely wasn't the suspect.
Flags: needinfo?(jmaher)
Comment 10•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #9)
> backing out 457e7d5b1136 seems to fix this regression. this was the patch
> which most likely wasn't the suspect.
Well, that's pretty unfortunate, as that's also the only patch fixing a real bug. :-(
Comment 11•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b548578a1980f2da63e4ca9c4b41b55f17be53de
Bug 1262203 - Do GC relocation writes in order; r=sfink
Comment 12•10 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
| Reporter | ||
Comment 13•10 years ago
|
||
:terrence, given the fact that the other patch from revision 457e7d5b1136 was the problem, do you know of anything else we can do here?
Flags: needinfo?(terrence)
Comment 14•10 years ago
|
||
I don't see anything obvious, maybe Jon will have some ideas.
Status: RESOLVED → REOPENED
Flags: needinfo?(terrence) → needinfo?(jcoppeard)
Resolution: FIXED → ---
| Assignee | ||
Updated•10 years ago
|
| Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jcoppeard
| Assignee | ||
Comment 15•10 years ago
|
||
Here's a patch to skip tracing any shape table if we've arrived at a base shape by tracing a shape. I think this should do the trick and cut out most shape table tracing. Currently waiting for talos runs to confirm this. Terrence, hopefully this isn't too horrible!
Attachment #8740068 -
Flags: review?(terrence)
Comment 16•10 years ago
|
||
Comment on attachment 8740068 [details] [diff] [review]
bug1262203-skip-shape-table-tracing
Review of attachment 8740068 [details] [diff] [review]:
-----------------------------------------------------------------
Yeah, that's pretty reasonable.
Attachment #8740068 -
Flags: review?(terrence) → review+
Comment 17•10 years ago
|
||
Comment 18•10 years ago
|
||
| bugherder | ||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 19•10 years ago
|
||
verified this is fixed.
Updated•9 years ago
|
Version: unspecified → Trunk
You need to log in
before you can comment on or make changes to this bug.
Description
•