Closed
Bug 1369662
Opened 8 years ago
Closed 7 years ago
17.99% tabpaint (osx-10-10) regression on push e398680c483332f16c829adbddb29528c566c180 (Thu Jun 1 2017)
Categories
(Core :: DOM: Core & HTML, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
People
(Reporter: igoldan, Assigned: mconley)
References
Details
(Keywords: perf, regression, talos-regression)
Attachments
(1 file)
Talos has detected a Firefox performance regression from push:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=e398680c483332f16c829adbddb29528c566c180
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
18% tabpaint summary osx-10-10 opt e10s 53.74 -> 63.41
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6999
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•8 years ago
|
||
This regression appeared after a bigger patch, with mozilla-inbound changesets from 675c893a028d and up to e398680c4833, here: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7abafd0fead900aec67b0de342f71ec856f561cf&tochange=e398680c483332f16c829adbddb29528c566c180
The ones related to this issue seem to be Bug 558184, Bug 1363323 and Bug 1369140.
Could you please state which of the bugs affected this?
Flags: needinfo?(peter.vanderwoude)
Flags: needinfo?(ehsan)
Comment 2•8 years ago
|
||
pushed to try:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=3f053f6a785f9ac9f768e44cd37fd56600fb695d&tochange=44a141d40ff31168a2091f108fdedda0d3c6953f
of course I didn't annotate the commits well, so looking at:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&filter-searchStr=osx%20talos%20other&tochange=e398680c483332f16c829adbddb29528c566c180&fromchange=0f04ee5c2e1c916ae454c911bcead18a04927460
the pushes are in this order:
hg update 980375a7b029: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f053f6a785f9ac9f768e44cd37fd56600fb695d
hg update 9255719d469c: https://treeherder.mozilla.org/#/jobs?repo=try&revision=acce7fc1bad446bb03aa409e715173e32a496f6c
hg update 87c1327b918d: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc0b5284d8fd94d95cce1d491b36849c7d99dea6
hg update e398680c4833: https://treeherder.mozilla.org/#/jobs?repo=try&revision=44a141d40ff31168a2091f108fdedda0d3c6953f
Comment 3•8 years ago
|
||
mconley and I talked about this on IRC. Bug 1369140 _may_ be the culprit because it removed a layout flush which used to occur during tab opening, however I'd have expected that to make things faster not slower... However it's possible (and maybe even likely) that it has moved the timing of something in a way that is making this test complain.
Mike offered help looking into this.
Flags: needinfo?(ehsan) → needinfo?(mconley)
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Bug 558184
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2ec31aacaf0569cbe1c08fc787963188c8947590&newProject=try&newRevision=6095abfded6d84b0337b42a233b4128336c4d26e&framework=1&filter=tabpaint&showOnlyImportant=0
55.96 -> 55.51
-0.80%
Bug 1363323
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6095abfded6d84b0337b42a233b4128336c4d26e&newProject=try&newRevision=673603c2c36e107062f484ee5a90b05594d2084d&framework=1&filter=tabpaint&showOnlyImportant=0
55.51 -> 54.51
-2.53%
Flags: needinfo?(peter.vanderwoude)
Comment 6•8 years ago
|
||
and a 3rd attempt:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=2c979be5a3b6acf76582e8c34cdfd340e9dc091f&tochange=212f61c16dfeaa783821abf2cca075ea6b257ed7
this time without --artifact
Assignee | ||
Comment 7•8 years ago
|
||
One thing to note is that the regression is almost 16ms, which is roughly 1 frame.
Current theory:
TL;DR: I think ehsan's patch made us run faster, which ironically makes us report worse results.
ehsan's patch makes focus changes a lot cheaper because we get to avoid flushing layout.
When the tabpaint test opens a new tab from the parent, the following things happen in this order:
1. The remote <xul:browser> is created
2. The browser is told to load the target URI
3. The tab for that browser is switched to, which involves some focus switching I believe - both before and after the TabParent is told to activate the child.
4. The target URI, after the load event fires, waits for the next MozAfterPaint event, which it uses to determine the delta between the tab was requested to open, and the time that the tab content was presented.
When step (3) happens, a race starts: we send a message to the TabChild on the main thread asking it to activate itself (and paint), and we also send a message along the ProcessHangMonitor thread asking it to interrupt JavaScript if any is running and _force_ paint.
The winner of that race will cause the MozAfterPaint event in (4) to run.
Here's where I think ehsan's patch changed things:
Because focus became a lot cheaper, I think the time between steps 2 through 3 occur faster now. Fast enough, in fact, that I suspect that the content process is busy enough loading the target page that the force paint message (which interrupts JS) is more likely to "win" the race.
I suspect that when we've interrupted JS, this is _before_ the target page has had the opportunity to set up the MozAfterPaint listener. This means that we force paint, MozAfterPaint fires, but nobody hears it. Then, the page loads as per normal, and then on the next tick we paint _again_, and this time MozAfterPaint hears it.
Whereas before ehsan's, I _think_ it's possible that the slow focus work gave the content process enought ime to set up the MozAfterPaint handler before the tab switch / force paint occurred.
That's my theory, anyhow. I've got a try build cooking at [1] to try to prove it.
Keeping needinfo so that I know to look at this try build when results are in.
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=946624498935b274aaab1940b4b04ad68dcb45ec
Comment 8•8 years ago
|
||
87c1327b918d is where the regression started which corresponds to :ehsan's patch from bug 1369140. It sounds like what :mconley is up to will be the right thing.
Comment 9•8 years ago
|
||
(In reply to Mike Conley (:mconley) from comment #7)
> Current theory:
Fascinating!
> Keeping needinfo so that I know to look at this try build when results are
> in.
That didn't build!
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #9)
> That didn't build!
How embarrassing! Re-pushed.
Updated•8 years ago
|
Component: Untriaged → DOM
Assignee | ||
Comment 11•7 years ago
|
||
Quick update here - I've got a potential fix here, but I'm trying to cross all t's, dot all i's (and ensuring this isn't just blind luck) before posting it for review.
I've got some Talos profiling builds cooking for confirmation.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8875384 -
Flags: review?(jmaher)
Assignee | ||
Comment 13•7 years ago
|
||
Got some math wrong - new patch coming up.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 16•7 years ago
|
||
mozreview-review |
Comment on attachment 8875384 [details]
Bug 1369662 - Make tabpaint talos test rely on MozAfterPaint paint timestamps as opposed to event processing timestamps.
https://reviewboard.mozilla.org/r/146818/#review150936
don't forget to sign the extension and add it to the patch.
Attachment #8875384 -
Flags: review?(jmaher) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 18•7 years ago
|
||
Garrr. Even with this test fix, ehsan's patch is apparently causing a regression. I'm now less confident that there's not a real regression here in our time to paint and upload the tab.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=55a453c8c048&newProject=try&newRevision=5100eb0dc0aa&framework=1&showOnlyImportant=0
What I need - desperately - is to get a profile from one of these Talos machines. Unfortunately, this has proven extremely difficult. Hopefully the information in bug 1370907 will allow me to figure out how to do it.
So stay tuned, more to do here...
Assignee | ||
Comment 19•7 years ago
|
||
Before anyone points out that the comparison link I posted shows that ehsan's patch causes a win, I'm comparing mozilla-central with a backout - so the _backout_ is what's showing the win. :/
Assignee | ||
Comment 20•7 years ago
|
||
I was _finally_ able to tease a (unfortunately unsymbolicated) profile out of our Talos automation.
With the patch backed out (no regression): https://perfht.ml/2rPSgnT
With the patch in (regression): https://perfht.ml/2rPQliX
Assignee | ||
Comment 21•7 years ago
|
||
Even though these profiles are unsymbolicated, I think I've found something useful.
Here is a zoomed in view of the profiles:
No regression (ehsan's patch backed out): https://perfht.ml/2r6AjEj
Regression (mozilla-central): https://perfht.ml/2r6e55C
This is the timeline view of these two profiles, and the thing I want to draw attention to is in the "No regression" profile. Notice that just before the 11.230 timestamp, there's a very short (0.005ms) rasterization. That's extremely short, but I think it does cause a MozAfterPaint event to fire.
That MozAfterPaint event appears to have a bounding rect with 0 height and 0 width. This actually sounds similar to a problem that ehsan was facing in a separate bug - bug 1343728. We have this mysterious early paint occurring, and by all accounts appears to be empty. But it's enough to fire a MozAfterPaint event, which the test detects.
This little extra paint is missing in the "regression" profile.
So what I think I'm getting at is that this test has been likely flawed from the start, and has not properly measured meaningful paint of content. I've pushed patches to try that wait for the first MozAfterPaint event that has a rect with some dimensions to it, both with and without ehsan's patch:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b80e1f83576e&newProject=try&newRevision=3534252d4864&framework=1&showOnlyImportant=0
The comparison shows _no difference_ taking that into account.
So my recommendation here is:
1) Update my patch in this bug to include ignoring MozAfterPaint events that have boundingClientRects with no height or width
2) File a follow-up bug to investigate and understand why that event was being fired, and why it seemed to only be an issue in this bug on OS X.
3) Leave bug 1369140 in the tree.
Flags: needinfo?(mconley)
Comment 22•7 years ago
|
||
do you think for all tests/platforms (at least in talos) we should do #1 (ignore MozAfterPaint events w/no boundingClientRects)
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #22)
> do you think for all tests/platforms (at least in talos) we should do #1
> (ignore MozAfterPaint events w/no boundingClientRects)
It might be a good idea, yes.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 26•7 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/132095a40bde
Make tabpaint talos test rely on MozAfterPaint paint timestamps as opposed to event processing timestamps. r=jmaher
Updated•7 years ago
|
Priority: -- → P1
Reporter | ||
Comment 27•7 years ago
|
||
This dimmed down the osx regression to half, and brought regressions on the other platforms.
== Change summary for alert #7141 (as of June 08 2017 17:31 UTC) ==
Regressions:
29% tabpaint summary linux64 pgo e10s 51.05 -> 66.09
27% tabpaint summary linux64 opt e10s 54.44 -> 69.11
25% tabpaint summary windows10-64 opt e10s56.45 -> 70.55
10% tabpaint summary osx-10-10 opt e10s 62.28 -> 68.73
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7141
Comment 28•7 years ago
|
||
well it actually added 10% onto the existing 18%. This is probably expected as we are measuring mozAfterPaint for something that is painting, not a blank event.
Comment 29•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 30•7 years ago
|
||
in summary:
* original 18% regresion on osx landed
* mconley changed the tabpaint test to measure proper mozAfterPaint events
* when that landed all platforms regressed by 25-30% except osx which regressed 10% (+ original 18% = 28%)
* now all platforms measure the right thing and they all have a similar change in results.
Updated•7 years ago
|
status-firefox54:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Updated•7 years ago
|
Assignee: nobody → mconley
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•