Closed
Bug 1331988
Opened 8 years ago
Closed 8 years ago
10.74% tabpaint (linux64) regression on push 8d8e6116ed6d (Tue Jan 17 2017)
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox50 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | unaffected |
firefox53 | --- | fixed |
firefox54 | --- | fixed |
People
(Reporter: jmaher, Assigned: kats)
References
Details
(Keywords: perf, regression, talos-regression)
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
dvander
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
Talos has detected a Firefox performance regression from push 8d8e6116ed6d. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
11% tabpaint summary linux64 opt 74.05 -> 82.01
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=4831
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Updated•8 years ago
|
Component: Untriaged → DOM: Content Processes
Product: Firefox → Core
Reporter | ||
Comment 1•8 years ago
|
||
this seems to be the tabpaint-from-parent:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=autoland&originalRevision=5b6e3ddaa0894dccf93c381eae4d83b74f29818e&newProject=autoland&newRevision=8d8e6116ed6ddbd99a528fcd8911e559336b938f&originalSignature=8215880edabb4c696cbb8067130ae5b226570742&newSignature=8215880edabb4c696cbb8067130ae5b226570742&framework=1
despite the alert not indicating other platforms, it looks like win8 and osx are both seeing changes that are regressions:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bautoland,8215880edabb4c696cbb8067130ae5b226570742,1,1%5D&series=%5Bautoland,3f8bffdf0837ec44f65347af994abe3b0512e014,1,1%5D&series=%5Bautoland,9271d84f3b0efdd755e9d36244506e0dce3a9483,1,1%5D
:kats, as the patch author, can you take a look at this and comment as to what we can do to explain this or work towards fixing it?
Flags: needinfo?(bugmail)
Assignee | ||
Comment 2•8 years ago
|
||
I'll bisect the patchset to see whether it came from the first half or the second half. If it's the first half I don't think there's much we can do. The second we might be able to improve.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
Assignee | ||
Comment 3•8 years ago
|
||
Assignee | ||
Comment 4•8 years ago
|
||
I ended up doing a series of try pushes:
Parts 1-2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43f11336e0c95924a49d06efb0dd6f9505d2c75f
Parts 1-3: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fdb790d9e3f2b2e337309706af66594164d8969
Parts 1-4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f86817b4dfbb94786d15ee81c7dbd5bbb36a3f1
Parts 1-5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ac04fd8179e2483b139b3f5c7b08f3306fe1b14
From the windows numbers (I didn't bother manually adding the linux jobs) it seems clear that the regression comes from part 4. What part 4 is doing is sending the InitRenderingState IPC message from TabParent to TabChild nearer to the tab creation, which in theory could delay the paint a little bit. However it's an async IPC message so it really shouldn't block the TabParent by anything noticeable. It's more likely that the processing of the InitRenderingState on the child side is slowing things down, because it makes a sync IPC to the compositor in order to get the compositor options.
If my theory is correct we should be able to mitigate this by getting the compositor options into the parent side as part of existing communications with the compositor, and send it to the child as part of InitRenderingState. That way the child wouldn't need to do the sync IPC. I'm writing a patch to see if that works.
Updated•8 years ago
|
status-firefox50:
--- → unaffected
status-firefox51:
--- → unaffected
status-firefox52:
--- → unaffected
Comment 5•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> I ended up doing a series of try pushes:
>
> Parts 1-2:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=43f11336e0c95924a49d06efb0dd6f9505d2c75f
> Parts 1-3:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=9fdb790d9e3f2b2e337309706af66594164d8969
> Parts 1-4:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=4f86817b4dfbb94786d15ee81c7dbd5bbb36a3f1
> Parts 1-5:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=2ac04fd8179e2483b139b3f5c7b08f3306fe1b14
>
> From the windows numbers (I didn't bother manually adding the linux jobs) it
> seems clear that the regression comes from part 4. What part 4 is doing is
> sending the InitRenderingState IPC message from TabParent to TabChild nearer
> to the tab creation, which in theory could delay the paint a little bit.
> However it's an async IPC message so it really shouldn't block the TabParent
> by anything noticeable. It's more likely that the processing of the
> InitRenderingState on the child side is slowing things down, because it
> makes a sync IPC to the compositor in order to get the compositor options.
>
> If my theory is correct we should be able to mitigate this by getting the
> compositor options into the parent side as part of existing communications
> with the compositor, and send it to the child as part of InitRenderingState.
> That way the child wouldn't need to do the sync IPC. I'm writing a patch to
> see if that works.
Does any of this work affecting the startup time of a content process? We don't
have a Talos test to measure that yet so I'm just curious if we should worry about
that on a theoretic level. It is very important for e10s-multi not to regress that
number any further. As far as I understand these patches are more about about changing
the order of things when a new tab is opened (with possibly a new content process startup)
and does not really add any significant extra work. Just want to double check it...
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #5)
>
> Does any of this work affecting the startup time of a content process? We
> don't
> have a Talos test to measure that yet so I'm just curious if we should worry
> about
> that on a theoretic level.
It depends on what you mean by startup time. The original patches shift a little bit of code that was happening when the tab was first "shown" to earlier, when the tab is being created. It's possible that if a content process is starting up with a tab that's not yet "shown" the bit of shifted code will slow it down slightly. Most of the slowdown would come, again, from the sync IPC call which I'm trying to mitigate by piggybacking it on other pre-existing IPC messages.
Assignee | ||
Comment 7•8 years ago
|
||
Also here's a try push with my first attempt at fixing this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1a022ed5d533915662861c1292b1078eaf94cbc - it seems to show some improvement but maybe not fully restored? I'll do more retriggers but I have a pretty full day today with interviews and meetings so I'm not sure how much progress I'll make before tomorrow.
Assignee | ||
Comment 8•8 years ago
|
||
Update: I'm having a very hard time actually getting talos numbers from try to see if my patch helps or not. I ran into two bugs already (see dependencies) and the backlog of jobs on the linux machines is very high and so it takes a while to get anywhere.
Assignee | ||
Comment 9•8 years ago
|
||
I reproduced the regression on my local Linux machine and chased it down. It looks like what is happening is that because we run InitRendering sooner, TabChild::mRemoteFrame is St earlier than it would be otherwise. And between the "early" and "late" points, there is a call to TabChild::GetDPI. That call does an early return if mRemoteFrame is not set, but does a sync IPC call if it *is* set. This sync IPC accounts for most of the extra time in my local setup. We can probably mitigate this by sending the DPI value as part of InitRendering instead of requiring the sync IPC call.
Assignee | ||
Comment 10•8 years ago
|
||
Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=12264cc38e6c47fcc0244582c9c187e6a45f57ba confirms the above by fixing the regression. However I had second thoughts about the approach I used. I think it might be better to have GetDPI and GetDefaultScale continue to return -1 until RecvShow is called. That will maintain the same behaviour as before while also fixing the regression. It also avoids duplicating the sending of the dpi/default scale values over, since they are sent over as part of the ShowInfo struct in RecvShow already.
Assignee | ||
Comment 11•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcf2993f28aadc7afad9834150e2024c75f3dc45 shows the talos regression fixed on win8. there's too much of a linux talos backlog but i did a try push for correctness at https://treeherder.mozilla.org/#/jobs?repo=try&revision=376b8e7f0d4adc7efcf30ba2d72f2dffea55498d which is also green.
Comment hidden (mozreview-request) |
Comment 13•8 years ago
|
||
mozreview-review |
Comment on attachment 8829500 [details]
Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC.
https://reviewboard.mozilla.org/r/106586/#review107612
Attachment #8829500 -
Flags: review?(dvander) → review+
Comment 14•8 years ago
|
||
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d37cbfeaec72
Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC. r=dvander
Comment 15•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Reporter | ||
Comment 16•8 years ago
|
||
Thanks for fixing this :kats, I have verified this on the graphs.
should we uplift this to Aurora as the regression was merged there yesterday morning?
Assignee | ||
Comment 17•8 years ago
|
||
Comment on attachment 8829500 [details]
Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC.
Approval Request Comment
[Feature/Bug causing the regression]: bug 1331509
[User impact if declined]: slightly slower time to paint when a new tab is created from the parent process
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: joel verified the talos regression is fixed, see previous comment
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not particularly
[Why is the change risky/not risky?]: the change is fairly small and the code paths involved are well understood
[String changes made/needed]: none
Attachment #8829500 -
Flags: approval-mozilla-aurora?
Comment 18•8 years ago
|
||
Comment on attachment 8829500 [details]
Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC.
Fix a perf regression related to tabpaint. Aurora53+.
Attachment #8829500 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 19•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•