Closed
Bug 1186585
Opened 9 years ago
Closed 8 years ago
19.66% - 44.84% tps talos regression with e10s enabled and APZ disabled
Categories
(Testing :: Talos, defect, P1)
Testing
Talos
Tracking
(e10s+)
RESOLVED
FIXED
mozilla48
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: mconley, Assigned: mconley)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(3 files)
425.16 KB,
image/png
|
Details | |
20.80 KB,
patch
|
Details | Diff | Splinter Review | |
12.61 KB,
application/zip
|
Details |
Our first tps reports came in: Linux: :( tps 22.2 +/- 1% (6#) [ +38.9%] 30.8 +/- 2% (6#) Linux64: :( tps 21.7 +/- 1% (6#) [ +31.7%] 28.6 +/- 2% (6#) Win7: tps 25.5 +/- 1% (6#) [ +0.9%] 25.7 +/- 2% (6#) WinXP: :( tps 23.1 +/- 1% (6#) [ +48.5%] 34.2 +/- 2% (6#) Win8: :) tps 25.7 +/- 1% (6#) [ -3.7%] 24.7 +/- 1% (6#) OSX10.10: :( tps 35.2 +/- 6% (6#) [ +28.2%] 45.1 +/- 3% (6#) Very interesting to see little to no regression on Windows 7, and what seems like a win on Windows 8...
Assignee | ||
Updated•9 years ago
|
Updated•9 years ago
|
Updated•9 years ago
|
Assignee: nobody → gwright
Priority: -- → P1
Comment 2•9 years ago
|
||
It's actually even worse now. Every platform now shows a regression with e10s, and the worst are showing a that e10s is around 100% slower than non-e10s. There's also a massive spike starting Sep 8th. http://graphs.mozilla.org/graph.html#tests=[[329,1,43],[329,1,35],[329,1,49],[329,1,31],[329,1,47],[329,1,45],[329,1,37],[329,1,25]]&sel=1440709325345,1448485325345&displayrange=90&datatype=geo
Flags: needinfo?(gwright)
Comment 3•9 years ago
|
||
pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5fe9ed3edd6811a662d40d05e37b0d66e9520d82&tochange=b23b2fa33a9dcda59dbbca1d157eca3c32c5b862
Comment 4•9 years ago
|
||
Nothing stick out to me in that range.
Assignee | ||
Comment 5•9 years ago
|
||
Here's the perfherder graph for mozilla-inbound around that day: https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=[mozilla-inbound,21552af6220f8727499e86b152ff30c82c79612f,1]&series=[mozilla-inbound,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&zoom=1441675624533.995,1441738517107.6924,9.565204122792125,107.39129107931386&selected=[mozilla-inbound,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,19618,13769245] The uptick occurs after 321303b7d428 on f1b54a9c9f9b, which is thankfully not a merge changeset. Seems to point squarely at bug 1107372.
Assignee | ||
Comment 6•9 years ago
|
||
gcp, any idea why this might have caused a massive regression here? And jmaher, do we know why this didn't set our alarms a-blazin'?
Flags: needinfo?(jmaher)
Flags: needinfo?(gpascutto)
Comment 7•9 years ago
|
||
So it looks like it's actually due to https://bugzilla.mozilla.org/show_bug.cgi?id=1188543, based on https://bugzilla.mozilla.org/show_bug.cgi?id=1188543#c23
Assignee | ||
Comment 8•9 years ago
|
||
Ah, that clears it up.
Flags: needinfo?(jmaher)
Flags: needinfo?(gpascutto)
Comment 9•9 years ago
|
||
I don't think the timing fits here? Also the merge range doesn't include a talos update that I can see.
Comment 10•9 years ago
|
||
The merge range included bug 1107372, which touched talos.json here: https://hg.mozilla.org/integration/mozilla-inbound/rev/f1b54a9c9f9b.
Comment 11•9 years ago
|
||
thanks :gwright, that is the correct bug that was filed and discussed for this specific test.
Blocks: e10s-rc
Comment 13•8 years ago
|
||
So I had a chance to profile this locally on OSX. One big difference is that with e10s the compositor spends way more time in IPDL::PLayerTransaction::RecvUpdateNoSwap (mostly in mozilla::glUploadImageDataToTexture and then moving bits for the GPU) and we're waiting for the compositor to finish. Do we update more in the e10s case or does it take more time compared to the non-e10s version? BTW just in case I turned APZ off, it helps a little but does not explain the majority of the regression. I also notices a difference in the e10s vs non-e10s test: waitForTabSwitchDone spins the event loop in the e10s case only, so for the else branch I've added a setTimeout(callback, 0) but that did not change much either. It's also quite shocking for me that in runTestHelper for the e10s case we spend 3.9% in calling win.gBrowser.selectedTab = tab; That is 1,5 times more than in the non-e10s case btw... The selectedTab setter sets a bunch of other xul properties fires events and we update the current browser as well and all that seem to add up.
Comment 14•8 years ago
|
||
Btw locally I experienced even worse numbers, and I'm afraid that's not only locally true :( https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,52da298b8b3be1e885e2a2abe278da8f22bfedaf,1]&series=[mozilla-inbound,fefdb0dbb74637ceb4d270d4cba12ef60d950e32,1]&series=[mozilla-inbound,4358212419708a19de74b37fdea38ffedc2cbf73,1]&series=[mozilla-inbound,4e248445ac2e8e68b87387487fbe5498a9c7a23d,1]
Comment 15•8 years ago
|
||
Another weird thing is that the content process spends quite a lot of time in PuppetWidget paint (~26%) which I don't see anywhere in the non-e10s case, and there are some IME related blocking IPC messages too (~3). e10s: https://cleopatra.io/#report=79a1c5ae646fd4de50f7419711bff7d1c567eddc&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A209099,%22end%22%3A222827%7D%5D&selection=0,5481,5482,5483,5484,5108,5485,8,9,10,11,12,13,14,15,16,17,20,21,22,23,24,25,26,9739 non-e10s: https://cleopatra.io/#report=a95c201cce7851e464edd6730816ad89ba6735ef&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A295491,%22end%22%3A320476%7D%5D&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,171
Assignee | ||
Comment 16•8 years ago
|
||
Couldn't sleep last night, so I poked at this a little bit for fun. A few observations: 1) APZ seems to be responsible for ~40ms of the regression on OS X. Here's me testing with it disabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=ad4d966f2367&newProject=try&newRevision=a2937311d493&framework=1 If I've done my late-night napkin math correctly, that drops what was a 104% regression on OS X down to about a ~20% regression. So with 40ms shaved off, at least against my try pushes here, we have ~30ms to account for. 2) Gabor, looking at the profiles you've created, I concur that IPDL::PLayerTransaction::RecvUpdateNoSwap is very suspicious. We should file a bug for that and get some graphics folks to try to explain what's going on there. Looking at your profiles, we spend only ~5ms inside RecvUpdateNoSwap for the non-e10s case, and ~18ms inside there for the e10s case. Rounding down a little, let's say that accounts for ~10ms of the remaining ~30ms, which means we need to account for ~20ms. I noticed that for e10s, getting profiles resulted in rather strange output from our try machines, so I did some fiddling, and came up with something maybe a bit better - it's not perfect, and not ready to land, but hopefully I can make it easier to get these soonish: non-e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/001e2b1a13be1ee222e2010eebcb8545433364952bd29db4327ed9f9612d9e93fe901f2754b267c3543b0f44e0ccbb9c5aeb2068e4d14e3d76410b016153875d#report=f3cfea5c1430df2caf21610e1ca8337751377346 e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/e5fed6ae1c3d3a0bc349b42a20bd63b99ab49efd6c8cb88804ab616f44968873879701c3afc9958c7f19061ff8871601864e2ef0fd8cba2f92b705a2b4a60217#report=9f968dd9ecd3552d49e3fcb7dbbeed03b790abab Actions from this: 1) I think we should file a bug for APZ causing an additional ~40ms of regression on OS X. 2) I think we should file a bug to get input from the graphics team on why IPDL::PLayerTransaction::RecvUpdateNoSwap is taking so long with e10s. 3) I think we should drill into the new profiles I generated (and the ones for Windows when they come in: https://treeherder.mozilla.org/#/jobs?repo=try&revision=641e6026a4bf&selectedJob=17273089) How does this sound, gabor?
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 17•8 years ago
|
||
Another thing that's hurting our TPS score - it looks like in at least one case, GC and CC are still occurring in the child, even though we've "started the stopwatch": http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/e5fed6ae1c3d3a0bc349b42a20bd63b99ab49efd6c8cb88804ab616f44968873879701c3afc9958c7f19061ff8871601864e2ef0fd8cba2f92b705a2b4a60217#report=9f968dd9ecd3552d49e3fcb7dbbeed03b790abab&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A84401,%22end%22%3A88536%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A84603,%22end%22%3A84930%7D%5D&selection=0,6747,6748,6749,6750,6258,6751,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,359,360,6804,6805,6805,25,35,36,37,38,39,6806,40,77,38,39,6807,40,77,38,52,53,54,98,99,100,101,102,103,104,105,106,107,111,913
Assignee | ||
Comment 18•8 years ago
|
||
(The other thing that's interesting in that last profile view is that the Layer Transaction is very short - similar to what we'd see with non-e10s. Here, I see RecvUpdateNoSwap takes only 5ms on the Compositor thread.
Comment 19•8 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #16) > 1) APZ seems to be responsible for ~40ms of the regression on OS X. For me it was about half of the regression locally, I only focused on the part where APZ is turned off. > 2) Gabor, looking at the profiles you've created, I concur that > IPDL::PLayerTransaction::RecvUpdateNoSwap is very suspicious. We should file > a bug for that and get some graphics folks to try to explain what's going on > there. Looking at your profiles, we spend only ~5ms inside RecvUpdateNoSwap > for the non-e10s case, and ~18ms inside there for the e10s case. Do you know anyone I could needinfo from the graphic team? Do you think it's related to the tab switch machinery in tabbrowser.xul? Jim told me that we have multiple tabs activated during the switch, can it happen that we update multiple tabs at once for some weird reason? > 1) I think we should file a bug for APZ causing an additional ~40ms of > regression on OS X. Bug 1251937 > 2) I think we should file a bug to get input from the graphics team on why > IPDL::PLayerTransaction::RecvUpdateNoSwap is taking so long with e10s. Bug 1251939, could you needinfo someone you think might be able to help out with this part? > 3) I think we should drill into the new profiles I generated (and the ones > for Windows when they come in: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=641e6026a4bf&selectedJob=17273089) > > How does this sound, gabor? Sounds like a plan :)
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 20•8 years ago
|
||
Hey gw280 - you might find this interesting: https://treeherder.allizom.org/perf.html#/e10s_comparesubtest?baseSignature=fe016968d213834efd424ca88680cfa7490b6c09&e10sSignature=5c199ff7bd97284c5f3820ba908f92275620cd8b It looks like loading aljazeera.net regressed by ~483% with e10s. That might be a useful subtest to look at - the signal sure looks strong enough.
Updated•8 years ago
|
Comment 21•8 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #20) > It looks like loading aljazeera.net regressed by ~483% with e10s. That might > be a useful subtest to look at - the signal sure looks strong enough. Might be related: Bug 1252822
Assignee | ||
Comment 22•8 years ago
|
||
Numbers with APZ disabled are in: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=a4929411c0aa&newProject=try&newRevision=850e39c8f668&framework=1&filter=TART&showOnlyImportant=0 Conclusions: 1) APZ causes a ~5.9% - ~31% tps regression with e10s enabled. This takes care of a good chunk of the regression across the board. 2) Doing some napkin math, I can re-write the e10s-only regressions (without APZ enabled) as: non-e10s (*) e10s Diff (%) ----------------------------------------------------- Linux64: 49.18 71.23 +44.84% OS X 10: 65.86 95.08 +44.37% Win 7 : 60.18 74.20 +23.30% Win 8 : 49.09 58.74 +19.66% With APZ disabled, the regression comes down quite a bit, but it's still far higher than the 5% threshold we've set on this. There's still more work to do here. So the next steps for this bug: 1) File a new bug for the APZ releated tps regressions and make sure the APZ team sees it. (*) I chose the worst case, since I have two sets of samples here given that disabling APZ has no effect on non-e10s builds.
Assignee | ||
Updated•8 years ago
|
Summary: 28.2% - 48.5% tps talos regression with e10s enabled → 19.66% - 44.84% tps talos regression with e10s enabled and APZ disabled
Comment 23•8 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #22) > 1) File a new bug for the APZ releated tps regressions and make sure the APZ > team sees it. > That's already tracked in bug 1233163 and we're working on it.
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #23) > (In reply to Mike Conley (:mconley) - Needinfo me! from comment #22) > > 1) File a new bug for the APZ releated tps regressions and make sure the APZ > > team sees it. > > > > That's already tracked in bug 1233163 and we're working on it. Ah, correct - that's bug 1216924. Thanks!
Comment 25•8 years ago
|
||
I typed this yesterday but it didn't get posted: Some tests are dominated by the 'IPDL::PBrowser::SendGetInputContext' sync message. This might be the best place to start on this bug IMO.
Here's a patch I've been trying out. It stops using all the tab switching logic and just paints the tab's background color if the content isn't ready. I think this would be a good starting point for tab screenshots. My idea would be to paint them inside RefLayerComposite::RenderLayer. You have to turn APZ off for this to work well because I didn't bother with displayport suppression. Also, there's no mechanism for for recording how long a tab switch takes.
Assignee | ||
Comment 27•8 years ago
|
||
Here are some up-to-date profiles: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6b2f0b427ec&selectedJob=18868727
Updated•8 years ago
|
Assignee: gwright → mconley
Assignee | ||
Comment 28•8 years ago
|
||
New stats put e10s way ahead of non-e10s on this benchmark. I think we're done here.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Assignee | ||
Comment 29•8 years ago
|
||
bugnotes |
http://people.mozilla.org/~mconley2/bugnotes/bug-1186585.html
You need to log in
before you can comment on or make changes to this bug.
Description
•