Closed Bug 1186585 Opened 9 years ago Closed 8 years ago

19.66% - 44.84% tps talos regression with e10s enabled and APZ disabled

Categories

(Testing :: Talos, defect, P1)

defect

Tracking

(e10s+)

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---

People

(Reporter: mconley, Assigned: mconley)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(3 files)

Our first tps reports came in:

Linux:
:( tps                   22.2 +/-  1% (6#)  [ +38.9%]      30.8 +/-  2% (6#)

Linux64:
:( tps                   21.7 +/-  1% (6#)  [ +31.7%]      28.6 +/-  2% (6#)

Win7:
   tps                   25.5 +/-  1% (6#)  [  +0.9%]      25.7 +/-  2% (6#)

WinXP:
:( tps                   23.1 +/-  1% (6#)  [ +48.5%]      34.2 +/-  2% (6#)

Win8:
:) tps                   25.7 +/-  1% (6#)  [  -3.7%]      24.7 +/-  1% (6#)

OSX10.10:
:( tps                   35.2 +/-  6% (6#)  [ +28.2%]      45.1 +/-  3% (6#)


Very interesting to see little to no regression on Windows 7, and what seems like a win on Windows 8...
Assignee: nobody → gwright
Priority: -- → P1
Needinfoing myself to get more up to date data
Flags: needinfo?(gwright)
It's actually even worse now. Every platform now shows a regression with e10s, and the worst are showing a that e10s is around 100% slower than non-e10s. There's also a massive spike starting Sep 8th.

http://graphs.mozilla.org/graph.html#tests=[[329,1,43],[329,1,35],[329,1,49],[329,1,31],[329,1,47],[329,1,45],[329,1,37],[329,1,25]]&sel=1440709325345,1448485325345&displayrange=90&datatype=geo
Flags: needinfo?(gwright)
Nothing stick out to me in that range.
Blocks: 1107372
gcp, any idea why this might have caused a massive regression here?

And jmaher, do we know why this didn't set our alarms a-blazin'?
Flags: needinfo?(jmaher)
Flags: needinfo?(gpascutto)
Ah, that clears it up.
Flags: needinfo?(jmaher)
Flags: needinfo?(gpascutto)
I don't think the timing fits here? Also the merge range doesn't include a talos update that I can see.
thanks :gwright, that is the correct bug that was filed and discussed for this specific test.
No longer blocks: 1107372
So I had a chance to profile this locally on OSX.

One big difference is that with e10s the compositor spends way more time in IPDL::PLayerTransaction::RecvUpdateNoSwap (mostly in mozilla::glUploadImageDataToTexture and then moving bits for the GPU) and we're waiting for the compositor to finish.

Do we update more in the e10s case or does it take more time compared to the non-e10s version?

BTW just in case I turned APZ off, it helps a little but does not explain the majority of the regression. I also notices a difference in the e10s vs non-e10s test: waitForTabSwitchDone spins the event loop in the e10s case only, so for the else branch I've added a setTimeout(callback, 0) but that did not change much either.

It's also quite shocking for me that in runTestHelper for the e10s case we spend 3.9% in calling win.gBrowser.selectedTab = tab; That is 1,5 times more than in the non-e10s case btw... The selectedTab setter sets a bunch of other xul properties fires events and we update the current browser as well and all that seem to add up.
Couldn't sleep last night, so I poked at this a little bit for fun.

A few observations:

1) APZ seems to be responsible for ~40ms of the regression on OS X. Here's me testing with it disabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=ad4d966f2367&newProject=try&newRevision=a2937311d493&framework=1

If I've done my late-night napkin math correctly, that drops what was a 104% regression on OS X down to about a ~20% regression.

So with 40ms shaved off, at least against my try pushes here, we have ~30ms to account for.

2) Gabor, looking at the profiles you've created, I concur that IPDL::PLayerTransaction::RecvUpdateNoSwap is very suspicious. We should file a bug for that and get some graphics folks to try to explain what's going on there. Looking at your profiles, we spend only ~5ms inside RecvUpdateNoSwap for the non-e10s case, and ~18ms inside there for the e10s case.

Rounding down a little, let's say that accounts for ~10ms of the remaining ~30ms, which means we need to account for ~20ms.


I noticed that for e10s, getting profiles resulted in rather strange output from our try machines, so I did some fiddling, and came up with something maybe a bit better - it's not perfect, and not ready to land, but hopefully I can make it easier to get these soonish:

non-e10s:

http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/001e2b1a13be1ee222e2010eebcb8545433364952bd29db4327ed9f9612d9e93fe901f2754b267c3543b0f44e0ccbb9c5aeb2068e4d14e3d76410b016153875d#report=f3cfea5c1430df2caf21610e1ca8337751377346

e10s:

http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/e5fed6ae1c3d3a0bc349b42a20bd63b99ab49efd6c8cb88804ab616f44968873879701c3afc9958c7f19061ff8871601864e2ef0fd8cba2f92b705a2b4a60217#report=9f968dd9ecd3552d49e3fcb7dbbeed03b790abab

Actions from this:

1) I think we should file a bug for APZ causing an additional ~40ms of regression on OS X.
2) I think we should file a bug to get input from the graphics team on why IPDL::PLayerTransaction::RecvUpdateNoSwap is taking so long with e10s.
3) I think we should drill into the new profiles I generated (and the ones for Windows when they come in: https://treeherder.mozilla.org/#/jobs?repo=try&revision=641e6026a4bf&selectedJob=17273089)

How does this sound, gabor?
Flags: needinfo?(gkrizsanits)
(The other thing that's interesting in that last profile view is that the Layer Transaction is very short - similar to what we'd see with non-e10s. Here, I see RecvUpdateNoSwap takes only 5ms on the Compositor thread.
Depends on: 1251937
Depends on: 1251939
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #16)
> 1) APZ seems to be responsible for ~40ms of the regression on OS X.

For me it was about half of the regression locally, I only focused on the part where APZ is turned off.

> 2) Gabor, looking at the profiles you've created, I concur that
> IPDL::PLayerTransaction::RecvUpdateNoSwap is very suspicious. We should file
> a bug for that and get some graphics folks to try to explain what's going on
> there. Looking at your profiles, we spend only ~5ms inside RecvUpdateNoSwap
> for the non-e10s case, and ~18ms inside there for the e10s case.

Do you know anyone I could needinfo from the graphic team? Do you think it's related
to the tab switch machinery in tabbrowser.xul? Jim told me that we have multiple tabs
activated during the switch, can it happen that we update multiple tabs at once for
some weird reason?

> 1) I think we should file a bug for APZ causing an additional ~40ms of
> regression on OS X.

Bug 1251937

> 2) I think we should file a bug to get input from the graphics team on why
> IPDL::PLayerTransaction::RecvUpdateNoSwap is taking so long with e10s.

Bug 1251939, could you needinfo someone you think might be able to help out with this part?

> 3) I think we should drill into the new profiles I generated (and the ones
> for Windows when they come in:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=641e6026a4bf&selectedJob=17273089)
> 
> How does this sound, gabor?

Sounds like a plan :)
Flags: needinfo?(gkrizsanits)
Hey gw280 - you might find this interesting:

https://treeherder.allizom.org/perf.html#/e10s_comparesubtest?baseSignature=fe016968d213834efd424ca88680cfa7490b6c09&e10sSignature=5c199ff7bd97284c5f3820ba908f92275620cd8b

It looks like loading aljazeera.net regressed by ~483% with e10s. That might be a useful subtest to look at - the signal sure looks strong enough.
Blocks: 1251547
Keywords: perf
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #20)
> It looks like loading aljazeera.net regressed by ~483% with e10s. That might
> be a useful subtest to look at - the signal sure looks strong enough.

Might be related: Bug 1252822
Numbers with APZ disabled are in:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=a4929411c0aa&newProject=try&newRevision=850e39c8f668&framework=1&filter=TART&showOnlyImportant=0

Conclusions:

1) APZ causes a ~5.9% - ~31% tps regression with e10s enabled. This takes care of a good chunk of the regression across the board.

2) Doing some napkin math, I can re-write the e10s-only regressions (without APZ enabled) as:

           non-e10s (*)     e10s           Diff (%)
-----------------------------------------------------
Linux64:    49.18          71.23            +44.84%
OS X 10:    65.86          95.08            +44.37%
Win 7  :    60.18          74.20            +23.30%
Win 8  :    49.09          58.74            +19.66%

With APZ disabled, the regression comes down quite a bit, but it's still far higher than the 5% threshold we've set on this. There's still more work to do here.

So the next steps for this bug:

1) File a new bug for the APZ releated tps regressions and make sure the APZ team sees it.

(*) I chose the worst case, since I have two sets of samples here given that disabling APZ has no effect on non-e10s builds.
Summary: 28.2% - 48.5% tps talos regression with e10s enabled → 19.66% - 44.84% tps talos regression with e10s enabled and APZ disabled
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #22)
> 1) File a new bug for the APZ releated tps regressions and make sure the APZ
> team sees it.
> 

That's already tracked in bug 1233163 and we're working on it.
(In reply to Benoit Girard (:BenWa) from comment #23)
> (In reply to Mike Conley (:mconley) - Needinfo me! from comment #22)
> > 1) File a new bug for the APZ releated tps regressions and make sure the APZ
> > team sees it.
> > 
> 
> That's already tracked in bug 1233163 and we're working on it.

Ah, correct - that's bug 1216924. Thanks!
I typed this yesterday but it didn't get posted:

Some tests are dominated by the 'IPDL::PBrowser::SendGetInputContext' sync message. This might be the best place to start on this bug IMO.
Attached patch sketchSplinter Review
Here's a patch I've been trying out. It stops using all the tab switching logic and just paints the tab's background color if the content isn't ready. I think this would be a good starting point for tab screenshots. My idea would be to paint them inside RefLayerComposite::RenderLayer.

You have to turn APZ off for this to work well because I didn't bother with displayport suppression. Also, there's no mechanism for for recording how long a tab switch takes.
Depends on: 1253961
Depends on: 1261152
Depends on: 1262283
Assignee: gwright → mconley
Depends on: 1264409
New stats put e10s way ahead of non-e10s on this benchmark. I think we're done here.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: