See Bug 1186585 Comment 16 (Another weird thing is that the content process spends quite a lot of time in PuppetWidget paint (~26%) which I don't see anywhere in the non-e10s case. Could someone explain me briefly why is that the case?)
Hey BenWa, we've been profiling the tps talos test to try to improve the e10s performance on it. Here are some profiles we gathered: non-e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/001e2b1a13be1ee222e2010eebcb8545433364952bd29db4327ed9f9612d9e93fe901f2754b267c3543b0f44e0ccbb9c5aeb2068e4d14e3d76410b016153875d#report=f3cfea5c1430df2caf21610e1ca8337751377346 e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/e5fed6ae1c3d3a0bc349b42a20bd63b99ab49efd6c8cb88804ab616f44968873879701c3afc9958c7f19061ff8871601864e2ef0fd8cba2f92b705a2b4a60217#report=9f968dd9ecd3552d49e3fcb7dbbeed03b790abab One thing we noticed is that PLayerTransaction::RecvUpdateNoSwap on the compositor thread takes quite a bit longer with e10s enabled. Do you know why that would be?
We've discussed this a bit offline. RecvUpdateNoSwap is higher but I don't think it's accounting for a score difference since RecvUpdateNoSwap is an async call and the compositor is on time. I'd like to find time to go through the profiles with mconley and look at what is going on in the main thread some more since I think the answer is in these profiles.
I just had a look and we could get a big score improvements by cutting down the time spent in 'IPDL::PBrowser::SendGetInputCont'. Some sub tests are dominated by this cost.
Assignee: nobody → gwright
tracking-e10s: --- → +
Priority: -- → P1
I'd like to come back to this. Here are two recent profiles from my OS X machine with APZ disabled: non-e10s: https://cleopatra.io/#report=7ec3fe2e552d8c2799eafa494d3317a5fb72e8d2&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A58229,%22end%22%3A59025%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A58337,%22end%22%3A58387%7D%5D&selection=0,7152,7153,7153,7154,7155,7156,7158,1317,1494,1495,1496,1497,7167,7168,7169,7170,7171,7172,7173,7174,7175,7176,7177,7178,7179,7180,7181,7182,7183,7184,7185,7186,1141 e10s: https://cleopatra.io/#report=39a2e259653bc181bf2790068697c8272c18be9e&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A55686,%22end%22%3A55906%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A55723,%22end%22%3A55777%7D%5D&selection=0,5245,5246,5246,5247,5248,5249,5251,26,27,5255,5256,5257,5258,5259,5260,5261 I've zoomed into two regions which I think are illustrative. Notice how the last layer transaction before the test is completed (the final composite that presents the loaded web content). In the non-e10s case, this takes 2-3ms... in the e10s case, we're spending something like 18ms doing what appears to be a memmove down in mozilla::gl::UploadImageDataToTexture. And until that layer transaction completes, we don't composite, and the test doesn't end. I have a reasonably high-degree of certainty that this is our bottleneck. BenWa: I seem to recall us discussing this one a few weeks back, and you convinced me that it wasn't an issue in-person. I'm sorry to say that I've forgotten our discussion. Looking at these profiles, are you still certain we don't care about these long memmoves in the e10s case?
At least part of the problem here, according to gw280 / mstange / jrmuizel, is that we're page faulting due to new texture allocations on tab switch. This is because we've got unique TexturePools per client, or something along those lines. I'm doing an experiment where we have a shared TexturePool to see how much that gains us.
We've looked at the profiles on person and from Comment 5 it sounds like you have a lead. ni-
You need to log in before you can comment on or make changes to this bug.