Closed Bug 1251939 Opened 8 years ago Closed 3 years ago

IPDL::PLayerTransaction::RecvUpdateNoSwap takes too long with e10s during TPS talos runs

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
e10s + ---

People

(Reporter: gkrizsanits, Assigned: gw280)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

See Bug 1186585 Comment 16 

(Another weird thing is that the content process spends quite a lot of time in PuppetWidget paint (~26%) which I don't see anywhere in the non-e10s case. Could someone explain me briefly why is that the case?)
Blocks: 1186585
Whiteboard: [gfx-noted]
We've discussed this a bit offline. RecvUpdateNoSwap is higher but I don't think it's accounting for a score difference since RecvUpdateNoSwap is an async call and the compositor is on time.

I'd like to find time to go through the profiles with mconley and look at what is going on in the main thread some more since I think the answer is in these profiles.
I just had a look and we could get a big score improvements by cutting down the time spent in 'IPDL::PBrowser::SendGetInputCont'. Some sub tests are dominated by this cost.
Flags: needinfo?(bgirard)
Assignee: nobody → gwright
Blocks: e10s-perf
tracking-e10s: --- → +
Priority: -- → P1
I'd like to come back to this.

Here are two recent profiles from my OS X machine with APZ disabled:

non-e10s:
https://cleopatra.io/#report=7ec3fe2e552d8c2799eafa494d3317a5fb72e8d2&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A58229,%22end%22%3A59025%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A58337,%22end%22%3A58387%7D%5D&selection=0,7152,7153,7153,7154,7155,7156,7158,1317,1494,1495,1496,1497,7167,7168,7169,7170,7171,7172,7173,7174,7175,7176,7177,7178,7179,7180,7181,7182,7183,7184,7185,7186,1141

e10s:
https://cleopatra.io/#report=39a2e259653bc181bf2790068697c8272c18be9e&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A55686,%22end%22%3A55906%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A55723,%22end%22%3A55777%7D%5D&selection=0,5245,5246,5246,5247,5248,5249,5251,26,27,5255,5256,5257,5258,5259,5260,5261

I've zoomed into two regions which I think are illustrative. Notice how the last layer transaction before the test is completed (the final composite that presents the loaded web content). In the non-e10s case, this takes 2-3ms... in the e10s case, we're spending something like 18ms doing what appears to be a memmove down in mozilla::gl::UploadImageDataToTexture.

And until that layer transaction completes, we don't composite, and the test doesn't end. I have a reasonably high-degree of certainty that this is our bottleneck.

BenWa: I seem to recall us discussing this one a few weeks back, and you convinced me that it wasn't an issue in-person. I'm sorry to say that I've forgotten our discussion. Looking at these profiles, are you still certain we don't care about these long memmoves in the e10s case?
Flags: needinfo?(bgirard)
At least part of the problem here, according to gw280 / mstange / jrmuizel, is that we're page faulting due to new texture allocations on tab switch. This is because we've got unique TexturePools per client, or something along those lines.

I'm doing an experiment where we have a shared TexturePool to see how much that gains us.
We've looked at the profiles on person and from Comment 5 it sounds like you have a lead. ni-
Flags: needinfo?(bgirard)
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.