Closed Bug 1184277 Opened 9 years ago Closed 9 years ago

Have talos run GC / CC in the content process when the parent does it

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---
firefox42 --- affected

People

(Reporter: mconley, Assigned: mconley)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Bug 1174776 tracks a 8.5%-30% tp5o talos regression. We've got profiles for some of the worst offenders now:

# xinhuanet.com

non-e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/3f1cf539d64869a3159e7cb1934e3b47112b5cfea8cdb5b8a2208f2d44c46ec326601ef87c617547dccd865533b2bc9497031b8bf7a91e797c3d2ce43036430f&pathInZip=profile_tp5o/xinhuanet.com/cycle_0.sps#report=de3f6a10f8926d6a876c537d1dd092d8e548e35d&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A824068,%22end%22%3A825106%7D%5D&selection=0,1

e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/6a7aa316e29e3130cc3de4fa966904b6739babcaa243a5e84c5331c1ec7c435f8c94f5631d5f31174f86e006f44dd4464c0c22bc4ac295db8c61c48859328eea&pathInZip=profile_tp5o/xinhuanet.com/cycle_0.sps


# youtube.com

non-e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/3f1cf539d64869a3159e7cb1934e3b47112b5cfea8cdb5b8a2208f2d44c46ec326601ef87c617547dccd865533b2bc9497031b8bf7a91e797c3d2ce43036430f&pathInZip=profile_tp5o/youtube.com/cycle_0.sps#report=0e810e3b58dac0fe5f297991a08ffc622f4b1c5c&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A918418,%22end%22%3A918849%7D%5D&selection=0,1,593,188,188,678,679,681,682,683,680,681,682,683,684,685,686,687,688,689,729,681,682,683,684,685,686,687,682,683,684,685,686,687,738,739,740,741,682,683,680,681,738,739,740,741,682,743,744,682,683,684,685,686,687,758,759,760,687,730,698,699,700,701,702,703,704

e10s: http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/6a7aa316e29e3130cc3de4fa966904b6739babcaa243a5e84c5331c1ec7c435f8c94f5631d5f31174f86e006f44dd4464c0c22bc4ac295db8c61c48859328eea&pathInZip=profile_tp5o/youtube.com/cycle_0.sps

One thing I noticed is that we spend much, much more time cycle collecting in the content process with e10s than in the main process with non-e10s.


Samples spend in nsCycleCollector::Collect:

xinhuanet.com:
non-e10s: 0
e10s: 466

youtube.com:
non-e10s: 11
e10s: 659


We should find out why.
Blocks: 1174776
The current theory is:
http://hg.mozilla.org/build/talos/file/8a61f8d21fc0/talos/pageloader/chrome/pageloader.js#l120

Here we force GC/CC. In e10s mode this doesn't clean up the content process garbage which would explain why we're seeing a lot more GC/CC during the page load themselves.

We should try having e10s CC behavior match the non e10s behavior and the scores deviation should converge.
Attached patch patchSplinter Review
Yeah that sounds right to me. Mochitest e10s had a similar issue.
Here's a try push that should be GC/CCing on the child:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d8771c91995
Assignee: nobody → bgirard
Yes, a nice improvement there. Is this patch ready for review? If so, I'd suggest jmaher.
It's not. I hacked the GC/CC platform messages to be sync message to test this:
https://hg.mozilla.org/try/rev/1d8771c91995

I'm not sure if we should try to make the GC/CC wait on the child by sending and waiting asynchronously on a response or if we should (not ideal) make the platform messages sync?

I'm not sure what the best way of waiting on the GC/CC is here. I can add callbacks and add a js message but it doesn't feel ideal.
(In reply to Benoit Girard (:BenWa) from comment #7)
> I'm not sure what the best way of waiting on the GC/CC is here. I can add
> callbacks and add a js message but it doesn't feel ideal.

In my still-unlanded patches for bug 1096614 and the bugs it depends on (in particular, bug 1149297) I made a thing using a promise with AsyncShutdown to wait for messages from all of the children. The async shutdown thing handles timeouts.  The messages are sent async using the message manager. It worked at least in local testing, but I never got to the point where I actually tried it on a full mochitest run.
What we probably want to do is have pageloader send down a message to the child when it wants to do a GC / CC, and then go back to the event loop until a message comes back from the child saying "Done".
Any chance someone more capable then I in front-end code want to pick this up and steal credit? I'll get a head start on the next regression cause.
Yeah, I'll do this.
Assignee: bgirard → mconley
Summary: Investigate aggressive cycle collection in content process during tp5o test → Have talos run GC / CC in the content process when the parent doe sit
Summary: Have talos run GC / CC in the content process when the parent doe sit → Have talos run GC / CC in the content process when the parent does it
Bug 1184277 - Make the nextPage callback of Pageloader use Task.async so we can do async stuff between pageloads. r?jmaher

There are various things we'd like to do between pageloads, like asynchronously dump
profiles, or send messages to content (like "do a GC / CC"). This adds the infrastructure
to do these async activities between page loads.
Attachment #8636242 - Flags: review?(jmaher)
Bug 1184277 - Have the content process force GC / CC between pageload tests. r?jmaher
Attachment #8636243 - Flags: review?(jmaher)
Comment on attachment 8636242 [details]
MozReview Request: Bug 1184277 - Make the nextPage callback of Pageloader use Task.async so we can do async stuff between pageloads. r?jmaher

https://reviewboard.mozilla.org/r/13619/#review12291

this is simple and straightforward.  Will this work on Android?  I would like to ensure we compare talos numbers before/after this change (async + gc/cc) to ensure that we understand any shifts we see.
Attachment #8636242 - Flags: review?(jmaher) → review+
Comment on attachment 8636243 [details]
MozReview Request: Bug 1184277 - Have the content process force GC / CC between pageload tests. r?jmaher

https://reviewboard.mozilla.org/r/13621/#review12293

overall this is straightforward, thanks!  I could see a followup bug to move all the code we load in the child process to be in talos-content.js- that way we could have pageload/cc/etc. in the same file and setup once/cleanup once.

::: talos/pageloader/chrome/pageloader.js:456
(Diff revision 1)
> +      report.recordCCTime(contentCCTime);

while we don't really use the CC time in the report won't the double entry here confuse us?  We report on line 452 already and now we are reporting again on line 456.
Attachment #8636243 - Flags: review?(jmaher) → review+
url:        https://hg.mozilla.org/build/talos/rev/124e5287b36186a45c99f0ee5745a89a8fc2d69a
changeset:  124e5287b36186a45c99f0ee5745a89a8fc2d69a
user:       Mike Conley <mconley@mozilla.com>
date:       Fri Jul 17 17:37:01 2015 -0400
description:
Bug 1184277 - Make the nextPage callback of Pageloader use Task.async so we can do async stuff between pageloads. r=jmaher.

There are various things we'd like to do between pageloads, like asynchronously dump
profiles, or send messages to content (like "do a GC / CC"). This adds the infrastructure
to do these async activities between page loads.

url:        https://hg.mozilla.org/build/talos/rev/e97460aa654e5b4064142b747516a4e4849059b7
changeset:  e97460aa654e5b4064142b747516a4e4849059b7
user:       Mike Conley <mconley@mozilla.com>
date:       Mon Jul 20 12:26:19 2015 -0400
description:
Bug 1184277 - Have the content process force GC / CC between pageload tests. r=jmaher
For the record, I removed the content process GC / CC timing stuff before I landed.
Depends on: 1186057
url:        https://hg.mozilla.org/build/talos/rev/deca965654a39a38c9084804adebd78c1e23efb7
changeset:  deca965654a39a38c9084804adebd78c1e23efb7
user:       Mike Conley <mconley@mozilla.com>
date:       Tue Jul 21 15:13:43 2015 -0400
description:
Bug 1184277 - Follow up: add talos-content.js to the main thread IO whitelist. r=jmaher.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.