Closed Bug 1312022 Opened 3 years ago Closed 3 years ago

Keep alive content processes for bc and dt tests for e10s-multi

Categories

(Core :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: gkrizsanits, Assigned: gkrizsanits)

References

Details

(Whiteboard: [e10s-multi:M1])

Attachments

(4 files, 3 obsolete files)

Because of the slow process startup times bc and dt tests time out frequently. Until some serious perf optimization to enable 2 content processes on nightly we'll try to keep alive both content processes for these suits.
Attachment #8803377 - Attachment is patch: true
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #1)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=af9d6d6545741dbea4bc53499b2a38883f33a189

Wrong try flags and tree is closed :( will try it later...
looks promising:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=208987d81c04&selectedJob=29698810

some known issues, and a few interesting crashes that I have to look into...
Seems like the previous patch still had some side effects for the cases where the pref is not set. Trying to be a bit more careful:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f97c638c2872
Attachment #8803376 - Attachment is obsolete: true
Attachment #8805079 - Flags: review?(mrbkap)
Attached patch part3: Leak detector hack. v1 (obsolete) — Splinter Review
Sometimes we get a window/docshell leak report, because the last 2 windows and last docshell is destroyed later then we decide to report the leak... Based on bug 1073352 comment 28 (and the following) it seems like it's not a trivial to fix this, on the other hand I don't think it's super important either. If there was an actual leak, since we use this in debug builds, it will show up anyway, if it's just the problem Andrew describes then I'm not even sure it's worth to fix it. Bottom line is, that I wouldn't block on this. (this is not super frequent, a couple of times per platform, example: https://treeherder.mozilla.org/logviewer.html#?job_id=29804952&repo=try#L18749)

The patch just does not report the leak if it's the last one or two windows/docshell in the content process. Just in case I want to let Andrew to take a look at this temporary hack as well.
Attachment #8805084 - Flags: review?(mrbkap)
Attachment #8805084 - Flags: feedback?(continuation)
Not sure if I should revert this change yet but I have a patch for it.
Comment on attachment 8805084 [details] [diff] [review]
part3: Leak detector hack. v1

Review of attachment 8805084 [details] [diff] [review]:
-----------------------------------------------------------------

I'd really like to see some more analysis of what the leak actually is before approving something like this. I fixed a number of these leaks when getting e10s leak checking working, and they were not super difficult to fix.

If I understand it correctly, this patch allows us to always leak two of each. If somebody lands a test that permanently leaks 1 or 2, then it'll be green, but at some later point an intermittent leak will show up, when we hit the extra leak from your patches here, which will be difficult to track down. How frequent is "sometimes"?

This will ignore leaks of any docshells and windows when there are only 1 or 2 alive, so this could include others, before the last two.

Isn't this ignoring these leaks in the parent process, too? Or is there some check I'm missing?

As a minor nitpick for the patch, it would be less fragile and easier to understand if you matched against "DOMWINDOW == 0 " instead of just "0 ".
Attachment #8805084 - Flags: feedback?(continuation) → review-
> If there was an actual leak, since we use this in debug builds, it will show up anyway,
These kinds of leaks don't show up with the regular shutdown leak checking. The purpose of this particular leak check is that you can have a case where you leak a window forever after closing a tab, but it still gets cleaned up somehow before shutdown.
(In reply to Andrew McCreight [:mccr8] from comment #11)
> Comment on attachment 8805084 [details] [diff] [review]
> part3: Leak detector hack. v1
> 
> Review of attachment 8805084 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd really like to see some more analysis of what the leak actually is
> before approving something like this. I fixed a number of these leaks when
> getting e10s leak checking working, and they were not super difficult to fix.
> 
> If I understand it correctly, this patch allows us to always leak two of
> each. If somebody lands a test that permanently leaks 1 or 2, then it'll be
> green, but at some later point an intermittent leak will show up, when we
> hit the extra leak from your patches here, which will be difficult to track
> down. How frequent is "sometimes"?

After reading your comments I think I misunderstood a bit the purpose of this check
and how this all works. The tracking problem could be easily solved by printing something
else than an error message, but I can see now why is it important to fix this. (Not to mention
that the patch did not do exactly what I intended to do)

So the frequency is low enough to make it hard to reproduce it locally (I couldn't so far)
but high enough to make it impossible to turn on 2 content processes on nightly. But I'll try
to come up with something. Do you have any hints how to hunt down these issues?

(In reply to Andrew McCreight [:mccr8] from comment #12)
> > If there was an actual leak, since we use this in debug builds, it will show up anyway,
> These kinds of leaks don't show up with the regular shutdown leak checking.
> The purpose of this particular leak check is that you can have a case where
> you leak a window forever after closing a tab, but it still gets cleaned up
> somehow before shutdown.

Right, it makes sense now.
Comment on attachment 8805084 [details] [diff] [review]
part3: Leak detector hack. v1

Review of attachment 8805084 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling r?, until I figure out how to fix this leak.
Attachment #8805084 - Flags: review?(mrbkap)
> Do you have any hints how to hunt down these issues?

I think I usually used refcount logging, or DMD's heap scan mode to try to figure out who is holding references to things. If you can't reproduce locally, that will make it more difficult to investigate.

The line after the UNEXPECTED-FAIL prints out information about the leaked windows:
TEST-INFO | toolkit/mozapps/extensions/test/browser/browser_webext_options.js | windows(s) leaked: [pid = 5132] [serial = 207], [pid = 5132] [serial = 205]
You can then search in the log for, say, "[pid = 5132] [serial = 205]" to see exactly where the window was created. That might help...
One thing that might help reproduce the issue is to make 4 or 5 copies of the test that is leaking, and just run them all in a row.
It turns out that it's just a bunch of leaky tests. I have no idea why they did not show up on the radar sooner, but I guess I was just the unlucky one who's patch gave them the final touch that triggered the leaks to be detected. By leaks I mean we finish the test sooner than we should and that _sometimes_ causing some trouble (usually because incorrect usage of nested generator functions). I think I fixed the most common offenders, the rest might be just some rare intermittent, let's see if more stuff needs to get fixed:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=98d8e8d88b3affdd7f54fcb2cb0062835162be32
Attachment #8805084 - Attachment is obsolete: true
Attachment #8805540 - Flags: review?(mrbkap)
Comment on attachment 8805085 [details] [diff] [review]
rem process reusing code. v1

Review of attachment 8805085 [details] [diff] [review]:
-----------------------------------------------------------------

I think there is no reason to leave in this hack with the current setup, let's remove it.
Attachment #8805085 - Flags: review?(mrbkap)
Comment on attachment 8805079 [details] [diff] [review]
part1: pref to keep alive content processes. v2

Do we shut the processes down on parent process shutdown?
Attachment #8805079 - Flags: review?(mrbkap) → review+
Comment on attachment 8805080 [details] [diff] [review]
part2: keep alive content processes for bc/dt if e10s-multi is enabled. v2

Do we need a testing peer's review on this?
Attachment #8805080 - Flags: review?(mrbkap) → review+
Attachment #8805085 - Flags: review?(mrbkap) → review+
Attachment #8805540 - Flags: review?(mrbkap) → review+
(In reply to Blake Kaplan (:mrbkap) from comment #19)
> Comment on attachment 8805079 [details] [diff] [review]
> part1: pref to keep alive content processes. v2
> 
> Do we shut the processes down on parent process shutdown?

Yes, also our tests would report any lingering processes left behind.

(In reply to Blake Kaplan (:mrbkap) from comment #20)
> Comment on attachment 8805080 [details] [diff] [review]
> part2: keep alive content processes for bc/dt if e10s-multi is enabled. v2
> 
> Do we need a testing peer's review on this?

Good point, I'm sure keeping someone in the loop is important...
Ted, we want to enable two content processes on nightly, and since content process startup is still not perfect, and in debug build it's pretty bad even, to avoid timeouts in bc/dt tests we keep the first two content processes alive. Once we optimize the process startup this will be ofc turned off again. The patches here are inactive until dom.ipc.processCount is set to 2 by default, so I'm quite confident landing them, but before we start using this on mc I want to give you a chance to review the approach we're about to take here and raise any concerns you might have with it.
Flags: needinfo?(ted)
I have to update patch 1 a bit to work together with bug 1304140. I talked to smaug on IRC about this, and we agreed that we just never want to keep those special processes alive.

try build as it will land on mc (if I didn't break something in the rebase process...):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c19ed80effa5ea988b7b8d0dadbbcde47ed396bd
Depends on: 1304140
Pushed by gkrizsanits@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/37fbc13ab60a
Pref to keep some content processes alive. r=mrbkap
https://hg.mozilla.org/integration/mozilla-inbound/rev/11d5d68a6a4a
Keep alive content processes on bc and dt. r=mrbkap
https://hg.mozilla.org/integration/mozilla-inbound/rev/423cf7cd8adc
Fix some leaky tests. r=mrbkap
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c1f5793e613
Revert reusing processes between bt tests. r=mrbkap
Thanks for the heads-up (sorry I didn't look at this sooner). This seems like a sensible approach. It actually seems like longer-term we might want to make some variant of this the default behavior--if the user closes the last tab in a process, we should probably keep it around for a while in expectation of them opening a new tab. Is something like that planned?
Flags: needinfo?(ted)
Assignee: nobody → gkrizsanits
You need to log in before you can comment on or make changes to this bug.