Killing a content process makes the parent process unresponsive, making everything worse
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: mstange, Assigned: gsvelto)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
When the system is generally slow, and we decide to kill a content process because it seems hung, we make the situation worse by dumping a crash on the main thread of the parent process, hanging the entire browser.
This was encountered in bug 1623609 comment 4, bug 1626789 comment 7, and also in an unaired Joy of Profiling episode when we were analyzing the following profile of Firefox startup on the 2017 reference device: https://perfht.ml/2L3uUHD
Apparently the rewrite of the crash reporting system will fix this, according to bug 1623609 comment 7:
(In reply to Gabriele Svelto [:gsvelto] from bug 1623609 comment #7)
We tried moving this OMT but it regressed badly so we gave up on that. The slowness here might be due to bug 1622316 for which I have a fix for waiting for review. That being said all this code is moving completely out-of-process in bug 1588530 which I'm currently working on.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Note that this applies only to nightly. Crash generation for hung/slow content process shutdown is disabled in beta/release.
Reporter | ||
Comment 2•5 years ago
|
||
Oh, that is great to hear. I didn't know that.
![]() |
||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
I'd like to try and bump the timeout significantly, to 30 or possible even 60 seconds. This should shake out the times when content processes are just being slow at shutting down (or don't get a chance because the scheduler doesn't wake them up in time). Hopefully it would leave only the actual hard deadlocks sending us crash dumps.
But there's the downside that now some processes might really take a while to shut down before being killed. Chris, do you think this might be problematic?
Comment 5•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #4)
I'd like to try and bump the timeout significantly, to 30 or possible even 60 seconds. This should shake out the times when content processes are just being slow at shutting down (or don't get a chance because the scheduler doesn't wake them up in time). Hopefully it would leave only the actual hard deadlocks sending us crash dumps.
But there's the downside that now some processes might really take a while to shut down before being killed. Chris, do you think this might be problematic?
Nika, this is a question for you. ^
Comment 6•5 years ago
|
||
I'm a bit worried about there being a worse experience for our users if a hung content process is kept around for longer than it needs to be, but I don't have a fundamental reason why I'd oppose the change other than that.
Redirecting to jld in case they know more about this.
Assignee | ||
Comment 7•5 years ago
|
||
The number of ShutDownKill signatures I see on nightly is going up a lot. That means content process shutdown is happening more often - something we expect - but not getting any faster. If we don't increase the timeout and we don't improve the shutdown procedure the only other option we have is reverting bug 1543113. That will cause all ShutDownKill signatures to be lumped together again so we'll see just the overall volume.
Comment 8•5 years ago
|
||
I don't have any insight here besides what's already been written. I don't have any problems with increasing the timeout; 5 seconds does seem low, thinking about older hardware with several processes going through shutdown at once and maybe swapping.
Assignee | ||
Comment 9•5 years ago
|
||
Alright, taking this. Let's bump up the timeout and see if there's any improvement.
Assignee | ||
Comment 10•5 years ago
|
||
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 13•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Description
•