Hit MOZ_CRASH(Shutdown hanging at step profile-change-teardown. Something is blocking the main-thread.) at toolkit/components/terminator/nsTerminator.cpp:220
Categories
(Toolkit :: Startup and Profile System, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | wontfix |
firefox86 | --- | unaffected |
firefox87 | --- | unaffected |
firefox88 | --- | wontfix |
firefox97 | --- | wontfix |
firefox98 | --- | wontfix |
firefox99 | --- | fixed |
People
(Reporter: petr.sumbera, Assigned: petr.sumbera)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0
Steps to reproduce:
I can see this on Solaris when executing firefox --screenshot about:blank
Screenshot saved to: /builds/psumbera/mozilla-central-build/screenshot.png
Hit MOZ_CRASH(Shutdown hanging at step profile-change-teardown. Something is blocking the main-thread.) at /builds/psumbera/mozilla-central-build/toolkit/components/terminator/nsTerminator.cpp:220
#01: mozilla::(anonymous namespace)::RunWatchdog(void*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1b880b69]
#02: _pt_root[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libnspr4.so +0x4a298]
#03: _thrp_setup[/lib/amd64/libc.so.1 +0x271863]
#04: _lwp_start[/lib/amd64/libc.so.1 +0x271b60]
The first bad revision is:
changeset: 569009:541363348e76
user: Jens Stutte <jstutte@mozilla.com>
date: Sat Feb 27 06:38:14 2021 -0800
description:
Bug 1689953: Harmonize shutdown phase definitions across nsTerminator and AppShutdown r=dthayer,chutten
This patch wants to solve several quirks around the shutdown terminator.
- Use the same shutdown phase definitions in AppShutdown and nsTerminator. This touches quite a few files.
- Ensure that the terminator phase shift is handled before any shutdown observer notifications are sent and reduce its heartbeat duration.
- Add missing phases to the shutdown telemetry.
Please note that this changes the unit of "tick" to 100ms rather than 1s.
As a side effect, we also remove the obsolete "shutdown-persist" context.
While the existing test coverage continues to prove the most important functions, we acknowledge the wish for better test coverage with [[ https://bugzilla.mozilla.org/show_bug.cgi?id=1693966 | bug 1693966 ]].
Differential Revision: https://phabricator.services.mozilla.com/D103626
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 1•4 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Toolkit::Startup and Profile System' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 2•4 years ago
|
||
Hi Petr, thanks for reporting!
Does this reproduce always? Under linux I cannot see this happen (though I see other probably unwanted errors, unrelated to bug 1689953).
Usually shutdown hangs are intermittent, but having a 100% reproducing one could give us some insights. The patch you marked as regressor is mostly a diagnostic patch that makes reporting better around shutdown hangs, so I think it is not very likely to be the cause, but just the symptom.
Assignee | ||
Comment 3•4 years ago
|
||
I think It happens always. But I have just scratched workspace and building it with HEARTBEAT_INTERVAL_MS set to 1000.
Assignee | ||
Comment 4•4 years ago
|
||
I can confirm that following:
--- a/toolkit/components/terminator/nsTerminator.cpp Mon Mar 01 16:15:30 2021 +0100
+++ b/toolkit/components/terminator/nsTerminator.cpp Tue Mar 02 10:21:36 2021 +0100
@@ -66,7 +66,7 @@
// forcefully.
#define ADDITIONAL_WAIT_BEFORE_CRASH_MS 3000
-#define HEARTBEAT_INTERVAL_MS 100
+#define HEARTBEAT_INTERVAL_MS 1000
namespace mozilla {
fixed the issue.
Note that debug build on Solaris seems to be generally pretty slow. But I have no comparison to Linux.
Comment 5•4 years ago
|
||
(In reply to Petr Sumbera from comment #4)
I can confirm that following:
--- a/toolkit/components/terminator/nsTerminator.cpp Mon Mar 01 16:15:30 2021 +0100 +++ b/toolkit/components/terminator/nsTerminator.cpp Tue Mar 02 10:21:36 2021 +0100 @@ -66,7 +66,7 @@ // forcefully. #define ADDITIONAL_WAIT_BEFORE_CRASH_MS 3000 -#define HEARTBEAT_INTERVAL_MS 100 +#define HEARTBEAT_INTERVAL_MS 1000 namespace mozilla {
fixed the issue.
Note that debug build on Solaris seems to be generally pretty slow. But I have no comparison to Linux.
Without changing toolkit.asyncshutdown.crash_timeout
preference? Looking at how we calculate ticks, your change above should in average just add 500ms, max. 900ms. Which might be enough to solve it on your specific machine, though.
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #5)
Without changing
toolkit.asyncshutdown.crash_timeout
preference?
Yes. No other change.
Comment 7•4 years ago
|
||
You might want to try what happens, if you revert your change but increase the toolkit.asyncshutdown.crash_timeout
preference. I assume that solves the problem, too (which would probably mean, we should not care too much here, as this is a debug build).
Comment 8•4 years ago
|
||
Set release status flags based on info from the regressing bug 1689953
Assignee | ||
Comment 9•4 years ago
|
||
I have tried to change 60 to 600 here:
https://searchfox.org/mozilla-central/source/toolkit/components/asyncshutdown/AsyncShutdown.jsm#87
But it makes no difference. It still crashes. Any idea?
Comment 10•4 years ago
|
||
Please go to about:config and change toolkit.asyncshutdown.crash_timeout
there, the pref might ovrride your default from a saved profile.
Assignee | ||
Comment 11•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #10)
Please go to about:config and change
toolkit.asyncshutdown.crash_timeout
there, the pref might ovrride your default from a saved profile.
Well, I run firefox --screenshot about:blank
at the end of my regular automated Firefox build procedure. And I do start it with new HOME (no existing profile). So there is no profile at the beginning!?
Assignee | ||
Comment 12•4 years ago
|
||
Ok, I tried to set toolkit.asyncshutdown.crash_timeout
in about:config
and finally it helped. But 600000 wasn't enough (as I already tested above). So after changing it to 6000000 it exits without the issue.
But still don't understand what's going here..
Updated•4 years ago
|
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 15•3 years ago
|
||
The main problem seems to be here:
const PRIntervalTime ticksDuration =
PR_MillisecondsToInterval(HEARTBEAT_INTERVAL_MS); // This returns on Solaris 10000
options->crashAfterTicks = crashAfterMS / ticksDuration; // 60000 / 10000 = 6!
All system do sleep between hearbeats just 0.1s
So on Solaris it just waits for 0.6s instead of 60s.
I think it shouldn't use PR_MillisecondsToInterval() at all and instead do just:
options->crashAfterTicks = crashAfterMS / HEARTBEAT_INTERVAL_MS;
Even with such change, old SPARC T4 system needs about 178s to shutdown (when there is no ~/.mozilla directorty). So default timeout of 60s isn't enough there. But I can easily extend it using toolkit.asyncshutdown.crash_timeout
.
Assignee | ||
Comment 16•3 years ago
|
||
Updated•3 years ago
|
Comment 17•3 years ago
|
||
Comment 18•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•