Closed
Bug 787436
Opened 12 years ago
Closed 12 years ago
B2G Updates: Use a watchdog thread to ensure gecko *really* restarts to apply an update
Categories
(Toolkit :: Application Update, defect)
Tracking
()
People
(Reporter: cjones, Assigned: cjones)
References
Details
(Keywords: dogfood, Whiteboard: [LOE:S])
Attachments
(1 file)
We currently have a hacky "self-destruct timer" in place to shut down gecko when some threads hang. It's not very effective through, because XPCOM shutting down timers and/or the main thread hanging can prevent the bomb from going off.
What we need to do is spawn a thread to sit on the timer and then shut down hard if it doesn't go off. Ideally we'd like to get a crash report when we do that, but doing so would be pretty dangerous in this state. For linux, our best approach is likely kill(0, SIGKILL).
I suggest that we add this interface to nsIAppShell, so that widget backends have the flexibility to do their own thing.
Assignee | ||
Comment 1•12 years ago
|
||
This needs to block because the support problems engendered by not having this will be an economic burden, and puts users at risk.
blocking-basecamp: --- → +
Assignee | ||
Comment 2•12 years ago
|
||
The initial plan here is to
- add another flag to nsIAppStartup:quit, eReallyFuckingQuit or something
- in nsAppStartup::Quit(), if eReallyFuckingQuit, and #ifdef GONK, kick off a pthread on entering the function to kill(0, SIGKILL)
This is pretty gross, but the alternatives are similarly gross for different reasons.
Assignee | ||
Comment 3•12 years ago
|
||
Gene, want to grab this? I would recommend taking bug 790527 first, and then implement the watchdog here to ensure we *really* quit/reboot/poweroff.
Comment 4•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> Gene, want to grab this? I would recommend taking bug 790527 first, and
> then implement the watchdog here to ensure we *really* quit/reboot/poweroff.
Yes! I'm trying to fix bug 790527 first and will come back to this one. Thanks for the suggestions. :)
Updated•12 years ago
|
Whiteboard: [LOE:S]
Assignee | ||
Comment 5•12 years ago
|
||
We had a new regression in xpcom thread shutdown. Can't let those affect dogfooding.
Assignee: marshall → jones.chris.g
Keywords: dogfood
Assignee | ||
Comment 6•12 years ago
|
||
I don't really know who reviews code around here, but bent, this is the approach we discussed. /me goes to take a shower.
I was going to add a new flag and then noticed we're helpfully bit-diddling the two quit args into two *bytes*. I don't to make us low on bits in that first byte just on account of gonk, so I changed the interpretation of the ForceQuit value.
Attachment #662413 -
Flags: review?(bent.mozilla)
Comment on attachment 662413 [details] [diff] [review]
Implement a "really really quit" watchdog to monitor normal shutdown and exit Gecko in a hurry if it takes too long
Review of attachment 662413 [details] [diff] [review]:
-----------------------------------------------------------------
This is ok but I'd love to use mfbt/Assertions if possible.
::: toolkit/components/startup/nsAppStartup.cpp
@@ +388,5 @@
> + // If we can't SIGKILL our process group, something is badly
> + // wrong. Trying to deliver a catch-able signal to ourselves can
> + // invoke signal handlers and might cause problems. So try
> + // _exit() and hope we go away.
> + _exit(1);
Can we reuse MOZ_CRASH from mfbt here? We have about a billion different places where we try to crash in our codebase now...
@@ +397,5 @@
> +{
> + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs));
> + if (timeoutSecs < 0 || timeoutSecs > 30) {
> + QuitHard();
> + return nullptr; // not reached
So, instead of doing QuitHard here and again below just reverse the condition and put the sleep inside the if block.
@@ +405,5 @@
> + // harmlessly reaped by the OS.
> + sleep(timeoutSecs);
> +
> + QuitHard();
> + return nullptr; // not reached
How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER?
Attachment #662413 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to ben turner [:bent] from comment #7)
> Comment on attachment 662413 [details] [diff] [review]
> Implement a "really really quit" watchdog to monitor normal shutdown and
> exit Gecko in a hurry if it takes too long
>
> Review of attachment 662413 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> This is ok but I'd love to use mfbt/Assertions if possible.
>
> ::: toolkit/components/startup/nsAppStartup.cpp
> @@ +388,5 @@
> > + // If we can't SIGKILL our process group, something is badly
> > + // wrong. Trying to deliver a catch-able signal to ourselves can
> > + // invoke signal handlers and might cause problems. So try
> > + // _exit() and hope we go away.
> > + _exit(1);
>
> Can we reuse MOZ_CRASH from mfbt here? We have about a billion different
> places where we try to crash in our codebase now...
>
Ah, but we don't *want* to crash. See the comment. MOZ_CRASH() in general can invoke signal handlers, which means another risk of not terminating.
> @@ +397,5 @@
> > +{
> > + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs));
> > + if (timeoutSecs < 0 || timeoutSecs > 30) {
> > + QuitHard();
> > + return nullptr; // not reached
>
> So, instead of doing QuitHard here and again below just reverse the
> condition and put the sleep inside the if block.
>
OK.
> @@ +405,5 @@
> > + // harmlessly reaped by the OS.
> > + sleep(timeoutSecs);
> > +
> > + QuitHard();
> > + return nullptr; // not reached
>
> How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER?
I still need to |return nullptr| here to make some compilers shut up. Do you know that mfbt has implementations of MOZ_NOT_REACHED that expand to |noreturn| semantics for all compilers?
Assignee | ||
Comment 9•12 years ago
|
||
Comment 10•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in
before you can comment on or make changes to this bug.
Description
•