Closed
Bug 787436
Opened 12 years ago
Closed 12 years ago
B2G Updates: Use a watchdog thread to ensure gecko *really* restarts to apply an update
Categories
(Toolkit :: Application Update, defect)
Tracking
()
People
(Reporter: cjones, Assigned: cjones)
References
Details
(Keywords: dogfood, Whiteboard: [LOE:S])
Attachments
(1 file)
We currently have a hacky "self-destruct timer" in place to shut down gecko when some threads hang. It's not very effective through, because XPCOM shutting down timers and/or the main thread hanging can prevent the bomb from going off. What we need to do is spawn a thread to sit on the timer and then shut down hard if it doesn't go off. Ideally we'd like to get a crash report when we do that, but doing so would be pretty dangerous in this state. For linux, our best approach is likely kill(0, SIGKILL). I suggest that we add this interface to nsIAppShell, so that widget backends have the flexibility to do their own thing.
Assignee | ||
Comment 1•12 years ago
|
||
This needs to block because the support problems engendered by not having this will be an economic burden, and puts users at risk.
blocking-basecamp: --- → +
Assignee | ||
Comment 2•12 years ago
|
||
The initial plan here is to - add another flag to nsIAppStartup:quit, eReallyFuckingQuit or something - in nsAppStartup::Quit(), if eReallyFuckingQuit, and #ifdef GONK, kick off a pthread on entering the function to kill(0, SIGKILL) This is pretty gross, but the alternatives are similarly gross for different reasons.
Assignee | ||
Comment 3•12 years ago
|
||
Gene, want to grab this? I would recommend taking bug 790527 first, and then implement the watchdog here to ensure we *really* quit/reboot/poweroff.
Comment 4•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3) > Gene, want to grab this? I would recommend taking bug 790527 first, and > then implement the watchdog here to ensure we *really* quit/reboot/poweroff. Yes! I'm trying to fix bug 790527 first and will come back to this one. Thanks for the suggestions. :)
Updated•12 years ago
|
Whiteboard: [LOE:S]
Assignee | ||
Comment 5•12 years ago
|
||
We had a new regression in xpcom thread shutdown. Can't let those affect dogfooding.
Assignee: marshall → jones.chris.g
Keywords: dogfood
Assignee | ||
Comment 6•12 years ago
|
||
I don't really know who reviews code around here, but bent, this is the approach we discussed. /me goes to take a shower. I was going to add a new flag and then noticed we're helpfully bit-diddling the two quit args into two *bytes*. I don't to make us low on bits in that first byte just on account of gonk, so I changed the interpretation of the ForceQuit value.
Attachment #662413 -
Flags: review?(bent.mozilla)
Comment on attachment 662413 [details] [diff] [review] Implement a "really really quit" watchdog to monitor normal shutdown and exit Gecko in a hurry if it takes too long Review of attachment 662413 [details] [diff] [review]: ----------------------------------------------------------------- This is ok but I'd love to use mfbt/Assertions if possible. ::: toolkit/components/startup/nsAppStartup.cpp @@ +388,5 @@ > + // If we can't SIGKILL our process group, something is badly > + // wrong. Trying to deliver a catch-able signal to ourselves can > + // invoke signal handlers and might cause problems. So try > + // _exit() and hope we go away. > + _exit(1); Can we reuse MOZ_CRASH from mfbt here? We have about a billion different places where we try to crash in our codebase now... @@ +397,5 @@ > +{ > + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs)); > + if (timeoutSecs < 0 || timeoutSecs > 30) { > + QuitHard(); > + return nullptr; // not reached So, instead of doing QuitHard here and again below just reverse the condition and put the sleep inside the if block. @@ +405,5 @@ > + // harmlessly reaped by the OS. > + sleep(timeoutSecs); > + > + QuitHard(); > + return nullptr; // not reached How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER?
Attachment #662413 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to ben turner [:bent] from comment #7) > Comment on attachment 662413 [details] [diff] [review] > Implement a "really really quit" watchdog to monitor normal shutdown and > exit Gecko in a hurry if it takes too long > > Review of attachment 662413 [details] [diff] [review]: > ----------------------------------------------------------------- > > This is ok but I'd love to use mfbt/Assertions if possible. > > ::: toolkit/components/startup/nsAppStartup.cpp > @@ +388,5 @@ > > + // If we can't SIGKILL our process group, something is badly > > + // wrong. Trying to deliver a catch-able signal to ourselves can > > + // invoke signal handlers and might cause problems. So try > > + // _exit() and hope we go away. > > + _exit(1); > > Can we reuse MOZ_CRASH from mfbt here? We have about a billion different > places where we try to crash in our codebase now... > Ah, but we don't *want* to crash. See the comment. MOZ_CRASH() in general can invoke signal handlers, which means another risk of not terminating. > @@ +397,5 @@ > > +{ > > + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs)); > > + if (timeoutSecs < 0 || timeoutSecs > 30) { > > + QuitHard(); > > + return nullptr; // not reached > > So, instead of doing QuitHard here and again below just reverse the > condition and put the sleep inside the if block. > OK. > @@ +405,5 @@ > > + // harmlessly reaped by the OS. > > + sleep(timeoutSecs); > > + > > + QuitHard(); > > + return nullptr; // not reached > > How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER? I still need to |return nullptr| here to make some compilers shut up. Do you know that mfbt has implementations of MOZ_NOT_REACHED that expand to |noreturn| semantics for all compilers?
Assignee | ||
Comment 9•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a81233955b61
Comment 10•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a81233955b61
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in
before you can comment on or make changes to this bug.
Description
•