Closed Bug 787436 Opened 12 years ago Closed 12 years ago

B2G Updates: Use a watchdog thread to ensure gecko really restarts to apply an update

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla18

Project Flags:

blocking-basecamp

People

(Reporter: cjones, Assigned: cjones)

References

Details

(Keywords: dogfood, Whiteboard: [LOE:S])

Attachments

(1 file)

Implement a "really really quit" watchdog to monitor normal shutdown and exit Gecko in a hurry if it takes too long 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 8.12 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Description

•

12 years ago

We currently have a hacky "self-destruct timer" in place to shut down gecko when some threads hang. It's not very effective through, because XPCOM shutting down timers and/or the main thread hanging can prevent the bomb from going off. What we need to do is spawn a thread to sit on the timer and then shut down hard if it doesn't go off. Ideally we'd like to get a crash report when we do that, but doing so would be pretty dangerous in this state. For linux, our best approach is likely kill(0, SIGKILL). I suggest that we add this interface to nsIAppShell, so that widget backends have the flexibility to do their own thing.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 1

•

12 years ago

This needs to block because the support problems engendered by not having this will be an economic burden, and puts users at risk.

blocking-basecamp: --- → +

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 2

•

12 years ago

The initial plan here is to - add another flag to nsIAppStartup:quit, eReallyFuckingQuit or something - in nsAppStartup::Quit(), if eReallyFuckingQuit, and #ifdef GONK, kick off a pthread on entering the function to kill(0, SIGKILL) This is pretty gross, but the alternatives are similarly gross for different reasons.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 3

•

12 years ago

Gene, want to grab this? I would recommend taking bug 790527 first, and then implement the watchdog here to ensure we *really* quit/reboot/poweroff.

Gene Lian (I already quit Mozilla)

Updated

•

12 years ago

Depends on: 790527

Gene Lian (I already quit Mozilla)

Comment 4

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #3) > Gene, want to grab this? I would recommend taking bug 790527 first, and > then implement the watchdog here to ensure we *really* quit/reboot/poweroff. Yes! I'm trying to fix bug 790527 first and will come back to this one. Thanks for the suggestions. :)

Marshall Culpepper [:marshall_law]

Updated

•

12 years ago

Whiteboard: [LOE:S]

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 5

•

12 years ago

We had a new regression in xpcom thread shutdown. Can't let those affect dogfooding.

Assignee: marshall → jones.chris.g

Keywords: dogfood

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 6

•

12 years ago

Attached patch Implement a "really really quit" watchdog to monitor normal shutdown and exit Gecko in a hurry if it takes too long — Details — Splinter Review

I don't really know who reviews code around here, but bent, this is the approach we discussed. /me goes to take a shower. I was going to add a new flag and then noticed we're helpfully bit-diddling the two quit args into two *bytes*. I don't to make us low on bits in that first byte just on account of gonk, so I changed the interpretation of the ForceQuit value.

Attachment #662413 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 7

•

12 years ago

Comment on attachment 662413 [details] [diff] [review] Implement a "really really quit" watchdog to monitor normal shutdown and exit Gecko in a hurry if it takes too long Review of attachment 662413 [details] [diff] [review]: ----------------------------------------------------------------- This is ok but I'd love to use mfbt/Assertions if possible. ::: toolkit/components/startup/nsAppStartup.cpp @@ +388,5 @@ > + // If we can't SIGKILL our process group, something is badly > + // wrong. Trying to deliver a catch-able signal to ourselves can > + // invoke signal handlers and might cause problems. So try > + // _exit() and hope we go away. > + _exit(1); Can we reuse MOZ_CRASH from mfbt here? We have about a billion different places where we try to crash in our codebase now... @@ +397,5 @@ > +{ > + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs)); > + if (timeoutSecs < 0 || timeoutSecs > 30) { > + QuitHard(); > + return nullptr; // not reached So, instead of doing QuitHard here and again below just reverse the condition and put the sleep inside the if block. @@ +405,5 @@ > + // harmlessly reaped by the OS. > + sleep(timeoutSecs); > + > + QuitHard(); > + return nullptr; // not reached How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER?

Attachment #662413 - Flags: review?(bent.mozilla) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 8

•

12 years ago

(In reply to ben turner [:bent] from comment #7) > Comment on attachment 662413 [details] [diff] [review] > Implement a "really really quit" watchdog to monitor normal shutdown and > exit Gecko in a hurry if it takes too long > > Review of attachment 662413 [details] [diff] [review]: > ----------------------------------------------------------------- > > This is ok but I'd love to use mfbt/Assertions if possible. > > ::: toolkit/components/startup/nsAppStartup.cpp > @@ +388,5 @@ > > + // If we can't SIGKILL our process group, something is badly > > + // wrong. Trying to deliver a catch-able signal to ourselves can > > + // invoke signal handlers and might cause problems. So try > > + // _exit() and hope we go away. > > + _exit(1); > > Can we reuse MOZ_CRASH from mfbt here? We have about a billion different > places where we try to crash in our codebase now... > Ah, but we don't *want* to crash. See the comment. MOZ_CRASH() in general can invoke signal handlers, which means another risk of not terminating. > @@ +397,5 @@ > > +{ > > + int32_t timeoutSecs = int32_t(intptr_t(aTimeoutSecs)); > > + if (timeoutSecs < 0 || timeoutSecs > 30) { > > + QuitHard(); > > + return nullptr; // not reached > > So, instead of doing QuitHard here and again below just reverse the > condition and put the sleep inside the if block. > OK. > @@ +405,5 @@ > > + // harmlessly reaped by the OS. > > + sleep(timeoutSecs); > > + > > + QuitHard(); > > + return nullptr; // not reached > > How about MOZ_NOT_REACHED? At least MOZ_NOT_REACHED_MARKER? I still need to |return nullptr| here to make some compilers shut up. Do you know that mfbt has implementations of MOZ_NOT_REACHED that expand to |noreturn| semantics for all compilers?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 9

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a81233955b61

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 10

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/a81233955b61

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla18

You need to log in before you can comment on or make changes to this bug.

Bugzilla

B2G Updates: Use a watchdog thread to ensure gecko really restarts to apply an update

Categories

(Toolkit :: Application Update, defect)

Tracking

()

People

(Reporter: cjones, Assigned: cjones)

References

Details

(Keywords: dogfood, Whiteboard: [LOE:S])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type

B2G Updates: Use a watchdog thread to ensure gecko *really* restarts to apply an update

B2G Updates: Use a watchdog thread to ensure gecko really restarts to apply an update