When rebooting/powering off, go through normal gecko shutdown

RESOLVED FIXED in mozilla18

Status

()

defect
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: cjones, Assigned: airpingu)

Tracking

unspecified
mozilla18
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-basecamp:+)

Details

(Whiteboard: [LOE:S])

Attachments

(1 attachment, 6 obsolete attachments)

Currently, the reboot()/poweroff() APIs just directly call the hal interfaces.  These interfaces in turn call the bare linux interface.  That's really bad, because we don't even sync the filesystems before shutting down.

Instead, we should extend the nsIAppStartup:quit interface to take two more shutdown types: REBOOT and POWEROFF.  After going through the quit(), we will finish the process by calling the right hal API.

We should also verify as part of this work that the hal:: implementation is doing the right thing with filesystem shutdown.

Gene, want to take this?

blocking+ because of possibility for data loss.
Assignee: nobody → clian
Whiteboard: [LOE:M]
Whiteboard: [LOE:M] → [LOE:S]
Just taking some notes here. Currently we have the following types:

     0001 | eConsiderQuit  | Attempt to quit if all windows are closed
     0010 | eAttemptQuit   | Try to close all windows, then quit if successful.
     0011 | eForceQuit     | Quit, damnit!
    10000 | eRestart       | Restart the application after quitting.
   100000 | eRestarti386   | Restart in the i386 architecture (OSX).
  1000000 | eRestartx86_64 | Restart in the x86_64 architecture (OSX).

I'm planning to add the following 2 new types by following eAttemptQuit:

 10000010 | ePowerOff      | Try to close all windows, then power off anyway.
100000010 | eReboot        | Try to close all windows, then reboot anyway.
Posted patch WIP Patch (obsolete) — Splinter Review
Hi :bsmedberg,

I'd like to invite you to have a quick review on this patch, where it seems you're the rightest person to review this nsIAppStartup.quit() logic. :)

The main purpose is to use .quit(ePowerOff) or .quit(eReboot) to normally close all the windows(apps) before powering off or rebooting the B2G mobile.

Note that this is still a WIP but everything works well. Just hoping to have more tests and confirm with you the correctness of the .quit() logic.

Please let me know if you have any suggestions or questions. Thanks!
Attachment #661176 - Flags: feedback?(benjamin)

Comment 3

7 years ago
Comment on attachment 661176 [details] [diff] [review]
WIP Patch

I'm going to delegate this to respindola: he's working on bug 662444 (shutdown by _exit instead of doing full shutdown) and I believe this should be designed with that in mind: in particular we really don't need or want to do a full shutdown here, just enough to sync the user profile and perhaps make sure that the filesystem is safe; definitely don't want to do XPCOM shutdown.
Attachment #661176 - Flags: feedback?(benjamin) → feedback?(respindola)
My feedback is that once bug 662444 is fixed, this is probably what we will want to do. All exits in a release build will by default do the lest possible work. The problem is that fully fixing 662444 will take a long time.  We will be moving back _exit one write at a time, and some are going to be hard to fix (might even be produced by extensions).

If this is a r+ or r- probably depends on the timeline you have in mind. If you can live with a simpler shutdown at this point, I would suggest holding this patch until regular shutdown is sufficiently fast.

Comment 5

7 years ago
This is "B2G now" timeline. As long as we can make this not affect Firefox, we should try to get B2G doing the fastest possible shutdown using this and fast _exit.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> This is "B2G now" timeline. As long as we can make this not affect Firefox,
> we should try to get B2G doing the fastest possible shutdown using this and
> fast _exit.

OK, r- from me then. Finding out what is needed for just a b2g shutdown is probably easier. We can always refactor the code once desktop shutdown is fast.
Attachment #661176 - Flags: feedback?(respindola) → feedback-
This is 100% orthogonal to the fast-_exit() work.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #6)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> > This is "B2G now" timeline. As long as we can make this not affect Firefox,
> > we should try to get B2G doing the fastest possible shutdown using this and
> > fast _exit.
> 
> OK, r- from me then. Finding out what is needed for just a b2g shutdown is
> probably easier. We can always refactor the code once desktop shutdown is
> fast.

This is not a very useful f- because it's unclear what information you're requesting.  Can you please be more specific?
Sure. Correct me if I am wrong, but I am getting the impression that

*  appStartup->Quit is being called to produce a regular shutdown (including XPCOM shutdown).
* bsmedberg thinks that we should be doing the fastest shutdown we can.

The only thing I think I can add to the discussion is that the work on _exit(0) is being slow, so it will take a while for a regular shutdown to avoid all unnecessary work and so appStartup->Quit is probably not what you want.
The purpose of this bug is 100% *not* to optimize shutdown time.  It 100% doesn't matter here.

Please don't add to this work a dependency on the _exit(0) optimization.  If the app-exit API will change in support of that orthogonal work (which it really shouldn't, IMHO), then that new API will need to support the two new use cases here (reboot/OS-shutdown).
What is the purpose of this bug? nsIAppStartup.quit doesn't seem to be necessary if all you're doing it making sure the filesystem syncs, and it's not sufficient if you actually need to shut down gecko; it shuts down the event loop but without additional work it doesn't shut down the profile (for example) or make sure that gecko has stopped writing data to disk.

It seems that nsIAppStartup.quit is orthagonal to whatever goal the bug was actually filed for.
Let's call the work that needs to be done to shut down gecko "X".  Currently, shutdown looks like
 1. call nsIAppStartup.quit(how)
 2. do X
 3. implement "how" from step (1): restart or exit

The goal of this bug is to add two new options for (1) and (3): "reboot" and "poweroff".  That work has nothing to do with X, for example making it faster.

Other code uses appStartup.quit() to quit gecko's process.  If that's not the right way, what is?
The normal gecko shutdown sequence is:

* exit the event loop. End up here: http://hg.mozilla.org/mozilla-central/annotate/9d8b3ee45146/toolkit/xre/nsAppRunner.cpp#l3911
* fire all the shutdown notifications in order, https://wiki.mozilla.org/XPCOM_Shutdown
** This technically happens by first calling nsXREDirProvider::DoShutdown and then NS_ShutdownXPCOM from here: http://hg.mozilla.org/mozilla-central/annotate/9d8b3ee45146/toolkit/xre/nsAppRunner.cpp#l1109

Then there's some special-case code for restarting (your #3)

We already shortcut this process when windows is shutting down: http://hg.mozilla.org/mozilla-central/diff/b5c4b792f3f2/widget/windows/nsWindow.cpp

I think B2G should do the same thing: fire these notifications in order and then call into the HAL, bypassing the rest of XPCOM shutdown.
Hi guys! Thanks for your suggestions. I'll try to follow up with all these concerns (sorry I've been taking 2 days off recently).
Posted patch Patch (obsolete) — Splinter Review
Hi :bsmedberg,

Things are simple now. ;) We don't attempt to use the appStartup.quit(). Instead, in order to to synchronize any unsaved data, we fire the following notifications:

NS_NAMED_LITERAL_STRING(context, "shutdown-persist");
obsServ->NotifyObservers(nullptr, "profile-change-net-teardown", context.get());
obsServ->NotifyObservers(nullptr, "profile-change-teardown", context.get());
obsServ->NotifyObservers(nullptr, "profile-before-change", context.get());

before calling into hal::PowerOff()/Reboot(). Could you please take a review on this patch? Changes should be trivial.
Attachment #661176 - Attachment is obsolete: true
Attachment #663366 - Flags: review?(benjamin)
Comment on attachment 663366 [details] [diff] [review]
Patch

do hal::Reboot and hal::PowerOff work synchronously or asynchronously? Are they unimplemented on desktop B2G builds, for example? I suspect that we should either make sure that they cannot return or that we put an _exit immediately after calling them. We really don't want this process to continue running.
Comment on attachment 663366 [details] [diff] [review]
Patch

I'm going to cancel this request for now, please re-request review if the exit behavior is correct.
Attachment #663366 - Flags: review?(benjamin)
Posted patch Patch 1.1 (obsolete) — Splinter Review
Hi :bsmedberg,

Thanks for the review. I'm hoping to make both hal::Reboot() and hal::PowerOff() are synchronous so that we don't need to explicitly call the _exit() to leave the process.

Hi :cjones,

Changes are trivial now if we don't attempt to call appStartup.quit(). What we only need to do is to fire some "profile-change" notifications to save any unsaved data.

This will also trigger http://mxr.mozilla.org/mozilla-central/source/toolkit/components/startup/nsAppStartup.cpp#804 to safely close all the windows, launch the QuitWatchdog, and then exit the appshell.

However, some tiny changes in Hal are made to ensure hal::Reboot() and hal::PowerOff() are synchronous to address comment #16. To do so, the prefix "sync" needs to be added in PHal.ipdl. Right? Please correct if I'm wrong.
Attachment #663366 - Attachment is obsolete: true
Attachment #664028 - Flags: superreview?(jones.chris.g)
Attachment #664028 - Flags: review?(benjamin)
It doesn't look like the current HAL implementations are synchronous. In particular, the B2G-desktop builds use http://mxr.mozilla.org/mozilla-central/source/hal/fallback/FallbackPower.cpp#11 which is a no-op. So either all the HALs need to be noreturn, or the process needs to _exit.
Comment on attachment 664028 [details] [diff] [review]
Patch 1.1

This is only going to work when reboot/shutdown are initiated from the b2g master process.  Please assert that these APIs are only used when XRE_GetProcessType() == GeckoProcessType_Default.

I don't think we should change the hal:: API here; in fact, given the constraint above, it's best to remove IPC support for hal::Shutdown()/Reboot().  Let's go ahead and do that.

This doesn't add a watchdog for the profile sync, but we can do that in a followup bug.  Please file.
Attachment #664028 - Flags: superreview?(jones.chris.g)
Depends on: 793970
Posted patch Patch, V2 (obsolete) — Splinter Review
Hi :cjones and :benjamin,

The new patch is summarized as follows:

1. Strip out the IPC mechanism for hal::Reboot and hal::PowerOff, which means the APIs can only be called from the main process.

2. For the Reboot/PowerOff in FallbackPower.cpp, we use kill(0, SIGKILL) and _exit(1) to leave the process (this is inspired by http://mxr.mozilla.org/mozilla-central/source/toolkit/components/startup/nsAppStartup.cpp#377).

3. Tentatively, add some MOZ_NOT_REACHED() checks to the points which are not visited for sure.
Attachment #664028 - Attachment is obsolete: true
Attachment #664028 - Flags: review?(benjamin)
Attachment #664379 - Flags: superreview?(jones.chris.g)
Attachment #664379 - Flags: review?(benjamin)
Comment on attachment 664379 [details] [diff] [review]
Patch, V2

>diff --git a/dom/power/PowerManagerService.cpp b/dom/power/PowerManagerService.cpp

>+void
>+PowerManagerService::SyncProfile()
>+{
>+  // TODO We need to start a watchdog thread to force gecko to really
>+  // power off or reboot if the profile synchronizing hangs (bug 793970).

Nit: people generally write "FIXME/bug 793970" for these kinds of
comments.

>+  nsCOMPtr<nsIObserverService> obsServ =
>+    mozilla::services::GetObserverService();

|services::GetObserverService()|.  You don't need the "mozilla::".

>diff --git a/hal/fallback/FallbackPower.cpp b/hal/fallback/FallbackPower.cpp

> void
> Reboot()
>-{}
>+{
>+  kill(0, SIGKILL);
>+  _exit(1);
>+  MOZ_NOT_REACHED();

This isn't going to compile on all platforms.  Let's hold off on
adding this until we refactor the watchdog code.  We can have the
reboot/poweroff fallbacks continue to be no-ops.

The MOZ_NOT_REACHED() calls that are in PowerManager after the hal::
calls, are fine for now.

>diff --git a/hal/linux/LinuxPower.cpp b/hal/linux/LinuxPower.cpp

We can remove the changes here here.

sr=me with that.
Attachment #664379 - Flags: superreview?(jones.chris.g) → superreview+

Updated

7 years ago
Attachment #664379 - Flags: review?(benjamin) → review?(benjamin)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #22)
> Comment on attachment 664379 [details] [diff] [review]
> Patch, V2
> The MOZ_NOT_REACHED() calls that are in PowerManager after the hal::
> calls, are fine for now.

Not quite sure about this part. My understanding is to only add MOZ_NOT_REACHED() checks in hal::Reboot()/hal::PowerOff() and remove the ones in PowerManagerService::Reboot()/PowerManagerService::PowerOff() since the hal:: has already done the checks.
Posted patch Patch, V3 (obsolete) — Splinter Review
sr=cjones

Hi Benjamin,

Based on :cjones' comment #22, we're going to design a watchdog with a timer to compulsively power-off/reboot the gecko if it hangs (Bug 793970), so let's have the reboot/poweroff fallbacks continue to be no-ops for now. If you don't mind, may I have your review+ to check this in? Thanks!
Attachment #664379 - Attachment is obsolete: true
Attachment #664379 - Flags: review?(benjamin)
Attachment #664856 - Flags: superreview+
Attachment #664856 - Flags: review?(benjamin)
(In reply to Gene Lian [:gene] from comment #23)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #22)
> > Comment on attachment 664379 [details] [diff] [review]
> > Patch, V2
> > The MOZ_NOT_REACHED() calls that are in PowerManager after the hal::
> > calls, are fine for now.
> 
> Not quite sure about this part. My understanding is to only add
> MOZ_NOT_REACHED() checks in hal::Reboot()/hal::PowerOff() and remove the
> ones in PowerManagerService::Reboot()/PowerManagerService::PowerOff() since
> the hal:: has already done the checks.

The hal:: API is not defined to "noreturn" when calling Reboot()/PowerOff().  We could define it to do that, but I think there might be some value in returning to indicate failure.

In the case of this patch, if we only QuitHard() in PowerManager when the hal:: functions *do* return, then we'll change less code.  So I prefer that.
I don't understand comment 24, can you catch me on IRC and help me understand what the watchdog has to do with the _exit call?
What I would like us to do here is the following, taking Reboot() as an arbitrary example

PowerManager::Reboot()
{
  // sync profile

  hal::Reboot();

  MOZ_NOT_REACHED();
}

on shipping platforms where PowerManager::Reboot() can be called, hal::Reboot() should never return.  If it does, there's an extremely serious bug.

I would prefer to use MOZ_NOT_REACHED() instead of _exit() so that we have the chance of getting a breakpad report if this case happens.
Posted patch Patch, V3.1 (obsolete) — Splinter Review
Hi Benjamin,

I think it's my bad to take the watchdog thing into considerations. Let's focus on the PowerManagerService::Reboot()/PowerOff behaviors within this issue.

Please see comment #27. We're hoping to use MOZ_NOT_REACHED() instead of _exit() to specify the unexpected hal returns.

sr=cjones
Attachment #664856 - Attachment is obsolete: true
Attachment #664856 - Flags: review?(benjamin)
Attachment #665248 - Flags: superreview+
Attachment #665248 - Flags: review?(benjamin)
Comment on attachment 665248 [details] [diff] [review]
Patch, V3.1

I still don't understand why this won't crash B2G desktop builds, where hal::Reboot is a no-op. Can you just add _exit to those methods instead of making them a no-op?
We don't call this reboot() method on desktop.  If we attempt to call it from PowerManager, then it's a logic error and we want to see the assertion failure.

Updated

7 years ago
Attachment #665248 - Flags: review?(benjamin) → review+
Posted patch Patch, V3.2Splinter Review
Rebase. Thanks Cjones and Benjamin for the reviews!

r=bsmedberg,sr=cjones
Attachment #665248 - Attachment is obsolete: true
Attachment #665804 - Flags: superreview+
Attachment #665804 - Flags: review+

Updated

7 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d31d2479d685
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in before you can comment on or make changes to this bug.