Closed Bug 731741 Opened 12 years ago Closed 12 years ago

Need a way to decide if we should _exit(0) early or not

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: espindola, Assigned: espindola)

References

Details

(Whiteboard: [snappy:p2])

Attachments

(1 file)

In bug 662444 we want to change the normal shutdown process to call _exit(0) early. While we might end up simplifying the shutdown process for debug builds too, we will likely always want to be able to do some extra sanity checking on them.

This bug is about figuring out the best way to write the if. The simplest is to just write

#ifdef DEBUG
...

but the logs we print on debug builds can be really useful at finding if we are missing something that should happen before the _exit(0).

I propose we write something like

if (mozilla::FastShudown) {
....
}

In a release build we make that a constant false and compilers should be able to optimize it away. On debug builds it becomes a variable initialized to false. We then add a command line option to set it to true.
I am particularly interested in comments about where it is declared and defined and what is the correct value to pass as the second argument to CheckArg.
Assignee: nobody → respindola
Status: NEW → ASSIGNED
Attachment #601948 - Flags: review?(benjamin)
Comment on attachment 601948 [details] [diff] [review]
add a variable we can use to check if we will use _exit(0)

I think fastshutdown should be a variable in all cases: default to true for release build and false for debug builds, but instead of a commandline flag, just get a pref value before shutting down the profile, say here:

http://hg.mozilla.org/mozilla-central/annotate/bfb1b7520ce9/toolkit/xre/nsXREDirProvider.cpp#l816

And use that value.
Attachment #601948 - Flags: review?(benjamin) → review-
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Comment on attachment 601948 [details] [diff] [review]
> add a variable we can use to check if we will use _exit(0)
> 
> I think fastshutdown should be a variable in all cases: default to true for
> release build and false for debug builds,

Why? This prevents the compiler from optimizing away the code in release builds.

> but instead of a commandline flag,
> just get a pref value before shutting down the profile, say here:
> 
> http://hg.mozilla.org/mozilla-central/annotate/bfb1b7520ce9/toolkit/xre/
> nsXREDirProvider.cpp#l816
> 
> And use that value.

I think we might want to use this variable way earlier than that. For example, if a module has something that only has to be done in a debug build, it is better for its initialization to do

if (!fastShutdown) {
  obs-AddObserver(..., profile-bofore-change, ...);
}

than have its observer be a nop if fastShutdown is true. We might even be able to assert that we have no observers to xpcom-* as an extra sanity check in a fastShutdown case.

Part of the reason I went with a command line is that it can be used to set the variable really early. It also don't make sense for this value to change during runtime, so I don't think a pref is appropriate.
Sorry, missed this comment. I didn't realize you were planning on skipping a bunch of unregistration code in this case: if that's really important, I don't think we want to rely on compilers optimizing the const away properly. So why don't we just make this a build-time flag, defaulting to the same as !DEBUG?
Whiteboard: [snappy]
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> Sorry, missed this comment. I didn't realize you were planning on skipping a
> bunch of unregistration code in this case: if that's really important, I
> don't think we want to rely on compilers optimizing the const away properly.
> So why don't we just make this a build-time flag, defaulting to the same as
> !DEBUG?

So, two independent issues:

*) Making sure this is a value that is available early and stays constant during a run. This is what would let us use it to avoid registration. It suggests a command line option, but we could still use a variable to hold its value on opt and debug builds.

*) Helping the compiler and avoiding mistakes. We never want to do a slow shutdown in release builds. Using a constant allows the compiler to check that we never set it and let it optimize dead code out. It in not required, but free optimizations are always nice.
Whiteboard: [snappy] → [snappy:p2]
You pinged me about this, but I'm not sure what you wanted to ask: I think we should just use a build-time flag and should avoid all runtime options, if we're going to do it like this. By default a debug build would shut down completely and a release build would _exit but you could pass a build-time flag to override this behavior.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> You pinged me about this, but I'm not sure what you wanted to ask: I think
> we should just use a build-time flag and should avoid all runtime options,
> if we're going to do it like this. By default a debug build would shut down
> completely and a release build would _exit but you could pass a build-time
> flag to override this behavior.

OK. I have a small preference for a being able to do a fast shutdown in a debug build without having to rebuild, but we can start with a build time flag.
I'd like a runtime switch in debug builds. (I do a lot of my fuzzing with Tinderbox builds on boxes that don't even have mozilla-central trees.)

Rather than a command-line flag, how about an env var? We're already setting XPCOM_MEM_LEAK_LOG when we want to get useful shutdown stats.
> We never want to do a slow shutdown in release builds.

Are we likely to find that some add-ons or third-party software can't deal with exit(0)?
> Rather than a command-line flag, how about an env var? We're already setting
> XPCOM_MEM_LEAK_LOG when we want to get useful shutdown stats.

If possible I would go with a command line flag. Env vars have some of the same problems as prefs, they can be changed during runtime for example.

Since Benjamin prefers a compile time check, I suggest we start with that and then discuss the best way to add a runtime check in another bug, would that be OK?
(In reply to Jesse Ruderman from comment #9)
> > We never want to do a slow shutdown in release builds.
> 
> Are we likely to find that some add-ons or third-party software can't deal
> with exit(0)?

I think so.
FWIW, most of the environment variables we use are checked at startup and never again. That's probably fine for this use case as well.
over lunch Ehsan pointed out that while doing the checking in a debug builds works for our own code, it might not be good enough for addon developers.

If we don't expect addons to be developed with a debug version of firefox, we probably need to keep the write poisoning in the release builds and enable it (and disable the early exit(0)) if a command line option or env variable is set.
I would also recommend some serious messaging effort to let addon devs know that they need to start changing their addons to write before xpcom-shutdown. Jorge may be able to get that ball rolling.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #13)
> over lunch Ehsan pointed out that while doing the checking in a debug builds
> works for our own code, it might not be good enough for addon developers.

Jesse also needs it for his fuzzing(+leak checking?) on stock builds.
I don't understand comment 13. Are you talking about leak checking? Our standard release builds don't support refcount-style leak checking anyway... it is possible to do a non-debug build with leak checking, but in that case we should disable the early-exiting. So I really still think we can and should keep this as a build-time flag.
(In reply to ben turner [:bent] from comment #14)
> I would also recommend some serious messaging effort to let addon devs know
> that they need to start changing their addons to write before
> xpcom-shutdown. Jorge may be able to get that ball rolling.

Bug 662444 is already flagged for addon-compat, though it doesn't have a milestone set yet. I would appreciate it if someone can give me a clear idea of what changes add-on devs should expect and when this is happening.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #16)
> I don't understand comment 13. Are you talking about leak checking? Our
> standard release builds don't support refcount-style leak checking anyway...
> it is possible to do a non-debug build with leak checking, but in that case
> we should disable the early-exiting. So I really still think we can and
> should keep this as a build-time flag.

No, we're talking about add-ons which rely on their JS objects to be GCed before they write something to disk during shutdown, for example.
Supporting something like that would require changing the value at runtime, since we don't know the set of extensions that are enabled until after XPCOM starts up. I don't think we should try to implement a fallback mechanism to the current behavior.
(In reply to Taras Glek (:taras) from comment #15)
> (In reply to Rafael Ávila de Espíndola (:espindola) from comment #13)
> > over lunch Ehsan pointed out that while doing the checking in a debug builds
> > works for our own code, it might not be good enough for addon developers.
> 
> Jesse also needs it for his fuzzing(+leak checking?) on stock builds.

I currently only fuzz debug builds.  I want debug builds to support both shutdown paths so I can test both shutdown paths: the one that finds (some) leaks, and the one that most users will experience.

So, my preference is:

* Builds with --enable-logrefcnt (including debug builds) support both shutdown paths, and select based on the value of an environment variable at startup.

* Builds without --enable-logrefcnt (including stock release builds) support fast shutdown only, if we can get away with it.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #19)
> Supporting something like that would require changing the value at runtime,
> since we don't know the set of extensions that are enabled until after XPCOM
> starts up. I don't think we should try to implement a fallback mechanism to
> the current behavior.

I agree that we should not change behavior midway based on what addons are loaded. What I think we should provide is a way for addons developers to test/debug their addons with a regular release of firefox (if that is what they normally use).

The idea is that "./firefox --no-early-exit" (or setting MOZ_NO_EARLY_EXIT if we go with an environment variable) would cause a regular release build to not call _exit(0) early and instead shutdown like a debug build would, poisoning write and asserting that no writes to disk show up after that.
I'm not sure we can rely on add-on developers to do all the right manual testing on their old add-ons.

Another strategy: ship one release with write poisoning, analyze the crash reports we get, and then ship releases with exit(0).
(In reply to Jesse Ruderman from comment #22)
> I'm not sure we can rely on add-on developers to do all the right manual
> testing on their old add-ons.
>
> Another strategy: ship one release with write poisoning, analyze the crash
> reports we get, and then ship releases with exit(0).

I would say this is complementary to being able to disable _exit(0) in a release build, but I agree with you. We should probably ship a release with write poisoning to see what is out there.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #23)
> (In reply to Jesse Ruderman from comment #22)
> > I'm not sure we can rely on add-on developers to do all the right manual
> > testing on their old add-ons.
> >
> > Another strategy: ship one release with write poisoning, analyze the crash
> > reports we get, and then ship releases with exit(0).
> 
> I would say this is complementary to being able to disable _exit(0) in a
> release build, but I agree with you. We should probably ship a release with
> write poisoning to see what is out there.

Note that for that to give us useful data, we need to ship write-poisoning on all platforms, including Windows.
(In reply to Ehsan Akhgari [:ehsan] from comment #24)
> (In reply to Rafael Ávila de Espíndola (:espindola) from comment #23)
> > (In reply to Jesse Ruderman from comment #22)
> > > I'm not sure we can rely on add-on developers to do all the right manual
> > > testing on their old add-ons.
> > >
> > > Another strategy: ship one release with write poisoning, analyze the crash
> > > reports we get, and then ship releases with exit(0).
> Note that for that to give us useful data, we need to ship write-poisoning
> on all platforms, including Windows.

I think shipping with write poisoning enabled to catch addon crashes is overly defensive. We should just flip exit(0) on and revisit doing it more conservatively iff we start catching a lot of malfunctioning addons.

Also doing addon-write poisoning is much easier than doing it on the platform(unless they are brave enough to use ctypes for io), so if we do choose to poison, doing it in a cross-platform way should be easy.
> I think shipping with write poisoning enabled to catch addon crashes is
> overly defensive. We should just flip exit(0) on and revisit doing it more
> conservatively iff we start catching a lot of malfunctioning addons.
> 
> Also doing addon-write poisoning is much easier than doing it on the
> platform(unless they are brave enough to use ctypes for io), so if we do
> choose to poison, doing it in a cross-platform way should be easy.

Have we dropped support for binary addons?
(In reply to Taras Glek (:taras) from comment #25)
> We should just flip exit(0) on and revisit doing it more
> conservatively iff we start catching a lot of malfunctioning addons.

Wait, we're talking dataloss here. Shouldn't we err on the conservative side?
(In reply to ben turner [:bent] from comment #27)
> (In reply to Taras Glek (:taras) from comment #25)
> > We should just flip exit(0) on and revisit doing it more
> > conservatively iff we start catching a lot of malfunctioning addons.
> 
> Wait, we're talking dataloss here. Shouldn't we err on the conservative side?

I think we should.  Not doing that could easily lead to an entire database be corrupted for example.
(In reply to ben turner [:bent] from comment #27)
> (In reply to Taras Glek (:taras) from comment #25)
> > We should just flip exit(0) on and revisit doing it more
> > conservatively iff we start catching a lot of malfunctioning addons.
> 
> Wait, we're talking dataloss here. Shouldn't we err on the conservative side?

This assumes that an addon does useful IO way late in the process. We already don't shutdown cleanly 10% of the time, so said dataloss is already occurring frequently enough for addons to notice and deal with it.
(In reply to Taras Glek (:taras) from comment #29)
> (In reply to ben turner [:bent] from comment #27)
> > (In reply to Taras Glek (:taras) from comment #25)
> > > We should just flip exit(0) on and revisit doing it more
> > > conservatively iff we start catching a lot of malfunctioning addons.
> > 
> > Wait, we're talking dataloss here. Shouldn't we err on the conservative side?
> 
> This assumes that an addon does useful IO way late in the process. We
> already don't shutdown cleanly 10% of the time, so said dataloss is already
> occurring frequently enough for addons to notice and deal with it.

I don't think this is a strong argument to increase the 10% to 100%.  ;-)
(In reply to Ehsan Akhgari [:ehsan] from comment #30)
> > 
> > This assumes that an addon does useful IO way late in the process. We
> > already don't shutdown cleanly 10% of the time, so said dataloss is already
> > occurring frequently enough for addons to notice and deal with it.
> 
> I don't think this is a strong argument to increase the 10% to 100%.  ;-)

We have concrete evidence that 10% of our shutdowns don't finish. We also have evidence that our shutdown is slow. I will trade solid evidence of improvement over theoretical addon bustage.
(In reply to Ehsan Akhgari [:ehsan] from comment #30) 
> I don't think this is a strong argument to increase the 10% to 100%.  ;-)

I should clarify, that's a 10% chance of corruption every 24hours. So if you start Firefox 10 times, you are pretty much guaranteed to loose your addon data. 
(it could also be possible that 10% of our users never shutdown correctly, but that's even worse)
(In reply to Taras Glek (:taras) from comment #31)
> (In reply to Ehsan Akhgari [:ehsan] from comment #30)
> > > 
> > > This assumes that an addon does useful IO way late in the process. We
> > > already don't shutdown cleanly 10% of the time, so said dataloss is already
> > > occurring frequently enough for addons to notice and deal with it.
> > 
> > I don't think this is a strong argument to increase the 10% to 100%.  ;-)
> 
> We have concrete evidence that 10% of our shutdowns don't finish. We also
> have evidence that our shutdown is slow.

Both true.

> I will trade solid evidence of improvement over theoretical addon bustage.

Firstly you're assuming that it is only add-ons that can hit that case, and the data that we're obtaining from our test runs capture all of the ways we can write data to disk during shutdown even with no add-ons installed.  That assumption is clearly incorrect.

Secondly, as bent noted, we're talking dataloss here.  We've always taken dataloss pretty seriously, well over all performance concerns.  I think we need evidence to show that this is *not* a dataloss issue before we can support this tradeoff.
I talked about this with Taras on IRC.  We're actually talking about releasing this on the Nightly channel, and not the Release channel.  I'm in 100% support of doing that -- I was just confused by the term "release".
I think we have a good idea on what is needed. Further discussions will likely be in bug 770605.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: