731741 - Need a way to decide if we should _exit(0) early or not

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Description

•

13 years ago

In bug 662444 we want to change the normal shutdown process to call _exit(0) early. While we might end up simplifying the shutdown process for debug builds too, we will likely always want to be able to do some extra sanity checking on them. This bug is about figuring out the best way to write the if. The simplest is to just write #ifdef DEBUG ... but the logs we print on debug builds can be really useful at finding if we are missing something that should happen before the _exit(0). I propose we write something like if (mozilla::FastShudown) { .... } In a release build we make that a constant false and compilers should be able to optimize it away. On debug builds it becomes a variable initialized to false. We then add a command line option to set it to true.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 1

•

13 years ago

Attached patch add a variable we can use to check if we will use _exit(0) — Details — Splinter Review

I am particularly interested in comments about where it is declared and defined and what is the correct value to pass as the second argument to CheckArg.

Assignee: nobody → respindola

Status: NEW → ASSIGNED

Attachment #601948 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 2

•

13 years ago

Comment on attachment 601948 [details] [diff] [review] add a variable we can use to check if we will use _exit(0) I think fastshutdown should be a variable in all cases: default to true for release build and false for debug builds, but instead of a commandline flag, just get a pref value before shutting down the profile, say here: http://hg.mozilla.org/mozilla-central/annotate/bfb1b7520ce9/toolkit/xre/nsXREDirProvider.cpp#l816 And use that value.

Attachment #601948 - Flags: review?(benjamin) → review-

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 3

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > Comment on attachment 601948 [details] [diff] [review] > add a variable we can use to check if we will use _exit(0) > > I think fastshutdown should be a variable in all cases: default to true for > release build and false for debug builds, Why? This prevents the compiler from optimizing away the code in release builds. > but instead of a commandline flag, > just get a pref value before shutting down the profile, say here: > > http://hg.mozilla.org/mozilla-central/annotate/bfb1b7520ce9/toolkit/xre/ > nsXREDirProvider.cpp#l816 > > And use that value. I think we might want to use this variable way earlier than that. For example, if a module has something that only has to be done in a debug build, it is better for its initialization to do if (!fastShutdown) { obs-AddObserver(..., profile-bofore-change, ...); } than have its observer be a nop if fastShutdown is true. We might even be able to assert that we have no observers to xpcom-* as an extra sanity check in a fastShutdown case. Part of the reason I went with a command line is that it can be used to set the variable really early. It also don't make sense for this value to change during runtime, so I don't think a pref is appropriate.

Benjamin Smedberg

Comment 4

•

13 years ago

Sorry, missed this comment. I didn't realize you were planning on skipping a bunch of unregistration code in this case: if that's really important, I don't think we want to rely on compilers optimizing the const away properly. So why don't we just make this a build-time flag, defaulting to the same as !DEBUG?

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

13 years ago

Whiteboard: [snappy]

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 5

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #4) > Sorry, missed this comment. I didn't realize you were planning on skipping a > bunch of unregistration code in this case: if that's really important, I > don't think we want to rely on compilers optimizing the const away properly. > So why don't we just make this a build-time flag, defaulting to the same as > !DEBUG? So, two independent issues: *) Making sure this is a value that is available early and stays constant during a run. This is what would let us use it to avoid registration. It suggests a command line option, but we could still use a variable to hold its value on opt and debug builds. *) Helping the compiler and avoiding mistakes. We never want to do a slow shutdown in release builds. Using a constant allows the compiler to check that we never set it and let it optimize dead code out. It in not required, but free optimizations are always nice.

(dormant account)

Updated

•

13 years ago

Whiteboard: [snappy] → [snappy:p2]

Benjamin Smedberg

Comment 6

•

13 years ago

You pinged me about this, but I'm not sure what you wanted to ask: I think we should just use a build-time flag and should avoid all runtime options, if we're going to do it like this. By default a debug build would shut down completely and a release build would _exit but you could pass a build-time flag to override this behavior.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 7

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #6) > You pinged me about this, but I'm not sure what you wanted to ask: I think > we should just use a build-time flag and should avoid all runtime options, > if we're going to do it like this. By default a debug build would shut down > completely and a release build would _exit but you could pass a build-time > flag to override this behavior. OK. I have a small preference for a being able to do a fast shutdown in a debug build without having to rebuild, but we can start with a build time flag.

Jesse Ruderman

Comment 8

•

13 years ago

I'd like a runtime switch in debug builds. (I do a lot of my fuzzing with Tinderbox builds on boxes that don't even have mozilla-central trees.) Rather than a command-line flag, how about an env var? We're already setting XPCOM_MEM_LEAK_LOG when we want to get useful shutdown stats.

Jesse Ruderman

Comment 9

•

13 years ago

> We never want to do a slow shutdown in release builds. Are we likely to find that some add-ons or third-party software can't deal with exit(0)?

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 10

•

13 years ago

> Rather than a command-line flag, how about an env var? We're already setting > XPCOM_MEM_LEAK_LOG when we want to get useful shutdown stats. If possible I would go with a command line flag. Env vars have some of the same problems as prefs, they can be changed during runtime for example. Since Benjamin prefers a compile time check, I suggest we start with that and then discuss the best way to add a runtime check in another bug, would that be OK?

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 11

•

13 years ago

(In reply to Jesse Ruderman from comment #9) > > We never want to do a slow shutdown in release builds. > > Are we likely to find that some add-ons or third-party software can't deal > with exit(0)? I think so.

(not currently active) Ted Mielczarek

Comment 12

•

13 years ago

FWIW, most of the environment variables we use are checked at startup and never again. That's probably fine for this use case as well.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 13

•

13 years ago

over lunch Ehsan pointed out that while doing the checking in a debug builds works for our own code, it might not be good enough for addon developers. If we don't expect addons to be developed with a debug version of firefox, we probably need to keep the write poisoning in the release builds and enable it (and disable the early exit(0)) if a command line option or env variable is set.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 14

•

13 years ago

I would also recommend some serious messaging effort to let addon devs know that they need to start changing their addons to write before xpcom-shutdown. Jorge may be able to get that ball rolling.

(dormant account)

Comment 15

•

13 years ago

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #13) > over lunch Ehsan pointed out that while doing the checking in a debug builds > works for our own code, it might not be good enough for addon developers. Jesse also needs it for his fuzzing(+leak checking?) on stock builds.

Benjamin Smedberg

Comment 16

•

13 years ago

I don't understand comment 13. Are you talking about leak checking? Our standard release builds don't support refcount-style leak checking anyway... it is possible to do a non-debug build with leak checking, but in that case we should disable the early-exiting. So I really still think we can and should keep this as a build-time flag.

Jorge Villalobos [:jorgev] (he/him)

Comment 17

•

13 years ago

(In reply to ben turner [:bent] from comment #14) > I would also recommend some serious messaging effort to let addon devs know > that they need to start changing their addons to write before > xpcom-shutdown. Jorge may be able to get that ball rolling. Bug 662444 is already flagged for addon-compat, though it doesn't have a milestone set yet. I would appreciate it if someone can give me a clear idea of what changes add-on devs should expect and when this is happening.

(no longer active)

Comment 18

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #16) > I don't understand comment 13. Are you talking about leak checking? Our > standard release builds don't support refcount-style leak checking anyway... > it is possible to do a non-debug build with leak checking, but in that case > we should disable the early-exiting. So I really still think we can and > should keep this as a build-time flag. No, we're talking about add-ons which rely on their JS objects to be GCed before they write something to disk during shutdown, for example.

Benjamin Smedberg

Comment 19

•

13 years ago

Supporting something like that would require changing the value at runtime, since we don't know the set of extensions that are enabled until after XPCOM starts up. I don't think we should try to implement a fallback mechanism to the current behavior.

Jesse Ruderman

Comment 20

•

13 years ago

(In reply to Taras Glek (:taras) from comment #15) > (In reply to Rafael Ávila de Espíndola (:espindola) from comment #13) > > over lunch Ehsan pointed out that while doing the checking in a debug builds > > works for our own code, it might not be good enough for addon developers. > > Jesse also needs it for his fuzzing(+leak checking?) on stock builds. I currently only fuzz debug builds. I want debug builds to support both shutdown paths so I can test both shutdown paths: the one that finds (some) leaks, and the one that most users will experience. So, my preference is: * Builds with --enable-logrefcnt (including debug builds) support both shutdown paths, and select based on the value of an environment variable at startup. * Builds without --enable-logrefcnt (including stock release builds) support fast shutdown only, if we can get away with it.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 21

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #19) > Supporting something like that would require changing the value at runtime, > since we don't know the set of extensions that are enabled until after XPCOM > starts up. I don't think we should try to implement a fallback mechanism to > the current behavior. I agree that we should not change behavior midway based on what addons are loaded. What I think we should provide is a way for addons developers to test/debug their addons with a regular release of firefox (if that is what they normally use). The idea is that "./firefox --no-early-exit" (or setting MOZ_NO_EARLY_EXIT if we go with an environment variable) would cause a regular release build to not call _exit(0) early and instead shutdown like a debug build would, poisoning write and asserting that no writes to disk show up after that.

Jesse Ruderman

Comment 22

•

13 years ago

I'm not sure we can rely on add-on developers to do all the right manual testing on their old add-ons. Another strategy: ship one release with write poisoning, analyze the crash reports we get, and then ship releases with exit(0).

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 23

•

13 years ago

(In reply to Jesse Ruderman from comment #22) > I'm not sure we can rely on add-on developers to do all the right manual > testing on their old add-ons. > > Another strategy: ship one release with write poisoning, analyze the crash > reports we get, and then ship releases with exit(0). I would say this is complementary to being able to disable _exit(0) in a release build, but I agree with you. We should probably ship a release with write poisoning to see what is out there.

(no longer active)

Comment 24

•

13 years ago

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #23) > (In reply to Jesse Ruderman from comment #22) > > I'm not sure we can rely on add-on developers to do all the right manual > > testing on their old add-ons. > > > > Another strategy: ship one release with write poisoning, analyze the crash > > reports we get, and then ship releases with exit(0). > > I would say this is complementary to being able to disable _exit(0) in a > release build, but I agree with you. We should probably ship a release with > write poisoning to see what is out there. Note that for that to give us useful data, we need to ship write-poisoning on all platforms, including Windows.

(dormant account)

Comment 25

•

13 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #24) > (In reply to Rafael Ávila de Espíndola (:espindola) from comment #23) > > (In reply to Jesse Ruderman from comment #22) > > > I'm not sure we can rely on add-on developers to do all the right manual > > > testing on their old add-ons. > > > > > > Another strategy: ship one release with write poisoning, analyze the crash > > > reports we get, and then ship releases with exit(0). > Note that for that to give us useful data, we need to ship write-poisoning > on all platforms, including Windows. I think shipping with write poisoning enabled to catch addon crashes is overly defensive. We should just flip exit(0) on and revisit doing it more conservatively iff we start catching a lot of malfunctioning addons. Also doing addon-write poisoning is much easier than doing it on the platform(unless they are brave enough to use ctypes for io), so if we do choose to poison, doing it in a cross-platform way should be easy.

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 26

•

13 years ago

> I think shipping with write poisoning enabled to catch addon crashes is > overly defensive. We should just flip exit(0) on and revisit doing it more > conservatively iff we start catching a lot of malfunctioning addons. > > Also doing addon-write poisoning is much easier than doing it on the > platform(unless they are brave enough to use ctypes for io), so if we do > choose to poison, doing it in a cross-platform way should be easy. Have we dropped support for binary addons?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 27

•

13 years ago

(In reply to Taras Glek (:taras) from comment #25) > We should just flip exit(0) on and revisit doing it more > conservatively iff we start catching a lot of malfunctioning addons. Wait, we're talking dataloss here. Shouldn't we err on the conservative side?

(no longer active)

Comment 28

•

13 years ago

(In reply to ben turner [:bent] from comment #27) > (In reply to Taras Glek (:taras) from comment #25) > > We should just flip exit(0) on and revisit doing it more > > conservatively iff we start catching a lot of malfunctioning addons. > > Wait, we're talking dataloss here. Shouldn't we err on the conservative side? I think we should. Not doing that could easily lead to an entire database be corrupted for example.

(dormant account)

Comment 29

•

13 years ago

(In reply to ben turner [:bent] from comment #27) > (In reply to Taras Glek (:taras) from comment #25) > > We should just flip exit(0) on and revisit doing it more > > conservatively iff we start catching a lot of malfunctioning addons. > > Wait, we're talking dataloss here. Shouldn't we err on the conservative side? This assumes that an addon does useful IO way late in the process. We already don't shutdown cleanly 10% of the time, so said dataloss is already occurring frequently enough for addons to notice and deal with it.

(no longer active)

Comment 30

•

13 years ago

(In reply to Taras Glek (:taras) from comment #29) > (In reply to ben turner [:bent] from comment #27) > > (In reply to Taras Glek (:taras) from comment #25) > > > We should just flip exit(0) on and revisit doing it more > > > conservatively iff we start catching a lot of malfunctioning addons. > > > > Wait, we're talking dataloss here. Shouldn't we err on the conservative side? > > This assumes that an addon does useful IO way late in the process. We > already don't shutdown cleanly 10% of the time, so said dataloss is already > occurring frequently enough for addons to notice and deal with it. I don't think this is a strong argument to increase the 10% to 100%. ;-)

(dormant account)

Comment 31

•

13 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #30) > > > > This assumes that an addon does useful IO way late in the process. We > > already don't shutdown cleanly 10% of the time, so said dataloss is already > > occurring frequently enough for addons to notice and deal with it. > > I don't think this is a strong argument to increase the 10% to 100%. ;-) We have concrete evidence that 10% of our shutdowns don't finish. We also have evidence that our shutdown is slow. I will trade solid evidence of improvement over theoretical addon bustage.

(dormant account)

Comment 32

•

13 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #30) > I don't think this is a strong argument to increase the 10% to 100%. ;-) I should clarify, that's a 10% chance of corruption every 24hours. So if you start Firefox 10 times, you are pretty much guaranteed to loose your addon data. (it could also be possible that 10% of our users never shutdown correctly, but that's even worse)

(no longer active)

Comment 33

•

13 years ago

(In reply to Taras Glek (:taras) from comment #31) > (In reply to Ehsan Akhgari [:ehsan] from comment #30) > > > > > > This assumes that an addon does useful IO way late in the process. We > > > already don't shutdown cleanly 10% of the time, so said dataloss is already > > > occurring frequently enough for addons to notice and deal with it. > > > > I don't think this is a strong argument to increase the 10% to 100%. ;-) > > We have concrete evidence that 10% of our shutdowns don't finish. We also > have evidence that our shutdown is slow. Both true. > I will trade solid evidence of improvement over theoretical addon bustage. Firstly you're assuming that it is only add-ons that can hit that case, and the data that we're obtaining from our test runs capture all of the ways we can write data to disk during shutdown even with no add-ons installed. That assumption is clearly incorrect. Secondly, as bent noted, we're talking dataloss here. We've always taken dataloss pretty seriously, well over all performance concerns. I think we need evidence to show that this is *not* a dataloss issue before we can support this tradeoff.

(no longer active)

Comment 34

•

13 years ago

I talked about this with Taras on IRC. We're actually talking about releasing this on the Nightly channel, and not the Release channel. I'm in 100% support of doing that -- I was just confused by the term "release".

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Assignee

Comment 35

•

13 years ago

I think we have a good idea on what is needed. Further discussions will likely be in bug 770605.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED