Closed
Bug 888347
Opened 11 years ago
Closed 11 years ago
[Worker] Don't initialize nsStreamTransportService during shutdown
Categories
(Core :: DOM: Workers, defect)
Core
DOM: Workers
Tracking
()
RESOLVED
FIXED
mozilla26
People
(Reporter: Yoric, Assigned: khuey)
References
Details
(Whiteboard: [Async])
Attachments
(1 file, 5 obsolete files)
5.53 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
Initializing nsStreamTransportService during shutdown leads to crashes, see e.g. bug 845190. We need to avoid this.
Reporter | ||
Comment 1•11 years ago
|
||
Imported from bug 845190.
Assignee: nobody → dteller
Attachment #769005 -
Flags: review+
Reporter | ||
Comment 2•11 years ago
|
||
Imported from bug 845190.
Attachment #769006 -
Flags: review?(justin.lebar+bug)
Attachment #769006 -
Flags: review?(doug.turner)
Comment 3•11 years ago
|
||
The main issue I see with this is, the observer service gives no guarantee of what order it runs in. So the thing that changes the app status could be the first xpcom-shutdown observer run, or it could be the last one run. Is that OK here? At what point do we need to declare that "shutdown has happened"? It might be worthwhile to create a smaller API that's tailored exactly to your needs here, instead of creating an API which exposes all possible states Gecko might be in, particularly because the semantics of exactly when we declare that we've entered/left a particular state are going to be pretty complicated.
Updated•11 years ago
|
Attachment #769006 -
Flags: review?(justin.lebar+bug)
Comment 4•11 years ago
|
||
Note also that it's tricky to use this API off the main thread. If you can guarantee that the app state is a ratchet -- it only goes towards shutdown, never in the opposite direction -- then off-main-thread people can use it, if they're careful not to depend on the exact state. But this may speak to the benefits of having a boolean instead of an enum.
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #4) > Note also that it's tricky to use this API off the main thread. > > If you can guarantee that the app state is a ratchet -- it only goes towards > shutdown, never in the opposite direction -- then off-main-thread people can > use it, if they're careful not to depend on the exact state. But this may > speak to the benefits of having a boolean instead of an enum. Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|. Now, I can certainly do that in a component specific to netwerk. It's just a bit weird, because it feels like the feature will be useful for more than netwerk.
Comment 6•11 years ago
|
||
> It's just a bit weird, because it > feels like the feature will be useful for more than netwerk. Sure, I agree this is a nice thing to have elsewhere. > Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|. Indeed. But you still have to be really careful about how you use this off the main thread. It's basically safe only as an optimization (like double-checked locking). You can't do if (NS_HasReachedShutdown()) { return; } // Run code which will break if we have reached shutdown off the main thread because the check is stale immediately after you run it. Maybe we should make the API main-thread only?
Reporter | ||
Comment 7•11 years ago
|
||
Well, if it's main-thread only, it's pretty useless for my use case. Also, I am not sure I understand your point about thread-safety. Is it that we could move back from "I have reached 'quit-application'" to "I am back to permanent regime?"
Reporter | ||
Comment 8•11 years ago
|
||
Oh, I now believe I understand your point. I do not want to use this to determine whether we are currently in the process of responding to a notification. I want to use this to determine whether we the notification has been sent at some point in the past. Does that make sense?
Comment 9•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #8) > Does that make sense? Yes, but I still don't believe it's thread-safe. If you're off the main thread, and you do if (!HasShutDown()) { return; } // Do something which will crash if we have shut down Then what happens if right after you check HasShutDown, we start shutting down?
Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #9) > (In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow > bugzilla until July 1st> from comment #8) > > Does that make sense? > > Yes, but I still don't believe it's thread-safe. > > If you're off the main thread, and you do > > if (!HasShutDown()) { > return; > } > // Do something which will crash if we have shut down > > Then what happens if right after you check HasShutDown, we start shutting > down? In particular, if there's no locks or any other synchronization mechanism involved, nothing stops us from running *the entire shutdown process* after if(!HasShutDown()) but before // Do something which will crash if we have shut down
Reporter | ||
Comment 11•11 years ago
|
||
Good point. However, there is a way to make this useful, as follows: observerService->AddObserver(this, "some-event", true); if (NS_HasSomeEvent()) { // Oops, we missed the notification observerService->NotifyObservers(..., "some-event", ...); }
Reporter | ||
Comment 12•11 years ago
|
||
I believe that this variant will work.
Attachment #769005 -
Attachment is obsolete: true
Attachment #769302 -
Flags: feedback?(doug.turner)
Assignee | ||
Comment 13•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #11) > Good point. > However, there is a way to make this useful, as follows: > > observerService->AddObserver(this, "some-event", true); > if (NS_HasSomeEvent()) { > // Oops, we missed the notification > observerService->NotifyObservers(..., "some-event", ...); > } I don't fully understand what you're trying to do here but I'll note that you can't touch the observer service at all off the main thread.
Reporter | ||
Comment 14•11 years ago
|
||
Ouch. Good point, it seems that I had read too quickly the stack traces of bug 845190. These stack traces are effectively on the main thread, which simplifies the problem considerably. In this case, a non-Atomic<> version of NS_HasFoo() could certainly work.
Reporter | ||
Comment 15•11 years ago
|
||
What about this?
Attachment #769006 -
Attachment is obsolete: true
Attachment #769302 -
Attachment is obsolete: true
Attachment #769006 -
Flags: review?(doug.turner)
Attachment #769302 -
Flags: feedback?(doug.turner)
Attachment #769365 -
Flags: feedback?(khuey)
Attachment #769365 -
Flags: feedback?(doug.turner)
Comment 16•11 years ago
|
||
> These stack traces are effectively on the main thread What does it mean to "effectively" be on the main thread? >diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp >--- a/netwerk/base/src/nsStreamTransportService.cpp >+++ b/netwerk/base/src/nsStreamTransportService.cpp >@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b > nsStreamTransportService::~nsStreamTransportService() > { > NS_ASSERTION(!mPool, "thread pool wasn't shutdown"); > } > > nsresult > nsStreamTransportService::Init() > { >- if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) { >+ if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) { Observers are notified in arbitrary order. Even if we're all on the main thread, this can still be a problem. It could be that the stream transport service receives its xpcom-shutdown-threads message, and then nsStreamStransportService::Init() gets called, and /then/ the app status updater receives its xpcom-shutdown-threads notification. In this case, we'd still crash, no? If the goal is to run nsStreamTransportService::Init() only once, why not do that? nsStreamTransportService::Init() { static bool hasRun = false; NS_ENSURE_FALSE(hasRun, NS_ERROR_FAILURE); hasRun = true; ... } That would be the entirety of the patch.
Updated•11 years ago
|
Attachment #769365 -
Flags: review-
Reporter | ||
Comment 17•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #16) > > These stack traces are effectively on the main thread > > What does it mean to "effectively" be on the main thread? It means that I'm using the French word instead of the English word. s/effectively/indeed/ > > >diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp > >--- a/netwerk/base/src/nsStreamTransportService.cpp > >+++ b/netwerk/base/src/nsStreamTransportService.cpp > > >@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b > > nsStreamTransportService::~nsStreamTransportService() > > { > > NS_ASSERTION(!mPool, "thread pool wasn't shutdown"); > > } > > > > nsresult > > nsStreamTransportService::Init() > > { > >- if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) { > >+ if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) { > > Observers are notified in arbitrary order. > > Even if we're all on the main thread, this can still be a problem. It could > be > that the stream transport service receives its xpcom-shutdown-threads > message, > and then nsStreamStransportService::Init() gets called, and /then/ the app > status updater receives its xpcom-shutdown-threads notification. > > In this case, we'd still crash, no? [...] It seems to me that we're simply defending against different conditions. Iiuc, you want me to defend against double-initialization, while I want to defend against first initialization taking place too late. The issue of double-initialization would cause crashes, indeed, but experiments show that this is not the cause of bug 845190, so this was not my main focus. Of course, we can add a simple check for double-initialization, as you suggest.
Reporter | ||
Comment 18•11 years ago
|
||
Tagging this bug as blocker, as it indirectly prevents landing ParallelJavaScript patches.
Severity: normal → blocker
Assignee | ||
Comment 19•11 years ago
|
||
I think this is a better approach. We should disallow starting workers once we can no longer start them successfully.
Attachment #769365 -
Attachment is obsolete: true
Attachment #769365 -
Flags: feedback?(khuey)
Attachment #769365 -
Flags: feedback?(doug.turner)
Attachment #770371 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•11 years ago
|
Assignee: dteller → khuey
Severity: blocker → normal
Comment 20•11 years ago
|
||
Comment on attachment 770371 [details] [diff] [review] Stop allowing workers to be created at XPCOM shutdown Review of attachment 770371 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/RuntimeService.cpp @@ +1669,5 @@ > obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID); > + observed = NS_FAILED(rv); > + rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID); > + // XXXkhuey what does this do? > + mObserved = observed || NS_FAILED(rv); The "XXXhuey" comment probably should be removed or replaced with something less ad-hoc. :)
Reporter | ||
Comment 21•11 years ago
|
||
Comment on attachment 770371 [details] [diff] [review] Stop allowing workers to be created at XPCOM shutdown Review of attachment 770371 [details] [diff] [review]: ----------------------------------------------------------------- I believe that my code could be useful in the future, but if this smaller patch fixes the issue, I'd be glad to settle for it.
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Felix S. Klock II [:pnkfelix, :fklock] from comment #20) > Comment on attachment 770371 [details] [diff] [review] > Stop allowing workers to be created at XPCOM shutdown > > Review of attachment 770371 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/workers/RuntimeService.cpp > @@ +1669,5 @@ > > obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID); > > + observed = NS_FAILED(rv); > > + rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID); > > + // XXXkhuey what does this do? > > + mObserved = observed || NS_FAILED(rv); > > The "XXXhuey" comment probably should be removed or replaced with something > less ad-hoc. :) It's a pretty common thing in Mozilla code. http://mxr.mozilla.org/mozilla-central/search?string=//+XXX
Reporter | ||
Comment 23•11 years ago
|
||
Bent, could you please review the patch? The bug is progressively spilling to more and more tests.
Severity: normal → major
Comment 24•11 years ago
|
||
Ben is on vacation this week.
Comment 25•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #24) > Ben is on vacation this week. Do you mind taking it then, jlebar?
Assignee | ||
Comment 26•11 years ago
|
||
Justin is not a module peer for workers.
Assignee | ||
Comment 27•11 years ago
|
||
And this really does need to be reviewed by bent. Have you verified that this actually fixes the issue you're seeing?
Comment 28•11 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27) > And this really does need to be reviewed by bent. > > Have you verified that this actually fixes the issue you're seeing? No, but it was never clear why the patch I wanted to land (bug 860965) turned this intermittent orange into a perma orange in the first place. But I did get backed out over it. I'll push to try to see if it's still happening.
Comment on attachment 770371 [details] [diff] [review] Stop allowing workers to be created at XPCOM shutdown Review of attachment 770371 [details] [diff] [review]: ----------------------------------------------------------------- Isn't this just papering over the real problem? The stream transport service should not allow itself to be re-initialized or initialized after we've started shutting down. We could add this bandaid here (though, we still don't have confirmation that it actually fixes the problem), but eventually we'll have to go through this again later for some other component... Fixing the real problem doesn't sound so hard.
Assignee | ||
Comment 30•11 years ago
|
||
Regardless of whatever happens at the Necko level, I don't think workers should allow you to start a worker when there's no chance it will succeed.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30) > I don't think workers should allow you to start a worker when there's no chance > it will succeed. I don't care super strongly about this specific case but this same argument could be made for XHR and any number of other DOM things. Why should we waste time peppering the entire codebase with shutdown checks?
Comment on attachment 770371 [details] [diff] [review] Stop allowing workers to be created at XPCOM shutdown Review of attachment 770371 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/RuntimeService.cpp @@ +1520,2 @@ > void > +RuntimeService::Shutdown() This will work but I think we can do better... We should go ahead and deliver the Kill message here so that workers can start cleaning themselves up. Then we can do all the thread joins and event loop pumping in Cleanup like we do now.
Attachment #770371 -
Flags: review?(bent.mozilla) → review-
Assignee | ||
Comment 33•11 years ago
|
||
Something like this?
Attachment #770371 -
Attachment is obsolete: true
Attachment #788975 -
Flags: review?(bent.mozilla)
Comment on attachment 788975 [details] [diff] [review] Patch Review of attachment 788975 [details] [diff] [review]: ----------------------------------------------------------------- r=me with this: ::: dom/workers/RuntimeService.cpp @@ +1592,2 @@ > // That's it, no more workers. > mShuttingDown = true; Let's just set this at the start of the function, and let's assert that this is false first. @@ +1711,5 @@ > obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID); > + observed = NS_FAILED(rv); > + rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID); > + // XXXkhuey what does this do? > + mObserved = observed || NS_FAILED(rv); Let's just do: obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID); obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID); mObserved = false;
Attachment #788975 -
Flags: review?(bent.mozilla) → review+
Updated•11 years ago
|
Keywords: checkin-needed
Updated•11 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 35•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/efc4235dd32d
Comment 36•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/efc4235dd32d
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Reporter | ||
Comment 37•11 years ago
|
||
Looks like we have a regression: https://tbpl.mozilla.org/php/getParsedLog.php?id=30184825&tree=Mozilla-Inbound#error1
You need to log in
before you can comment on or make changes to this bug.
Description
•