Closed Bug 888347 Opened 11 years ago Closed 11 years ago

[Worker] Don't initialize nsStreamTransportService during shutdown

Categories

(Core :: DOM: Workers, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: Yoric, Assigned: khuey)

References

Details

(Whiteboard: [Async])

Attachments

(1 file, 5 obsolete files)

Initializing nsStreamTransportService during shutdown leads to crashes, see e.g. bug 845190. We need to avoid this.
Imported from bug 845190.
Assignee: nobody → dteller
Attachment #769005 - Flags: review+
Imported from bug 845190.
Attachment #769006 - Flags: review?(justin.lebar+bug)
Attachment #769006 - Flags: review?(doug.turner)
The main issue I see with this is, the observer service gives no guarantee of what order it runs in.  So the thing that changes the app status could be the first xpcom-shutdown observer run, or it could be the last one run.

Is that OK here?  At what point do we need to declare that "shutdown has happened"?

It might be worthwhile to create a smaller API that's tailored exactly to your needs here, instead of creating an API which exposes all possible states Gecko might be in, particularly because the semantics of exactly when we declare that we've entered/left a particular state are going to be pretty complicated.
Attachment #769006 - Flags: review?(justin.lebar+bug)
Note also that it's tricky to use this API off the main thread.

If you can guarantee that the app state is a ratchet -- it only goes towards shutdown, never in the opposite direction -- then off-main-thread people can use it, if they're careful not to depend on the exact state.  But this may speak to the benefits of having a boolean instead of an enum.
(In reply to Justin Lebar [:jlebar] from comment #4)
> Note also that it's tricky to use this API off the main thread.
> 
> If you can guarantee that the app state is a ratchet -- it only goes towards
> shutdown, never in the opposite direction -- then off-main-thread people can
> use it, if they're careful not to depend on the exact state.  But this may
> speak to the benefits of having a boolean instead of an enum.

Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|.

Now, I can certainly do that in a component specific to netwerk. It's just a bit weird, because it feels like the feature will be useful for more than netwerk.
> It's just a bit weird, because it 
> feels like the feature will be useful for more than netwerk.

Sure, I agree this is a nice thing to have elsewhere.

> Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|.

Indeed.  But you still have to be really careful about how you use this off the main thread.  It's basically safe only as an optimization (like double-checked locking).  You can't do

  if (NS_HasReachedShutdown()) {
    return;
  }
  // Run code which will break if we have reached shutdown

off the main thread because the check is stale immediately after you run it.

Maybe we should make the API main-thread only?
Well, if it's main-thread only, it's pretty useless for my use case.

Also, I am not sure I understand your point about thread-safety. Is it that we could move back from "I have reached 'quit-application'" to "I am back to permanent regime?"
Oh, I now believe I understand your point. I do not want to use this to determine whether we are currently in the process of responding to a notification. I want to use this to determine whether we the notification has been sent at some point in the past. Does that make sense?
(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #8)
>  Does that make sense?

Yes, but I still don't believe it's thread-safe.

If you're off the main thread, and you do

  if (!HasShutDown()) {
    return;
  }
  // Do something which will crash if we have shut down

Then what happens if right after you check HasShutDown, we start shutting down?
(In reply to Justin Lebar [:jlebar] from comment #9)
> (In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow
> bugzilla until July 1st> from comment #8)
> >  Does that make sense?
> 
> Yes, but I still don't believe it's thread-safe.
> 
> If you're off the main thread, and you do
> 
>   if (!HasShutDown()) {
>     return;
>   }
>   // Do something which will crash if we have shut down
> 
> Then what happens if right after you check HasShutDown, we start shutting
> down?

In particular, if there's no locks or any other synchronization mechanism involved, nothing stops us from running *the entire shutdown process* after if(!HasShutDown()) but before // Do something which will crash if we have shut down
Good point.
However, there is a way to make this useful, as follows:

observerService->AddObserver(this, "some-event", true);
if (NS_HasSomeEvent()) {
  // Oops, we missed the notification
  observerService->NotifyObservers(..., "some-event", ...);
}
I believe that this variant will work.
Attachment #769005 - Attachment is obsolete: true
Attachment #769302 - Flags: feedback?(doug.turner)
(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #11)
> Good point.
> However, there is a way to make this useful, as follows:
> 
> observerService->AddObserver(this, "some-event", true);
> if (NS_HasSomeEvent()) {
>   // Oops, we missed the notification
>   observerService->NotifyObservers(..., "some-event", ...);
> }

I don't fully understand what you're trying to do here but I'll note that you can't touch the observer service at all off the main thread.
Ouch. Good point, it seems that I had read too quickly the stack traces of bug 845190. These stack traces are effectively on the main thread, which simplifies the problem considerably.

In this case, a non-Atomic<> version of NS_HasFoo() could certainly work.
What about this?
Attachment #769006 - Attachment is obsolete: true
Attachment #769302 - Attachment is obsolete: true
Attachment #769006 - Flags: review?(doug.turner)
Attachment #769302 - Flags: feedback?(doug.turner)
Attachment #769365 - Flags: feedback?(khuey)
Attachment #769365 - Flags: feedback?(doug.turner)
> These stack traces are effectively on the main thread

What does it mean to "effectively" be on the main thread?

>diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp
>--- a/netwerk/base/src/nsStreamTransportService.cpp
>+++ b/netwerk/base/src/nsStreamTransportService.cpp

>@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b
> nsStreamTransportService::~nsStreamTransportService()
> {
>     NS_ASSERTION(!mPool, "thread pool wasn't shutdown");
> }
> 
> nsresult
> nsStreamTransportService::Init()
> {
>-    if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) {
>+    if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) {

Observers are notified in arbitrary order.

Even if we're all on the main thread, this can still be a problem.  It could be
that the stream transport service receives its xpcom-shutdown-threads message,
and then nsStreamStransportService::Init() gets called, and /then/ the app
status updater receives its xpcom-shutdown-threads notification.

In this case, we'd still crash, no?

If the goal is to run nsStreamTransportService::Init() only once, why not do that?

  nsStreamTransportService::Init()
  {
    static bool hasRun = false;
    NS_ENSURE_FALSE(hasRun, NS_ERROR_FAILURE);
    hasRun = true;
    
    ...
  }

That would be the entirety of the patch.
(In reply to Justin Lebar [:jlebar] from comment #16)
> > These stack traces are effectively on the main thread
> 
> What does it mean to "effectively" be on the main thread?

It means that I'm using the French word instead of the English word.
s/effectively/indeed/

> 
> >diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp
> >--- a/netwerk/base/src/nsStreamTransportService.cpp
> >+++ b/netwerk/base/src/nsStreamTransportService.cpp
> 
> >@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b
> > nsStreamTransportService::~nsStreamTransportService()
> > {
> >     NS_ASSERTION(!mPool, "thread pool wasn't shutdown");
> > }
> > 
> > nsresult
> > nsStreamTransportService::Init()
> > {
> >-    if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) {
> >+    if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) {
> 
> Observers are notified in arbitrary order.
> 
> Even if we're all on the main thread, this can still be a problem.  It could
> be
> that the stream transport service receives its xpcom-shutdown-threads
> message,
> and then nsStreamStransportService::Init() gets called, and /then/ the app
> status updater receives its xpcom-shutdown-threads notification.
> 
> In this case, we'd still crash, no?

[...]

It seems to me that we're simply defending against different conditions. Iiuc, you want me to defend against double-initialization, while I want to defend against first initialization taking place too late. The issue of double-initialization would cause crashes, indeed, but experiments show that this is not the cause of bug 845190, so this was not my main focus.

Of course, we can add a simple check for double-initialization, as you suggest.
Tagging this bug as blocker, as it indirectly prevents landing ParallelJavaScript patches.
Severity: normal → blocker
I think this is a better approach.  We should disallow starting workers once we can no longer start them successfully.
Attachment #769365 - Attachment is obsolete: true
Attachment #769365 - Flags: feedback?(khuey)
Attachment #769365 - Flags: feedback?(doug.turner)
Attachment #770371 - Flags: review?(bent.mozilla)
Assignee: dteller → khuey
Severity: blocker → normal
Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/RuntimeService.cpp
@@ +1669,5 @@
>          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> +      observed = NS_FAILED(rv);
> +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> +      // XXXkhuey what does this do?
> +      mObserved = observed || NS_FAILED(rv);

The "XXXhuey" comment probably should be removed or replaced with something less ad-hoc.  :)
Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

I believe that my code could be useful in the future, but if this smaller patch fixes the issue, I'd be glad to settle for it.
(In reply to Felix S. Klock II [:pnkfelix, :fklock] from comment #20)
> Comment on attachment 770371 [details] [diff] [review]
> Stop allowing workers to be created at XPCOM shutdown
> 
> Review of attachment 770371 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/workers/RuntimeService.cpp
> @@ +1669,5 @@
> >          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> > +      observed = NS_FAILED(rv);
> > +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> > +      // XXXkhuey what does this do?
> > +      mObserved = observed || NS_FAILED(rv);
> 
> The "XXXhuey" comment probably should be removed or replaced with something
> less ad-hoc.  :)

It's a pretty common thing in Mozilla code.

http://mxr.mozilla.org/mozilla-central/search?string=//+XXX
Bent, could you please review the patch? The bug is progressively spilling to more and more tests.
Severity: normal → major
Ben is on vacation this week.
(In reply to Justin Lebar [:jlebar] from comment #24)
> Ben is on vacation this week.

Do you mind taking it then, jlebar?
Justin is not a module peer for workers.
And this really does need to be reviewed by bent.

Have you verified that this actually fixes the issue you're seeing?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27)
> And this really does need to be reviewed by bent.
> 
> Have you verified that this actually fixes the issue you're seeing?

No, but it was never clear why the patch I wanted to land (bug 860965) turned this intermittent orange into a perma orange in the first place. But I did get backed out over it. I'll push to try to see if it's still happening.
Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

Isn't this just papering over the real problem? The stream transport service should not allow itself to be re-initialized or initialized after we've started shutting down. We could add this bandaid here (though, we still don't have confirmation that it actually fixes the problem), but eventually we'll have to go through this again later for some other component...

Fixing the real problem doesn't sound so hard.
Regardless of whatever happens at the Necko level, I don't think workers should allow you to start a worker when there's no chance it will succeed.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> I don't think workers should allow you to start a worker when there's no chance
> it will succeed.

I don't care super strongly about this specific case but this same argument could be made for XHR and any number of other DOM things. Why should we waste time peppering the entire codebase with shutdown checks?
Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/RuntimeService.cpp
@@ +1520,2 @@
>  void
> +RuntimeService::Shutdown()

This will work but I think we can do better... We should go ahead and deliver the Kill message here so that workers can start cleaning themselves up. Then we can do all the thread joins and event loop pumping in Cleanup like we do now.
Attachment #770371 - Flags: review?(bent.mozilla) → review-
Attached patch PatchSplinter Review
Something like this?
Attachment #770371 - Attachment is obsolete: true
Attachment #788975 - Flags: review?(bent.mozilla)
Comment on attachment 788975 [details] [diff] [review]
Patch

Review of attachment 788975 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with this:

::: dom/workers/RuntimeService.cpp
@@ +1592,2 @@
>    // That's it, no more workers.
>    mShuttingDown = true;

Let's just set this at the start of the function, and let's assert that this is false first.

@@ +1711,5 @@
>          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> +      observed = NS_FAILED(rv);
> +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> +      // XXXkhuey what does this do?
> +      mObserved = observed || NS_FAILED(rv);

Let's just do:

  obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
  obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
  mObserved = false;
Attachment #788975 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/efc4235dd32d
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Depends on: 905170
Depends on: 915193
Depends on: 909237
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: