Closed Bug 888347 Opened 11 years ago Closed 11 years ago

[Worker] Don't initialize nsStreamTransportService during shutdown

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla26

People

(Reporter: Yoric, Assigned: khuey)

References

Details

(Whiteboard: [Async])

Attachments

(1 file, 5 obsolete files)

2. Preventing nsStreamTransportService from re-initializing, v9 11 years ago David Teller [:Yoric] - still alive but not very active 1.58 KB, patch	Yoric : review+	Details \| Diff \| Splinter Review
1. Exposing startup/shutdown data 11 years ago David Teller [:Yoric] - still alive but not very active 7.91 KB, patch		Details \| Diff \| Splinter Review
2. Preventing nsStreamTransportService from re-initializing, v10 11 years ago David Teller [:Yoric] - still alive but not very active 1.77 KB, patch		Details \| Diff \| Splinter Review
Preventing nsStreamTransportService from re-initializing, v11 11 years ago David Teller [:Yoric] - still alive but not very active 8.99 KB, patch	justin.lebar+bug : review-	Details \| Diff \| Splinter Review
Stop allowing workers to be created at XPCOM shutdown 11 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 3.93 KB, patch	bent.mozilla : review-	Details \| Diff \| Splinter Review
Patch 11 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 5.53 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review

David Teller [:Yoric] - still alive but not very active

Reporter

Description

•

11 years ago

Initializing nsStreamTransportService during shutdown leads to crashes, see e.g. bug 845190. We need to avoid this.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 1

•

11 years ago

Attached patch 2. Preventing nsStreamTransportService from re-initializing, v9 (obsolete) — Details — Splinter Review

Imported from bug 845190.

Assignee: nobody → dteller

Attachment #769005 - Flags: review+

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 2

•

11 years ago

Attached patch 1. Exposing startup/shutdown data (obsolete) — Details — Splinter Review

Imported from bug 845190.

Attachment #769006 - Flags: review?(justin.lebar+bug)

Attachment #769006 - Flags: review?(doug.turner)

Justin Lebar (not reading bugmail)

Comment 3

•

11 years ago

The main issue I see with this is, the observer service gives no guarantee of what order it runs in.  So the thing that changes the app status could be the first xpcom-shutdown observer run, or it could be the last one run.

Is that OK here?  At what point do we need to declare that "shutdown has happened"?

It might be worthwhile to create a smaller API that's tailored exactly to your needs here, instead of creating an API which exposes all possible states Gecko might be in, particularly because the semantics of exactly when we declare that we've entered/left a particular state are going to be pretty complicated.

Justin Lebar (not reading bugmail)

Updated

•

11 years ago

Attachment #769006 - Flags: review?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Comment 4

•

11 years ago

Note also that it's tricky to use this API off the main thread.

If you can guarantee that the app state is a ratchet -- it only goes towards shutdown, never in the opposite direction -- then off-main-thread people can use it, if they're careful not to depend on the exact state.  But this may speak to the benefits of having a boolean instead of an enum.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 5

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #4)
> Note also that it's tricky to use this API off the main thread.
> 
> If you can guarantee that the app state is a ratchet -- it only goes towards
> shutdown, never in the opposite direction -- then off-main-thread people can
> use it, if they're careful not to depend on the exact state.  But this may
> speak to the benefits of having a boolean instead of an enum.

Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|.

Now, I can certainly do that in a component specific to netwerk. It's just a bit weird, because it feels like the feature will be useful for more than netwerk.

Justin Lebar (not reading bugmail)

Comment 6

•

11 years ago

> It's just a bit weird, because it 
> feels like the feature will be useful for more than netwerk.

Sure, I agree this is a nice thing to have elsewhere.

> Well, we could easily convert this to a set of boolean flags |NS_HasReachedFoo(...)|.

Indeed.  But you still have to be really careful about how you use this off the main thread.  It's basically safe only as an optimization (like double-checked locking).  You can't do

  if (NS_HasReachedShutdown()) {
    return;
  }
  // Run code which will break if we have reached shutdown

off the main thread because the check is stale immediately after you run it.

Maybe we should make the API main-thread only?

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 7

•

11 years ago

Well, if it's main-thread only, it's pretty useless for my use case.

Also, I am not sure I understand your point about thread-safety. Is it that we could move back from "I have reached 'quit-application'" to "I am back to permanent regime?"

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 8

•

11 years ago

Oh, I now believe I understand your point. I do not want to use this to determine whether we are currently in the process of responding to a notification. I want to use this to determine whether we the notification has been sent at some point in the past. Does that make sense?

Justin Lebar (not reading bugmail)

Comment 9

•

11 years ago

(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #8)
>  Does that make sense?

Yes, but I still don't believe it's thread-safe.

If you're off the main thread, and you do

  if (!HasShutDown()) {
    return;
  }
  // Do something which will crash if we have shut down

Then what happens if right after you check HasShutDown, we start shutting down?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 10

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #9)
> (In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow
> bugzilla until July 1st> from comment #8)
> >  Does that make sense?
> 
> Yes, but I still don't believe it's thread-safe.
> 
> If you're off the main thread, and you do
> 
>   if (!HasShutDown()) {
>     return;
>   }
>   // Do something which will crash if we have shut down
> 
> Then what happens if right after you check HasShutDown, we start shutting
> down?

In particular, if there's no locks or any other synchronization mechanism involved, nothing stops us from running *the entire shutdown process* after if(!HasShutDown()) but before // Do something which will crash if we have shut down

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 11

•

11 years ago

Good point.
However, there is a way to make this useful, as follows:

observerService->AddObserver(this, "some-event", true);
if (NS_HasSomeEvent()) {
  // Oops, we missed the notification
  observerService->NotifyObservers(..., "some-event", ...);
}

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 12

•

11 years ago

Attached patch 2. Preventing nsStreamTransportService from re-initializing, v10 (obsolete) — Details — Splinter Review

I believe that this variant will work.

Attachment #769005 - Attachment is obsolete: true

Attachment #769302 - Flags: feedback?(doug.turner)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 13

•

11 years ago

(In reply to David Rajchenbach Teller [:Yoric] <on workweek, will not follow bugzilla until July 1st> from comment #11)
> Good point.
> However, there is a way to make this useful, as follows:
> 
> observerService->AddObserver(this, "some-event", true);
> if (NS_HasSomeEvent()) {
>   // Oops, we missed the notification
>   observerService->NotifyObservers(..., "some-event", ...);
> }

I don't fully understand what you're trying to do here but I'll note that you can't touch the observer service at all off the main thread.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 14

•

11 years ago

Ouch. Good point, it seems that I had read too quickly the stack traces of bug 845190. These stack traces are effectively on the main thread, which simplifies the problem considerably.

In this case, a non-Atomic<> version of NS_HasFoo() could certainly work.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 15

•

11 years ago

Attached patch Preventing nsStreamTransportService from re-initializing, v11 (obsolete) — Details — Splinter Review

What about this?

Attachment #769006 - Attachment is obsolete: true

Attachment #769302 - Attachment is obsolete: true

Attachment #769006 - Flags: review?(doug.turner)

Attachment #769302 - Flags: feedback?(doug.turner)

Attachment #769365 - Flags: feedback?(khuey)

Attachment #769365 - Flags: feedback?(doug.turner)

Justin Lebar (not reading bugmail)

Comment 16

•

11 years ago

> These stack traces are effectively on the main thread

What does it mean to "effectively" be on the main thread?

>diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp
>--- a/netwerk/base/src/nsStreamTransportService.cpp
>+++ b/netwerk/base/src/nsStreamTransportService.cpp

>@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b
> nsStreamTransportService::~nsStreamTransportService()
> {
>     NS_ASSERTION(!mPool, "thread pool wasn't shutdown");
> }
> 
> nsresult
> nsStreamTransportService::Init()
> {
>-    if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) {
>+    if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) {

Observers are notified in arbitrary order.

Even if we're all on the main thread, this can still be a problem.  It could be
that the stream transport service receives its xpcom-shutdown-threads message,
and then nsStreamStransportService::Init() gets called, and /then/ the app
status updater receives its xpcom-shutdown-threads notification.

In this case, we'd still crash, no?

If the goal is to run nsStreamTransportService::Init() only once, why not do that?

  nsStreamTransportService::Init()
  {
    static bool hasRun = false;
    NS_ENSURE_FALSE(hasRun, NS_ERROR_FAILURE);
    hasRun = true;
    
    ...
  }

That would be the entirety of the patch.

Justin Lebar (not reading bugmail)

Updated

•

11 years ago

Attachment #769365 - Flags: review-

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 17

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #16)
> > These stack traces are effectively on the main thread
> 
> What does it mean to "effectively" be on the main thread?

It means that I'm using the French word instead of the English word.
s/effectively/indeed/

> 
> >diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp
> >--- a/netwerk/base/src/nsStreamTransportService.cpp
> >+++ b/netwerk/base/src/nsStreamTransportService.cpp
> 
> >@@ -433,17 +433,17 @@ nsOutputStreamTransport::IsNonBlocking(b
> > nsStreamTransportService::~nsStreamTransportService()
> > {
> >     NS_ASSERTION(!mPool, "thread pool wasn't shutdown");
> > }
> > 
> > nsresult
> > nsStreamTransportService::Init()
> > {
> >-    if (NS_GetAppStatus() >= NS_XPCOM_SHUTDOWN_THREADS) {
> >+    if (NS_HasAppEncountered(NS_LITERAL_CSTRING("xpcom-shutdown-threads"))) {
> 
> Observers are notified in arbitrary order.
> 
> Even if we're all on the main thread, this can still be a problem.  It could
> be
> that the stream transport service receives its xpcom-shutdown-threads
> message,
> and then nsStreamStransportService::Init() gets called, and /then/ the app
> status updater receives its xpcom-shutdown-threads notification.
> 
> In this case, we'd still crash, no?

[...]

It seems to me that we're simply defending against different conditions. Iiuc, you want me to defend against double-initialization, while I want to defend against first initialization taking place too late. The issue of double-initialization would cause crashes, indeed, but experiments show that this is not the cause of bug 845190, so this was not my main focus.

Of course, we can add a simple check for double-initialization, as you suggest.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 18

•

11 years ago

Tagging this bug as blocker, as it indirectly prevents landing ParallelJavaScript patches.

Severity: normal → blocker

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 19

•

11 years ago

Attached patch Stop allowing workers to be created at XPCOM shutdown (obsolete) — Details — Splinter Review

I think this is a better approach.  We should disallow starting workers once we can no longer start them successfully.

Attachment #769365 - Attachment is obsolete: true

Attachment #769365 - Flags: feedback?(khuey)

Attachment #769365 - Flags: feedback?(doug.turner)

Attachment #770371 - Flags: review?(bent.mozilla)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

11 years ago

Assignee: dteller → khuey

Severity: blocker → normal

Felix S. Klock II [:pnkfelix, :fklock]

Comment 20

•

11 years ago

Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/RuntimeService.cpp
@@ +1669,5 @@
>          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> +      observed = NS_FAILED(rv);
> +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> +      // XXXkhuey what does this do?
> +      mObserved = observed || NS_FAILED(rv);

The "XXXhuey" comment probably should be removed or replaced with something less ad-hoc.  :)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 21

•

11 years ago

Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

I believe that my code could be useful in the future, but if this smaller patch fixes the issue, I'd be glad to settle for it.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 22

•

11 years ago

(In reply to Felix S. Klock II [:pnkfelix, :fklock] from comment #20)
> Comment on attachment 770371 [details] [diff] [review]
> Stop allowing workers to be created at XPCOM shutdown
> 
> Review of attachment 770371 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/workers/RuntimeService.cpp
> @@ +1669,5 @@
> >          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> > +      observed = NS_FAILED(rv);
> > +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> > +      // XXXkhuey what does this do?
> > +      mObserved = observed || NS_FAILED(rv);
> 
> The "XXXhuey" comment probably should be removed or replaced with something
> less ad-hoc.  :)

It's a pretty common thing in Mozilla code.

http://mxr.mozilla.org/mozilla-central/search?string=//+XXX

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 23

•

11 years ago

Bent, could you please review the patch? The bug is progressively spilling to more and more tests.

Severity: normal → major

Justin Lebar (not reading bugmail)

Comment 24

•

11 years ago

Ben is on vacation this week.

Shu-yu Guo [:shu]

Comment 25

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #24)
> Ben is on vacation this week.

Do you mind taking it then, jlebar?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 26

•

11 years ago

Justin is not a module peer for workers.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 27

•

11 years ago

And this really does need to be reviewed by bent.

Have you verified that this actually fixes the issue you're seeing?

Shu-yu Guo [:shu]

Comment 28

•

11 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27)
> And this really does need to be reviewed by bent.
> 
> Have you verified that this actually fixes the issue you're seeing?

No, but it was never clear why the patch I wanted to land (bug 860965) turned this intermittent orange into a perma orange in the first place. But I did get backed out over it. I'll push to try to see if it's still happening.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 29

•

11 years ago

Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

Isn't this just papering over the real problem? The stream transport service should not allow itself to be re-initialized or initialized after we've started shutting down. We could add this bandaid here (though, we still don't have confirmation that it actually fixes the problem), but eventually we'll have to go through this again later for some other component...

Fixing the real problem doesn't sound so hard.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 30

•

11 years ago

Regardless of whatever happens at the Necko level, I don't think workers should allow you to start a worker when there's no chance it will succeed.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 31

•

11 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> I don't think workers should allow you to start a worker when there's no chance
> it will succeed.

I don't care super strongly about this specific case but this same argument could be made for XHR and any number of other DOM things. Why should we waste time peppering the entire codebase with shutdown checks?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 32

•

11 years ago

Comment on attachment 770371 [details] [diff] [review]
Stop allowing workers to be created at XPCOM shutdown

Review of attachment 770371 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/RuntimeService.cpp
@@ +1520,2 @@
>  void
> +RuntimeService::Shutdown()

This will work but I think we can do better... We should go ahead and deliver the Kill message here so that workers can start cleaning themselves up. Then we can do all the thread joins and event loop pumping in Cleanup like we do now.

Attachment #770371 - Flags: review?(bent.mozilla) → review-

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 33

•

11 years ago

Attached patch Patch — Details — Splinter Review

Something like this?

Attachment #770371 - Attachment is obsolete: true

Attachment #788975 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 34

•

11 years ago

Comment on attachment 788975 [details] [diff] [review]
Patch

Review of attachment 788975 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with this:

::: dom/workers/RuntimeService.cpp
@@ +1592,2 @@
>    // That's it, no more workers.
>    mShuttingDown = true;

Let's just set this at the start of the function, and let's assert that this is false first.

@@ +1711,5 @@
>          obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
> +      observed = NS_FAILED(rv);
> +      rv = obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
> +      // XXXkhuey what does this do?
> +      mObserved = observed || NS_FAILED(rv);

Let's just do:

  obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID);
  obs->RemoveObserver(this, NS_XPCOM_SHUTDOWN_OBSERVER_ID);
  mObserved = false;

Attachment #788975 - Flags: review?(bent.mozilla) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

11 years ago

Keywords: checkin-needed

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

11 years ago

Keywords: checkin-needed

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 35

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/efc4235dd32d

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 36

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/efc4235dd32d

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: in-testsuite?

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

Andrei Eftimie

Updated

•

11 years ago

Depends on: 905170

Andrei Eftimie

Updated

•

11 years ago

Depends on: 915193

Andrei Eftimie

Updated

•

11 years ago

Depends on: 909237

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 37

•

11 years ago

Looks like we have a regression:
https://tbpl.mozilla.org/php/getParsedLog.php?id=30184825&tree=Mozilla-Inbound#error1

You need to log in before you can comment on or make changes to this bug.