Closed Bug 967005 Opened 6 years ago Closed 6 years ago

Null-dereference crash if a rejected Promise survives to worker shutdown

Categories

(Core :: DOM: Workers, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox29 + fixed
firefox30 + fixed

People

(Reporter: bzbarsky, Assigned: nsm)

References

Details

Attachments

(1 file, 2 obsolete files)

jandem is seeing stacks where WorkerThreadPrimaryRunnable::Run calls JS_DestroyContext, which triggers CC, which calls ~Promise, which calls MaybeReportRejected, which calls JS_BeginRequest on null, as far as we can tell.

Presumably because nsContentUtils::GetDefaultJSContextForThread is returning null in this case.

First of all, is that expected?  The comments in Run() claim there is another JSContext around in addition to the cx being destroyed.

Second, if this is expected, should we for now just null-check and drop the promise rejection on the floor if !cx?

Third, if we GCed before destroying, would we end up with a non-null JSContext here, or not?
Flags: needinfo?(khuey)
And I guess 4th: Can we add a version of JS_ErrorFromException that doesn't need rooted input?  The rooting is the only thing we want out of the JSContext here.
Promises are supposed to keep the worker alive, so unresolved promises shouldn't cause this kind of trouble... It sounds like the worker is dying too early maybe.
1. Yes.  The comment is wrong.  bholley got rid of it at some point and apparently it never got updated
2. See below
3. Certainly not without CCing appropriately.  And I think contexts inherently root their global these days, so there will be some garbage collected only when we JS_DestroyContext.
4. I have no idea.

I think the real question here is why we are doing this from the dtor, instead of the resolving code.  The Promise definitely can't keep the worker alive until it's dtor runs ... that would effectively create an unbreakable cycle if the Promise were reachable from the global.
Flags: needinfo?(khuey)
Yeah, question 4 was for the JS folks I cced.

> I think the real question here is why we are doing this from the dtor, instead of the
> resolving code.

We want to report unhandled rejections (the promise equivalent of uncaught exceptions).  We don't know at the time the promise is rejected whether it will be unhandled or not.  In fact, we don't know up until the point when no one can call then() on the promise anymore.  That only becomes true when the promise is unreachable, and the only way we have to detect that is when it's GCed...
One option, I suppose, is for a worker to keep track of all its promises and explicitly tell them to report before it goes into shutdown like this, since it knows at that point that the promises are no longer reachable...
Yes, so that can be done by adding unGCd promises as a "feature" to the worker.  They will then be notified when shutdown begins and can report them, and then remove themselves so that they don't block shutdown any further.
Nikhil, want to do that?
Flags: needinfo?(nsm.nikhil)
Assignee: nobody → nsm.nikhil
Flags: needinfo?(nsm.nikhil)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> Yes, so that can be done by adding unGCd promises as a "feature" to the
> worker.  They will then be notified when shutdown begins and can report
> them, and then remove themselves so that they don't block shutdown any
> further.

This seems like a bad idea to me. Every AddFeature call lead to a mutex acquisition and a runnable dispatch to the worker's parent thread. Considering the proliferation of Promises based APIs, a more efficient solution might be better.

jandem, do you have a reproducible testcase?
Flags: needinfo?(jdemooij)
(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> jandem, do you have a reproducible testcase?

See the patch + M-4 failures here: https://tbpl.mozilla.org/?tree=Try&rev=f9e567ee6aea

The patch there was on top of inbound revision 02302b3cdd84. I wasn't able to reproduce it locally though but didn't try very hard. The failures happen in different tests depending on GC/CC timing I think, probably after it runs test_promise.html
Flags: needinfo?(jdemooij)
(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> > Yes, so that can be done by adding unGCd promises as a "feature" to the
> > worker.  They will then be notified when shutdown begins and can report
> > them, and then remove themselves so that they don't block shutdown any
> > further.
> 
> This seems like a bad idea to me. Every AddFeature call lead to a mutex
> acquisition and a runnable dispatch to the worker's parent thread.
> Considering the proliferation of Promises based APIs, a more efficient
> solution might be better.
> 
> jandem, do you have a reproducible testcase?

Yeah, it's actually the wrong behavior, because calling AddFeature will increment the busy count and prevent the worker from shutting down via GC.  So we'd need some sort of one off here for promises.
One of those things where we overthink what is happening.
In the end, it's just that the worker private is setting mJSContext to nullptr and causing this whole crash.
The cx is still around since the JS engine (appropriately) only deletes the cx after the GC is done.
So let's allow Promises to reach that cx. I've currently added another member since I don't know if some worker stuff relies on mJSContext being null at that point.
Attachment #8373824 - Flags: review?(khuey)
A cx-less JS_ErrorFromException would be doable but kind of strange from the perspective of the rest of the API. The ErrorReport is lazily generated, which means JS_ErrorFromException can allocate memory, and if that fails, it reports OOM on cx -- like the whole JSAPI.

Calling this halfway through a GC sounds iffy to me. What if the exception object itself has been collected?
> What if the exception object itself has been collected?

I _think_ it can't do that before we go away and unroot it.
(In reply to Boris Zbarsky [:bz] from comment #13)
> > What if the exception object itself has been collected?
> 
> I _think_ it can't do that before we go away and unroot it.

Also, isn't the exception object on the cx? That is what it feels like from the outside with JS_ClearPendingException() and stuff all 'operating' on cx.
Comment on attachment 8373824 [details] [diff] [review]
Ensure Promises can get a valid JSContext to report unhandled rejection.

Review of attachment 8373824 [details] [diff] [review]:
-----------------------------------------------------------------

This is something bent needs to review.
Attachment #8373824 - Flags: review?(khuey) → review?(bent.mozilla)
> Also, isn't the exception object on the cx? 

We pulled it off there and stashed it in the Promise, so no, it's not.
Oh, that makes sense. It's one of those funny places where we can do an arbitrary number of GCs and not collect everything... oh well.
(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> This seems like a bad idea to me. Every AddFeature call lead to a mutex
> acquisition and a runnable dispatch to the worker's parent thread.

This is only paritally true... The mutex is not normally contended so I think this is a premature optimization worry. Also, runnables are only dispatched when the feature count goes to or from 0.
Comment on attachment 8373824 [details] [diff] [review]
Ensure Promises can get a valid JSContext to report unhandled rejection.

Review of attachment 8373824 [details] [diff] [review]:
-----------------------------------------------------------------

I really think we should go the normal route and use the WorkerFeature mechanism like khuey suggested in comment 6.
Attachment #8373824 - Flags: review?(bent.mozilla) → review-
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #18)
> (In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> > This seems like a bad idea to me. Every AddFeature call lead to a mutex
> > acquisition and a runnable dispatch to the worker's parent thread.
> 
> This is only paritally true... The mutex is not normally contended so I
> think this is a premature optimization worry. Also, runnables are only
> dispatched when the feature count goes to or from 0.

Ben, the ModifyBusyCount runnable is always dispatched since AddFeature() will happen on the worker thread and use ModifyBusyCountFromWorkerThread().
Flags: needinfo?(bent.mozilla)
Are we talking about the same code? I'm here:

  http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#4634

Anyway, let's not worry about perf like this. Do the simple, safe, and correct thing. If it turns out to be slow then we can change the busy count mechanism to benefit promises and other things too.
Flags: needinfo?(bent.mozilla)
Comment on attachment 8381141 [details] [diff] [review]
Report rejected promises when worker stops running.

Review of attachment 8381141 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good. Comment follow:

::: dom/promise/Promise.cpp
@@ +188,5 @@
>  
>  NS_IMPL_CYCLE_COLLECTION_CLASS(Promise)
>  
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(Promise)
> +  tmp->MaybeReportRejectedOnce();

Hm, in two other places below you always follow MaybeReportRejectedOnce with RemoveFeature. You're not calling RemoveFeature here. I think the simplest thing to do would be to put a RemoveFeature call inside MaybeReportRejectedOnce?

@@ +1060,5 @@
> +  // If the Promise was rejected, and there is no reject handler already setup,
> +  // watch for thread shutdown.
> +  if (aState == PromiseState::Rejected &&
> +      !mHadRejectCallback &&
> +      !NS_IsMainThread()) {

Is it ever possible to be on the main thread here? And if so what should happen?

@@ +1066,5 @@
> +    MOZ_ASSERT(worker);
> +    worker->AssertIsOnWorkerThread();
> +
> +    mFeature = new PromiseReportRejectFeature(this);
> +    worker->AddFeature(worker->GetJSContext(), mFeature);

This can fail, so how should you handle it?

::: dom/promise/Promise.h
@@ +39,5 @@
> +    MOZ_ASSERT(mPromise);
> +  }
> +
> +  bool
> +  Notify(JSContext* aCx, workers::Status aStatus) MOZ_OVERRIDE;

Nit: add the virtual keyword.

@@ +160,5 @@
>    void MaybeReportRejected();
>  
> +  void MaybeReportRejectedOnce() {
> +    MaybeReportRejected();
> +    mResult = JS::UndefinedValue();

I'm unclear why this is needed. Can't you just make MaybeReportRejected not do anything if mResult is undefined?
Attachment #8381141 - Flags: review?(bent.mozilla)
It's not possible to be on the main thread if the Promise was created on the worker thread.
That check only exists so we don't AddFeature a main thread promise.

MaybeReportReject() does indeed return if mResult is undefined. MaybeReportRejectOnce() is using that to enforce the once only behaviour.
Attachment #8386476 - Flags: review?(bent.mozilla)
Comment on attachment 8386476 [details] [diff] [review]
Report rejected promises when worker stops running.

Review of attachment 8386476 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: dom/promise/Promise.cpp
@@ +1077,5 @@
> +    worker->AssertIsOnWorkerThread();
> +
> +    mFeature = new PromiseReportRejectFeature(this);
> +    bool added = worker->AddFeature(worker->GetJSContext(), mFeature);
> +    if (!added) {

Nit: No need for the 'added' var, and let's do:

  if (NS_WARN_IF(!worker->AddFeature(worker->GetJSContext(), mFeature))) {
    // ...
   }

::: dom/promise/Promise.h
@@ +33,5 @@
> +class PromiseReportRejectFeature : public workers::WorkerFeature
> +{
> +  // The Promise that owns this feature.
> +  Promise* mPromise;
> +public:

Nit: newline above 'public'
Attachment #8386476 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/30b7c90dca14
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Shouldn't this have automatically been uplifted to fx29 based on the flags?
Flags: needinfo?(kwierso)
Flags: needinfo?(bbajaj)
I believe the patch still needs to request beta approval to get into 29, even if it's tracking that release.
Flags: needinfo?(kwierso)
Nikhil, you have to update the flag of the attachment. Select approval‑mozilla‑{aurora,beta} and fill the form. 
Since the patch is pretty big, please do that asap.

The tracking flags are used by the release team to keep track of important bugs. It does not impact uplift.
Flags: needinfo?(bbajaj) → needinfo?(nsm.nikhil)
Comment on attachment 8386476 [details] [diff] [review]
Report rejected promises when worker stops running.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 915233

User impact if declined: 
Promises used in workers and rejected but not handled will lead to crash.

Testing completed (on m-c, etc.): 
Yes. Tests in suite.

Risk to taking this patch (and alternatives if risky): 
I don't see any risk, it fixes a crash ;) nothing else.

String or IDL/UUID changes made by this patch:
None
Attachment #8386476 - Flags: approval-mozilla-beta?
Attachment #8386476 - Flags: approval-mozilla-aurora?
Flags: needinfo?(nsm.nikhil)
Attachment #8386476 - Flags: approval-mozilla-beta?
Attachment #8386476 - Flags: approval-mozilla-beta+
Attachment #8386476 - Flags: approval-mozilla-aurora?
Attachment #8386476 - Flags: approval-mozilla-aurora+
(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #30)
> Shouldn't this have automatically been uplifted to fx29 based on the flags?

nope, in desktop/mobile world if a bug is tracking you need to request approval for that branch :) In this case you need to request approval-mozilla-beta:? and fill in the approval request comment to get it landed on mozilla-beta(firefox 29).
Flags: in-testsuite?
Depends on: 1020226
You need to log in before you can comment on or make changes to this bug.