Closed Bug 967005 Opened 10 years ago Closed 10 years ago

Null-dereference crash if a rejected Promise survives to worker shutdown

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla30

Tracking Flags:

Tracking

Status

firefox29

fixed

firefox30

fixed

People

(Reporter: bzbarsky, Assigned: nsm)

References

Details

Attachments

(1 file, 2 obsolete files)

Ensure Promises can get a valid JSContext to report unhandled rejection. 10 years ago Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) 4.12 KB, patch	bent.mozilla : review-	Details \| Diff \| Splinter Review
Report rejected promises when worker stops running. 10 years ago Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) 6.95 KB, patch		Details \| Diff \| Splinter Review
Report rejected promises when worker stops running. 10 years ago Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) 7.40 KB, patch	bent.mozilla : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

10 years ago

jandem is seeing stacks where WorkerThreadPrimaryRunnable::Run calls JS_DestroyContext, which triggers CC, which calls ~Promise, which calls MaybeReportRejected, which calls JS_BeginRequest on null, as far as we can tell.

Presumably because nsContentUtils::GetDefaultJSContextForThread is returning null in this case.

First of all, is that expected?  The comments in Run() claim there is another JSContext around in addition to the cx being destroyed.

Second, if this is expected, should we for now just null-check and drop the promise rejection on the floor if !cx?

Third, if we GCed before destroying, would we end up with a non-null JSContext here, or not?

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

10 years ago

Flags: needinfo?(khuey)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

10 years ago

And I guess 4th: Can we add a version of JS_ErrorFromException that doesn't need rooted input?  The rooting is the only thing we want out of the JSContext here.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 2

•

10 years ago

Promises are supposed to keep the worker alive, so unresolved promises shouldn't cause this kind of trouble... It sounds like the worker is dying too early maybe.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 3

•

10 years ago

1. Yes.  The comment is wrong.  bholley got rid of it at some point and apparently it never got updated
2. See below
3. Certainly not without CCing appropriately.  And I think contexts inherently root their global these days, so there will be some garbage collected only when we JS_DestroyContext.
4. I have no idea.

I think the real question here is why we are doing this from the dtor, instead of the resolving code.  The Promise definitely can't keep the worker alive until it's dtor runs ... that would effectively create an unbreakable cycle if the Promise were reachable from the global.

Flags: needinfo?(khuey)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

10 years ago

Yeah, question 4 was for the JS folks I cced.

> I think the real question here is why we are doing this from the dtor, instead of the
> resolving code.

We want to report unhandled rejections (the promise equivalent of uncaught exceptions).  We don't know at the time the promise is rejected whether it will be unhandled or not.  In fact, we don't know up until the point when no one can call then() on the promise anymore.  That only becomes true when the promise is unreachable, and the only way we have to detect that is when it's GCed...

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

10 years ago

One option, I suppose, is for a worker to keep track of all its promises and explicitly tell them to report before it goes into shutdown like this, since it knows at that point that the promises are no longer reachable...

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 6

•

10 years ago

Yes, so that can be done by adding unGCd promises as a "feature" to the worker.  They will then be notified when shutdown begins and can report them, and then remove themselves so that they don't block shutdown any further.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

10 years ago

Nikhil, want to do that?

Flags: needinfo?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Updated

•

10 years ago

Assignee: nobody → nsm.nikhil

Flags: needinfo?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 8

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> Yes, so that can be done by adding unGCd promises as a "feature" to the
> worker.  They will then be notified when shutdown begins and can report
> them, and then remove themselves so that they don't block shutdown any
> further.

This seems like a bad idea to me. Every AddFeature call lead to a mutex acquisition and a runnable dispatch to the worker's parent thread. Considering the proliferation of Promises based APIs, a more efficient solution might be better.

jandem, do you have a reproducible testcase?

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Comment 9

•

10 years ago

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> jandem, do you have a reproducible testcase?

See the patch + M-4 failures here: https://tbpl.mozilla.org/?tree=Try&rev=f9e567ee6aea

The patch there was on top of inbound revision 02302b3cdd84. I wasn't able to reproduce it locally though but didn't try very hard. The failures happen in different tests depending on GC/CC timing I think, probably after it runs test_promise.html

Flags: needinfo?(jdemooij)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 10

•

10 years ago

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> > Yes, so that can be done by adding unGCd promises as a "feature" to the
> > worker.  They will then be notified when shutdown begins and can report
> > them, and then remove themselves so that they don't block shutdown any
> > further.
> 
> This seems like a bad idea to me. Every AddFeature call lead to a mutex
> acquisition and a runnable dispatch to the worker's parent thread.
> Considering the proliferation of Promises based APIs, a more efficient
> solution might be better.
> 
> jandem, do you have a reproducible testcase?

Yeah, it's actually the wrong behavior, because calling AddFeature will increment the busy count and prevent the worker from shutting down via GC.  So we'd need some sort of one off here for promises.

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 11

•

10 years ago

Attached patch Ensure Promises can get a valid JSContext to report unhandled rejection. (obsolete) — Details — Splinter Review

One of those things where we overthink what is happening.
In the end, it's just that the worker private is setting mJSContext to nullptr and causing this whole crash.
The cx is still around since the JS engine (appropriately) only deletes the cx after the GC is done.
So let's allow Promises to reach that cx. I've currently added another member since I don't know if some worker stuff relies on mJSContext being null at that point.

Attachment #8373824 - Flags: review?(khuey)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Updated

•

10 years ago

status-firefox29: --- → affected

status-firefox30: --- → affected

tracking-firefox29: --- → ?

tracking-firefox30: --- → ?

Jason Orendorff [:jorendorff]

Comment 12

•

10 years ago

A cx-less JS_ErrorFromException would be doable but kind of strange from the perspective of the rest of the API. The ErrorReport is lazily generated, which means JS_ErrorFromException can allocate memory, and if that fails, it reports OOM on cx -- like the whole JSAPI.

Calling this halfway through a GC sounds iffy to me. What if the exception object itself has been collected?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 13

•

10 years ago

> What if the exception object itself has been collected?

I _think_ it can't do that before we go away and unroot it.

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 14

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #13)
> > What if the exception object itself has been collected?
> 
> I _think_ it can't do that before we go away and unroot it.

Also, isn't the exception object on the cx? That is what it feels like from the outside with JS_ClearPendingException() and stuff all 'operating' on cx.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 15

•

10 years ago

Comment on attachment 8373824 [details] [diff] [review]
Ensure Promises can get a valid JSContext to report unhandled rejection.

Review of attachment 8373824 [details] [diff] [review]:
-----------------------------------------------------------------

This is something bent needs to review.

Attachment #8373824 - Flags: review?(khuey) → review?(bent.mozilla)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 16

•

10 years ago

> Also, isn't the exception object on the cx? 

We pulled it off there and stashed it in the Promise, so no, it's not.

Jason Orendorff [:jorendorff]

Comment 17

•

10 years ago

Oh, that makes sense. It's one of those funny places where we can do an arbitrary number of GCs and not collect everything... oh well.

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

tracking-firefox29: ? → +

tracking-firefox30: ? → +

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 18

•

10 years ago

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> This seems like a bad idea to me. Every AddFeature call lead to a mutex
> acquisition and a runnable dispatch to the worker's parent thread.

This is only paritally true... The mutex is not normally contended so I think this is a premature optimization worry. Also, runnables are only dispatched when the feature count goes to or from 0.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 19

•

10 years ago

Comment on attachment 8373824 [details] [diff] [review]
Ensure Promises can get a valid JSContext to report unhandled rejection.

Review of attachment 8373824 [details] [diff] [review]:
-----------------------------------------------------------------

I really think we should go the normal route and use the WorkerFeature mechanism like khuey suggested in comment 6.

Attachment #8373824 - Flags: review?(bent.mozilla) → review-

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 20

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #18)
> (In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8)
> > This seems like a bad idea to me. Every AddFeature call lead to a mutex
> > acquisition and a runnable dispatch to the worker's parent thread.
> 
> This is only paritally true... The mutex is not normally contended so I
> think this is a premature optimization worry. Also, runnables are only
> dispatched when the feature count goes to or from 0.

Ben, the ModifyBusyCount runnable is always dispatched since AddFeature() will happen on the worker thread and use ModifyBusyCountFromWorkerThread().

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Updated

•

10 years ago

Flags: needinfo?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 21

•

10 years ago

Are we talking about the same code? I'm here:

  http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#4634

Anyway, let's not worry about perf like this. Do the simple, safe, and correct thing. If it turns out to be slow then we can change the busy count mechanism to benefit promises and other things too.

Flags: needinfo?(bent.mozilla)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 22

•

10 years ago

Attached patch Report rejected promises when worker stops running. (obsolete) — Details — Splinter Review

Attachment #8381141 - Flags: review?(bent.mozilla)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Updated

•

10 years ago

Attachment #8373824 - Attachment is obsolete: true

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 23

•

10 years ago

Comment on attachment 8381141 [details] [diff] [review]
Report rejected promises when worker stops running.

Review of attachment 8381141 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good. Comment follow:

::: dom/promise/Promise.cpp
@@ +188,5 @@
>  
>  NS_IMPL_CYCLE_COLLECTION_CLASS(Promise)
>  
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(Promise)
> +  tmp->MaybeReportRejectedOnce();

Hm, in two other places below you always follow MaybeReportRejectedOnce with RemoveFeature. You're not calling RemoveFeature here. I think the simplest thing to do would be to put a RemoveFeature call inside MaybeReportRejectedOnce?

@@ +1060,5 @@
> +  // If the Promise was rejected, and there is no reject handler already setup,
> +  // watch for thread shutdown.
> +  if (aState == PromiseState::Rejected &&
> +      !mHadRejectCallback &&
> +      !NS_IsMainThread()) {

Is it ever possible to be on the main thread here? And if so what should happen?

@@ +1066,5 @@
> +    MOZ_ASSERT(worker);
> +    worker->AssertIsOnWorkerThread();
> +
> +    mFeature = new PromiseReportRejectFeature(this);
> +    worker->AddFeature(worker->GetJSContext(), mFeature);

This can fail, so how should you handle it?

::: dom/promise/Promise.h
@@ +39,5 @@
> +    MOZ_ASSERT(mPromise);
> +  }
> +
> +  bool
> +  Notify(JSContext* aCx, workers::Status aStatus) MOZ_OVERRIDE;

Nit: add the virtual keyword.

@@ +160,5 @@
>    void MaybeReportRejected();
>  
> +  void MaybeReportRejectedOnce() {
> +    MaybeReportRejected();
> +    mResult = JS::UndefinedValue();

I'm unclear why this is needed. Can't you just make MaybeReportRejected not do anything if mResult is undefined?

Attachment #8381141 - Flags: review?(bent.mozilla)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 24

•

10 years ago

Attached patch Report rejected promises when worker stops running. — Details — Splinter Review

It's not possible to be on the main thread if the Promise was created on the worker thread.
That check only exists so we don't AddFeature a main thread promise.

MaybeReportReject() does indeed return if mResult is undefined. MaybeReportRejectOnce() is using that to enforce the once only behaviour.

Attachment #8386476 - Flags: review?(bent.mozilla)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Updated

•

10 years ago

Attachment #8381141 - Attachment is obsolete: true

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 25

•

10 years ago

Comment on attachment 8386476 [details] [diff] [review]
Report rejected promises when worker stops running.

Review of attachment 8386476 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: dom/promise/Promise.cpp
@@ +1077,5 @@
> +    worker->AssertIsOnWorkerThread();
> +
> +    mFeature = new PromiseReportRejectFeature(this);
> +    bool added = worker->AddFeature(worker->GetJSContext(), mFeature);
> +    if (!added) {

Nit: No need for the 'added' var, and let's do:

  if (NS_WARN_IF(!worker->AddFeature(worker->GetJSContext(), mFeature))) {
    // ...
   }

::: dom/promise/Promise.h
@@ +33,5 @@
> +class PromiseReportRejectFeature : public workers::WorkerFeature
> +{
> +  // The Promise that owns this feature.
> +  Promise* mPromise;
> +public:

Nit: newline above 'public'

Attachment #8386476 - Flags: review?(bent.mozilla) → review+

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 26

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/716ba77a9d0a

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 27

•

10 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/0a70c0402338 for breaking ASAN mochitest-4: https://tbpl.mozilla.org/php/getParsedLog.php?id=35966001&tree=Mozilla-Inbound

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 28

•

10 years ago

let's try again
https://hg.mozilla.org/integration/mozilla-inbound/rev/30b7c90dca14

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 29

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/30b7c90dca14

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox30: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla30

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 30

•

10 years ago

Shouldn't this have automatically been uplifted to fx29 based on the flags?

Flags: needinfo?(kwierso)

Flags: needinfo?(bbajaj)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 31

•

10 years ago

I believe the patch still needs to request beta approval to get into 29, even if it's tracking that release.

Flags: needinfo?(kwierso)

Sylvestre Ledru [:Sylvestre]

Comment 32

•

10 years ago

Nikhil, you have to update the flag of the attachment. Select approval‑mozilla‑{aurora,beta} and fill the form. 
Since the patch is pretty big, please do that asap.

The tracking flags are used by the release team to keep track of important bugs. It does not impact uplift.

Flags: needinfo?(bbajaj) → needinfo?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Assignee

Comment 33

•

10 years ago

Comment on attachment 8386476 [details] [diff] [review]
Report rejected promises when worker stops running.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 915233

User impact if declined: 
Promises used in workers and rejected but not handled will lead to crash.

Testing completed (on m-c, etc.): 
Yes. Tests in suite.

Risk to taking this patch (and alternatives if risky): 
I don't see any risk, it fixes a crash ;) nothing else.

String or IDL/UUID changes made by this patch:
None

Attachment #8386476 - Flags: approval-mozilla-beta?

Attachment #8386476 - Flags: approval-mozilla-aurora?

Flags: needinfo?(nsm.nikhil)

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Attachment #8386476 - Flags: approval-mozilla-beta?

Attachment #8386476 - Flags: approval-mozilla-beta+

Attachment #8386476 - Flags: approval-mozilla-aurora?

Attachment #8386476 - Flags: approval-mozilla-aurora+

bhavana bajaj [:bajaj]

Comment 34

•

10 years ago

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #30)
> Shouldn't this have automatically been uplifted to fx29 based on the flags?

nope, in desktop/mobile world if a bug is tracking you need to request approval for that branch :) In this case you need to request approval-mozilla-beta:? and fill in the approval request comment to get it landed on mozilla-beta(firefox 29).

Ryan VanderMeulen [:RyanVM]

Comment 35

•

10 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/6738f08f107f

status-firefox29: affected → fixed

[SV Manager] Florin Mezei, QA (:FlorinMezei)

Updated

•

10 years ago

Flags: in-testsuite?

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Depends on: 1020226

You need to log in before you can comment on or make changes to this bug.