Closed Bug 1299887 Opened 3 years ago Closed 3 years ago

Crashes writing to Heap<> during Worker thread shutdown

Categories

(Core :: DOM: Workers, defect)

50 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- unaffected
firefox49 --- unaffected
firefox50 --- fixed
firefox51 - fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(2 files, 5 obsolete files)

STR:

1) Download and unzip the attached service_worker_talos_crash_case.zip.
2) cd service_worker
3) python -m SimpleHTTPServer 8000
4) Open a fresh nightly 51
5) Navigate to http://localhost:8000/images.html?images=25
6) Observe that the process running the service worker thread crashes.

The crash will look like:

https://crash-stats.mozilla.com/report/index/f5f8cfa6-acd6-4e7a-8e0f-2c9512160901

This only reproduces on 51.  It may not effect earlier branches since they have the older c++ promises.  The crash appears to be triggered by a Promise destructor running during Worker thread shutdown.

From what I gather on IRC, this means that we are doing something not quite right during worker shutdown.  Terrence, what do you think?
Flags: needinfo?(terrence)
[Tracking Requested - why for this release]:
FYI, this only happens in opt.  DEBUG builds (even with optimization enabled) does not trigger.
Adding Till in case this is a spidermonkey Promise issue.
Making the dom/cache FetchHandler cycle collected avoids the crash for me.

Its not clear, however, if this is just working around a bigger problem with CC and worker shutdown.
It seems with spidermonkey promises PromiseNativeHandler objects are now exposed to self-hosted script:

  https://dxr.mozilla.org/mozilla-central/source/dom/promise/Promise.cpp#800

Whose callbacks are themselves cycle collected now:

  https://dxr.mozilla.org/mozilla-central/source/dom/promise/Promise.cpp#405

I wonder if other PromiseNativeHandler implementations need to be cycle collected for this to work in general.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Attachment #8787350 - Attachment is obsolete: true
We have another PromiseNativeHandler object that references a c++ Promise in service worker code.  I haven't seen this trigger the same crash, but we should probably cycle collect it as well.
Attachment #8787363 - Flags: review?(bzbarsky)
Comment on attachment 8787365 [details] [diff] [review]
P2 Cycle collect service worker KeepAliveHandler::InternalHandler. r=bz

I don't see any other PromiseNativeHandler classes that hold a Promise directly.
Attachment #8787365 - Flags: review?(bzbarsky)
Blocks: 1299271
Comment on attachment 8787365 [details] [diff] [review]
P2 Cycle collect service worker KeepAliveHandler::InternalHandler. r=bz

Test failures on try.
Attachment #8787365 - Flags: review?(bzbarsky)
Attachment #8787363 - Flags: review?(bzbarsky)
See Also: → 1299934
Comment on attachment 8787363 [details] [diff] [review]
P1 Make dom/cache FetchHandler cycle collected. r=bz

This is the patch we were discussing last week.  To refresh, we are crashing because:

1) Worker shutdown is triggering a cycle collect
2) The cycle collection is causing the promise holding this PromiseNativeHandler to go away.
3) The PromiseNativeHandler then disposes of its internal mPromise
4) The Promise destruction tries to call back into self-hosted js on the exiting JS runtime triggering badness.

This patch works around the problem by cycle collecting the PromiseNativeHandler as well.  I believe this avoids the issue by dropping the mPromise ref earlier in the CC/GC cycle.  Its kind of a bandaid, though.
Attachment #8787363 - Flags: review?(bzbarsky)
This is the only other PromiseNativeHandler object I could find that directly holds another mPromise.  We don't want to cycle collect this object, though.  I wrote down the reasons in a comment in this patch.
Attachment #8787365 - Attachment is obsolete: true
Attachment #8788504 - Flags: review?(bzbarsky)
>4) The Promise destruction tries to call back into self-hosted js on the exiting JS runtime triggering badness.

It does?  Where is that happening?

What I see from code auditing, and the stack posted earlier in this bug, is that Promise::~Promise only calls mozilla::DropJSObjects(this).  This does eventually end up in CycleCollectedJSRuntime::RemoveJSHolder which tries to do something with the nsScriptObjectTracer.  Is the value it gets out of mJSHolders bogus or something?  Where is the self-hosted JS being called?

> Its kind of a bandaid, though.

Totally seems like one...  Nothing wrong with a band-aid, but it's not quite clear to me what we're _really_ band-aiding over here, and I'd like to understand that.
Flags: needinfo?(bkelly)
Comment on attachment 8788504 [details] [diff] [review]
P2 Note why KeepAliveHandler::InternalHandler is not cycle collected. r=bz

r=me, but seems like this could cause the same crash issues, no?
Attachment #8788504 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #14)
> What I see from code auditing, and the stack posted earlier in this bug, is
> that Promise::~Promise only calls mozilla::DropJSObjects(this).  This does
> eventually end up in CycleCollectedJSRuntime::RemoveJSHolder which tries to
> do something with the nsScriptObjectTracer.  Is the value it gets out of
> mJSHolders bogus or something?  Where is the self-hosted JS being called?

Sorry.  Not into self-hosted, but it has to drop js functions that are defined in self-hosted code.  We did not have to do this with the c++ promises in the past.

> Totally seems like one...  Nothing wrong with a band-aid, but it's not quite
> clear to me what we're _really_ band-aiding over here, and I'd like to
> understand that.

I wonder if we are getting a nullptr rt here:

https://dxr.mozilla.org/mozilla-central/source/xpcom/base/HoldDropJSObjects.cpp#39
Flags: needinfo?(bkelly)
(In reply to Boris Zbarsky [:bz] from comment #15)
> Comment on attachment 8788504 [details] [diff] [review]
> P2 Note why KeepAliveHandler::InternalHandler is not cycle collected. r=bz
> 
> r=me, but seems like this could cause the same crash issues, no?

I remember now.  KeepAliveHandler::InternalHandler uses the WorkerHolder mechanism to explicitly break the cycle here if the worker thread is being terminated:

https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerPrivate.cpp#312
https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerPrivate.cpp#284
A better solution we hashed out over IRC.  Basically, we should try to maintain the lifetime PromiseNativeHandlers had in the past.  The spidermonkey promise change inadvertently extended the life of PromiseNativeHandlers until the JS is GC'd.

This patch uses a shim to more aggressively release the PromiseNativeHandler objects.  The shim then lives until the JS is GC'd.
Attachment #8787363 - Attachment is obsolete: true
Attachment #8788504 - Attachment is obsolete: true
Attachment #8787363 - Flags: review?(bzbarsky)
Attachment #8788594 - Flags: review?(bzbarsky)
Comment on attachment 8788594 [details] [diff] [review]
Use a PromiseNativeHandler shim to ensure real PromiseNativeHandlers are released when the Promise settles. r=bz

>+  NS_DECL_ISUPPORTS
>+NS_IMPL_ISUPPORTS0(PromiseNativeHandlerShim);

PromiseNativeHandlerShim needs to implement cycle collection and cc mInner.
Attachment #8788594 - Flags: review?(bzbarsky) → review-
Updated with more circle gathering.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee6783c3e7e3
Attachment #8788594 - Attachment is obsolete: true
Attachment #8788599 - Flags: review?(bzbarsky)
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

r=me
Attachment #8788599 - Flags: review?(bzbarsky) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05cd37129db4
Use a PromiseNativeHandler shim to ensure real PromiseNativeHandlers are released when the Promise settles. r=bz
https://hg.mozilla.org/mozilla-central/rev/05cd37129db4
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Don't think we have to track this for 51 since it is now fixed.
Releasing PromiseNativeHandlers later was introduced in bug 911216 which was released in FF50.  We should probably uplift this.
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

Approval Request Comment
[Feature/regressing bug #]: Bug 911216
[User impact if declined]: Increased memory footprint and potentially crashes in worker code.  This was revealed in new code in FF51, but the underlying issue exists in FF50 and could occur under certain workloads.
[Describe test coverage new/current, TreeHerder]: No new tests.  Existing tests exercise promises quite a bit.
[Risks and why]: Minimal.
[String/UUID change made/needed]: None.
Attachment #8788599 - Flags: approval-mozilla-aurora?
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

Fixes a recent regression, Aurora50+
Attachment #8788599 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Version: unspecified → 50 Branch
You need to log in before you can comment on or make changes to this bug.