Crashes writing to Heap<> during Worker thread shutdown

RESOLVED FIXED in Firefox 50

Status

()

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bkelly, Assigned: bkelly)

Tracking

(Blocks: 1 bug, {regression})

50 Branch
mozilla51
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 unaffected, firefox49 unaffected, firefox50 fixed, firefox51- fixed)

Details

Attachments

(2 attachments, 5 obsolete attachments)

(Assignee)

Description

2 years ago
Created attachment 8787323 [details]
service_worker_talos_crash_case.zip

STR:

1) Download and unzip the attached service_worker_talos_crash_case.zip.
2) cd service_worker
3) python -m SimpleHTTPServer 8000
4) Open a fresh nightly 51
5) Navigate to http://localhost:8000/images.html?images=25
6) Observe that the process running the service worker thread crashes.

The crash will look like:

https://crash-stats.mozilla.com/report/index/f5f8cfa6-acd6-4e7a-8e0f-2c9512160901

This only reproduces on 51.  It may not effect earlier branches since they have the older c++ promises.  The crash appears to be triggered by a Promise destructor running during Worker thread shutdown.

From what I gather on IRC, this means that we are doing something not quite right during worker shutdown.  Terrence, what do you think?
Flags: needinfo?(terrence)
(Assignee)

Comment 1

2 years ago
[Tracking Requested - why for this release]:
status-firefox48: --- → unaffected
status-firefox49: --- → unaffected
status-firefox50: --- → unaffected
status-firefox51: --- → affected
tracking-firefox51: --- → ?
Keywords: regression
(Assignee)

Comment 2

2 years ago
FYI, this only happens in opt.  DEBUG builds (even with optimization enabled) does not trigger.
(Assignee)

Comment 3

2 years ago
Adding Till in case this is a spidermonkey Promise issue.
(Assignee)

Comment 4

2 years ago
Created attachment 8787350 [details] [diff] [review]
P1 Make dom/cache FetchHandler cycle collected. r=smaug

Making the dom/cache FetchHandler cycle collected avoids the crash for me.

Its not clear, however, if this is just working around a bigger problem with CC and worker shutdown.
(Assignee)

Comment 5

2 years ago
It seems with spidermonkey promises PromiseNativeHandler objects are now exposed to self-hosted script:

  https://dxr.mozilla.org/mozilla-central/source/dom/promise/Promise.cpp#800

Whose callbacks are themselves cycle collected now:

  https://dxr.mozilla.org/mozilla-central/source/dom/promise/Promise.cpp#405

I wonder if other PromiseNativeHandler implementations need to be cycle collected for this to work in general.
(Assignee)

Updated

2 years ago
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
(Assignee)

Comment 6

2 years ago
Created attachment 8787363 [details] [diff] [review]
P1 Make dom/cache FetchHandler cycle collected. r=bz
Attachment #8787350 - Attachment is obsolete: true
(Assignee)

Comment 7

2 years ago
Created attachment 8787365 [details] [diff] [review]
P2 Cycle collect service worker KeepAliveHandler::InternalHandler. r=bz

We have another PromiseNativeHandler object that references a c++ Promise in service worker code.  I haven't seen this trigger the same crash, but we should probably cycle collect it as well.
(Assignee)

Updated

2 years ago
Attachment #8787363 - Flags: review?(bzbarsky)
(Assignee)

Comment 8

2 years ago
Comment on attachment 8787365 [details] [diff] [review]
P2 Cycle collect service worker KeepAliveHandler::InternalHandler. r=bz

I don't see any other PromiseNativeHandler classes that hold a Promise directly.
Attachment #8787365 - Flags: review?(bzbarsky)
(Assignee)

Updated

2 years ago
Blocks: 1299271
(Assignee)

Comment 10

2 years ago
Comment on attachment 8787365 [details] [diff] [review]
P2 Cycle collect service worker KeepAliveHandler::InternalHandler. r=bz

Test failures on try.
Attachment #8787365 - Flags: review?(bzbarsky)
(Assignee)

Updated

2 years ago
Attachment #8787363 - Flags: review?(bzbarsky)
See Also: → bug 1299934
(Assignee)

Comment 11

2 years ago
Comment on attachment 8787363 [details] [diff] [review]
P1 Make dom/cache FetchHandler cycle collected. r=bz

This is the patch we were discussing last week.  To refresh, we are crashing because:

1) Worker shutdown is triggering a cycle collect
2) The cycle collection is causing the promise holding this PromiseNativeHandler to go away.
3) The PromiseNativeHandler then disposes of its internal mPromise
4) The Promise destruction tries to call back into self-hosted js on the exiting JS runtime triggering badness.

This patch works around the problem by cycle collecting the PromiseNativeHandler as well.  I believe this avoids the issue by dropping the mPromise ref earlier in the CC/GC cycle.  Its kind of a bandaid, though.
Attachment #8787363 - Flags: review?(bzbarsky)
(Assignee)

Comment 12

2 years ago
Created attachment 8788504 [details] [diff] [review]
P2 Note why KeepAliveHandler::InternalHandler is not cycle collected. r=bz

This is the only other PromiseNativeHandler object I could find that directly holds another mPromise.  We don't want to cycle collect this object, though.  I wrote down the reasons in a comment in this patch.
Attachment #8787365 - Attachment is obsolete: true
Attachment #8788504 - Flags: review?(bzbarsky)
>4) The Promise destruction tries to call back into self-hosted js on the exiting JS runtime triggering badness.

It does?  Where is that happening?

What I see from code auditing, and the stack posted earlier in this bug, is that Promise::~Promise only calls mozilla::DropJSObjects(this).  This does eventually end up in CycleCollectedJSRuntime::RemoveJSHolder which tries to do something with the nsScriptObjectTracer.  Is the value it gets out of mJSHolders bogus or something?  Where is the self-hosted JS being called?

> Its kind of a bandaid, though.

Totally seems like one...  Nothing wrong with a band-aid, but it's not quite clear to me what we're _really_ band-aiding over here, and I'd like to understand that.
Flags: needinfo?(bkelly)
Comment on attachment 8788504 [details] [diff] [review]
P2 Note why KeepAliveHandler::InternalHandler is not cycle collected. r=bz

r=me, but seems like this could cause the same crash issues, no?
Attachment #8788504 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 16

2 years ago
(In reply to Boris Zbarsky [:bz] from comment #14)
> What I see from code auditing, and the stack posted earlier in this bug, is
> that Promise::~Promise only calls mozilla::DropJSObjects(this).  This does
> eventually end up in CycleCollectedJSRuntime::RemoveJSHolder which tries to
> do something with the nsScriptObjectTracer.  Is the value it gets out of
> mJSHolders bogus or something?  Where is the self-hosted JS being called?

Sorry.  Not into self-hosted, but it has to drop js functions that are defined in self-hosted code.  We did not have to do this with the c++ promises in the past.

> Totally seems like one...  Nothing wrong with a band-aid, but it's not quite
> clear to me what we're _really_ band-aiding over here, and I'd like to
> understand that.

I wonder if we are getting a nullptr rt here:

https://dxr.mozilla.org/mozilla-central/source/xpcom/base/HoldDropJSObjects.cpp#39
Flags: needinfo?(bkelly)
(Assignee)

Comment 17

2 years ago
(In reply to Boris Zbarsky [:bz] from comment #15)
> Comment on attachment 8788504 [details] [diff] [review]
> P2 Note why KeepAliveHandler::InternalHandler is not cycle collected. r=bz
> 
> r=me, but seems like this could cause the same crash issues, no?

I remember now.  KeepAliveHandler::InternalHandler uses the WorkerHolder mechanism to explicitly break the cycle here if the worker thread is being terminated:

https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerPrivate.cpp#312
https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerPrivate.cpp#284
(Assignee)

Comment 18

2 years ago
Created attachment 8788594 [details] [diff] [review]
Use a PromiseNativeHandler shim to ensure real PromiseNativeHandlers are released when the Promise settles. r=bz

A better solution we hashed out over IRC.  Basically, we should try to maintain the lifetime PromiseNativeHandlers had in the past.  The spidermonkey promise change inadvertently extended the life of PromiseNativeHandlers until the JS is GC'd.

This patch uses a shim to more aggressively release the PromiseNativeHandler objects.  The shim then lives until the JS is GC'd.
Attachment #8787363 - Attachment is obsolete: true
Attachment #8788504 - Attachment is obsolete: true
Attachment #8787363 - Flags: review?(bzbarsky)
Attachment #8788594 - Flags: review?(bzbarsky)
Comment on attachment 8788594 [details] [diff] [review]
Use a PromiseNativeHandler shim to ensure real PromiseNativeHandlers are released when the Promise settles. r=bz

>+  NS_DECL_ISUPPORTS
>+NS_IMPL_ISUPPORTS0(PromiseNativeHandlerShim);

PromiseNativeHandlerShim needs to implement cycle collection and cc mInner.
Attachment #8788594 - Flags: review?(bzbarsky) → review-
(Assignee)

Comment 20

2 years ago
Created attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

Updated with more circle gathering.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee6783c3e7e3
Attachment #8788594 - Attachment is obsolete: true
Attachment #8788599 - Flags: review?(bzbarsky)
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

r=me
Attachment #8788599 - Flags: review?(bzbarsky) → review+

Comment 22

2 years ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05cd37129db4
Use a PromiseNativeHandler shim to ensure real PromiseNativeHandlers are released when the Promise settles. r=bz

Comment 23

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/05cd37129db4
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox51: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Don't think we have to track this for 51 since it is now fixed.
tracking-firefox51: ? → -
(Assignee)

Comment 25

2 years ago
Releasing PromiseNativeHandlers later was introduced in bug 911216 which was released in FF50.  We should probably uplift this.
Blocks: 911216
status-firefox50: unaffected → affected
(Assignee)

Comment 26

2 years ago
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

Approval Request Comment
[Feature/regressing bug #]: Bug 911216
[User impact if declined]: Increased memory footprint and potentially crashes in worker code.  This was revealed in new code in FF51, but the underlying issue exists in FF50 and could occur under certain workloads.
[Describe test coverage new/current, TreeHerder]: No new tests.  Existing tests exercise promises quite a bit.
[Risks and why]: Minimal.
[String/UUID change made/needed]: None.
Attachment #8788599 - Flags: approval-mozilla-aurora?
Comment on attachment 8788599 [details] [diff] [review]
bug1299887_promiseshim.patch

Fixes a recent regression, Aurora50+
Attachment #8788599 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Version: unspecified → 50 Branch
You need to log in before you can comment on or make changes to this bug.