Assertion failure: mSelfRef, at dom/workers/ServiceWorkerPrivate.cpp:360 [@ KeepAliveHandler::RemovePromise]

RESOLVED FIXED in Firefox 55

Status

()

defect
P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bc, Assigned: bkelly)

Tracking

(Blocks 2 bugs, {assertion})

53 Branch
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 wontfix, firefox54 wontfix, firefox55 fixed)

Details

(crash signature, )

Attachments

(2 attachments)

1. Install Spider
   https://bclary.com/projects/spider/spider-0.1.0.5-sm+tb+fx+an+fn.xpi

2. firefox -spider -start -quit -url  "https://mobile.twitter.com/trecet?lang=es" -hook "https://bclary.com/projects/spider/test-crash-on-load.js"

This has been occurring since at least late 2016-12 on over 800 urls mostly from mobile.twitter.com and weather.com.

Assertion failure: mSelfRef, at dom/workers/ServiceWorkerPrivate.cpp:360

This requires Spider *and* the hook script. It may be invalid as a result, but since it has been one of our more common crashes on Bughunter, I'll file it anyway. If you think this is invalid, please just mark it.

Reproduces on Windows and Linux in Bughunter but so far I've only been able to reproduce it manually on the Bughunter vms but not on my local workstation. It may be a timing issue that requires a very fast connection which I do not have at home.
It looks like perhaps the service worker thread is being terminated causing us to clear mSelfRef here:

https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerPrivate.cpp#351

And then a promise is settled immediately after causing us to call RemovePromise().

Perhaps we could add a mShuttingDown flag and only assert on these if that is false.
Since this impacts a significant number of widely visited urls, I think we'd fix this soon-ish.
Component: DOM: Workers → DOM: Service Workers
Priority: -- → P2
This is only failing on DEBUG builds, so our users should not be seeing it.  Per comment 1 I think its probably a slightly incorrect assertion and not an actual functional bug.
The downside for me is that Bughunter keeps hitting this which prevents us from finding other crashes which may be hiding behind it. If we can fine tune the assertion condition to properly detect issues that would be great. Barring that, just converting it into a warning would be good enough. Fatal assertions which aren't important enough to fix, aren't important enough to be fatal.
One thing to note however is that I do see this primarily on mobile.twitter.com and though we don't crash with opt desktop builds there, our users are crashing there in Fennec. I took a url and looked up the original crash report for it: bp-2e825d6e-769b-4b5f-a8b9-0a5360170527 which I don't believe has a useful stack, but there *may* be an underlying issue causing this assertion which our Fennec users see in release builds. ccing snorp for thoughts.
(In reply to Bob Clary [:bc:] from comment #5)
> One thing to note however is that I do see this primarily on
> mobile.twitter.com and though we don't crash with opt desktop builds there,
> our users are crashing there in Fennec. I took a url and looked up the
> original crash report for it: bp-2e825d6e-769b-4b5f-a8b9-0a5360170527 which
> I don't believe has a useful stack, but there *may* be an underlying issue
> causing this assertion which our Fennec users see in release builds. ccing
> snorp for thoughts.

Bob, what about that crash report makes you think its related to this?  Just that its on twitter?

The issue in this bug is an assertion which will be compiled out of fennec release channel.  Its unclear to me how it can be a factor in that crash.
Flags: needinfo?(bob)
Please see the comment in the patch.  I think it should be somewhat self explanatory.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=67a10f67f5eea97791b55b4a280fb95ad6f8ed61
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Attachment #8873175 - Flags: review?(bugmail)
The urls I test come from crash-stats. I looked up a url where I hit this assertion in bughunter to get the original crash report the user submitted that I got the url from. There was only one recent crash at that specific url.
Flags: needinfo?(bob)
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #7)
> Created attachment 8873175 [details] [diff] [review]
> Remove KeepAliveHandler::RemovePromise() assertion that can be wrong during
> worker thread shutdown. r=asuth
> 
> Please see the comment in the patch.  I think it should be somewhat self
> explanatory.
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=67a10f67f5eea97791b55b4a280fb95ad6f8ed61

Is it expected to run promise handlers after the worker has reached the terminating state?
(In reply to Cătălin Badea (:catalinb) from comment #9)
> Is it expected to run promise handlers after the worker has reached the
> terminating state?

I think it depends on the source of the promise.  For example, a promise returned from Cache::Match() can resolve after worker shutdown is started.  We have to keep the worker alive because we are expecting an IPC runnable and we want to avoid a crash.  This will let the promise be settled, the IPC actor will be destroyed, and then worker thread shutdown will complete.

Also, you could have an issue where something rejects in its WorkerHolder::Notify() implementation, but due to ordering the KeepAliveHandler gets its Notify() called first.  This would result in a similar problem.
Comment on attachment 8873175 [details] [diff] [review]
Remove KeepAliveHandler::RemovePromise() assertion that can be wrong during worker thread shutdown. r=asuth

Review of attachment 8873175 [details] [diff] [review]:
-----------------------------------------------------------------

Great comment.  I spent more time thinking about whether I should rib you over the excellent styling of the comment than trying to understand the patch, thanks to said comment.
Attachment #8873175 - Flags: review?(bugmail) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e398680c4833
Remove KeepAliveHandler::RemovePromise() assertion that can be wrong during worker thread shutdown. r=asuth
https://hg.mozilla.org/mozilla-central/rev/e398680c4833
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Version: unspecified → 53 Branch
You need to log in before you can comment on or make changes to this bug.