Closed Bug 1387211 Opened 7 years ago Closed 7 years ago

avoid potential deadlock with hybrid event target when shutting down a worker

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox55 --- unaffected
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

Attachments

(1 file, 1 obsolete file)

I'm seeing a deadlock warning in worker shutdown code:

 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "###!!! ERROR: Potential deadlock detected:"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "=== Cyclical dependency starts at"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "--- Mutex : WorkerEventTarget calling context"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "  [stack trace unavailable]"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) ""
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "--- Next dependency:"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) "--- Mutex : WorkerPrivateParent Mutex (currently acquired)"
 0:09.27 PROCESS_OUTPUT: ProcessReader (pid:51199) " calling context"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "  [stack trace unavailable]"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) ""
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "=== Cycle completed at"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "--- Mutex : WorkerEventTarget calling context"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "  [stack trace unavailable]"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) ""
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Deadlock may happen for some other execution"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) ""
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "[Child 51291] WARNING: Potential deadlock detected:"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Cyclical dependency starts at"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Mutex : WorkerEventTarget"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Next dependency:"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Mutex : WorkerPrivateParent Mutex (currently acquired)"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Cycle completed at"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Mutex : WorkerEventTarget"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) "Deadlock may happen for some other execution"
 0:09.28 PROCESS_OUTPUT: ProcessReader (pid:51199) ""

This is because we lock the event target and worker private mutexes in reverse order in these two different cases:

1) During dispatch we lock event target mutex and then worker private mutex.
2) During shutdown we lock worker private mutex and then event target mutex.

This is a clear deadlock risk.
Bill, do you mind reviewing this while Andrea is out on holiday?

In bug 1379243 I added a new event target type to support nsIGlobalObject::EventTargetFor() on worker threads.  This object can outlive the life of the worker, so the worker must effectively disable it at shutdown time.  This point where I am disabling is holding a lock which ends up causing the lock order to be reversed compared to Dispatch which runs the risk of deadlock.

This patch makes us unlock while disabling the event target.  I also disable a bit sooner so that we can be sure more runnables will not be dispatched once the status is set to Killing.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=51a16766ea67c2c1cec7b23a8e4695183e0c25f7
Attachment #8893541 - Flags: review?(wmccloskey)
Hmm, maybe a try without my giant wip patch included would be good:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc1d78566f10c4f7fd6230fecbebe9646691a5eb
Comment on attachment 8893541 [details] [diff] [review]
Avoid potential deadlock during worker shutdown. r=billm

Review of attachment 8893541 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +6322,5 @@
> +    if (aStatus == Killing) {
> +      // We need to unlock while disabling the hybrid event target to avoid
> +      // a possible deadlock.  During dispatch the event target locks its own
> +      // lock and then locks the WorkerPrivate lock.  We would be locking in
> +      // the reverse order here.

I think this comment could be clarified a bit. Maybe something like:
"To avoid deadlock, we always acquire the event target mutex before the worker private mutex. (We do it in this order because that is what works best for event dispatching.) To enforce that order here, we need to unlock the worker private mutex before we lock the event target mutex (in ForgetWorkerPrivate)."
Attachment #8893541 - Flags: review?(wmccloskey) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2024c4bb1c76
Avoid potential deadlock during worker shutdown. r=billm
Comment on attachment 8894539 [details] [diff] [review]
Avoid potential deadlock during worker shutdown. r=billm

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1379243
[User impact if declined]: Potential threading deadlock could prevent clean shutdown or even hang the main thread.  We have not seen this actually trigger, though.
[Is this code covered by automated tests?]: Its hard to write a test to trigger this deadlock, but we do have a deadlock detector that warns about this code.  This patch fixes the deadlock detector warnings.
[Has the fix been verified in Nightly?]: Its hard to verify since its a race condition and we have not seen it trigger in practice.
[Needs manual test from QE? If yes, steps to reproduce]:  No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal risk
[Why is the change risky/not risky?]: There is little risk here since its a reordering of operations so that we don't nest locks in this particular case.  Also, this code is not heavily used yet, so the change of other regressions affecting features is very limited.
[String changes made/needed]: None
Attachment #8894539 - Flags: approval-mozilla-beta?
Comment on attachment 8894539 [details] [diff] [review]
Avoid potential deadlock during worker shutdown. r=billm

Prevent some shutdown hangs, let's uplift for 56 beta 1.
Attachment #8894539 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #8)
> https://hg.mozilla.org/releases/mozilla-beta/rev/cbed0550193a

Sorry if I broke protocol here, but this hasn't landed in m-c yet.  I tend to flag when I push to inbound since there is usually some delay looking at flags, etc.
https://hg.mozilla.org/mozilla-central/rev/2024c4bb1c76
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(In reply to Ben Kelly [:bkelly] from comment #6)
> [Is this code covered by automated tests?]: Its hard to write a test to
> trigger this deadlock, but we do have a deadlock detector that warns about
> this code.  This patch fixes the deadlock detector warnings.
> [Has the fix been verified in Nightly?]: Its hard to verify since its a race
> condition and we have not seen it trigger in practice.
> [Needs manual test from QE? If yes, steps to reproduce]:  No

Setting qe-verify- based on Ben's assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.