Closed
Bug 1370644
Opened 6 years ago
Closed 6 years ago
Deadlock detector seems to pay no attention to the fact that a lock gets unlocked
Categories
(Core :: XPCOM, enhancement)
Core
XPCOM
Tracking
()
People
(Reporter: ehsan.akhgari, Assigned: erahm)
References
Details
Attachments
(1 file)
1.59 KB,
patch
|
froydnj
:
review+
ehsan.akhgari
:
feedback+
|
Details | Diff | Splinter Review |
See bug 1370448. Let me try to walk you through the reported potential deadlock: 12:34:48 INFO - PID 4684 | === Cyclical dependency starts at 12:34:48 INFO - PID 4684 | --- Mutex : TimerThread.mMonitor calling context This lock is held in TimerThread::FindNextFireTimeForCurrentThread() which is being triggered from https://hg.mozilla.org/integration/mozilla-inbound/rev/16798a4167a1#l1.40 (sorry for linking to inbound, this code hasn't landed.) What happens here is like this: * xpcshell js code calls nsIThread::hasPendingEvents() * That gets to https://hg.mozilla.org/integration/mozilla-inbound/rev/16798a4167a1#l1.83 * That gets to https://hg.mozilla.org/integration/mozilla-inbound/rev/16798a4167a1#l1.40 * That gets to https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/xpcom/threads/MainThreadIdlePeriod.cpp#33 * That gets to https://searchfox.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#596 The lock is held and released using MonitorAutoLock. The deadlock detector clearly knows that the lock isn't being held since it's not printing "(currently acquired)" here! 12:34:48 INFO - PID 4684 | [stack trace unavailable] 12:34:48 INFO - PID 4684 | --- Next dependency: 12:34:48 INFO - PID 4684 | --- Mutex : nsSocketTransportService::mLock (currently acquired) https://hg.mozilla.org/integration/mozilla-inbound/rev/551c2fc04c72#l1.12 12:34:48 INFO - PID 4684 | calling context 12:34:48 INFO - PID 4684 | [stack trace unavailable] 12:34:48 INFO - PID 4684 | --- Next dependency: 12:34:48 INFO - PID 4684 | --- Mutex : nsTimerImpl::mMutex (currently acquired) https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/xpcom/threads/nsTimerImpl.cpp#327 12:34:48 INFO - PID 4684 | calling context 12:34:48 INFO - PID 4684 | [stack trace unavailable] 12:34:48 INFO - PID 4684 | === Cycle completed at 12:34:48 INFO - PID 4684 | --- Mutex : TimerThread.mMonitor calling context https://searchfox.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#547 It would be nice to teach the deadlock detector to be gentle and keep silent here and not complain without a reason.
Reporter | ||
Comment 1•6 years ago
|
||
Bug 1371704 is posing a threat to backout bug 1368286 due to this bug for the beta merge to go on smoothly, and bug 1368286 is pretty important to not get backed out for us to not starve our idle queue. As a result of that I'm going to bump up the priority of this to [qf:p1]...
Whiteboard: [qf:p1]
Assignee | ||
Comment 3•6 years ago
|
||
So just to be clear, this is more of warning. The deadlock detector thinks you're setting yourself up for issues because you're acquiring locks in a different order than it expects. It's definitely possible there's a logic error and this is a false positive, I'm going to dig into the code and try to figure out what's going on. If I recall correctly, it's something like the first time you acquired lock C you already held lock A and B. The next time you you acquired C you held lock just lock F (please don't argue the validity of this statement, I still need to look back at the logic). I think a super quick solution is to only use |NS_ERROR| if |maybeImminent == true| [1] and otherwise use an |NS_WARNING|. [1] http://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/xpcom/threads/BlockingResourceBase.cpp#308
Assignee | ||
Comment 4•6 years ago
|
||
This modifies the logic in |CheckAcquisition| to only call |NS_ERROR| if we're really going to deadlock. Instead, if we detect a suspicious cycle, we just use an |NS_WARNING|. This means that we'll still output warning text in debug builds, but we won't cause the process to abort. Ehsan, can you test if this fixes your intermittents? MozReview-Commit-ID: 71mFInWwbDY
Attachment #8876245 -
Flags: review?(nfroyd)
Attachment #8876245 -
Flags: feedback?(ehsan)
Assignee | ||
Comment 5•6 years ago
|
||
I also plan on following up on whether or not our logic is correct, hence the current patch being part 1. I think what we have is good enough to unblock Ehsan for now though.
Reporter | ||
Comment 6•6 years ago
|
||
I made a try push to test whether this will fix the permaorange we saw on beta55: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cb95f92fe8621f69715a6f6ab6ebf977a3061a3
![]() |
||
Comment 7•6 years ago
|
||
Comment on attachment 8876245 [details] [diff] [review] Part 1: Only use NS_ERROR for an imminent failure Review of attachment 8876245 [details] [diff] [review]: ----------------------------------------------------------------- Nobody ever pays attention to warnings, but...
Attachment #8876245 -
Flags: review?(nfroyd) → review+
Reporter | ||
Comment 8•6 years ago
|
||
Comment on attachment 8876245 [details] [diff] [review] Part 1: Only use NS_ERROR for an imminent failure Review of attachment 8876245 [details] [diff] [review]: ----------------------------------------------------------------- Looks like this indeed fixes the issue. Thanks!
Attachment #8876245 -
Flags: feedback?(ehsan) → feedback+
Assignee | ||
Comment 10•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/091c5613b5cfe1e448bd2c93cd2bb4149701ba9f Bug 1370644 - Part 1: Only use NS_ERROR for an imminent failure. r=froydnj
![]() |
||
Comment 11•6 years ago
|
||
Backed out for failing GTest XPCOMDeadlockDetectorTest.XPCOMSanity3DeathTest | Death test: Sanity3_Child(): https://hg.mozilla.org/integration/mozilla-inbound/rev/25b03b08f551411d7618232028584d66498d5bba Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=091c5613b5cfe1e448bd2c93cd2bb4149701ba9f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=106042392&repo=mozilla-inbound [task 2017-06-10T08:13:50.589511Z] 08:13:50 INFO - TEST-START | XPCOMDeadlockDetectorTest.XPCOMSanity3DeathTest [task 2017-06-10T08:13:50.589780Z] 08:13:50 INFO - [WARNING] /home/worker/workspace/build/src/testing/gtest/gtest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 15 threads. [task 2017-06-10T08:13:50.589977Z] 08:13:50 INFO - Program /home/worker/workspace/build/application/firefox/firefox (pid = 1203) received signal 11. [task 2017-06-10T08:13:50.590163Z] 08:13:50 INFO - Stack: [task 2017-06-10T08:13:50.590389Z] 08:13:50 INFO - #01: mozilla::OffTheBooksMutex::~OffTheBooksMutex [xpcom/threads/Mutex.h:57] [task 2017-06-10T08:13:50.590601Z] 08:13:50 INFO - #02: Sanity3_Child [xpcom/tests/gtest/TestDeadlockDetector.cpp:146] [task 2017-06-10T08:13:50.590835Z] 08:13:50 INFO - #03: XPCOMDeadlockDetectorTest_XPCOMSanity3DeathTest_Test::TestBody [xpcom/tests/gtest/TestDeadlockDetector.cpp:174] [task 2017-06-10T08:13:50.591042Z] 08:13:50 INFO - #04: testing::Test::Run [testing/gtest/gtest/src/gtest.cc:2481] [task 2017-06-10T08:13:50.591261Z] 08:13:50 INFO - #05: testing::TestInfo::Run [testing/gtest/gtest/src/gtest.cc:2656] [task 2017-06-10T08:13:50.591475Z] 08:13:50 INFO - #06: testing::TestCase::Run [testing/gtest/gtest/src/gtest.cc:2774] [task 2017-06-10T08:13:50.591703Z] 08:13:50 INFO - #07: testing::internal::UnitTestImpl::RunAllTests [testing/gtest/gtest/src/gtest.cc:4647] [task 2017-06-10T08:13:50.591919Z] 08:13:50 INFO - #08: testing::UnitTest::Run [testing/gtest/gtest/src/gtest.cc:4260] [task 2017-06-10T08:13:50.592132Z] 08:13:50 INFO - #09: mozilla::RunGTestFunc [xpcom/base/nsCOMPtr.h:402] [task 2017-06-10T08:13:50.592348Z] 08:13:50 INFO - #10: XREMain::XRE_mainStartup [toolkit/xre/nsAppRunner.cpp:3818] [task 2017-06-10T08:13:50.592558Z] 08:13:50 INFO - #11: XREMain::XRE_main [toolkit/xre/nsAppRunner.cpp:4735] [task 2017-06-10T08:13:50.592766Z] 08:13:50 INFO - #12: XRE_main [toolkit/xre/nsAppRunner.cpp:4844] [task 2017-06-10T08:13:50.592997Z] 08:13:50 INFO - #13: do_main [browser/app/nsBrowserApp.cpp:237] [task 2017-06-10T08:13:50.593229Z] 08:13:50 INFO - #14: main [browser/app/nsBrowserApp.cpp:312] [task 2017-06-10T08:13:50.593455Z] 08:13:50 INFO - #15: libc.so.6 + 0x20830 [task 2017-06-10T08:13:50.593721Z] 08:13:50 INFO - #16: _start [task 2017-06-10T08:13:50.593901Z] 08:13:50 INFO - Sleeping for 0 seconds. [task 2017-06-10T08:13:50.594196Z] 08:13:50 INFO - Type 'gdb /home/worker/workspace/build/application/firefox/firefox 1203' to attach your debugger to this thread. [task 2017-06-10T08:13:50.594403Z] 08:13:50 INFO - Done sleeping... [task 2017-06-10T08:13:50.594641Z] 08:13:50 WARNING - TEST-UNEXPECTED-FAIL | XPCOMDeadlockDetectorTest.XPCOMSanity3DeathTest | Death test: Sanity3_Child() [task 2017-06-10T08:13:50.594983Z] 08:13:50 INFO - Result: died but not with expected error. [task 2017-06-10T08:13:50.595344Z] 08:13:50 INFO - Expected: ###!!! ERROR: Potential deadlock detected.*=== Cyclical dependency starts at.*--- Mutex : dd.sanity3.m1.*--- Next dependency:.*--- Mutex : dd.sanity3.m2.*--- Next dependency:.*--- Mutex : dd.sanity3.m3.*--- Next dependency:.*--- Mutex : dd.sanity3.m4.*=== Cycle completed at.*--- Mutex : dd.sanity3.m1.*###!!! ASSERTION: Potential deadlock detected.* [task 2017-06-10T08:13:50.595466Z] 08:13:50 INFO - Actual msg: [task 2017-06-10T08:13:50.595736Z] 08:13:50 INFO - [ DEATH ] [1203] WARNING: OOPDeinit() without successful OOPInit(): file /home/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 3531 [task 2017-06-10T08:13:50.595937Z] 08:13:50 INFO - [ DEATH ] ###!!! ERROR: Potential deadlock detected: [task 2017-06-10T08:13:50.596167Z] 08:13:50 INFO - [ DEATH ] === Cyclical dependency starts at [task 2017-06-10T08:13:50.596358Z] 08:13:50 INFO - [ DEATH ] --- Mutex : dd.sanity3.m1 calling context [task 2017-06-10T08:13:50.596621Z] 08:13:50 INFO - [ DEATH ] [stack trace unavailable] [task 2017-06-10T08:13:50.596864Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.597111Z] 08:13:50 INFO - [ DEATH ] --- Next dependency: [task 2017-06-10T08:13:50.597324Z] 08:13:50 INFO - [ DEATH ] --- Mutex : dd.sanity3.m2 calling context [task 2017-06-10T08:13:50.597542Z] 08:13:50 INFO - [ DEATH ] [stack trace unavailable] [task 2017-06-10T08:13:50.597757Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.597968Z] 08:13:50 INFO - [ DEATH ] --- Next dependency: [task 2017-06-10T08:13:50.598210Z] 08:13:50 INFO - [ DEATH ] --- Mutex : dd.sanity3.m3 calling context [task 2017-06-10T08:13:50.598408Z] 08:13:50 INFO - [ DEATH ] [stack trace unavailable] [task 2017-06-10T08:13:50.598625Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.598857Z] 08:13:50 INFO - [ DEATH ] --- Next dependency: [task 2017-06-10T08:13:50.599099Z] 08:13:50 INFO - [ DEATH ] --- Mutex : dd.sanity3.m4 (currently acquired) [task 2017-06-10T08:13:50.599297Z] 08:13:50 INFO - [ DEATH ] calling context [task 2017-06-10T08:13:50.599524Z] 08:13:50 INFO - [ DEATH ] [stack trace unavailable] [task 2017-06-10T08:13:50.599741Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.599975Z] 08:13:50 INFO - [ DEATH ] === Cycle completed at [task 2017-06-10T08:13:50.600166Z] 08:13:50 INFO - [ DEATH ] --- Mutex : dd.sanity3.m1 calling context [task 2017-06-10T08:13:50.600427Z] 08:13:50 INFO - [ DEATH ] [stack trace unavailable] [task 2017-06-10T08:13:50.600640Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.600887Z] 08:13:50 INFO - [ DEATH ] Deadlock may happen for some other execution [task 2017-06-10T08:13:50.601114Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.601340Z] 08:13:50 INFO - [ DEATH ] [1203] WARNING: Potential deadlock detected: [task 2017-06-10T08:13:50.601563Z] 08:13:50 INFO - [ DEATH ] Cyclical dependency starts at [task 2017-06-10T08:13:50.601796Z] 08:13:50 INFO - [ DEATH ] Mutex : dd.sanity3.m1 [task 2017-06-10T08:13:50.602022Z] 08:13:50 INFO - [ DEATH ] Next dependency: [task 2017-06-10T08:13:50.602240Z] 08:13:50 INFO - [ DEATH ] Mutex : dd.sanity3.m2 [task 2017-06-10T08:13:50.602463Z] 08:13:50 INFO - [ DEATH ] Next dependency: [task 2017-06-10T08:13:50.602692Z] 08:13:50 INFO - [ DEATH ] Mutex : dd.sanity3.m3 [task 2017-06-10T08:13:50.602930Z] 08:13:50 INFO - [ DEATH ] Next dependency: [task 2017-06-10T08:13:50.603154Z] 08:13:50 INFO - [ DEATH ] Mutex : dd.sanity3.m4 (currently acquired) [task 2017-06-10T08:13:50.603375Z] 08:13:50 INFO - [ DEATH ] Cycle completed at [task 2017-06-10T08:13:50.603584Z] 08:13:50 INFO - [ DEATH ] Mutex : dd.sanity3.m1 [task 2017-06-10T08:13:50.603812Z] 08:13:50 INFO - [ DEATH ] Deadlock may happen for some other execution [task 2017-06-10T08:13:50.604031Z] 08:13:50 INFO - [ DEATH ] [task 2017-06-10T08:13:50.604274Z] 08:13:50 INFO - [ DEATH ] : file /home/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 308 [task 2017-06-10T08:13:50.604487Z] 08:13:50 INFO - [ DEATH ] Assertion failure: !mOwningThread (destroying a still-owned lock!), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Mutex.h:57 [task 2017-06-10T08:13:50.604759Z] 08:13:50 INFO - [ DEATH ] #01: Sanity3_Child [xpcom/tests/gtest/TestDeadlockDetector.cpp:146] [task 2017-06-10T08:13:50.605096Z] 08:13:50 INFO - [ DEATH ] #02: XPCOMDeadlockDetectorTest_XPCOMSanity3DeathTest_Test::TestBody [xpcom/tests/gtest/TestDeadlockDetector.cpp:174] [task 2017-06-10T08:13:50.605316Z] 08:13:50 INFO - [ DEATH ] #03: testing::Test::Run [testing/gtest/gtest/src/gtest.cc:2481] [task 2017-06-10T08:13:50.605523Z] 08:13:50 INFO - [ DEATH ] #04: testing::TestInfo::Run [testing/gtest/gtest/src/gtest.cc:2656] [task 2017-06-10T08:13:50.605799Z] 08:13:50 INFO - [ DEATH ] #05: testing::TestCase::Run [testing/gtest/gtest/src/gtest.cc:2774] [task 2017-06-10T08:13:50.606031Z] 08:13:50 INFO - [ DEATH ] #06: testing::internal::UnitTestImpl::RunAllTests [testing/gtest/gtest/src/gtest.cc:4647] [task 2017-06-10T08:13:50.606269Z] 08:13:50 INFO - [ DEATH ] #07: testing::UnitTest::Run [testing/gtest/gtest/src/gtest.cc:4260] [task 2017-06-10T08:13:50.606512Z] 08:13:50 INFO - [ DEATH ] #08: mozilla::RunGTestFunc [xpcom/base/nsCOMPtr.h:402] [task 2017-06-10T08:13:50.606751Z] 08:13:50 INFO - [ DEATH ] #09: XREMain::XRE_mainStartup [toolkit/xre/nsAppRunner.cpp:3818] [task 2017-06-10T08:13:50.606972Z] 08:13:50 INFO - [ DEATH ] #10: XREMain::XRE_main [toolkit/xre/nsAppRunner.cpp:4735] [task 2017-06-10T08:13:50.607201Z] 08:13:50 INFO - [ DEATH ] #11: XRE_main [toolkit/xre/nsAppRunner.cpp:4844] [task 2017-06-10T08:13:50.607405Z] 08:13:50 INFO - [ DEATH ] #12: do_main [browser/app/nsBrowserApp.cpp:237] [task 2017-06-10T08:13:50.607700Z] 08:13:50 INFO - [ DEATH ] #13: main [browser/app/nsBrowserApp.cpp:312] [task 2017-06-10T08:13:50.607897Z] 08:13:50 INFO - [ DEATH ] #14: libc.so.6 + 0x20830 [task 2017-06-10T08:13:50.608162Z] 08:13:50 INFO - [ DEATH ] #15: _start [task 2017-06-10T08:13:50.608417Z] 08:13:50 INFO - [ DEATH ] @ /home/worker/workspace/build/src/xpcom/tests/gtest/TestDeadlockDetector.cpp:174 [task 2017-06-10T08:13:50.608673Z] 08:13:50 WARNING - TEST-UNEXPECTED-FAIL | XPCOMDeadlockDetectorTest.XPCOMSanity3DeathTest | test completed (time: 8ms)
Flags: needinfo?(erahm)
Reporter | ||
Comment 12•6 years ago
|
||
I pushed a patch that disables Sanity3DeathTest temporarily to try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=daf2ff8ea44bb5612b50c02b18f11553704868e4
Reporter | ||
Comment 13•6 years ago
|
||
More test failures: https://queue.taskcluster.net/v1/task/JfS7g_zhQg-FPYEctv9Xrg/runs/0/artifacts/public/logs/live_backing.log Second try attempt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43aa23e49777b1f6f78cafd1122844d8db52e6ed
Reporter | ||
Comment 16•6 years ago
|
||
Second patch is green on try, I'm gonna land it after checking with erahm.
Comment 17•6 years ago
|
||
Pushed by eakhgari@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c38ea3bf8704 Part 1: Only use NS_ERROR for an imminent failure. r=froydnj
Comment 18•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/a5658a3fd394
status-firefox55:
--- → fixed
Assignee | ||
Comment 19•6 years ago
|
||
Just to note: Ehsan updated the patch to disable the Sanity3 and TwoThreads death tests. This makes sense since we've changed the potential deadlock code to just be a warning. As a follow-up when we have more time to look into this we can probably just re-enable those as non-death tests.
Flags: needinfo?(erahm)
Comment 20•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c38ea3bf8704
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•1 year ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•