Closed Bug 1370448 Opened 7 years ago Closed 7 years ago

Fake deadlock detected due to nsSocketTransportService::StartPollWatchdog()

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

Attachments

(1 file)

My patch in bug 1368286 makes the deadlock detector complain on Windows, for example see this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=464da20f5513409c084c54e88f3ded61510f5fc4

12:34:48     INFO -  TEST-PASS | netwerk/test/unit/test_http2.js | testOnStopRequest - [testOnStopRequest : 81] true == true
12:34:48     INFO -  TEST-PASS | netwerk/test/unit/test_http2.js | testOnStopRequest - [testOnStopRequest : 82] true == true
12:34:48     INFO -  PID 4684 | starting test number 40
12:34:48     INFO -  (xpcshell/head.js) | test pending (2)
12:34:48     INFO -  (xpcshell/head.js) | test pending (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (3)
12:34:48     INFO -  (xpcshell/head.js) | test pending (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (2)
12:34:48     INFO -  (xpcshell/head.js) | test finished (1)
12:34:48     INFO -  exiting test
12:34:48     INFO -  PID 4684 | ###!!! ERROR: Potential deadlock detected:
12:34:48     INFO -  PID 4684 | === Cyclical dependency starts at
12:34:48     INFO -  PID 4684 | --- Mutex : TimerThread.mMonitor calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | --- Next dependency:
12:34:48     INFO -  PID 4684 | --- Mutex : nsSocketTransportService::mLock (currently acquired)
12:34:48     INFO -  PID 4684 |  calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | --- Next dependency:
12:34:48     INFO -  PID 4684 | --- Mutex : nsTimerImpl::mMutex (currently acquired)
12:34:48     INFO -  PID 4684 |  calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | === Cycle completed at
12:34:48     INFO -  PID 4684 | --- Mutex : TimerThread.mMonitor calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | Deadlock may happen for some other execution
12:34:48     INFO -  PID 4684 | [4684] ###!!! ASSERTION: Potential deadlock detected:
12:34:48     INFO -  PID 4684 | Cyclical dependency starts at
12:34:48     INFO -  PID 4684 | Mutex : TimerThread.mMonitor
12:34:48     INFO -  PID 4684 | Next dependency:
12:34:48     INFO -  PID 4684 | Mutex : nsSocketTransportService::mLock (currently acquired)
12:34:48     INFO -  PID 4684 | Next dependency:
12:34:48     INFO -  PID 4684 | Mutex : nsTimerImpl::mMutex (currently acquired)
12:34:48     INFO -  PID 4684 | Cycle completed at
12:34:48     INFO -  PID 4684 | Mutex : TimerThread.mMonitor
12:34:48     INFO -  PID 4684 | Deadlock may happen for some other execution
12:34:48     INFO -  PID 4684 | : 'Error', file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/xpcom/threads/BlockingResourceBase.cpp, line 308
12:35:00     INFO -  PID 4684 | #01: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:383]
12:35:00     INFO -  PID 4684 | #02: nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const &,unsigned int,nsTimerImpl::Callback &&) [xpcom/threads/nsTimerImpl.cpp:224]
12:35:00     INFO -  PID 4684 | #03: nsTimerImpl::InitCommon(unsigned int,unsigned int,nsTimerImpl::Callback &&) [xpcom/threads/nsTimerImpl.cpp:203]
12:35:00     INFO -  PID 4684 | #04: nsTimerImpl::Init(nsIObserver *,unsigned int,unsigned int) [xpcom/threads/nsTimerImpl.cpp:328]
12:35:00     INFO -  PID 4684 | #05: nsTimer::Init(nsIObserver *,unsigned int,unsigned int) [xpcom/threads/nsTimerImpl.h:231]
12:35:00     INFO -  PID 4684 | #06: mozilla::net::nsSocketTransportService::StartPollWatchdog() [netwerk/base/nsSocketTransportService2.cpp:1622]
12:35:00     INFO -  PID 4684 | #07: mozilla::net::nsSocketTransportService::OnDispatchedEvent(nsIThreadInternal *) [netwerk/base/nsSocketTransportService2.cpp:810]
12:35:00     INFO -  PID 4684 | #08: nsThread::PutEvent(already_AddRefed<nsIRunnable>,nsThread::nsNestedEventTarget *) [xpcom/threads/nsThread.cpp:751]
12:35:00     INFO -  PID 4684 | #09: nsThread::DispatchInternal(already_AddRefed<nsIRunnable>,unsigned int,nsThread::nsNestedEventTarget *) [xpcom/threads/nsThread.cpp:809]
12:35:00     INFO -  PID 4684 | #10: nsThread::Dispatch(already_AddRefed<nsIRunnable>,unsigned int) [xpcom/threads/nsThread.cpp:868]
12:35:00     INFO -  PID 4684 | #11: mozilla::net::nsSocketTransportService::Dispatch(already_AddRefed<nsIRunnable>,unsigned int) [netwerk/base/nsSocketTransportService2.cpp:153]
12:35:00     INFO -  PID 4684 | #12: nsIEventTarget::Dispatch(nsIRunnable *,unsigned int) [obj-firefox/dist/include/nsIEventTarget.h:37]
12:35:00     INFO -  PID 4684 | #13: mozilla::net::nsHttpConnectionMgr::PostEvent(void ( mozilla::net::nsHttpConnectionMgr::*)(int,mozilla::net::ARefBase *),int,mozilla::net::ARefBase *) [netwerk/protocol/http/nsHttpConnectionMgr.cpp:291]
12:35:00     INFO -  PID 4684 | #14: mozilla::net::nsHttpConnectionMgr::Shutdown() [netwerk/protocol/http/nsHttpConnectionMgr.cpp:236]
12:35:00     INFO -  PID 4684 | #15: mozilla::net::nsHttpHandler::ShutdownConnectionManager() [netwerk/protocol/http/nsHttpHandler.cpp:2622]
12:35:00     INFO -  PID 4684 | #16: nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverList.cpp:112]
12:35:00     INFO -  PID 4684 | #17: nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverService.cpp:285]
12:35:00     INFO -  PID 4684 | #18: NS_InvokeByIndex
12:35:00     INFO -  PID 4684 | #19: CallMethodHelper::Invoke() [js/xpconnect/src/XPCWrappedNative.cpp:1996]
12:35:00     INFO -  PID 4684 | #20: XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1282]
12:35:00     INFO -  PID 4684 | #21: XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:982]
12:35:00     INFO -  PID 4684 | #22: js::CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/jscntxtinlines.h:293]
12:35:00     INFO -  PID 4684 | #23: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:470]
12:35:00     INFO -  PID 4684 | #24: InternalCall [js/src/vm/Interpreter.cpp:515]
12:35:00     INFO -  PID 4684 | #25: js::jit::DoCallFallback [js/src/jit/BaselineIC.cpp:2453]
12:35:00     INFO -  PID 4684 | #26: ??? (???:???)
12:35:00     INFO -  PID 4684 | #27: ??? (???:???)
12:35:00     INFO -  PID 4684 | #28: ??? (???:???)
12:35:00     INFO -  PID 4684 | #29: EnterBaseline [js/src/jit/BaselineJIT.cpp:164]
12:35:00     INFO -  PID 4684 | #30: js::jit::EnterBaselineAtBranch(JSContext *,js::InterpreterFrame *,unsigned char *) [js/src/jit/BaselineJIT.cpp:270]
12:35:00     INFO -  PID 4684 | #31: Interpret [js/src/vm/Interpreter.cpp:1979]
12:35:00     INFO -  PID 4684 | #32: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:410]
12:35:00     INFO -  PID 4684 | #33: js::ExecuteKernel(JSContext *,JS::Handle<JSScript *>,JSObject &,JS::Value const &,js::AbstractFramePtr,JS::Value *) [js/src/vm/Interpreter.cpp:699]
12:35:00     INFO -  PID 4684 | #34: js::Execute(JSContext *,JS::Handle<JSScript *>,JSObject &,JS::Value *) [js/src/vm/Interpreter.cpp:731]
12:35:00     INFO -  PID 4684 | #35: Evaluate [js/src/jsapi.cpp:4662]
12:35:00     INFO -  PID 4684 | #36: JS::Evaluate(JSContext *,JS::ReadOnlyCompileOptions const &,char const *,unsigned int,JS::MutableHandle<JS::Value>) [js/src/jsapi.cpp:4703]
12:35:00     INFO -  PID 4684 | #37: ProcessArgs [js/xpconnect/src/XPCShellImpl.cpp:1131]
12:35:00     INFO -  PID 4684 | #38: XRE_XPCShellMain(int,char * *,char * *,XREShellData const *) [js/xpconnect/src/XPCShellImpl.cpp:1522]
12:35:00     INFO -  PID 4684 | #39: mozilla::BootstrapImpl::XRE_XPCShellMain(int,char * *,char * *,XREShellData const *) [toolkit/xre/Bootstrap.cpp:53]
12:35:00     INFO -  PID 4684 | #40: NS_internal_main(int,char * *,char * *) [js/xpconnect/shell/xpcshell.cpp:74]
12:35:00     INFO -  PID 4684 | #41: wmain [toolkit/xre/nsWindowsWMain.cpp:118]
12:35:00     INFO -  PID 4684 | #42: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253]
12:35:00     INFO -  PID 4684 | #43: kernel32.dll + 0x53c45
12:35:00     INFO -  PID 4684 | #44: ntdll.dll + 0x637f5
12:35:00     INFO -  PID 4684 | #45: ntdll.dll + 0x637c8

The reason that this is happening is that I'm adding a call to TimerThread::FindNextFireTimeForCurrentThread() indirectly from nsThread::HasPendingEvent() and that function holds the timer thread lock, <https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/xpcom/threads/TimerThread.cpp#596> creating this weird "potential" deadlock, which of course isn't a deadlock since the said timer thread lock has been freed, so there is no actual reference cycle.

In order to make the deadlock detector stop complaining, I'm going to make this job be dispatched off of a runnable from the main thread.  I checked with Dragana on IRC and she said that should be OK.
This is try push with many Windows debug xpcshell-test retriggers that are all green (except for one intermittent):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca91e688ea3b777c10da658ff179e44e831879ed
Attachment #8874711 - Flags: review?(dd.mozilla) → review+
Ehsan, how does that lambda hold the nsSocketTransportService alive during your main thread dispatch?
Flags: needinfo?(ehsan)
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #4)
> Ehsan, how does that lambda hold the nsSocketTransportService alive during
> your main thread dispatch?

No.  Note that this is a thread observer so this code will run on the next event loop run after an event has been dispatched to the STS thread, which means the thread object will be around since during XPCOM shutdown we will run all pending events as we're waiting for the threads to join.  But I suppose I can make it hold the object alive explicitly, it's easy enough to do and avoids the guesswork around the lifetime.
Flags: needinfo?(ehsan)
Or at least add a comment, because I think its somewhat non-obvious.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #6)
> Or at least add a comment, because I think its somewhat non-obvious.

Somewhat is an understatement.  :-)  I'm testing a patch on try to add an explicit reference to just hold the object alive and not rely on this magic!
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/551c2fc04c72
Start poll watchdog timer off of a runnable from the main thread in order to avoid a false positive detected deadlock; r=dragana
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #0)

> The reason that this is happening is that I'm adding a call to
> TimerThread::FindNextFireTimeForCurrentThread() indirectly from
> nsThread::HasPendingEvent() and that function holds the timer thread lock,
> <https://searchfox.org/mozilla-central/rev/
> d441cb24482c2e5448accaf07379445059937080/xpcom/threads/TimerThread.cpp#596>
> creating this weird "potential" deadlock, which of course isn't a deadlock
> since the said timer thread lock has been freed, so there is no actual
> reference cycle.

If the timer thread lock is freed it should have been removed from tracking in the deadlock detector [1], if that's not the case we should file a bug to follow up on that.

[1] http://searchfox.org/mozilla-central/rev/507743376d1ba753d14ab6b9305b7c6358570be8/xpcom/threads/BlockingResourceBase.cpp#236
Flags: needinfo?(ehsan)
Filed bug 1370644 about fixing the deadlock detector.
Flags: needinfo?(ehsan)
https://hg.mozilla.org/mozilla-central/rev/551c2fc04c72
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
See Also: → 1371857
See Also: → 1370776
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: