Fake deadlock detected due to nsSocketTransportService::StartPollWatchdog()

RESOLVED FIXED in Firefox 55

Status

()

Core
Networking
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: Ehsan, Assigned: Ehsan)

Tracking

unspecified
mozilla55
Points:
---

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

a year ago
My patch in bug 1368286 makes the deadlock detector complain on Windows, for example see this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=464da20f5513409c084c54e88f3ded61510f5fc4

12:34:48     INFO -  TEST-PASS | netwerk/test/unit/test_http2.js | testOnStopRequest - [testOnStopRequest : 81] true == true
12:34:48     INFO -  TEST-PASS | netwerk/test/unit/test_http2.js | testOnStopRequest - [testOnStopRequest : 82] true == true
12:34:48     INFO -  PID 4684 | starting test number 40
12:34:48     INFO -  (xpcshell/head.js) | test pending (2)
12:34:48     INFO -  (xpcshell/head.js) | test pending (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (3)
12:34:48     INFO -  (xpcshell/head.js) | test pending (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (3)
12:34:48     INFO -  (xpcshell/head.js) | test finished (2)
12:34:48     INFO -  (xpcshell/head.js) | test finished (1)
12:34:48     INFO -  exiting test
12:34:48     INFO -  PID 4684 | ###!!! ERROR: Potential deadlock detected:
12:34:48     INFO -  PID 4684 | === Cyclical dependency starts at
12:34:48     INFO -  PID 4684 | --- Mutex : TimerThread.mMonitor calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | --- Next dependency:
12:34:48     INFO -  PID 4684 | --- Mutex : nsSocketTransportService::mLock (currently acquired)
12:34:48     INFO -  PID 4684 |  calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | --- Next dependency:
12:34:48     INFO -  PID 4684 | --- Mutex : nsTimerImpl::mMutex (currently acquired)
12:34:48     INFO -  PID 4684 |  calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | === Cycle completed at
12:34:48     INFO -  PID 4684 | --- Mutex : TimerThread.mMonitor calling context
12:34:48     INFO -  PID 4684 |   [stack trace unavailable]
12:34:48     INFO -  PID 4684 | Deadlock may happen for some other execution
12:34:48     INFO -  PID 4684 | [4684] ###!!! ASSERTION: Potential deadlock detected:
12:34:48     INFO -  PID 4684 | Cyclical dependency starts at
12:34:48     INFO -  PID 4684 | Mutex : TimerThread.mMonitor
12:34:48     INFO -  PID 4684 | Next dependency:
12:34:48     INFO -  PID 4684 | Mutex : nsSocketTransportService::mLock (currently acquired)
12:34:48     INFO -  PID 4684 | Next dependency:
12:34:48     INFO -  PID 4684 | Mutex : nsTimerImpl::mMutex (currently acquired)
12:34:48     INFO -  PID 4684 | Cycle completed at
12:34:48     INFO -  PID 4684 | Mutex : TimerThread.mMonitor
12:34:48     INFO -  PID 4684 | Deadlock may happen for some other execution
12:34:48     INFO -  PID 4684 | : 'Error', file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/xpcom/threads/BlockingResourceBase.cpp, line 308
12:35:00     INFO -  PID 4684 | #01: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:383]
12:35:00     INFO -  PID 4684 | #02: nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const &,unsigned int,nsTimerImpl::Callback &&) [xpcom/threads/nsTimerImpl.cpp:224]
12:35:00     INFO -  PID 4684 | #03: nsTimerImpl::InitCommon(unsigned int,unsigned int,nsTimerImpl::Callback &&) [xpcom/threads/nsTimerImpl.cpp:203]
12:35:00     INFO -  PID 4684 | #04: nsTimerImpl::Init(nsIObserver *,unsigned int,unsigned int) [xpcom/threads/nsTimerImpl.cpp:328]
12:35:00     INFO -  PID 4684 | #05: nsTimer::Init(nsIObserver *,unsigned int,unsigned int) [xpcom/threads/nsTimerImpl.h:231]
12:35:00     INFO -  PID 4684 | #06: mozilla::net::nsSocketTransportService::StartPollWatchdog() [netwerk/base/nsSocketTransportService2.cpp:1622]
12:35:00     INFO -  PID 4684 | #07: mozilla::net::nsSocketTransportService::OnDispatchedEvent(nsIThreadInternal *) [netwerk/base/nsSocketTransportService2.cpp:810]
12:35:00     INFO -  PID 4684 | #08: nsThread::PutEvent(already_AddRefed<nsIRunnable>,nsThread::nsNestedEventTarget *) [xpcom/threads/nsThread.cpp:751]
12:35:00     INFO -  PID 4684 | #09: nsThread::DispatchInternal(already_AddRefed<nsIRunnable>,unsigned int,nsThread::nsNestedEventTarget *) [xpcom/threads/nsThread.cpp:809]
12:35:00     INFO -  PID 4684 | #10: nsThread::Dispatch(already_AddRefed<nsIRunnable>,unsigned int) [xpcom/threads/nsThread.cpp:868]
12:35:00     INFO -  PID 4684 | #11: mozilla::net::nsSocketTransportService::Dispatch(already_AddRefed<nsIRunnable>,unsigned int) [netwerk/base/nsSocketTransportService2.cpp:153]
12:35:00     INFO -  PID 4684 | #12: nsIEventTarget::Dispatch(nsIRunnable *,unsigned int) [obj-firefox/dist/include/nsIEventTarget.h:37]
12:35:00     INFO -  PID 4684 | #13: mozilla::net::nsHttpConnectionMgr::PostEvent(void ( mozilla::net::nsHttpConnectionMgr::*)(int,mozilla::net::ARefBase *),int,mozilla::net::ARefBase *) [netwerk/protocol/http/nsHttpConnectionMgr.cpp:291]
12:35:00     INFO -  PID 4684 | #14: mozilla::net::nsHttpConnectionMgr::Shutdown() [netwerk/protocol/http/nsHttpConnectionMgr.cpp:236]
12:35:00     INFO -  PID 4684 | #15: mozilla::net::nsHttpHandler::ShutdownConnectionManager() [netwerk/protocol/http/nsHttpHandler.cpp:2622]
12:35:00     INFO -  PID 4684 | #16: nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverList.cpp:112]
12:35:00     INFO -  PID 4684 | #17: nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverService.cpp:285]
12:35:00     INFO -  PID 4684 | #18: NS_InvokeByIndex
12:35:00     INFO -  PID 4684 | #19: CallMethodHelper::Invoke() [js/xpconnect/src/XPCWrappedNative.cpp:1996]
12:35:00     INFO -  PID 4684 | #20: XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1282]
12:35:00     INFO -  PID 4684 | #21: XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:982]
12:35:00     INFO -  PID 4684 | #22: js::CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/jscntxtinlines.h:293]
12:35:00     INFO -  PID 4684 | #23: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:470]
12:35:00     INFO -  PID 4684 | #24: InternalCall [js/src/vm/Interpreter.cpp:515]
12:35:00     INFO -  PID 4684 | #25: js::jit::DoCallFallback [js/src/jit/BaselineIC.cpp:2453]
12:35:00     INFO -  PID 4684 | #26: ??? (???:???)
12:35:00     INFO -  PID 4684 | #27: ??? (???:???)
12:35:00     INFO -  PID 4684 | #28: ??? (???:???)
12:35:00     INFO -  PID 4684 | #29: EnterBaseline [js/src/jit/BaselineJIT.cpp:164]
12:35:00     INFO -  PID 4684 | #30: js::jit::EnterBaselineAtBranch(JSContext *,js::InterpreterFrame *,unsigned char *) [js/src/jit/BaselineJIT.cpp:270]
12:35:00     INFO -  PID 4684 | #31: Interpret [js/src/vm/Interpreter.cpp:1979]
12:35:00     INFO -  PID 4684 | #32: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:410]
12:35:00     INFO -  PID 4684 | #33: js::ExecuteKernel(JSContext *,JS::Handle<JSScript *>,JSObject &,JS::Value const &,js::AbstractFramePtr,JS::Value *) [js/src/vm/Interpreter.cpp:699]
12:35:00     INFO -  PID 4684 | #34: js::Execute(JSContext *,JS::Handle<JSScript *>,JSObject &,JS::Value *) [js/src/vm/Interpreter.cpp:731]
12:35:00     INFO -  PID 4684 | #35: Evaluate [js/src/jsapi.cpp:4662]
12:35:00     INFO -  PID 4684 | #36: JS::Evaluate(JSContext *,JS::ReadOnlyCompileOptions const &,char const *,unsigned int,JS::MutableHandle<JS::Value>) [js/src/jsapi.cpp:4703]
12:35:00     INFO -  PID 4684 | #37: ProcessArgs [js/xpconnect/src/XPCShellImpl.cpp:1131]
12:35:00     INFO -  PID 4684 | #38: XRE_XPCShellMain(int,char * *,char * *,XREShellData const *) [js/xpconnect/src/XPCShellImpl.cpp:1522]
12:35:00     INFO -  PID 4684 | #39: mozilla::BootstrapImpl::XRE_XPCShellMain(int,char * *,char * *,XREShellData const *) [toolkit/xre/Bootstrap.cpp:53]
12:35:00     INFO -  PID 4684 | #40: NS_internal_main(int,char * *,char * *) [js/xpconnect/shell/xpcshell.cpp:74]
12:35:00     INFO -  PID 4684 | #41: wmain [toolkit/xre/nsWindowsWMain.cpp:118]
12:35:00     INFO -  PID 4684 | #42: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253]
12:35:00     INFO -  PID 4684 | #43: kernel32.dll + 0x53c45
12:35:00     INFO -  PID 4684 | #44: ntdll.dll + 0x637f5
12:35:00     INFO -  PID 4684 | #45: ntdll.dll + 0x637c8

The reason that this is happening is that I'm adding a call to TimerThread::FindNextFireTimeForCurrentThread() indirectly from nsThread::HasPendingEvent() and that function holds the timer thread lock, <https://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/xpcom/threads/TimerThread.cpp#596> creating this weird "potential" deadlock, which of course isn't a deadlock since the said timer thread lock has been freed, so there is no actual reference cycle.

In order to make the deadlock detector stop complaining, I'm going to make this job be dispatched off of a runnable from the main thread.  I checked with Dragana on IRC and she said that should be OK.
(Assignee)

Comment 1

a year ago
Created attachment 8874711 [details] [diff] [review]
Start poll watchdog timer off of a runnable from the main thread in order to avoid a false positive detected deadlock
Attachment #8874711 - Flags: review?(dd.mozilla)
(Assignee)

Comment 2

a year ago
This is try push with many Windows debug xpcshell-test retriggers that are all green (except for one intermittent):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca91e688ea3b777c10da658ff179e44e831879ed
Attachment #8874711 - Flags: review?(dd.mozilla) → review+

Updated

a year ago
Duplicate of this bug: 1369595
Ehsan, how does that lambda hold the nsSocketTransportService alive during your main thread dispatch?
Flags: needinfo?(ehsan)
(Assignee)

Comment 5

a year ago
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #4)
> Ehsan, how does that lambda hold the nsSocketTransportService alive during
> your main thread dispatch?

No.  Note that this is a thread observer so this code will run on the next event loop run after an event has been dispatched to the STS thread, which means the thread object will be around since during XPCOM shutdown we will run all pending events as we're waiting for the threads to join.  But I suppose I can make it hold the object alive explicitly, it's easy enough to do and avoids the guesswork around the lifetime.
Flags: needinfo?(ehsan)
Or at least add a comment, because I think its somewhat non-obvious.
(Assignee)

Comment 7

a year ago
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #6)
> Or at least add a comment, because I think its somewhat non-obvious.

Somewhat is an understatement.  :-)  I'm testing a patch on try to add an explicit reference to just hold the object alive and not rely on this magic!

Comment 8

a year ago
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/551c2fc04c72
Start poll watchdog timer off of a runnable from the main thread in order to avoid a false positive detected deadlock; r=dragana
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #0)

> The reason that this is happening is that I'm adding a call to
> TimerThread::FindNextFireTimeForCurrentThread() indirectly from
> nsThread::HasPendingEvent() and that function holds the timer thread lock,
> <https://searchfox.org/mozilla-central/rev/
> d441cb24482c2e5448accaf07379445059937080/xpcom/threads/TimerThread.cpp#596>
> creating this weird "potential" deadlock, which of course isn't a deadlock
> since the said timer thread lock has been freed, so there is no actual
> reference cycle.

If the timer thread lock is freed it should have been removed from tracking in the deadlock detector [1], if that's not the case we should file a bug to follow up on that.

[1] http://searchfox.org/mozilla-central/rev/507743376d1ba753d14ab6b9305b7c6358570be8/xpcom/threads/BlockingResourceBase.cpp#236
Flags: needinfo?(ehsan)
(Assignee)

Comment 10

a year ago
Filed bug 1370644 about fixing the deadlock detector.
Flags: needinfo?(ehsan)

Comment 11

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/551c2fc04c72
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55

Updated

a year ago
Duplicate of this bug: 1371126

Updated

a year ago
See Also: → bug 1371857

Updated

a year ago
See Also: → bug 1370776
You need to log in before you can comment on or make changes to this bug.