Closed
Bug 1152047
Opened 9 years ago
Closed 9 years ago
mozilla::net::nsWSAdmissionManager::GetSessionCount may deadlock during shutdown
Categories
(Core :: Networking: WebSockets, defect)
Tracking
()
RESOLVED
FIXED
mozilla41
People
(Reporter: mayhemer, Assigned: michal)
References
Details
Attachments
(1 file)
1.28 KB,
patch
|
mcmanus
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
NtWaitForSingleObject ntdll.dll@0x1e1d6 RtlpEnterCriticalSectionContended RtlpEnterCriticalSectionContended PR_Lock mozilla::net::nsWSAdmissionManager::GetSessionCount(int&) mozilla::net::WebSocketChannel::StopSession(nsresult) mozilla::net::WebSocketChannel::ReleaseSession() mozilla::net::WebSocketChannel::ProcessInput(unsigned char*, unsigned int) mozilla::net::WebSocketChannel::OnInputStreamReady(nsIAsyncInputStream*) nsInputStreamReadyEvent::Run() nsThread::ProcessNextEvent(bool, bool*) NS_ProcessNextEvent(nsIThread*, bool) nsSocketTransportService::Run() https://crash-stats.mozilla.com/report/index/5acfd3bf-631b-46fd-b67b-731e42150407#allthreads, thread 9 https://crash-stats.mozilla.com/report/index/73b21efe-1098-4285-b57d-771342150402#allthreads, thread 8
Comment 1•9 years ago
|
||
hm. any ideas on what would cause this honza? The stacks don't actually show anything else holding that lock and its only ever used as an AutoLock. StaticMutex.h does say this though * The same caveats that apply to StaticAutoPtr apply to StaticMutex. In * particular, do not use StaticMutex as a stack variable or a class instance * variable, because this class relies on the fact that global variablies are * initialized to 0 in order to initialize mMutex. It is only safe to use * StaticMutex as a global or static variable. sLock is defined as a static member of a new'd object (a singleton that is lazy allocated). I wonder if the compiler trips up on that? It would be easy enough to make sLock be genuinely global and static.
Reporter | ||
Comment 2•9 years ago
|
||
I have seen happening this when the lock is reentered during a single stack, not necessarily during the one where the deadlock occurred. When you reenter the lock, it gets into a bad state that later leads to a deadlock.
Updated•9 years ago
|
Assignee: nobody → michal.novotny
Assignee | ||
Comment 3•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #1) > sLock is defined as a static member of a new'd object (a singleton that is > lazy allocated). I wonder if the compiler trips up on that? This shouldn't be a problem, see http://stackoverflow.com/questions/1421671/when-are-static-c-class-members-initialized Honza's theory might be right. Methods nsWSAdmissionManager::ConditionallyConnect(), nsWSAdmissionManager::OnConnected() and nsWSAdmissionManager::OnStopSession() can end up calling WebSocketChannel::BeginOpen() while holding the lock and if WebSocketChannel::BeginOpen() fails the lock is reentered in nsWSAdmissionManager::GetSessionCount(). I.e. the potential callstack is: nsWSAdmissionManager::GetSessionCount << lock WebSocketChannel::StopSession WebSocketChannel::AbortSession WebSocketChannel::BeginOpen FailDelayManager::DelayOrBegin FailDelayManager::ConnectNext nsWSAdmissionManager::OnStopSession << lock
Attachment #8605261 -
Flags: review?(mcmanus)
Comment 4•9 years ago
|
||
Comment on attachment 8605261 [details] [diff] [review] fix Review of attachment 8605261 [details] [diff] [review]: ----------------------------------------------------------------- nice analysis!
Attachment #8605261 -
Flags: review?(mcmanus) → review+
Comment 6•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/40155ce1f054
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment 7•9 years ago
|
||
Honza should this also get uplifted to aurora? Did it affect 39 at all? Thanks.
status-firefox39:
--- → ?
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 8•9 years ago
|
||
Same here. Dragana, can you request approvals?
Flags: needinfo?(honzab.moz) → needinfo?(dd.mozilla)
Comment 9•9 years ago
|
||
It's too late for 39 at this point unless we want to do an RC2. I still don't know if this affects 39. The fix could still make it into 40.
Updated•9 years ago
|
tracking-firefox40:
--- → +
tracking-firefox41:
--- → +
Comment 10•9 years ago
|
||
I will needinfo Michal it is his bug
Flags: needinfo?(dd.mozilla) → needinfo?(michal.novotny)
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Liz Henry (:lizzard) from comment #9) > It's too late for 39 at this point unless we want to do an RC2. I still > don't know if this affects 39. > The fix could still make it into 40. Yes, 39 should be affected. But this fix introduced a new problem that was fixed in 1170645, so we would need to uplift that too. Anyway, I didn't see any crash report on the crash-stats for this bug, so I'm not sure how critical this problem really is.
Flags: needinfo?(michal.novotny)
Comment 12•9 years ago
|
||
Michael, can we have the uplift request to aurora? thanks
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8605261 [details] [diff] [review] fix Approval Request Comment [Feature/regressing bug #]: don't know [User impact if declined]: potential deadlock [Describe test coverage new/current, TreeHerder]: there are websocket tests that should cover this code path [Risks and why]: low as long as this lands together with 1170645 which fixed another problem introduced by this patch [String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8605261 -
Flags: approval-mozilla-aurora?
Comment 14•9 years ago
|
||
Comment on attachment 8605261 [details] [diff] [review] fix Improve the stability, taking it. Taking also Bug 1170645
Attachment #8605261 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in
before you can comment on or make changes to this bug.
Description
•