Closed Bug 1358889 Opened 3 years ago Closed 3 years ago
Crash in mozilla::Singleton
Thread Holder::Release Use _i
59 bytes, text/x-review-board-request
This bug was filed from the Socorro interface and is report bp-b80e48d0-6b48-4577-a388-892f60170423. ============================================================= Crashing Thread (10) Frame Module Signature Source 0 xul.dll mozilla::SingletonThreadHolder::ReleaseUse_i() media/mtransport/nr_socket_prsock.cpp:250 1 xul.dll mozilla::runnable_args_memfn<RefPtr<mozilla::SingletonThreadHolder>, void ( mozilla::SingletonThreadHolder::*)(void)>::Run() media/mtransport/runnable_utils.h:170 2 xul.dll mozilla::detail::RunOnThreadInternal media/mtransport/runnable_utils.h:50 3 xul.dll mozilla::NrUdpSocketIpc::~NrUdpSocketIpc() media/mtransport/nr_socket_prsock.cpp:1157 4 xul.dll mozilla::NrUdpSocketIpc::`scalar deleting destructor'(unsigned int) 5 xul.dll mozilla::NrUdpSocketIpc::Release() media/mtransport/nr_socket_prsock.h:240 6 xul.dll mozilla::DispatchedRelease<mozilla::NrUdpSocketIpc>::Run() media/mtransport/runnable_utils.h:244 7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1264 8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:389 9 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:945 10 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1264 11 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:389 12 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 13 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 14 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 15 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:495 16 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 17 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 18 ucrtbase.dll _o__CIpow 19 kernel32.dll BaseThreadInitThunk 20 ntdll.dll __RtlUserThreadStart 21 ntdll.dll _RtlUserThreadStart crash reports with this signature are regressing since firefox 54 and later. they are occurring on 32bit/64bit builds of the browser on various versions of windows so far and always in the content process.
It looks like accessing nullptr. Randy, could you take a look? https://hg.mozilla.org/releases/mozilla-beta/annotate/cf76e00dcd6f/media/mtransport/nr_socket_prsock.cpp#l250
Appears to be happening only since 54. As bug 1341374 landed in 54 it seems likely that this is related to that change.
Looks like the additional dispatching to the right threads introduced by bug 1341374 result in the singleton destructor http://searchfox.org/mozilla-central/source/media/mtransport/nr_socket_prsock.cpp#186 being run before ReleaseUse_i() and nulling |mThread|.
@Randell: the patch would be option #1 which is simply "if the thread has been shut down already there is no point in trying to do rev-counting any more". The other option might be to try to get the destructor running only after all de-ref-counts are done. But I don't know how to accomplish that. The only option would potentially be to dispatch ClearSingletonOnShutdown() call over to the mThread as well and "hope" that it gets executed after all pending ReleaseUse_i() calls?
I think the "right" solution is to convert all of this to the (newer) SharedThreadPool and TaskQueue implementations. Each NrSocketIpc can have it's own TaskQueue. Default for SharedThreadPool is max 4 threads, but you can limit it (to 1 for example for audio in MediaPipeline.cpp). The TaskQueues enforce virtual single-threadedness for each NrSocketIpc, so likely there's no need to limit it to 1 thread for all peerconnections. 4 is probably fine (and in almost all cases, it'll really be 1 max). for now... I don't love this, but it's better than crashing. It'll probably leak (which is ok in this instance), and there's probably a TSAN violation here in theory.
Comment on attachment 8861219 [details] Bug 1358889: check if mThread is around before dereferencing. https://reviewboard.mozilla.org/r/133188/#review136030
Attachment #8861219 - Flags: review?(rjesup) → review+
(In reply to Randell Jesup [:jesup] from comment #6) > I think the "right" solution is to convert all of this to the (newer) > SharedThreadPool and TaskQueue implementations. Created bug 1359623 for that.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/498d5f1c4aea check if mThread is around before dereferencing. r=jesup
Please request Beta approval on this when you get a chance.
Comment on attachment 8861219 [details] Bug 1358889: check if mThread is around before dereferencing. Approval Request Comment [Feature/Bug causing the regression]: Bug 1341374 [User impact if declined]: Fx can crash in WebRTC calls if e10s is on. [Is this code covered by automated tests?]: In general the code is covered by lots of mochitests, but apparently none of them have hit this particular shutdown problem. [Has the fix been verified in Nightly?]: No none steps to repro. Too short time to have reliable numbers from crash stats either. [Needs manual test from QE? If yes, steps to reproduce]: No, see above. [List of other uplifts needed for the feature/fix]: N/A [Is the change risky?]: No [Why is the change risky/not risky?]: The patch only adds an extra reference check to prevent null pointer calls being made. [String changes made/needed]: N/A
Attachment #8861219 - Flags: approval-mozilla-beta?
Comment on attachment 8861219 [details] Bug 1358889: check if mThread is around before dereferencing. Fix a crash. Beta54+. Should be in 54 beta 4.
Attachment #8861219 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Nils Ohlmeier [:drno] from comment #12) > [Feature/Bug causing the regression]: Bug 1341374 > [User impact if declined]: Fx can crash in WebRTC calls if e10s is on. > [Is this code covered by automated tests?]: In general the code is covered > by lots of mochitests, but apparently none of them have hit this particular > shutdown problem. > [Has the fix been verified in Nightly?]: No none steps to repro. Too short > time to have reliable numbers from crash stats either. > [Needs manual test from QE? If yes, steps to reproduce]: No, see above. Setting qe-verify- based on Nils' assessment on manual testing needs.
You need to log in before you can comment on or make changes to this bug.