Closed Bug 1358889 Opened 3 years ago Closed 3 years ago

Crash in mozilla::SingletonThreadHolder::ReleaseUse_i

Categories

(Core :: WebRTC: Networking, defect, P1, critical)

54 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr45 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: philipp, Assigned: drno)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-b80e48d0-6b48-4577-a388-892f60170423.
=============================================================
Crashing Thread (10)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::SingletonThreadHolder::ReleaseUse_i() 	media/mtransport/nr_socket_prsock.cpp:250
1 	xul.dll 	mozilla::runnable_args_memfn<RefPtr<mozilla::SingletonThreadHolder>, void ( mozilla::SingletonThreadHolder::*)(void)>::Run() 	media/mtransport/runnable_utils.h:170
2 	xul.dll 	mozilla::detail::RunOnThreadInternal 	media/mtransport/runnable_utils.h:50
3 	xul.dll 	mozilla::NrUdpSocketIpc::~NrUdpSocketIpc() 	media/mtransport/nr_socket_prsock.cpp:1157
4 	xul.dll 	mozilla::NrUdpSocketIpc::`scalar deleting destructor'(unsigned int) 	
5 	xul.dll 	mozilla::NrUdpSocketIpc::Release() 	media/mtransport/nr_socket_prsock.h:240
6 	xul.dll 	mozilla::DispatchedRelease<mozilla::NrUdpSocketIpc>::Run() 	media/mtransport/runnable_utils.h:244
7 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1264
8 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:389
9 	xul.dll 	mozilla::net::nsSocketTransportService::Run() 	netwerk/base/nsSocketTransportService2.cpp:945
10 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1264
11 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:389
12 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
13 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
14 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
15 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:495
16 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
17 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
18 	ucrtbase.dll 	_o__CIpow 	
19 	kernel32.dll 	BaseThreadInitThunk 	
20 	ntdll.dll 	__RtlUserThreadStart 	
21 	ntdll.dll 	_RtlUserThreadStart

crash reports with this signature are regressing since firefox 54 and later. they are occurring on 32bit/64bit builds of the browser on various versions of windows so far and always in the content process.
It looks like accessing nullptr. Randy, could you take a look?

https://hg.mozilla.org/releases/mozilla-beta/annotate/cf76e00dcd6f/media/mtransport/nr_socket_prsock.cpp#l250
Flags: needinfo?(rjesup)
Component: WebRTC → WebRTC: Networking
Rank: 15
Priority: -- → P1
Appears to be happening only since 54. As bug 1341374 landed in 54 it seems likely that this is related to that change.
Looks like the additional dispatching to the right threads introduced by bug 1341374 result in the singleton destructor 
http://searchfox.org/mozilla-central/source/media/mtransport/nr_socket_prsock.cpp#186
being run before ReleaseUse_i() and nulling |mThread|.
@Randell: the patch would be option #1 which is simply "if the thread has been shut down already there is no point in trying to do rev-counting any more". The other option might be to try to get the destructor running only after all de-ref-counts are done. But I don't know how to accomplish that. The only option would potentially be to dispatch ClearSingletonOnShutdown() call over to the mThread as well and "hope" that it gets executed after all pending ReleaseUse_i() calls?
I think the "right" solution is to convert all of this to the (newer) SharedThreadPool and TaskQueue implementations.  Each NrSocketIpc can have it's own TaskQueue.  Default for SharedThreadPool is max 4 threads, but you can limit it (to 1 for example  for audio in MediaPipeline.cpp).  The TaskQueues enforce virtual single-threadedness for each NrSocketIpc, so likely there's no need to limit it to 1 thread for all peerconnections.  4 is probably fine (and in almost all cases, it'll really be 1 max).

for now... I don't love this, but it's better than crashing.  It'll probably leak (which is ok in this instance), and there's probably a TSAN violation here in theory.
Flags: needinfo?(rjesup)
Comment on attachment 8861219 [details]
Bug 1358889: check if mThread is around before dereferencing.

https://reviewboard.mozilla.org/r/133188/#review136030
Attachment #8861219 - Flags: review?(rjesup) → review+
See Also: → 1359623
(In reply to Randell Jesup [:jesup] from comment #6)
> I think the "right" solution is to convert all of this to the (newer)
> SharedThreadPool and TaskQueue implementations.

Created bug 1359623 for that.
Assignee: nobody → drno
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/498d5f1c4aea
check if mThread is around before dereferencing. r=jesup
https://hg.mozilla.org/mozilla-central/rev/498d5f1c4aea
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Please request Beta approval on this when you get a chance.
Flags: needinfo?(drno)
Comment on attachment 8861219 [details]
Bug 1358889: check if mThread is around before dereferencing.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1341374
[User impact if declined]: Fx can crash in WebRTC calls if e10s is on.
[Is this code covered by automated tests?]: In general the code is covered by lots of mochitests, but apparently none of them have hit this particular shutdown problem.
[Has the fix been verified in Nightly?]: No none steps to repro. Too short time to have reliable numbers from crash stats either.
[Needs manual test from QE? If yes, steps to reproduce]: No, see above.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: The patch only adds an extra reference check to prevent null pointer calls being made.
[String changes made/needed]: N/A
Flags: needinfo?(drno)
Attachment #8861219 - Flags: approval-mozilla-beta?
Comment on attachment 8861219 [details]
Bug 1358889: check if mThread is around before dereferencing.

Fix a crash. Beta54+. Should be in 54 beta 4.
Attachment #8861219 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Nils Ohlmeier [:drno] from comment #12)
> [Feature/Bug causing the regression]: Bug 1341374
> [User impact if declined]: Fx can crash in WebRTC calls if e10s is on.
> [Is this code covered by automated tests?]: In general the code is covered
> by lots of mochitests, but apparently none of them have hit this particular
> shutdown problem.
> [Has the fix been verified in Nightly?]: No none steps to repro. Too short
> time to have reliable numbers from crash stats either.
> [Needs manual test from QE? If yes, steps to reproduce]: No, see above.

Setting qe-verify- based on Nils' assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.