Closed Bug 1030372 Opened 11 years ago Closed 10 years ago

Don't use WrapRunnable for Runnables with thread-specific release requirements

Categories

(Core :: WebRTC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox30 --- wontfix
firefox31 --- wontfix
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 --- wontfix
firefox37 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- wontfix
b2g-v1.3 --- wontfix
b2g-v1.3T --- wontfix
b2g-v1.4 --- wontfix
b2g-v2.0 --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.2 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [webrtc-uplift])

Attachments

(1 file, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #986762 +++ While researching while we're getting a hit or two a month on the above closed bug, I realized that the likely cause we a refcounting fubar in RUN_ON_THREAD() Scenario: Thread A: create runnable holding reference to resource to be destroyed only on Thread B RUN_ON_THREAD(thread_B, runnable); Boom The reason is that the runnable can complete before RUN_ON_THREAD does, and RUN_ON_THREAD puts the runnable in an nsCOMPtr<>, which it then passes to thread_B->Dispatch(runnable, ...). That takes a *new* reference to the runnable, and the nsCOMPtr still holds one, which it releases later - and maybe after the runnable has completed on thread_B, as mentioned above. If the runnable has a resource not meant to be released on Thread_A, you have problems. DataChannelConnection::ReadBlob() is an example, and the reference to DataChannelConnection. A much smaller issue is that if IsOnCurrentThread() fails (very unlikely), the NS_ENSURE_TRUE(rv, rv) will cause the runnable to be released on thread_A, which is not guaranteed safe. In this case, we should just purposely leak the runnable to be safe. The main fix is likely to do thread->Dispatch(runnable.forget()...)
Any nsIRunnable subclass has to be able to deal with the destructor running on either the sending or the receiving thread.
Yes, quite. Anything in the runnable destructor could happen on either thread. If you need to guarantee that it runs on the target thread, you need to do it in the Run() method.
s(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > Yes, quite. Anything in the runnable destructor could happen on either > thread. If you need to guarantee that it runs on the target thread, you need > to do it in the Run() method. Aha. Another footgun to trip over which is non-obvious. I note that NS_ProxyRelease avoids this particular problem by (without comment!) making mDoomed a raw ptr, and explicitly releasing in Run(). It's probably not as surprising for explicit Runnable subclasses, since you typically had a local nsRefPtr<> to hold the runnable, and it survives past Dispatch() or NS_DispatchToMainThread() - they makes it at least knowable that it might get released on that thread, even if it's unlikely. WrapRunnable(nsRefPtr<foo>(already_addrefed_object),..) would *seem* to avoid the problem, but the runnable can still die in this thread. The crashes are safe in that we use a opt/release runtime assert (ASSERT_WEBRTC) in ~DataChannelConnection(). Unmarking sec.
Group: core-security
Depends on: 986762
Summary: RUN_ON_THREAD can release/delete runnables on the wrong (sending) thread → Don't use WrapRunnable for Runnables with thread-specific release requirements
Attachment #8446205 - Attachment is obsolete: true
Attachment #8446369 - Flags: review?(benjamin)
Comment on attachment 8446369 [details] [diff] [review] use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert I don't understand the constraints here, or why mConnection is a raw pointer. Why can't it be a nsCOMPtr but we explicitly null it out in Run()? The checks in RunOnThreadInternal for "IsOnCurrentThread" are very surprising in general. How is it that we don't know which thread we're dispatching events from? And also whether the target thread is still alive? Thread shutdown should almost always be synchronized externally.
Flags: needinfo?(rjesup)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Comment on attachment 8446369 [details] [diff] [review] > use explicit runnable and Release for ReadBlob to avoid wrong-thread release > assert > > I don't understand the constraints here, or why mConnection is a raw pointer. > > Why can't it be a nsCOMPtr but we explicitly null it out in Run()? One of the constraints of DataChannelConnection is that we have to destroy it on either STS or MainThread - it has a number of items it's responsible for, several of which are STS-only, and others rely on implicit used-on-mainthread locking to avoid tons of lock(mLock)'s to cover a very-edge case. I'll upload a cleaner patch, now that I'm fully awake. > The checks in RunOnThreadInternal for "IsOnCurrentThread" are very > surprising in general. How is it that we don't know which thread we're > dispatching events from? And also whether the target thread is still alive? > Thread shutdown should almost always be synchronized externally. RUN_ON_THREAD() is used from all sorts of different threads; I think it could be simplified (and the API made less more in keeping with the rest of the system - especially the name) and the functionality of that and WrapRunnable merged into xpcom (and stop people including stuff from mtransport in a bunch of places).
Flags: needinfo?(rjesup)
I think this is more reasonable, now that I'm fully awake, and has cleaner ownership.
Attachment #8446369 - Attachment is obsolete: true
Attachment #8446369 - Flags: review?(benjamin)
Attachment #8446621 - Flags: review?(benjamin)
Comment on attachment 8446621 [details] [diff] [review] use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert >diff --git a/netwerk/sctp/datachannel/DataChannel.cpp b/netwerk/sctp/datachannel/DataChannel.cpp >+class DataChannelBlobSendRunnable : public nsRunnable >+{ >+public: >+ DataChannelBlobSendRunnable(already_AddRefed<DataChannelConnection>& aConnection, >+ uint16_t aStream) >+ : mConnection(aConnection) >+ , mStream(aStream) {} >+ >+ ~DataChannelBlobSendRunnable() >+ { >+ if (!NS_IsMainThread()) { >+ // explicitly leak the connection if destroyed off mainthread >+ unused << mConnection.forget().take(); This should MOZ_ASSERT, since there is no good condition where this should happen. Keeping the release-mode leak is ok for safety. >+ } >+ } >+ >+ NS_IMETHODIMP Run() >+ { >+ ASSERT_WEBRTC(NS_IsMainThread()); >+ >+ mConnection->SendBinaryMsg(mStream, mData); If mConnection must always be released on this thread, you should null it out here rather than relying on destructor ordering. r=me with those changes
Attachment #8446621 - Flags: review?(benjamin) → review+
blocking-b2g: --- → 2.2?
Priority: -- → P1
FYI: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=70f9be304cbd (with nits addressed). This was in a patch queue for OpenH264, and after that landed I missed the r+ for this (I've since added filters to make review results pop to the top of my bugmail).
Sorry Randell, this was trapped between Seth's bustage. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3cc2c90893
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Not sure if this'll meet the bar for b2g32 approval at this point, but might as well nominate it along with beta and b2g34. (It's not going to get approved for Aurora in time for today's uplift, so might as well go straight to a beta request).
Flags: needinfo?(rjesup)
Yeah. It's a timing issue, but not common. And it's a safe crash in release builds.
Flags: needinfo?(rjesup)
Clearing the blocking nom for 2.2? as this is already fixed/verified on that branch per the status flag
blocking-b2g: 2.2? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: