Don't use WrapRunnable for Runnables with thread-specific release requirements

RESOLVED FIXED in Firefox 37

Status

()

defect
P1
critical
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: jesup, Assigned: jesup)

Tracking

({crash, intermittent-failure})

Trunk
mozilla37
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox30 wontfix, firefox31 wontfix, firefox32 wontfix, firefox33 wontfix, firefox34 wontfix, firefox35 wontfix, firefox36 wontfix, firefox37 fixed, firefox-esr24 unaffected, firefox-esr31 wontfix, b2g-v1.3 wontfix, b2g-v1.3T wontfix, b2g-v1.4 wontfix, b2g-v2.0 wontfix, b2g-v2.1 wontfix, b2g-v2.2 fixed)

Details

(Whiteboard: [webrtc-uplift])

Attachments

(1 attachment, 2 obsolete attachments)

+++ This bug was initially created as a clone of Bug #986762 +++

While researching while we're getting a hit or two a month on the above closed bug, I realized that the likely cause we a refcounting fubar in RUN_ON_THREAD()

Scenario:
Thread A:
create runnable holding reference to resource to be destroyed only on Thread B
RUN_ON_THREAD(thread_B, runnable);

Boom

The reason is that the runnable can complete before RUN_ON_THREAD does, and RUN_ON_THREAD puts the runnable in an nsCOMPtr<>, which it then passes to thread_B->Dispatch(runnable, ...).  That takes a *new* reference to the runnable, and the nsCOMPtr still holds one, which it releases later - and maybe after the runnable has completed on thread_B, as mentioned above.

If the runnable has a resource not meant to be released on Thread_A, you have problems.  DataChannelConnection::ReadBlob() is an example, and the reference to DataChannelConnection.

A much smaller issue is that if IsOnCurrentThread() fails (very unlikely), the NS_ENSURE_TRUE(rv, rv) will cause the runnable to be released on thread_A, which is not guaranteed safe.  In this case, we should just purposely leak the runnable to be safe.

The main fix is likely to do thread->Dispatch(runnable.forget()...)
Any nsIRunnable subclass has to be able to deal with the destructor running on either the sending or the receiving thread.

Comment 2

5 years ago
Yes, quite. Anything in the runnable destructor could happen on either thread. If you need to guarantee that it runs on the target thread, you need to do it in the Run() method.
s(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Yes, quite. Anything in the runnable destructor could happen on either
> thread. If you need to guarantee that it runs on the target thread, you need
> to do it in the Run() method.

Aha.  Another footgun to trip over which is non-obvious.  I note that NS_ProxyRelease avoids this particular problem by (without comment!) making mDoomed a raw ptr, and explicitly releasing in Run().

It's probably not as surprising for explicit Runnable subclasses, since you typically had a local nsRefPtr<> to hold the runnable, and it survives past Dispatch() or NS_DispatchToMainThread() - they makes it at least knowable that it might get released on that thread, even if it's unlikely.  WrapRunnable(nsRefPtr<foo>(already_addrefed_object),..) would *seem* to avoid the problem, but the runnable can still die in this thread.

The crashes are safe in that we use a opt/release runtime assert (ASSERT_WEBRTC) in ~DataChannelConnection().  Unmarking sec.
Group: core-security
Depends on: 986762
Summary: RUN_ON_THREAD can release/delete runnables on the wrong (sending) thread → Don't use WrapRunnable for Runnables with thread-specific release requirements
Attachment #8446205 - Attachment is obsolete: true
Attachment #8446369 - Flags: review?(benjamin)

Comment 7

5 years ago
Comment on attachment 8446369 [details] [diff] [review]
use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert

I don't understand the constraints here, or why mConnection is a raw pointer.

Why can't it be a nsCOMPtr but we explicitly null it out in Run()?

The checks in RunOnThreadInternal for "IsOnCurrentThread" are very surprising in general. How is it that we don't know which thread we're dispatching events from? And also whether the target thread is still alive? Thread shutdown should almost always be synchronized externally.
Flags: needinfo?(rjesup)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> Comment on attachment 8446369 [details] [diff] [review]
> use explicit runnable and Release for ReadBlob to avoid wrong-thread release
> assert
> 
> I don't understand the constraints here, or why mConnection is a raw pointer.
> 
> Why can't it be a nsCOMPtr but we explicitly null it out in Run()?

One of the constraints of DataChannelConnection is that we have to destroy it on either STS or MainThread - it has a number of items it's responsible for, several of which are STS-only, and others rely on implicit used-on-mainthread locking to avoid tons of lock(mLock)'s to cover a very-edge case. 

I'll upload a cleaner patch, now that I'm fully awake.

> The checks in RunOnThreadInternal for "IsOnCurrentThread" are very
> surprising in general. How is it that we don't know which thread we're
> dispatching events from? And also whether the target thread is still alive?
> Thread shutdown should almost always be synchronized externally.

RUN_ON_THREAD() is used from all sorts of different threads; I think it could be simplified (and the API made less more in keeping with the rest of the system - especially the name) and the functionality of that and WrapRunnable merged into xpcom (and stop people including stuff from mtransport in a bunch of places).
Flags: needinfo?(rjesup)
I think this is more reasonable, now that I'm fully awake, and has cleaner ownership.
Attachment #8446369 - Attachment is obsolete: true
Attachment #8446369 - Flags: review?(benjamin)
Attachment #8446621 - Flags: review?(benjamin)
Comment on attachment 8446621 [details] [diff] [review]
use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert

>diff --git a/netwerk/sctp/datachannel/DataChannel.cpp b/netwerk/sctp/datachannel/DataChannel.cpp

>+class DataChannelBlobSendRunnable : public nsRunnable
>+{
>+public:
>+  DataChannelBlobSendRunnable(already_AddRefed<DataChannelConnection>& aConnection,
>+                              uint16_t aStream)
>+    : mConnection(aConnection)
>+    , mStream(aStream) {}
>+
>+  ~DataChannelBlobSendRunnable()
>+  {
>+    if (!NS_IsMainThread()) {
>+      // explicitly leak the connection if destroyed off mainthread
>+      unused << mConnection.forget().take();

This should MOZ_ASSERT, since there is no good condition where this should happen. Keeping the release-mode leak is ok for safety.

>+    }
>+  }
>+
>+  NS_IMETHODIMP Run()
>+  {
>+    ASSERT_WEBRTC(NS_IsMainThread());
>+
>+    mConnection->SendBinaryMsg(mStream, mData);

If mConnection must always be released on this thread, you should null it out here rather than relying on destructor ordering.

r=me with those changes
Attachment #8446621 - Flags: review?(benjamin) → review+

Updated

5 years ago
blocking-b2g: --- → 2.2?
Priority: -- → P1
FYI: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=70f9be304cbd (with nits addressed).

This was in a patch queue for OpenH264, and after that landed I missed the r+ for this (I've since added filters to make review results pop to the top of my bugmail).
Sorry Randell, this was trapped between Seth's bustage.
Backed out in
https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3cc2c90893
Relanded this in https://hg.mozilla.org/integration/mozilla-inbound/rev/238f460e7aed since it wasn't the cause of the bustage.
https://hg.mozilla.org/mozilla-central/rev/238f460e7aed
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Not sure if this'll meet the bar for b2g32 approval at this point, but might as well nominate it along with beta and b2g34. (It's not going to get approved for Aurora in time for today's uplift, so might as well go straight to a beta request).
Flags: needinfo?(rjesup)
Yeah.  It's a timing issue, but not common.  And it's a safe crash in release builds.
Flags: needinfo?(rjesup)
Clearing the blocking nom for 2.2? as this is already fixed/verified on that branch per the status flag
blocking-b2g: 2.2? → ---
You need to log in before you can comment on or make changes to this bug.