Closed Bug 1030372 Opened 11 years ago Closed 10 years ago

Don't use WrapRunnable for Runnables with thread-specific release requirements

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla37

Tracking Flags:

Tracking

Status

firefox30

---

wontfix

firefox31

---

wontfix

firefox32

---

wontfix

firefox33

---

wontfix

firefox34

---

wontfix

firefox35

---

wontfix

firefox36

---

wontfix

firefox37

---

fixed

firefox-esr24

---

unaffected

firefox-esr31

---

wontfix

b2g-v1.3

---

wontfix

b2g-v1.3T

---

wontfix

b2g-v1.4

---

wontfix

b2g-v2.0

---

wontfix

b2g-v2.1

---

wontfix

b2g-v2.2

---

fixed

Tracking

Status

firefox30

firefox31

firefox32

firefox33

firefox34

firefox35

firefox36

firefox37

firefox-esr24

firefox-esr31

b2g-v1.3

b2g-v1.3T

b2g-v1.4

b2g-v2.0

b2g-v2.1

b2g-v2.2

relnote-firefox

thunderbird_esr115

thunderbird_esr140

firefox-esr115

firefox-esr140

firefox144

firefox145

firefox146

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [webrtc-uplift])

Attachments

(1 file, 2 obsolete files)

don't hold a reference to a runnable in RUN_ON_THREAD after Dispatch() 11 years ago Randell Jesup [:jesup] (needinfo me) 1.39 KB, patch		Details \| Diff \| Splinter Review
use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert 11 years ago Randell Jesup [:jesup] (needinfo me) 5.16 KB, patch		Details \| Diff \| Splinter Review
use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert 11 years ago Randell Jesup [:jesup] (needinfo me) 5.91 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Description

•

11 years ago

+++ This bug was initially created as a clone of Bug #986762 +++ While researching while we're getting a hit or two a month on the above closed bug, I realized that the likely cause we a refcounting fubar in RUN_ON_THREAD() Scenario: Thread A: create runnable holding reference to resource to be destroyed only on Thread B RUN_ON_THREAD(thread_B, runnable); Boom The reason is that the runnable can complete before RUN_ON_THREAD does, and RUN_ON_THREAD puts the runnable in an nsCOMPtr<>, which it then passes to thread_B->Dispatch(runnable, ...). That takes a *new* reference to the runnable, and the nsCOMPtr still holds one, which it releases later - and maybe after the runnable has completed on thread_B, as mentioned above. If the runnable has a resource not meant to be released on Thread_A, you have problems. DataChannelConnection::ReadBlob() is an example, and the reference to DataChannelConnection. A much smaller issue is that if IsOnCurrentThread() fails (very unlikely), the NS_ENSURE_TRUE(rv, rv) will cause the runnable to be released on thread_A, which is not guaranteed safe. In this case, we should just purposely leak the runnable to be safe. The main fix is likely to do thread->Dispatch(runnable.forget()...)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 1

•

11 years ago

Any nsIRunnable subclass has to be able to deal with the destructor running on either the sending or the receiving thread.

Benjamin Smedberg

Comment 2

•

11 years ago

Yes, quite. Anything in the runnable destructor could happen on either thread. If you need to guarantee that it runs on the target thread, you need to do it in the Run() method.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 3

•

11 years ago

Attached patch don't hold a reference to a runnable in RUN_ON_THREAD after Dispatch() (obsolete) — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 4

•

11 years ago

s(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > Yes, quite. Anything in the runnable destructor could happen on either > thread. If you need to guarantee that it runs on the target thread, you need > to do it in the Run() method. Aha. Another footgun to trip over which is non-obvious. I note that NS_ProxyRelease avoids this particular problem by (without comment!) making mDoomed a raw ptr, and explicitly releasing in Run(). It's probably not as surprising for explicit Runnable subclasses, since you typically had a local nsRefPtr<> to hold the runnable, and it survives past Dispatch() or NS_DispatchToMainThread() - they makes it at least knowable that it might get released on that thread, even if it's unlikely. WrapRunnable(nsRefPtr<foo>(already_addrefed_object),..) would *seem* to avoid the problem, but the runnable can still die in this thread. The crashes are safe in that we use a opt/release runtime assert (ASSERT_WEBRTC) in ~DataChannelConnection(). Unmarking sec.

Group: core-security

Depends on: 986762

Summary: RUN_ON_THREAD can release/delete runnables on the wrong (sending) thread → Don't use WrapRunnable for Runnables with thread-specific release requirements

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 5

•

11 years ago

Attached patch use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert (obsolete) — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Attachment #8446205 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Attachment #8446369 - Flags: review?(benjamin)

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

status-b2g-v1.3: --- → affected

status-b2g-v1.3T: --- → affected

status-b2g-v1.4: --- → affected

status-b2g-v2.0: --- → affected

status-firefox30: --- → wontfix

status-firefox31: --- → affected

status-firefox32: --- → affected

status-firefox33: --- → affected

status-firefox-esr24: --- → unaffected

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 6

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=043b96374033

Benjamin Smedberg

Comment 7

•

11 years ago

Comment on attachment 8446369 [details] [diff] [review] use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert I don't understand the constraints here, or why mConnection is a raw pointer. Why can't it be a nsCOMPtr but we explicitly null it out in Run()? The checks in RunOnThreadInternal for "IsOnCurrentThread" are very surprising in general. How is it that we don't know which thread we're dispatching events from? And also whether the target thread is still alive? Thread shutdown should almost always be synchronized externally.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 8

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Comment on attachment 8446369 [details] [diff] [review] > use explicit runnable and Release for ReadBlob to avoid wrong-thread release > assert > > I don't understand the constraints here, or why mConnection is a raw pointer. > > Why can't it be a nsCOMPtr but we explicitly null it out in Run()? One of the constraints of DataChannelConnection is that we have to destroy it on either STS or MainThread - it has a number of items it's responsible for, several of which are STS-only, and others rely on implicit used-on-mainthread locking to avoid tons of lock(mLock)'s to cover a very-edge case. I'll upload a cleaner patch, now that I'm fully awake. > The checks in RunOnThreadInternal for "IsOnCurrentThread" are very > surprising in general. How is it that we don't know which thread we're > dispatching events from? And also whether the target thread is still alive? > Thread shutdown should almost always be synchronized externally. RUN_ON_THREAD() is used from all sorts of different threads; I think it could be simplified (and the API made less more in keeping with the rest of the system - especially the name) and the functionality of that and WrapRunnable merged into xpcom (and stop people including stuff from mtransport in a bunch of places).

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 9

•

11 years ago

Attached patch use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert — Details — Splinter Review

I think this is more reasonable, now that I'm fully awake, and has cleaner ownership.

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Attachment #8446369 - Attachment is obsolete: true

Attachment #8446369 - Flags: review?(benjamin)

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Attachment #8446621 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 10

•

11 years ago

Comment on attachment 8446621 [details] [diff] [review] use explicit runnable and Release for ReadBlob to avoid wrong-thread release assert >diff --git a/netwerk/sctp/datachannel/DataChannel.cpp b/netwerk/sctp/datachannel/DataChannel.cpp >+class DataChannelBlobSendRunnable : public nsRunnable >+{ >+public: >+ DataChannelBlobSendRunnable(already_AddRefed<DataChannelConnection>& aConnection, >+ uint16_t aStream) >+ : mConnection(aConnection) >+ , mStream(aStream) {} >+ >+ ~DataChannelBlobSendRunnable() >+ { >+ if (!NS_IsMainThread()) { >+ // explicitly leak the connection if destroyed off mainthread >+ unused << mConnection.forget().take(); This should MOZ_ASSERT, since there is no good condition where this should happen. Keeping the release-mode leak is ok for safety. >+ } >+ } >+ >+ NS_IMETHODIMP Run() >+ { >+ ASSERT_WEBRTC(NS_IsMainThread()); >+ >+ mConnection->SendBinaryMsg(mStream, mData); If mConnection must always be released on this thread, you should null it out here rather than relying on destructor ordering. r=me with those changes

Attachment #8446621 - Flags: review?(benjamin) → review+

Bobby Chien

Updated

•

11 years ago

blocking-b2g: --- → 2.2?

Priority: -- → P1

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 11

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7bd61e25e18f

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 12

•

10 years ago

FYI: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=70f9be304cbd (with nits addressed). This was in a patch queue for OpenH264, and after that landed I missed the r+ for this (I've since added filters to make review results pop to the top of my bugmail).

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

10 years ago

status-firefox31: affected → wontfix

status-firefox32: affected → wontfix

status-firefox33: affected → wontfix

status-firefox34: --- → wontfix

status-firefox35: --- → wontfix

status-firefox36: --- → affected

status-firefox37: --- → affected

status-firefox-esr31: --- → wontfix

Tom S. (please needinfo tschuster)

Comment 13

•

10 years ago

Sorry Randell, this was trapped between Seth's bustage. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3cc2c90893

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 14

•

10 years ago

Relanded this in https://hg.mozilla.org/integration/mozilla-inbound/rev/238f460e7aed since it wasn't the cause of the bustage.

Carsten Book [:Tomcat]

Comment 15

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/238f460e7aed

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla37

Ryan VanderMeulen [:RyanVM]

Comment 16

•

10 years ago

Not sure if this'll meet the bar for b2g32 approval at this point, but might as well nominate it along with beta and b2g34. (It's not going to get approved for Aurora in time for today's uplift, so might as well go straight to a beta request).

status-b2g-v1.3: affected → wontfix

status-b2g-v1.3T: affected → wontfix

status-b2g-v1.4: affected → wontfix

status-b2g-v2.1: --- → affected

status-b2g-v2.2: --- → fixed

status-firefox37: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 17

•

10 years ago

Yeah. It's a timing issue, but not common. And it's a safe crash in release builds.

Flags: needinfo?(rjesup)

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

status-b2g-v2.0: affected → wontfix

status-b2g-v2.1: affected → wontfix

status-firefox36: affected → wontfix

bhavana bajaj [:bajaj]

Comment 18

•

10 years ago

Clearing the blocking nom for 2.2? as this is already fixed/verified on that branch per the status flag

blocking-b2g: 2.2? → ---

You need to log in before you can comment on or make changes to this bug.