830100 - Allow destruction of TransportFlows off the STS thread

Assignee

Description

•

13 years ago

Currently it is unsafe to Release() a TFlow or TLayer off the STS thread because if you are the last holder, you will destroy it and the dtor is not thread-safe. This needs some kind of fixing.

Randell Jesup [:jesup] (needinfo me)

Comment 1

•

13 years ago

When a fix for this lands, we'll need to back out fixes like bug 827878

Blocks: 827878

Eric Rescorla (:ekr)

Assignee

Updated

•

13 years ago

Summary: All destruction of TransportFlows off the STS thread → Allow destruction of TransportFlows off the STS thread

Randell Jesup [:jesup] (needinfo me)

Comment 2

•

13 years ago

Nice to clean up, but wouldn't block going to release

Assignee: nobody → ekr

Priority: -- → P3

Whiteboard: [webrtc][blocking-webrtc-]

Maire Reavy [:mreavy]

Updated

•

13 years ago

Whiteboard: [webrtc][blocking-webrtc-] → [WebRTC][blocking-webrtc+]

Randell Jesup [:jesup] (needinfo me)

Updated

•

13 years ago

Blocks: 842171

Eric Rescorla (:ekr)

Assignee

Comment 3

•

13 years ago

Attached patch Refactor transport flow to allow destruction on any thread (obsolete) — Details — Splinter Review

Eric Rescorla (:ekr)

Assignee

Comment 4

•

13 years ago

Comment on attachment 715440 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 715440 [details] [diff] [review]: ----------------------------------------------------------------- Tim, Adam, This patch is a compromise between the right thing (complete thread safety) and the current thing (no thread safety). It also adds a bunch of asserts to try to detect API misuse. Aside from the usual code review, I would be interested in your opinion of whether this hits the right balance.

Attachment #715440 - Flags: review?(tterribe)

Attachment #715440 - Flags: review?(adam)

Adam Roach [:abr]

Comment 5

•

13 years ago

Comment on attachment 715440 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 715440 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me. Nits inline. ::: media/mtransport/test/transport_unittests.cpp @@ +193,5 @@ > > flow_->SignalPacketReceived.connect(this, &TransportTestPeer::PacketReceived); > } > > + void ConnectSocket(TransportTestPeer *peer) { Now that this is just a thunk to call ConnectSocket_s on the proper thread, I think you want to change the "Dispatch" calls in TransportTest::ConnectSocket and TransportTest::ConnectSocketExpectFail to straight function calls. @@ +323,5 @@ > } > > + TransportLayer::State state() { > + TransportLayer::State tstate; > + WS @@ +327,5 @@ > + > + RUN_ON_THREAD(test_utils->sts_target(), > + WrapRunnableRet(flow_, &TransportFlow::state, &tstate), > + NS_DISPATCH_SYNC); > + WS ::: media/mtransport/transportflow.cpp @@ +21,5 @@ > + if (!CheckThreadInt()) { > + MOZ_ASSERT(SignalStateChange.is_empty()); > + MOZ_ASSERT(SignalPacketReceived.is_empty()); > + } > + if (!layers_.empty()) { Perhaps add a comment that the use of layers_ here is unprotected, and why that's generally going to be okay (either that or be stronger in the header file comment about calling the destructor off the target thread not being safe except in the narrow exceptional case it describes). @@ +46,5 @@ > + // Enforce that if any of the layers have a thread binding, > + // they all have the same binding. > + if (target_) { > + const nsCOMPtr<nsIEventTarget>& lthread = layer->GetThread(); > + WS ::: media/mtransport/transportflow.h @@ +22,5 @@ > // Generally, one reads and writes to the top layer. > + > +// This code has a confusing hybrid threading model which > +// probably needs some eventual refactoring. > +// Please add a bug # for the refactoring effort. @@ +109,5 @@ > void StateChange(TransportLayer *layer, TransportLayer::State state); > void PacketReceived(TransportLayer* layer, const unsigned char *data, > size_t len); > + static void DestroyLayers(std::deque<TransportLayer *> layers); > + WS ::: media/mtransport/transportlayer.h @@ +84,5 @@ > > + // Check if we are on the right thread > + void CheckThread() { > + NS_ABORT_IF_FALSE(CheckThreadInt(), "Wrong thread"); > + } Why did this move to public? I can't find where it's used... @@ +85,5 @@ > + // Check if we are on the right thread > + void CheckThread() { > + NS_ABORT_IF_FALSE(CheckThreadInt(), "Wrong thread"); > + } > + WS ::: media/mtransport/transportlayerloopback.h @@ -129,5 @@ > nsresult QueuePacket(const unsigned char *data, size_t len); > > TransportLayerLoopback* peer_; > nsCOMPtr<nsITimer> timer_; > - nsCOMPtr<nsIEventTarget> target_; I presume this is just simple cleanup, unrelated to the bug -- right?

Attachment #715440 - Flags: review?(adam) → review+

Adam Roach [:abr]

Comment 6

•

13 years ago

Right, so on the broader opinion that you solicited regarding the approach: I think this should be fine until we get around to a more comprehensive refactor, although (as I indicated in my review), I think I'd prefer to see a little more emphatic documentation around the compromise of allowing off-thread access to the layers deque in the flow destructor.

Timothy B. Terriberry (:derf)

Comment 7

•

13 years ago

Comment on attachment 715440 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 715440 [details] [diff] [review]: ----------------------------------------------------------------- This is not too terrible (I say without really understanding all the places a TransportFlow can be derefed), but it all seems a little more complicated than it needs to be. AFAIK the only thread that any flow can be bound to is the STS thread, and baking that assumption in would allow you to get rid of all the target_ tracking code. ::: media/mtransport/transportflow.cpp @@ +23,5 @@ > + MOZ_ASSERT(SignalPacketReceived.is_empty()); > + } > + if (!layers_.empty()) { > + RUN_ON_THREAD(target_, > + WrapRunnableNM(&TransportFlow::DestroyLayers,layers_), So, this copies layers_ by value... multiple times... which is actually a fairly expensive operation. In libstdc++ this involves 8 0.5 kB allocations at a minimum, in addition to copying the whole list. I guess I'm confused why you're using a deque in the first place... you only ever insert items on the front, and never remove them. The only other operation supported is searching for a layer with a specific id. You could simply provide a getter for TransportLayer::downward_ and remove the need for the deque entirely. ::: media/mtransport/transportflow.h @@ +41,5 @@ > +// restriction by thread-locking the signals, but previous > +// attempts have caused deadlocks. > +// > +// Most of these invariants are enforced by hard asserts > +// (i.e., those which fire even in production builds). AFAICT, only the MOZ_CRASH in PushLayer is such a "hard assert". NS_ABORT_IF_FALSE does not abort in debug builds, and neither does MOZ_ASSERT. ::: media/mtransport/transportlayer.h @@ +102,5 @@ > // Return the layer id for this layer > virtual const std::string id() = 0; > > // The id of the flow > + virtual const std::string& flow_id() { This doesn't need to be virtual, does it? Presumably a sub-class would just set flow_id instead of overriding.

Attachment #715440 - Flags: review?(tterribe) → review+

Eric Rescorla (:ekr)

Assignee

Comment 8

•

13 years ago

Attached patch Refactor transport flow to allow destruction on any thread (obsolete) — Details — Splinter Review

Eric Rescorla (:ekr)

Assignee

Updated

•

13 years ago

Attachment #715440 - Attachment is obsolete: true

Eric Rescorla (:ekr)

Assignee

Comment 9

•

13 years ago

Comment on attachment 715440 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 715440 [details] [diff] [review]: ----------------------------------------------------------------- derf: Here is a revised patch. I kept the deque and the thread tracking because I am hoping to use the thread tracking for futureproofing. there is a question below about whether these fails should be hard or not.... ::: media/mtransport/transportflow.cpp @@ +23,5 @@ > + MOZ_ASSERT(SignalPacketReceived.is_empty()); > + } > + if (!layers_.empty()) { > + RUN_ON_THREAD(target_, > + WrapRunnableNM(&TransportFlow::DestroyLayers,layers_), I agree that would work. I guess this is a taste issue, since I think it's nice to have it all in one place. Also, if/when I change these to be RefPtrs, it will make ownership clearer. But let's do this: I will change it from being a deque being a ScopedDeletePtr. This removes the copy. ::: media/mtransport/transportflow.h @@ +41,5 @@ > +// restriction by thread-locking the signals, but previous > +// attempts have caused deadlocks. > +// > +// Most of these invariants are enforced by hard asserts > +// (i.e., those which fire even in production builds). You mean production builds, right? Anyway, do you think we should fix the comment or make these hard crashes? ::: media/mtransport/transportlayer.h @@ +102,5 @@ > // Return the layer id for this layer > virtual const std::string id() = 0; > > // The id of the flow > + virtual const std::string& flow_id() { Fair enough. ::: media/mtransport/transportlayerloopback.h @@ -129,5 @@ > nsresult QueuePacket(const unsigned char *data, size_t len); > > TransportLayerLoopback* peer_; > nsCOMPtr<nsITimer> timer_; > - nsCOMPtr<nsIEventTarget> target_; actually, it was shadowing the same member in TLayer.

Eric Rescorla (:ekr)

Assignee

Updated

•

13 years ago

Attachment #717929 - Flags: review?(tterribe)

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

13 years ago

Comment on attachment 717929 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 717929 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/mtransport/test/transport_unittests.cpp @@ +149,5 @@ > > + void DisconnectDestroyFlow() { > + loopback_->Disconnect(); > + disconnect_all(); // Disconnect from the signals; > + flow_ = nullptr; indent @@ +151,5 @@ > + loopback_->Disconnect(); > + disconnect_all(); // Disconnect from the signals; > + flow_ = nullptr; > + } > + WS @@ +329,5 @@ > + > + RUN_ON_THREAD(test_utils->sts_target(), > + WrapRunnableRet(flow_, &TransportFlow::state, &tstate), > + NS_DISPATCH_SYNC); > + WS ::: media/mtransport/transportflow.h @@ +43,5 @@ > +// restriction by thread-locking the signals, but previous > +// attempts have caused deadlocks. > +// > +// Most of these invariants are enforced by hard asserts > +// (i.e., those which fire even in production builds). I think we should make critical assertions like this hard asserts in production code at least through Aurora and likely Beta, though some/many/most of them should be ones we can disable in the future if we want (in production, or after we've been in production and clean for a cycle or two). See bug 843695 for an example; you can key off the same MOZ_WEBRTC_ASSERT_ALWAYS. Ones not on main performance paths may be more reasonable to make MOZ_CRASH (or to back up a MOZ_ASSERT with failure returns).

Randell Jesup [:jesup] (needinfo me)

Updated

•

13 years ago

Blocks: 835283

Randell Jesup [:jesup] (needinfo me)

Comment 11

•

13 years ago

We should get this one reviewed and landed, as it will fix real race conditions we have (see bug 835283)

Maire Reavy [:mreavy]

Comment 13

•

13 years ago

Updating the priority based on what we now know (should have done this in February when I made this a blocker bug).

Priority: P3 → P1

Timothy B. Terriberry (:derf)

Comment 14

•

13 years ago

Comment on attachment 717929 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 717929 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Eric Rescorla (:ekr) from comment #9) > You mean production builds, right? Err, yes. That. > Anyway, do you think we should fix the comment or make these hard crashes? That depends on how confident you are we won't violate them in production. Given the level of review I've done, I would personally make them hard crashes instead of fixing the comment. Anyway, this version of the patch leaks, and bug 844493 has bitrotted it. Please post a new version and ask for re-review. ::: media/mtransport/transportflow.cpp @@ +32,5 @@ > + WrapRunnableNM(&TransportFlow::DestroyLayers, layers_.forget()), > + NS_DISPATCH_NORMAL); > +} > + > +void TransportFlow::DestroyLayers(std::deque<TransportLayer *>* layers) { This doesn't appear to be taking a ScopedDeletePointer (or any other kind of smart pointer), so this will leak the deque. @@ +46,5 @@ > nsresult rv = layer->Init(); > if (!NS_SUCCEEDED(rv)) > return rv; > > + // Enforce that if any of the layers have a thread binding, This has bitrotted w.r.t. bug 844493. This check now needs to be in PushLayers() as well.

Attachment #717929 - Flags: review?(tterribe) → review-

Eric Rescorla (:ekr)

Assignee

Comment 15

•

13 years ago

Attached patch Refactor transport flow to allow destruction on any thread (obsolete) — Details — Splinter Review

Eric Rescorla (:ekr)

Assignee

Updated

•

13 years ago

Attachment #717929 - Attachment is obsolete: true

Eric Rescorla (:ekr)

Assignee

Comment 16

•

13 years ago

Comment on attachment 731896 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 731896 [details] [diff] [review]: ----------------------------------------------------------------- I believe this addresses's derf's comments, but I'm not operating at 100% this morning.

Attachment #731896 - Flags: review?(tterribe)

Timothy B. Terriberry (:derf)

Comment 17

•

13 years ago

Comment on attachment 731896 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 731896 [details] [diff] [review]: ----------------------------------------------------------------- r+ with a few nits and one important question. ::: media/mtransport/test/transport_unittests.cpp @@ +151,5 @@ > + loopback_->Disconnect(); > + disconnect_all(); // Disconnect from the signals; > + flow_ = nullptr; > + } > + WS. @@ +326,5 @@ > } > > + TransportLayer::State state() { > + TransportLayer::State tstate; > + WS. @@ +330,5 @@ > + > + RUN_ON_THREAD(test_utils->sts_target(), > + WrapRunnableRet(flow_, &TransportFlow::state, &tstate), > + NS_DISPATCH_SYNC); > + WS. ::: media/mtransport/transportflow.cpp @@ +58,5 @@ > } > > nsresult rv = layer->Init(); > if (!NS_SUCCEEDED(rv)) { > // Set ourselves to have failed. I should have asked this in bug 844493, but unlike PushLayers(), we don't destroy all existing layers on error. Should we? Otherwise it seems like we run the risk of that layer changing state on its own and overriding the error state. Feel free to address this in a new bug. @@ +132,5 @@ > } > > // Now destroy the rest of the flow, because it's no longer > // in an acceptable state. > + while (!layers_->empty()) { It doesn't save much code, but it seems like DestroyLayers() could be re-used here if it also cleared the queue.

Attachment #731896 - Flags: review?(tterribe) → review+

Eric Rescorla (:ekr)

Assignee

Comment 18

•

13 years ago

Attached patch Refactor transport flow to allow destruction on any thread — Details — Splinter Review

Eric Rescorla (:ekr)

Assignee

Updated

•

13 years ago

Attachment #731896 - Attachment is obsolete: true

Eric Rescorla (:ekr)

Assignee

Comment 19

•

13 years ago

Comment on attachment 734302 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 734302 [details] [diff] [review]: ----------------------------------------------------------------- Tim, I did a lot of cleanup here and added some more tests.

Attachment #734302 - Flags: review?(tterribe)

Maire Reavy [:mreavy]

Updated

•

13 years ago

Whiteboard: [WebRTC][blocking-webrtc+] → [WebRTC][blocking-webrtc+][webrtc-uplift]

Timothy B. Terriberry (:derf)

Comment 20

•

13 years ago

Comment on attachment 734302 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 734302 [details] [diff] [review]: ----------------------------------------------------------------- r=me, also thanks for adding more tests. ::: media/mtransport/transportflow.cpp @@ +60,5 @@ > } > > nsresult TransportFlow::PushLayer(TransportLayer *layer) { > + CheckThread(); > + ScopedDeletePtr<TransportLayer> layer_tmp(layer); // Destroy on failure. Nice catch!

Attachment #734302 - Flags: review?(tterribe) → review+

Eric Rescorla (:ekr)

Assignee

Comment 21

•

13 years ago

Comment on attachment 734302 [details] [diff] [review] Refactor transport flow to allow destruction on any thread Review of attachment 734302 [details] [diff] [review]: ----------------------------------------------------------------- Adam, I am having trouble committing from this machine.

Attachment #734302 - Flags: checkin?(adam)

Randell Jesup [:jesup] (needinfo me)

Comment 22

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5abbf6fd5466

status-firefox22: --- → affected

status-firefox23: --- → affected

Target Milestone: --- → mozilla22

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Attachment #734302 - Flags: checkin?(adam) → checkin+

Randell Jesup [:jesup] (needinfo me)

Comment 23

•

12 years ago

Backout due to orange (sctp_unittests fail to exit): https://hg.mozilla.org/integration/mozilla-inbound/rev/75406eaf8351 https://tbpl.mozilla.org/php/getParsedLog.php?id=21611268&tree=Mozilla-Inbound

Randell Jesup [:jesup] (needinfo me)

Comment 24

•

12 years ago

Attached patch Avoid deadlocks by sending packets asynchronously (obsolete) — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Comment 25

•

12 years ago

Comment on attachment 735840 [details] [diff] [review] Avoid deadlocks by sending packets asynchronously https://tbpl.mozilla.org/?tree=Try&rev=8f04baa38ba1 Could use the flow argument instead of flow_ in SendPacket_s() I assume, but really doesn't matter.

Attachment #735840 - Flags: review?(ekr)

Randell Jesup [:jesup] (needinfo me)

Comment 26

•

12 years ago

Attached patch Avoid deadlocks by sending packets asynchronously — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Attachment #735840 - Attachment is obsolete: true

Attachment #735840 - Flags: review?(ekr)

Randell Jesup [:jesup] (needinfo me)

Comment 27

•

12 years ago

Comment on attachment 735908 [details] [diff] [review] Avoid deadlocks by sending packets asynchronously Switch to static function per IRC

Attachment #735908 - Flags: review?(ekr)

Eric Rescorla (:ekr)

Assignee

Comment 28

•

12 years ago

Comment on attachment 735908 [details] [diff] [review] Avoid deadlocks by sending packets asynchronously Review of attachment 735908 [details] [diff] [review]: ----------------------------------------------------------------- lgtm with nit below. ::: media/mtransport/test/sctp_unittest.cpp @@ +196,5 @@ > int sent() const { return sent_; } > int received() const { return received_; } > bool connected() const { return connected_; } > > + static TransportResult SendPacket_s(const unsigned char* data, size_t len, TransportFlow *flow) { I think this would be better as a RefPtr or a const RefPtr& so it's clear what the semantics are.

Attachment #735908 - Flags: review?(ekr) → review+

Randell Jesup [:jesup] (needinfo me)

Comment 29

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/18282c2c682d https://hg.mozilla.org/integration/mozilla-inbound/rev/63f2b10a44e2 Green try (of the previous sctp_unittest orange) at https://tbpl.mozilla.org/?tree=Try&rev=8f04baa38ba1

Ryan VanderMeulen [:RyanVM]

Comment 30

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/18282c2c682d https://hg.mozilla.org/mozilla-central/rev/63f2b10a44e2

Status: NEW → RESOLVED

Closed: 12 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: mozilla22 → mozilla23

Jason Smith [:jsmith]

Updated

•

12 years ago

Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift] → [WebRTC][blocking-webrtc+][webrtc-uplift][qa-]

Jason Smith [:jsmith]

Updated

•

12 years ago

status-firefox23: affected → fixed

Randell Jesup [:jesup] (needinfo me)

Comment 31

•

12 years ago

Comment on attachment 734302 [details] [diff] [review] Refactor transport flow to allow destruction on any thread [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Random failures and timing-related crashes. This bug or some equivalent is needed on Aurora for stability. Testing completed (on m-c, etc.): on m-c, and will be getting a lot of testing and fuzzing. Risk to taking this patch (and alternatives if risky): Moderately low - it should only make major changes in the flow when a release would have happened off the STS thread - which would have been a major problem. String or IDL/UUID changes made by this patch: none

Attachment #734302 - Flags: approval-mozilla-aurora?

Randell Jesup [:jesup] (needinfo me)

Comment 33

•

12 years ago

Comment on attachment 735908 [details] [diff] [review] Avoid deadlocks by sending packets asynchronously [Approval Request Comment] See other patch in this bug; this needs to be included with it (bustage fix to unit tests)

Attachment #735908 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox22: --- → +

Alex Keybl [:akeybl]

Comment 34

•

12 years ago

Comment on attachment 734302 [details] [diff] [review] Refactor transport flow to allow destruction on any thread We'll also track for release, given this is blocking-webrtc+

Attachment #734302 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #735908 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Randell Jesup [:jesup] (needinfo me)

Comment 35

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/73b137330c23 https://hg.mozilla.org/releases/mozilla-aurora/rev/b6d86fda0787

status-firefox22: affected → fixed

Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift][qa-] → [WebRTC][blocking-webrtc+][qa-]

Target Milestone: mozilla23 → mozilla22

Refactor transport flow to allow destruction on any thread 13 years ago Eric Rescorla (:ekr) 15.72 KB, patch	derf : review+ abr : review+	Details \| Diff \| Splinter Review
Refactor transport flow to allow destruction on any thread 13 years ago Eric Rescorla (:ekr) 16.97 KB, patch	derf : review-	Details \| Diff \| Splinter Review
Refactor transport flow to allow destruction on any thread 13 years ago Eric Rescorla (:ekr) 19.55 KB, patch	derf : review+	Details \| Diff \| Splinter Review
Refactor transport flow to allow destruction on any thread 13 years ago Eric Rescorla (:ekr) 26.07 KB, patch	derf : review+ akeybl : approval-mozilla-aurora+ jesup : checkin+	Details \| Diff \| Splinter Review
Avoid deadlocks by sending packets asynchronously 12 years ago Randell Jesup [:jesup] (needinfo me) 2.16 KB, patch		Details \| Diff \| Splinter Review
Avoid deadlocks by sending packets asynchronously 12 years ago Randell Jesup [:jesup] (needinfo me) 2.16 KB, patch	ekr : review+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review