Closed Bug 1048261 Opened 5 years ago Closed 5 years ago

ASAN failure in the runnable for SetDtlsConnected - doesn't hold a reference to pcimpl

Categories

(Core :: WebRTC: Signaling, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox31 --- unaffected
firefox32 --- fixed
firefox33 --- fixed
firefox34 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- unaffected
b2g-v1.2 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Keywords: crash, regression, sec-critical)

Attachments

(2 files, 3 obsolete files)

With billm's GC/CC patches (https://tbpl.mozilla.org/?tree=Try&rev=ebce8337a8ed), there was an ASAN failure (https://tbpl.mozilla.org/php/getParsedLog.php?id=45173855&tree=Try) which pointed to the fact that the dispatch of the runnable for SetDtlsConnected doesn't hold a reference to the PCImpl (destination object), and in this case it's getting deleted before the runnable executes.

We should take and verify this fix, then uplift as far as possible.  No indication that it's exploitable (and without this GC/CC change, ASAN has never triggered on this), but we should assume it's theoretically possible.
Per Randell, the debug crashes are probably the same issue:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45153067&tree=Try
Attachment #8467056 - Flags: review?(docfaraday)
Comment on attachment 8467056 [details] [diff] [review]
Fix SetDtlsConnected()

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Hard.  You'd need to force the GC/CC into freeing it and get something else allocated there.  With current gC/CC I suspect somewhere between very hard and impossible.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  They do point to a missing refcount, but nothing more beyond that.

Which older supported branches are affected by this flaw? this was caused by a patch that landed in Fx32, only 32, 33, 34, and b2g 2.0

If not all supported branches, which bug introduced the flaw? bug 942367 

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Trivial; should apply cleanly

How likely is this patch to cause regressions; how much testing does it need? Virtually no chance for regression.
Attachment #8467056 - Flags: sec-approval?
Comment on attachment 8467056 [details] [diff] [review]
Fix SetDtlsConnected()

Review of attachment 8467056 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with one nit.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionMedia.cpp
@@ +542,5 @@
>  
>    bool privacyRequested = false;
>    // TODO (Bug 952678) set privacy mode, ask the DTLS layer about that
>    GetMainThread()->Dispatch(
> +    WrapRunnable(nsRefPtr<PeerConnectionImpl>(mParent), 

Nit: trailing ws
Attachment #8467056 - Flags: review?(docfaraday) → review+
Comment on attachment 8467056 [details] [diff] [review]
Fix SetDtlsConnected()

sec-approval+. Let's get this onto Aurora and Beta as well to avoid shipping it.
Attachment #8467056 - Flags: sec-approval? → sec-approval+
Comment on attachment 8467056 [details] [diff] [review]
Fix SetDtlsConnected()

Approval Request Comment
See sec-approval request.

Bug was introduced during Fx32

Minimal risk of regression (adds a refcount to a runnable member).
Attachment #8467056 - Flags: approval-mozilla-beta?
Attachment #8467056 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebb6c174ab00
Waiting on aurora/beta approvals to land there
Target Milestone: --- → mozilla34
Hi Randell, I pushed my GC patch on top of your fix and now I'm getting a different ASAN error.
https://tbpl.mozilla.org/php/getParsedLog.php?id=45203386&full=1&branch=try#error0
Could you please look and see if it's related?
That's a problem.  I can see now why the code that I wrote here was wrong and why Randell's fix is not sufficient.

The problem is that the STS thread doesn't maintain a reference to the PC, and it can't with the current structure.  The best thing I can think of right now requires a proxy for taking the STS thread upcall.  That has some tricky lifecycle issues to deal with.

I don't think that we should uplift the current patch until this one is right.  Can this wait until tomorrow?
Attachment #8467056 - Flags: review-
This applies on top of Randell's patch (attachment 8467056 [details] [diff] [review]) and - I think - addresses the thread dispatch issue at the core of this.  It's much more code, but it seems to at least work.

The problem here is that you have something on main that needs to setup an observer on STS.  That observer runs on STS, but needs to dispatch to main, then destroy itself back on STS.  Oh, and the event that it gets from STS isn't guaranteed, so it needs to observe a second event, which could run while dispatching to main.

This does all that without synchronous dispatches, which are all kinds of bad.

I'm not sure that I got the refcounting around the thread dispatches exactly right.  But any problem there is limited to refcounting of the main and STS threads during dispatch, ...again, I think.  Please double check my logic, it's past my bed time.

https://tbpl.mozilla.org/?tree=Try&rev=887fe2b516d3
Attachment #8467561 - Flags: review?(rjesup)
Attachment #8467561 - Flags: review?(docfaraday)
Comment on attachment 8467561 [details] [diff] [review]
0001-Bug-1048261-Fixing-dispatch-of-DTLS-connected-event.patch

Review of attachment 8467561 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionMedia.cpp
@@ +536,3 @@
>  void
> +PeerConnectionMedia::AddTransportFlow(
> +  int aIndex, bool aRtcp, const mozilla::RefPtr<TransportFlow> &aFlow)

Leave function style matching rest of file

@@ +551,5 @@
>      NS_DISPATCH_NORMAL);
>  }
>  
>  void
> +PeerConnectionMedia::DtlsObserver::Connect(

in keeping with the "standard" of naming thread-specific functions, Connect_s()

@@ +607,5 @@
> +void
> +PeerConnectionMedia::DtlsObserver::Cleanup_s()
> +{
> +  mDispatching = false;
> +  SelfDestruct();

Comment that if SelfDestruct was blocked by mDispatching earlier, it will occur here.
Attachment #8467561 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/ebb6c174ab00

Leaving open for the other patch.
Status: NEW → ASSIGNED
Comment on attachment 8467561 [details] [diff] [review]
0001-Bug-1048261-Fixing-dispatch-of-DTLS-connected-event.patch

Review of attachment 8467561 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe I'm missing something, but why couldn't we just go back to square one and use a PC handle in the dispatch to PeerConnectionImpl::SetDtlsConnected (set up a thunk that attempts to get a strong ref on main, and makes the function call)? Are there other problems we're trying to fix here?
Flags: needinfo?(martin.thomson)
r=jesup already, need to pick Byron's brain about how to make this simpler.
Attachment #8467561 - Attachment is obsolete: true
Attachment #8467561 - Flags: review?(docfaraday)
Attachment #8467869 - Flags: review?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #14)
> Comment on attachment 8467561 [details] [diff] [review]
> 0001-Bug-1048261-Fixing-dispatch-of-DTLS-connected-event.patch
> 
> Review of attachment 8467561 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Maybe I'm missing something, but why couldn't we just go back to square one
> and use a PC handle in the dispatch to PeerConnectionImpl::SetDtlsConnected
> (set up a thunk that attempts to get a strong ref on main, and makes the
> function call)? Are there other problems we're trying to fix here?

Sorry about the delay, I typed a response, but didn't send (was on a call).

This is essentially what the patch does.  We can't use either PeerConnectionImpl or PeerConnectionMedia (they share a lifecycle), so we need something that is coupled the lifecycle of TransportLayerDtls.  If you have any ideas on how to make this simpler, I'm very interested.  I just can't think of anything easier.  That is, without creating a dependency between TransportLayerDtls and PeerConnection*, which I think would be a very bad layering violation.
Flags: needinfo?(martin.thomson)
Based on Byron's feedback, this is a much simpler version of the patch.

https://tbpl.mozilla.org/?tree=Try&rev=b9b2673bac60
Attachment #8467869 - Attachment is obsolete: true
Attachment #8467869 - Flags: review?(docfaraday)
Attachment #8467973 - Flags: review?(rjesup)
Attachment #8467973 - Flags: review?(docfaraday)
And here, for maximal ASAN crashiness: https://tbpl.mozilla.org/?tree=Try&rev=7893633a011a
Comment on attachment 8467973 [details] [diff] [review]
0001-Bug-1048261-Safe-dispatch-from-DTLS-connect-to-PeerC.patch

Review of attachment 8467973 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, if it works
Attachment #8467973 - Flags: review?(docfaraday) → review+
Comment on attachment 8467973 [details] [diff] [review]
0001-Bug-1048261-Safe-dispatch-from-DTLS-connect-to-PeerC.patch

Review of attachment 8467973 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionMedia.cpp
@@ +551,5 @@
>  }
>  
>  void
> +PeerConnectionMedia::DtlsConnected_m(std::string aParentHandle,
> +                                     bool aPrivacyRequested)

Suggest make this const std::string&
Comment on attachment 8467973 [details] [diff] [review]
0001-Bug-1048261-Safe-dispatch-from-DTLS-connect-to-PeerC.patch

Review of attachment 8467973 [details] [diff] [review]:
-----------------------------------------------------------------

Much better, thanks
Attachment #8467973 - Flags: review?(rjesup) → review+
Approval Request Comment
[Feature/regressing bug #]: 942367
[User impact if declined]: none
[Describe test coverage new/current, TBPL]: builds, manual testing, mochitest, asan crashtests x50
[Risks and why]: none identified
[String/UUID change made/needed]: none

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Pretty hard.  See comment 4.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No

Which older supported branches are affected by this flaw?
See comment 4

If not all supported branches, which bug introduced the flaw?
bug 942367

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
the same patch should apply cleanly with minor changes

How likely is this patch to cause regressions; how much testing does it need?
the testing thus far seems adequate
Attachment #8467973 - Attachment is obsolete: true
Attachment #8468107 - Flags: sec-approval?
Attachment #8468107 - Flags: review+
Attachment #8468107 - Flags: approval-mozilla-beta?
Attachment #8468107 - Flags: approval-mozilla-aurora?
Comment on attachment 8467056 [details] [diff] [review]
Fix SetDtlsConnected()

Review of attachment 8467056 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like folly to r- now that I've patched over this.
Attachment #8467056 - Flags: review- → review+
Comment on attachment 8468107 [details] [diff] [review]
0001-Bug-1048261-Safe-dispatch-from-DTLS-connect-to-PeerC.patch

Approvals all around. Let's get it in.
Attachment #8468107 - Flags: sec-approval?
Attachment #8468107 - Flags: sec-approval+
Attachment #8468107 - Flags: approval-mozilla-beta?
Attachment #8468107 - Flags: approval-mozilla-beta+
Attachment #8468107 - Flags: approval-mozilla-aurora?
Attachment #8468107 - Flags: approval-mozilla-aurora+
(In reply to Martin Thomson [:mt] from comment #18)
> And here, for maximal ASAN crashiness:
> https://tbpl.mozilla.org/?tree=Try&rev=7893633a011a

Looks like you're good to go, Bill :)
Bug 1016738 re-landed, so I went ahead and pushed this to inbound to avoid intermittent crashes.

https://hg.mozilla.org/integration/mozilla-inbound/rev/6564f687b3ff
Attachment #8467056 - Flags: approval-mozilla-beta?
Attachment #8467056 - Flags: approval-mozilla-aurora?
Can this be closed now Randell?
Flags: needinfo?(rjesup)
Comment 26 hasn't merged to m-c yet.
Flags: needinfo?(rjesup)
Ahh right.  I couldn't figure out why it hadn't been closed yet, but I forgot to check the date.
https://hg.mozilla.org/mozilla-central/rev/6564f687b3ff
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
I quickly talked to Randell via IRC and asked if there's anything that QE can do to reproduce/verify. Randell mentioned that this issue was rarely seen on Try and other than constantly retriggering tbpl, there was never a reliable way to cause the problem. If there's anything else QE can do to test/verify this issue, please let us know and needinfo! Marking this as qe-verify-
Flags: qe-verify-
Blocks: 942367
Group: core-security
No longer depends on: 942367
Keywords: regression
Duplicate of this bug: 1038349
You need to log in before you can comment on or make changes to this bug.