Closed Bug 801221 Opened 7 years ago Closed 7 years ago

Crash in [@ memmove | ssl_SendSavedWriteData | dtls_SendSavedWriteData ]

Categories

(Core :: WebRTC: Networking, defect, P1, critical)

defect

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox17 --- unaffected
firefox18 - fixed
firefox19 + verified
firefox-esr10 --- unaffected
firefox-esr17 --- unaffected

People

(Reporter: ekr, Assigned: ekr)

References

Details

(4 keywords, Whiteboard: [WebRTC], [blocking-webrtc+] [testcase see comment 6] [adv-main18-])

Crash Data

Attachments

(1 file, 4 obsolete files)

I believe this occurred in test harness.

The proximal error was the following crash:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000121c00025
[Switching to process 78458 thread 0x3a03]
0x00007fffffe012b1 in __longcopy ()
(gdb) up
#1  0x00007fffffe0087a in __memcpy ()
(gdb) up
#2  0x000000010b516c8f in __inline_memmove_chk (__dest=0x12118f000,
__src=0x12118f065, __len=4294967195) at _string.h:69
Current language:  auto; currently minimal
(gdb) up
#3  0x000000010b516908 in ssl_SendSavedWriteData (ss=0x1210ed000) at
sslsecur.c:498
(gdb) up
#4  0x000000010b4e6435 in dtls_SendSavedWriteData (ss=0x1210ed000) at
dtlscon.c:742
(gdb) up
#5  0x000000010b4e52cc in dtls_TransmitMessageFlight (ss=0x1210ed000)
at dtlscon.c:722
(gdb) up
#6  0x000000010b4e593e in dtls_FlushHandshakeMessages (ss=0x1210ed000,
flags=0) at dtlscon.c:523
(gdb) down
#5  0x000000010b4e52cc in dtls_TransmitMessageFlight (ss=0x1210ed000)
at dtlscon.c:722
(gdb) down
#4  0x000000010b4e6435 in dtls_SendSavedWriteData (ss=0x1210ed000) at
dtlscon.c:742
(gdb) down
#3  0x000000010b516908 in ssl_SendSavedWriteData (ss=0x1210ed000) at
sslsecur.c:498
(gdb) run

WTC, bsmith, and I have evaluated this crash and suspect either a memory
error elsewhere or a threading issue that is causeing another write to
happen concurrently, thus clearing ss-SendBuf.len. This does not appear
to be a bug in ssl3_SendSavedWriteData().
This error occurred in the WebRTC code.
We need to add NS_ABORT_IF_FALSE(IsOnThread(target_)) to the top of every method in TransportLayerMethods, where:

bool
IsOnThread(nsIEventTarget * target) {
  bool on;
  NS_ENSURE_TRUE(target, false);
  NS_ENSURE_SUCCESS(target->IsOnCurrentThread(&on), false);
  NS_ENSURE_TRUE(on, false);
  return true;
}

Later, it should be changed from NS_ABORT_If_FALSE to MOZ_ASSERT.

Anyway, we need the full stack trace. My initial suspicion is that this is coming from a PR_Close() that is happening in a destructor or destructor-like function, and that destructor is running on the main thread.
I roughly agree with this analysis of what the problem is.

There are some functions that are safe, but pretty much everything after setup needs to have this kind of assert.
Blocks: 801392
Attached patch DTLS init on STS thread (obsolete) — Splinter Review
Eric, do you have a testcase for it?
Assignee: nobody → ekr
Status: NEW → ASSIGNED
This crash also happens on Windows 7 with my testcase from bug 798323 (attachment 671342 [details]). Just open it and wait a couple of seconds and Firefox will crash.

bp-9ed026c1-3557-474d-ab5c-23f712121015
Flags: in-testsuite?
OS: Mac OS X → All
Hardware: x86 → All
Priority: -- → P1
Whiteboard: [WebRTC], [blocking-webrtc+]
Crash Signature: [@ memmove | ssl_SendSavedWriteData | dtls_SendSavedWriteData ]
Summary: Crash in ssl_SendSavedWriteData → Crash in [@ memmove | ssl_SendSavedWriteData | dtls_SendSavedWriteData ]
ekr: where do we stand on the patch?  Is it ready for review?  It doesn't seem to add the ENSURE tests bsmith mentions
Correct. I am adding those now. It rippled into me wanting to add tests for Dispatch success.
Attached patch DTLS init on STS thread (obsolete) — Splinter Review
* * *
Error checks on Dispatch
* * *
Dispach PC error checks
* * *
Dispach PC error checks
Attachment #671234 - Attachment is obsolete: true
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 672935 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is read for review.
Attachment #672935 - Flags: review?(rjesup)
Note: I cannot reproduce this crash.
P.S. I know about the trailing whitespace. will fix before landing.
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 672935 [details] [diff] [review]:
-----------------------------------------------------------------

partial review, will finish later

::: media/mtransport/transportlayer.h
@@ +61,5 @@
>  
> +  // Dispatch a call onto our thread (or run on the same thread if
> +  // thread is not set). This is always synchronous.
> +  nsresult RunOnThread(nsIRunnable *event) {
> +    if (target_ && (target_ != nsRefPtr<nsIThread>(do_GetCurrentThread())))

do_CurrentThread works better with nsCOMPtrs - see nsThreadUtils.h and all the uses of it in the tree.  However, I don't think you need to addref it at all for this test.
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 672935 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +2452,3 @@
>      dtls->SetRole(pc->GetRole() == sipcc::PeerConnectionImpl::kRoleOfferer ?
>                    TransportLayerDtls::CLIENT : TransportLayerDtls::SERVER);
> +                                             dtls->SetIdentity(pc->GetIdentity());

indent

@@ +2492,5 @@
> +
> +    if (NS_FAILED(rv)) {
> +      return NULL;
> +    }
> +    if (NS_FAILED(res)) {

if (NS_FAILED(rv) || NS_FAILED(res)) {
if you please
Attachment #672935 - Flags: review?(rjesup) → review+
Ekr, if you produce a try build please let us know, I can check it.
crashing in memmove on an invalid address is bad news, assuming the worst security-wise.
Keywords: sec-critical
ekr: can we check this in an retest?
Attachment #672935 - Flags: sec-approval?
Attachment #674369 - Attachment is obsolete: true
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

I'm not giving sec-approval since the form wasn't filled out and I have no idea of the implications of the patch. Please renominate and fill in the template that it adds to the comments. Thank you.
Attachment #672935 - Flags: sec-approval? → sec-approval-
(In reply to Al Billings [:abillings] from comment #19)
> Comment on attachment 672935 [details] [diff] [review]
> DTLS init on STS thread
> 
> I'm not giving sec-approval since the form wasn't filled out and I have no
> idea of the implications of the patch. Please renominate and fill in the
> template that it adds to the comments. Thank you.

Sure, no problem. When I read the writeup I thought that bugzilla was going
to add a new comment with the form but that didn't happen.... Jesup sent
me the form so I'll fill it out.
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 672935 [details] [diff] [review]:
-----------------------------------------------------------------

> How easily can the security issue be deduced from the patch?

There are two security issues:

1. A memory overread/overrwrite bug in memmove.
2. A speculative failure to use DTLS with SCTP.

The first of these is potentially serious but hard to deduce (and
we're not 100% sure this patch fixes it).

The second of these issues is that if you fail to push the
DTLS layer on top of the TransportFlow, you end up with just
an unencrypted flow. I believe this is not a real issue because
currently SCTP reuses a media TransportFlow and if those fail
to negotiate DTLS an assert fires. It's also unclear how
failure to push layers could happen, since the code called on
push is XPCOM code of a sort that is routinely used w/o error
checking.


> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

I don't believe so. This just looks like a typical threading
issue.


> Which older supported branches are affected by this flaw?

None. This applies to FF 18 Aurora and M-C, but only with
PeerConnection turned on.


> If not all supported branches, which bug introduced the flaw?

M-C for FF 18.


> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

N/A.


> How likely is this patch to cause regressions; how much testing does it need?

The most likely regressions are asserts in fxns called on the wrong thread,
but we want those to fail so we can fix them.
Attachment #672935 - Flags: sec-approval- → sec-approval?
Comment on attachment 672935 [details] [diff] [review]
DTLS init on STS thread

sec-approval+

Did you mean 18 or 19 in "M-C for FF 18"? If you meant 18 then we'd need this on the Aurora branch too.
Attachment #672935 - Flags: sec-approval? → sec-approval+
I believe this fix is needed on FF18 as well.

FYI, apparently when ekr selected sec-approval? it did't offer the form; not sure why (I tried it and it worked for me)
Attached patch DTLS init on STS thread (obsolete) — Splinter Review
* * *
Error checks on Dispatch
* * *
Dispach PC error checks
* * *
Dispach PC error checks
Comment on attachment 674824 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 674824 [details] [diff] [review]:
-----------------------------------------------------------------

Jesup, there were enough small changes here that I would like a re-review.
Attachment #674824 - Flags: review?(rjesup)
Comment on attachment 674824 [details] [diff] [review]
DTLS init on STS thread

Review of attachment 674824 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/transport_unittests.cpp
@@ +211,5 @@
> +
> +    test_utils.sts_target()->Dispatch(
> +      WrapRunnableRet(flow_, &TransportFlow::PushLayers, layers, &res),
> +      NS_DISPATCH_SYNC);
> +        

trailing space

@@ +275,5 @@
>    }
>  
>    TransportResult SendPacket(const unsigned char* data, size_t len) {
> +    TransportResult ret;
> +    

ditto

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +2446,5 @@
> +    ScopedDeletePtr<TransportLayerIce> ice(new
> +        TransportLayerIce("flow", pc->ice_ctx(),
> +                          pc->ice_media_stream(level-1),
> +                          rtcp ? 2 : 1));
> +    

trailing spaces (2 lines)

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp
@@ +264,5 @@
> +  
> +  // res is invalid unless the dispatch succeeded
> +  if (NS_FAILED(rv))
> +    return rv;
> +  

trailing spaces (2 lines)

::: media/webrtc/signaling/test/mediapipeline_unittest.cpp
@@ +189,5 @@
>  
>    void SetUp() {
>      PRStatus status = PR_NewTCPSocketPair(fds_);
>      ASSERT_EQ(status, PR_SUCCESS);
> +    

trailing space
Attachment #674824 - Flags: review?(rjesup) → review+
Attachment #674824 - Attachment is obsolete: true
Comment on attachment 675130 [details] [diff] [review]
Move mtransport operations onto STS thread;

Carrying forward r+ and sec-approval+ (patch is cleanup of nits from my review plus C++ unit test changes that don't go into the product)
Attachment #675130 - Flags: sec-approval+
Attachment #675130 - Flags: review+
(In reply to Henrik Skupin (:whimboo) from comment #6)
> This crash also happens on Windows 7 with my testcase from bug 798323
> (attachment 671342 [details]). Just open it and wait a couple of seconds and
> Firefox will crash.

Instead of this crash I now see bug 805701 when using this testcase.
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+] [testcase see comment 6]
https://hg.mozilla.org/mozilla-central/rev/a39913319ce6
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Apparent fallout from this patch is breaking DataChannels, which spawns connect threads to call usrsctp_connect() and the like (because they can block).  In general, we need to be able to move all IO off the main thread.  If we must do it all from one thread (STS), then we need to design the API such that this is (largely) hidden from the user.

I'll do some more tests and then likely file a followup bug (non-sec).
Verified by running the test case cited a few times to ensure that this crash no longer reproduces on nightly 10/28.
Status: RESOLVED → VERIFIED
(In reply to Jason Smith [:jsmith] from comment #33)
> Verified by running the test case cited a few times to ensure that this
> crash no longer reproduces on nightly 10/28.

So, as this is marked tracking+ for 18, should this be uplifted to aurora?
Untracking for FF18 since we don't expect to ship WebRTC enabled in that version of Firefox. We'd still accept a very low risk uplift if nominated soon.
Attachment #672935 - Attachment is obsolete: true
Comment on attachment 675130 [details] [diff] [review]
Move mtransport operations onto STS thread;

[Approval Request Comment]
Bug caused by (feature/regressing bug #): initial checkin of mtransport before FF18 uplifted
User impact if declined: Random security-sensitive crashes when using WebRTC
Testing completed (on m-c, etc.): several weeks baking on m-c
Risk to taking this patch (and alternatives if risky): Risk to normal users is minimal as this feature is behind a pref, and the version/API extant in FF18 is primarily good for demos (i.e. it will likely not be targeted by any production service providers or websites).  Risk to users who flip the pref is pretty low; the patch is not tiny but is mostly just adding CheckThread() calls, and the security risk for such users is definitely less than the current code.  The most likely problem would be breaking some aspect of WebRTC.
String or UUID changes made by this patch: None

Note: if we take this, we'll need to take bug 806169 and bug 807647 (both fairly simple, second is fallout from the first) to un-break DataChannels, which this patch will break.
Attachment #675130 - Flags: approval-mozilla-aurora?
Comment on attachment 675130 [details] [diff] [review]
Move mtransport operations onto STS thread;

[Triage Comment]
Please land before Monday morning PT to make it in before the merge. Thanks!
Attachment #675130 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Asked for approval on the two dependent bugs
Duplicate of this bug: 801392
Whiteboard: [WebRTC], [blocking-webrtc+] [testcase see comment 6] → [WebRTC], [blocking-webrtc+] [testcase see comment 6] [adv-main18-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.