Crash in mozilla::dom::UDPSocketParent::ConnectInternal

RESOLVED FIXED in Firefox 52

Status

()

Core
DOM
P3
critical
RESOLVED FIXED
a year ago
24 days ago

People

(Reporter: calixte, Assigned: drno)

Tracking

({crash, csectype-uaf, sec-high})

52 Branch
mozilla54
crash, csectype-uaf, sec-high
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr45 unaffected, firefox51 wontfix, firefox52+ fixed, firefox-esr5252+ fixed, firefox53+ fixed, firefox54+ fixed)

Details

(Whiteboard: [post-critsmash-triage][adv-main52+], crash signature)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments)

(Reporter)

Description

a year ago
This bug was filed from the Socorro interface and is 
report bp-782e6b03-4a79-4938-8f94-9417d2161031.
=============================================================

This crash (7 crashes) appeared on nightly the 2016-10-31 just after the patch [1] to fix bug 1198381 which modified the file xpcom/threads/nsThread.cpp which is in the backtrace.

[1] https://hg.mozilla.org/mozilla-central/rev?node=17b19eb241db07356004d211c4d23a7646fd8b09
(Reporter)

Updated

a year ago
Flags: needinfo?(afarre)
It really doesn't feel like it's related to bug 1198381. The changes to nsThread all happen before a runnable is executed, while this call stack is in the middle of the execution of a runnable. I'll look a bit closer on Monday though and ask around for ideas.
Flags: needinfo?(afarre)
Priority: -- → P3

Comment 2

10 months ago
this signature is increasing in frequency after firefox 51 was released - it's also quite common on os x.
status-firefox51: --- → affected
status-firefox53: --- → affected
status-firefox54: --- → affected
OS: Windows 10 → All
Hardware: Unspecified → All
See Also: → bug 1194259
(Assignee)

Comment 3

10 months ago
To add some background: WebRTC attempts a fake connect of a UDP socket to find out the IP address of the default interface on a machine.

I failed to find any code change in WebRTC which could explain this. Now up to version 52 that feature only gets triggered if user has set a userpref to not disclose all of his addresses. From version 52 onward that same feature is also being used as a fallback method if other means of gathering the local IP addresses have failed.
So the most logic explanation for the spike is that more users appear to set this pref (maybe via an add-on)?!

From looking at the code it looks like |mSocket| is null. Which either means the socket got closed, before we call connect() on it, or the socket wasn't even successfully created before we call connect() on it. Neither of which really makes sense to me.
I spotted bug 1336586 while looking at our code. Potentially that could cause this, but I'm not 100% convinced.
I apparently hit this crash just now on Firefox Nightly on Windows 10:
https://crash-stats.mozilla.com/report/index/08ac33be-79af-46ae-9a26-c93502170209

I had reopened Firefox with a full set of tabs from my previous session, and I was clicking through and closing a bunch of tabs in series. Some of the tabs were closed as they were starting to load, so it's possible there's a race with something.

Comment 5

9 months ago
Does anything guarantee that DoConnect() is processed before RecvClose (in case child explicitly closes the connect very early) or ActorDestroy (in case child process crashes or something)?
DoConnect() is called asynchronously. 

The relevant code might be coming from bug 1194259
Flags: needinfo?(rjesup)
Oh, relevant! I have dom.ipc.processCount set to 16.

Comment 7

9 months ago
experienced this issue on Linux/amd64: f64d4244-a80f-4fa1-af4b-aab1c2170209
drno: can you answer Olli's question?

I suspect the answer is no
Flags: needinfo?(rjesup) → needinfo?(drno)
Comment hidden (mozreview-request)

Comment 10

9 months ago
mozreview-review
Comment on attachment 8836279 [details]
Bug 1315248: ensure socket is available for connect and send.

https://reviewboard.mozilla.org/r/111754/#review113056

::: dom/network/UDPSocketParent.cpp:342
(Diff revision 1)
>  UDPSocketParent::RecvOutgoingData(const UDPData& aData,
>                                    const UDPSocketAddr& aAddr)
>  {
> -  MOZ_ASSERT(mSocket);
> +  if (!mSocket) {
> +    MOZ_ASSERT(false, "socket is closed");
> +    return IPC_FAIL(this, "Attempted to send data on closed socket");

curious, is this really a reason to kill the child process?
(Assignee)

Comment 11

9 months ago
WebRTC's networking API's pretty much guarantee that close() can not be called before we have received the result for a connect() call, as nICEr (our ICE stack) assumes to operate on a sync socket API. Further more I assume that IPC calls in the parent get executed in the order which they were received. That's why I failed to see how we actually end up hitting this crash.

But ActorDestroy() is probably the answer how this can happen.
Flags: needinfo?(drno)
(Assignee)

Comment 12

9 months ago
mozreview-review-reply
Comment on attachment 8836279 [details]
Bug 1315248: ensure socket is available for connect and send.

https://reviewboard.mozilla.org/r/111754/#review113056

> curious, is this really a reason to kill the child process?

No. You are right. I forgot what IPC_FAIL() means.
Comment hidden (mozreview-request)
Comment on attachment 8836279 [details]
Bug 1315248: ensure socket is available for connect and send.

https://reviewboard.mozilla.org/r/111754/#review113180

`RecvConnect`, `RecvJoinMulticast` and `RecvLeaveMulticast` should also do a nullptr check on `mSocket` as well.

::: dom/network/UDPSocketParent.cpp:341
(Diff revision 2)
>  mozilla::ipc::IPCResult
>  UDPSocketParent::RecvOutgoingData(const UDPData& aData,
>                                    const UDPSocketAddr& aAddr)
>  {
> -  MOZ_ASSERT(mSocket);
> +  if (!mSocket) {
> +    MOZ_ASSERT(false, "socket is closed");

We should use NS_WARN_IF instead of MOZ_ASSERT, since there might be close or IPC actor error happened simultaneously. Aserting on chrome process seems overkill to me.
Attachment #8836279 - Flags: review?(schien) → review+
Comment hidden (mozreview-request)
(Assignee)

Comment 16

9 months ago
Check back on try result
Flags: needinfo?(drno)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 months ago
Flags: needinfo?(drno)

Comment 19

9 months ago
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/6a3ba8bfb34c
ensure socket is available for connect and send. r=schien

Comment 20

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6a3ba8bfb34c
Status: NEW → RESOLVED
Last Resolved: 9 months ago
status-firefox54: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Please request Aurora/Beta approval on this when you get a chance.
Assignee: nobody → drno
status-firefox51: affected → wontfix
Flags: needinfo?(drno)
(Assignee)

Comment 22

9 months ago
Comment on attachment 8836279 [details]
Bug 1315248: ensure socket is available for connect and send.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1194259 plus probably some timing related changes in later releases.
[User impact if declined]: Crashes on pages using webrtc if the page gets closed (?)
[Is this code covered by automated tests?]: No, as we don't know exactly how to reproduce it and it is most likely highly timing sensitive.
[Has the fix been verified in Nightly?]: No, as we don't know how to reproduce it.
[Needs manual test from QE? If yes, steps to reproduce]: Not clear how to reproduce.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: The patch adds a safeguard against a null pointer reference.
[String changes made/needed]: N/A
Flags: needinfo?(drno)
Attachment #8836279 - Flags: approval-mozilla-beta?
Attachment #8836279 - Flags: approval-mozilla-aurora?
Comment on attachment 8836279 [details]
Bug 1315248: ensure socket is available for connect and send.

avoid null deref in udp code, beta52+, aurora53+
Attachment #8836279 - Flags: approval-mozilla-beta?
Attachment #8836279 - Flags: approval-mozilla-beta+
Attachment #8836279 - Flags: approval-mozilla-aurora?
Attachment #8836279 - Flags: approval-mozilla-aurora+

Comment 24

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/aafe44390b1b
status-firefox53: affected → fixed

Comment 25

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/0d1e2aa8c0c2
status-firefox52: affected → fixed
had to push https://hg.mozilla.org/releases/mozilla-beta/rev/d0d5f1187d755b0d1fc65baaecd0ca2e5d16d842 as follow up to fix a bustage (thanks to ryanvm to point me to the right direction)
(Assignee)

Comment 27

9 months ago
(In reply to Carsten Book [:Tomcat] from comment #26)
> had to push
> https://hg.mozilla.org/releases/mozilla-beta/rev/
> d0d5f1187d755b0d1fc65baaecd0ca2e5d16d842 as follow up to fix a bustage
> (thanks to ryanvm to point me to the right direction)

Ups. Sorry for that. And thanks for fixing it!

Comment 28

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/0d1e2aa8c0c2
https://hg.mozilla.org/releases/mozilla-esr52/rev/d0d5f1187d75
status-firefox-esr52: --- → fixed

Comment 29

9 months ago
hi, there are still crashes on 52.0b7: https://crash-stats.mozilla.com/report/index/ea0d347b-0be1-4cf9-bc43-57eba2170217
do we need to reopen?
Flags: needinfo?(drno)
(Assignee)

Comment 30

9 months ago
After verifying that my patch actually has made it into beta 7 I have to admit that I'm puzzled on what is going on here.

Actually the list of crash reports shows a few crash addresses like this 0xffffffffe5e5e5e5, which if I'm not mistaken indicates user after free.

While looking around in the code I noticed that UDPSocketParent::RecvConnect() dispatches to the STS thread. But none of the other functions in UDPSocketParent do the same dispatch. So my best guess right now is that we are seeing some threading issue here.
In fact nsUDPSocket::InitWithAddress() and nsUDPSocket::Connect() show the same pattern: InitWithAddress() appears to be running on any thread, but Connect() checks for STS.

schien: do you have any thoughts on this?
Status: RESOLVED → REOPENED
Flags: needinfo?(drno) → needinfo?(schien)
Resolution: FIXED → ---

Comment 31

9 months ago
marking this as security-sensitive as precaution as well...
Group: core-security
Group: core-security → dom-core-security
Thanks philipp - was was about to do the same :-)

This smacks of something involving threading or messages; perhaps passing a 'this' pointer to a WrapRunnable without ensuring a lifetime long enough that the runnable is certain to have been destroyed.  It also could be thread coherence - accessing 'this' members (and changing them) from the STS thread and the PBackground thread.

Something like RecvConnect() or sendConnectResponse() -- if 'this' has gone away before the runnable runs, boom.  You can do WrapRunnable(RefPtr<USPSocketParent>(this), ...
Agree with @jesup, this looks fishy when I skim through UDPSocketParent.cpp. We should try hold additional refcount for that WrapRunnable.
Flags: needinfo?(schien)

Updated

9 months ago
Flags: needinfo?(drno)
(Assignee)

Updated

9 months ago
status-firefox52: fixed → affected
status-firefox53: fixed → affected
status-firefox54: fixed → affected
status-firefox-esr52: fixed → affected
Flags: needinfo?(drno)
(In reply to Nils Ohlmeier [:drno] from comment #30)
> Actually the list of crash reports shows a few crash addresses like this
> 0xffffffffe5e5e5e5, which if I'm not mistaken indicates user after free.

FYI crash addresses in Breakpad are always treated as 64-bit even on dumps from 32-bit CPUs, so this is just 0xe5e5e5e5 sign-extended to 64-bit, which is almost certainly some sort of poison pattern.
This is a sec-high in 52, which is also an ESR release.  Any chance of a fix while we're still in beta?
Flags: needinfo?(drno)
Keywords: csectype-uaf, sec-high
(Assignee)

Comment 36

9 months ago
Created attachment 8840079 [details] [diff] [review]
bug1315248_part2.patch
Flags: needinfo?(drno)
Attachment #8840079 - Flags: review?(rjesup)

Updated

9 months ago
Attachment #8840079 - Flags: review?(rjesup) → review+
(Assignee)

Comment 37

9 months ago
Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5844222b8a8feb6277a252317933b19216ba5846
(Assignee)

Comment 38

9 months ago
Comment on attachment 8840079 [details] [diff] [review]
bug1315248_part2.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch? It's probably pretty hard as it appears to be some race condition between getting a PeerConnection to gather IP addresses and shutting it down at the right moment. What makes it easier is that you can try that in a loop without user interaction until your code hits it.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? All releases up to Firefox 46.

If not all supported branches, which bug introduced the flaw? Bug 1194259 added support for UDP connect.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? The patch applies to aurora. I have a patch for beta.

How likely is this patch to cause regressions; how much testing does it need? I think the likelihood of regressions is pretty low, since it only holds a handle to an object to ensure the object still exists when the runable gets executed. As we don't know how to reproduce the problem we can really test the problem itself.
Attachment #8840079 - Flags: sec-approval?
Comment on attachment 8840079 [details] [diff] [review]
bug1315248_part2.patch

sec-approval+ for trunk. We'll want this on branches ASAP as well. Can you please nominate branch patches?
Attachment #8840079 - Flags: sec-approval? → sec-approval+
(Assignee)

Comment 40

9 months ago
Comment on attachment 8840079 [details] [diff] [review]
bug1315248_part2.patch

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1194259 in Fx 46
[User impact if declined]: Crashes the browser if WebRTC is used and tab gets closed.
[Is this code covered by automated tests?]: No, we don't know how to reproduce it.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No STR known.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: Low risk
[Why is the change risky/not risky?]: I think it is not risky as it only adds a refpointer to keep the object alive until the runnables are executed.
[String changes made/needed]: N/A
Attachment #8840079 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 41

9 months ago
Created attachment 8840187 [details] [diff] [review]
bug1315248_part2_beta.patch

This is just a separate patch for beta (because attachment 8840079 [details] [diff] [review] does not apply cleanly).
Attachment #8840187 - Flags: review+
(Assignee)

Comment 42

9 months ago
Comment on attachment 8840187 [details] [diff] [review]
bug1315248_part2_beta.patch

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1194259 in Fx 46
[User impact if declined]: Crashes the browser if WebRTC is used and tab gets closed.
[Is this code covered by automated tests?]: No, we don't know how to reproduce it.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No STR known.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: Low risk
[Why is the change risky/not risky?]: I think it is not risky as it only adds a refpointer to keep the object alive until the runnables are executed.
[String changes made/needed]: N/A
Attachment #8840187 - Flags: approval-mozilla-beta?
Comment on attachment 8840079 [details] [diff] [review]
bug1315248_part2.patch

Fix a crash. Aurora53+.
Attachment #8840079 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Updated

9 months ago
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/4fff02e2a6a2d1d8f3ea81c7ab3f30374b9e03c9
Keywords: checkin-needed

Comment 45

9 months ago
uplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/08c817237480
status-firefox53: affected → fixed
Comment on attachment 8840187 [details] [diff] [review]
bug1315248_part2_beta.patch

uaf fix, should be in 52.0b9
Attachment #8840187 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 47

9 months ago
uplift
https://hg.mozilla.org/releases/mozilla-beta/rev/500322a38bc8
status-firefox52: affected → fixed

Comment 48

9 months ago
uplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/500322a38bc8
status-firefox-esr52: affected → fixed
https://hg.mozilla.org/mozilla-central/rev/4fff02e2a6a2
Status: REOPENED → RESOLVED
Last Resolved: 9 months ago9 months ago
status-firefox54: affected → fixed
Resolution: --- → FIXED
(Assignee)

Comment 50

9 months ago
Check back on crash stats once this has been in the public for some time.
Flags: needinfo?(drno)
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main52+]
(Assignee)

Comment 51

9 months ago
Since the landing of the first patched lowered the frequency of the crashes in beta to a really low volume I guess we need to wait for 52 to become release to judge if it is really fixed now with both patches.
status-firefox-esr45: --- → unaffected
tracking-firefox52: --- → ?
tracking-firefox53: --- → ?
tracking-firefox54: --- → ?
tracking-firefox-esr52: --- → ?
Updating tracking flags for 52/53/54 and esr52.
tracking-firefox52: ? → +
tracking-firefox53: ? → +
tracking-firefox54: ? → +
tracking-firefox-esr52: ? → 52+
Group: dom-core-security → core-security-release
(Assignee)

Comment 53

8 months ago
Since 52 becoming release I only see crashes for 51.0.1 or for older beta's of 52. So it looks good.
I'll check back in two weeks to see if this really disappeared from crash stats for good this time.
(Assignee)

Comment 54

8 months ago
No crashes in 52 since release. Now I'm convinced we fixed it properly.
Flags: needinfo?(drno)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.