Closed Bug 1280443 Opened 8 years ago Closed 8 years ago

Crash in nr_socket_sendto

Categories

(Core :: WebRTC: Networking, defect, P1)

48 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox48 --- fixed
firefox49 --- fixed
firefox-esr45 --- unaffected
relnote-firefox --- 48+
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

(Keywords: csectype-race, csectype-uaf, sec-critical, Whiteboard: [adv-main48+])

Attachments

(1 file, 1 obsolete file)

From crash-stats: https://crash-stats.mozilla.com/signature/?date=%3E01%2F01%2F2016&signature=nr_socket_sendto&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports As of 48, we are seeing crashes in nr_socket_sendto, some of which look like UAF. Here's the mtransport stuff that landed in 48: https://bugzilla.mozilla.org/buglist.cgi?list_id=13071777&resolution=FIXED&query_format=advanced&component=WebRTC%3A%20Networking&target_milestone=mozilla48 It might have something to do with ICE restart, but it is hard to say yet. My preliminary analysis of the source did not turn anything up. It may be some bug elsewhere has crept in, but I looked over the timer stuff and I didn't see anything scary.
backlog: --- → webrtc/webaudio+
Rank: 10
Ok, after looking at the timer stuff some more, there's a bunch of racy stuff going on, particularly in nsTimerImpl::Release. I'm going to put together a diagnostic patch for this bug to see if the problem is timers failing to cancel properly.
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Hm crash-stats shows a single crash in 39 and then lots of crashes from 48 on. Looks like something has changed in the timing in 48. Looks like only ICE restart really changed anything significantly in 48. The only idea which comes to mind would be double checking if we properly set, reset and clear all timers in case of rollbacks and if ICE restarts fail.
Yeah, I've looked it over, and ICE restart made very few modifications to the nICEr code, which is where all the timers are handled. It could be that we're running into a similar problem as found in bug 1279146, where we have some dangling pointers to nICEr stuff that we try to use, but so far I have not discovered a code-path that gets from there to this crash. It may be that bug 1279146 is corrupting the heap in a way that causes this bug, but that's a long shot.
Attachment #8763626 - Flags: review?(drno)
Keywords: sec-high
Comment on attachment 8763626 [details] [diff] [review] Make a timer assert more aggressive [Security approval request comment] How easily could an exploit be constructed based on the patch? Very hard I think. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Not really. Which older supported branches are affected by this flaw? Unknown; this patch should detect whether canceling an nsTimerImpl is not working in rare cases. If this ends up being the case, it is probably a long-standing bug. How likely is this patch to cause regressions; how much testing does it need? At most it will cause some null pointer crashes in rare cases.
Attachment #8763626 - Flags: sec-approval?
Comment on attachment 8763626 [details] [diff] [review] Make a timer assert more aggressive Review of attachment 8763626 [details] [diff] [review]: ----------------------------------------------------------------- LGTM
Attachment #8763626 - Flags: review?(drno) → review+
Group: core-security → media-core-security
Comment on attachment 8763626 [details] [diff] [review] Make a timer assert more aggressive sec-approval=dveditz
Attachment #8763626 - Flags: sec-approval? → sec-approval+
Keywords: checkin-needed
Comment on attachment 8763626 [details] [diff] [review] Make a timer assert more aggressive Approval Request Comment [Feature/regressing bug #]: If the crash is being caused by flaws in the timer implementation, this problem has likely existed since webrtc landed. [User impact if declined]: This could convert a rare UAF crash into a slightly less rare null pointer crash. [Describe test coverage new/current, TreeHerder]: Given how rare this crash is, it would probably be hard to get coverage. [Risks and why]: It is possible that we will get null pointer crashes more often, instead of the pretty rare UAF crashes we're seeing. [String/UUID change made/needed]: None.
Attachment #8763626 - Flags: approval-mozilla-beta?
Attachment #8763626 - Flags: approval-mozilla-aurora?
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Byron, it seems that esr45 is affected, right?
Flags: needinfo?(docfaraday)
Comment on attachment 8763626 [details] [diff] [review] Make a timer assert more aggressive Taking it as it fixes a sec-high. Should be in 48 beta 4.
Attachment #8763626 - Flags: approval-mozilla-beta?
Attachment #8763626 - Flags: approval-mozilla-beta+
Attachment #8763626 - Flags: approval-mozilla-aurora?
Attachment #8763626 - Flags: approval-mozilla-aurora+
I have not seen an esr 45 crash, but it is possible it is affected. Also, I should have marked this leave-open, since the patch might not actually fix the root cause. We have to wait and see.
Status: RESOLVED → REOPENED
Flags: needinfo?(docfaraday)
Resolution: FIXED → ---
I maybe see one crash since this landed on 48, and none on 49, for what that is worth.
Whiteboard: [adv-main48+]
.
Attachment #8763626 - Attachment is obsolete: true
Attachment #8779833 - Flags: review?(drno)
I've run the patch from Try for 40min on a machine where it insta-crashed, no problems. Also on Linux for me it would i-loop due to the same bug, also running for 40 min now. Source of the bug was bug 1258753
Sec-critical since the dup shows you can insta-crash it with a call through a UAF vtbl
Rank: 10 → 5
Comment on attachment 8779833 [details] [diff] [review] Copy this the hard way Review of attachment 8779833 [details] [diff] [review]: ----------------------------------------------------------------- LGTM
Attachment #8779833 - Flags: review?(drno) → review+
Comment on attachment 8779833 [details] [diff] [review] Copy this the hard way Approval Request Comment [Feature/regressing bug #]: Bug 1258753 [User impact if declined]: We're using a function pointer in freed memory using a fairly simple test-case, which means an arbitrary code execution exploit could be mounted. [Describe test coverage new/current, TreeHerder]: Getting the timing down has been hard; we've only been able to repro consistently on windows 10. [Risks and why]: Pretty low. It is a pretty simple change. [String/UUID change made/needed]: None. [Security approval request comment] How easily could an exploit be constructed based on the patch? Based on the patch, not so easily, because it is hard to work out where the sec problem comes in. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Not really. It's pretty subtle. Which older supported branches are affected by this flaw? 48 and up. If not all supported branches, which bug introduced the flaw? Bug 1258753 Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I bet this will apply cleanly, but if it does not it will be easy to make backports. How likely is this patch to cause regressions; how much testing does it need? Pretty unlikely.
Attachment #8779833 - Flags: sec-approval?
Attachment #8779833 - Flags: approval-mozilla-release?
Attachment #8779833 - Flags: approval-mozilla-beta?
Attachment #8779833 - Flags: approval-mozilla-aurora?
I'm confused by the re-opening of this bug. We fixed it in 48 and noted it in the roll up advisory. Did the fix not work correctly?
Flags: needinfo?(docfaraday)
We landed a patch which was described as "This could convert a rare UAF crash into a slightly less rare null pointer crash." Also, byron meant this to be leave-open I think (thus his reopening it right after it was closed: "Also, I should have marked this leave-open, since the patch might not actually fix the root cause. We have to wait and see.") It did quiet down the crashes quite a bit, but not quite 100%. Now that we have a reproducible testcase, we were able to track the root cause down. (Ironically the crash didn't exactly lead us to the source of the problem, it was the iloop on Linux that pointed directly at the cause.)
I'm going also second byron's evaluation: "We're using a function pointer in freed memory using a fairly simple test-case, which means an arbitrary code execution exploit could be mounted." Since it's now known to be easy to force the UAF, and it's a function-pointer UAF, I expect a practical attack could be constructed.
Flags: needinfo?(docfaraday)
I can take it in the 48.0.1 release but I need a sec approval today.
(In reply to Sylvestre Ledru [:sylvestre] from comment #28) > I can take it in the 48.0.1 release but I need a sec approval today. Al -- Just want to make sure this is at the top of your queue for this morning. Thanks.
Flags: needinfo?(abillings)
Comment on attachment 8779833 [details] [diff] [review] Copy this the hard way sec-approval for this then!
Flags: needinfo?(abillings)
Attachment #8779833 - Flags: sec-approval? → sec-approval+
Comment on attachment 8779833 [details] [diff] [review] Copy this the hard way Let's take it in the 48.0.1 dot release
Attachment #8779833 - Flags: approval-mozilla-release?
Attachment #8779833 - Flags: approval-mozilla-release+
Attachment #8779833 - Flags: approval-mozilla-beta?
Attachment #8779833 - Flags: approval-mozilla-beta+
Attachment #8779833 - Flags: approval-mozilla-aurora?
Attachment #8779833 - Flags: approval-mozilla-aurora+
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Group: media-core-security → core-security-release
Added in the 48.0.1 release notes: "Fix a crash in WebRTC".
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: