Closed Bug 1475431 Opened 3 years ago Closed 3 years ago

Appear.in tab crash in Windows 10 64-Bit clang-cl build

Categories

(Core :: WebRTC, defect, P2)

x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 - wontfix
firefox61 --- wontfix
firefox62 + fixed
firefox63 + fixed

People

(Reporter: ng, Assigned: padenot)

References

Details

(4 keywords, Whiteboard: [fixed on trunk in bug 1476278][post-critsmash-triage][adv-main62+])

Attachments

(4 files)

appear.in call crashes immediately upon granting gUM permissions. No crash is reported in about:crashes. 

Steps to reproduce:
1) On Windows 10 64 bit built with clang-cl, navigate to https://appear.in and enter a room
2) When prompted to grant permission to use microphone (and optionally camera) accept
3) Tab Crash occurs, no crash appears in about:crashes

Expected outcome
There is no tab crash
This was determined via mozregression on mozilla-inbound which lead to the following check-in: https://hg.mozilla.org/mozilla-central/rev/01380eb6a6d0
I followed steps 1 and 2 exactly and no tab crash occurred.

Can you please run your scenario under WinDbg and capture a stack [1] and/or minidump [2]

[1] https://developer.mozilla.org/en-US/docs/Mozilla/How_to_get_a_stacktrace_with_WinDbg
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/Capturing_a_minidump
Flags: needinfo?(na-g)
Crash log, Win 10 64bit (Insider), 64Bit nightly.
Flags: needinfo?(na-g)
I should also have noted that this box is running Windows 10 Insider. I will try reproducing this on another box without Insider.

Requested stack, and logs. I captured the minidump as well.
(2c28.178): Security check failure or stack buffer overrun - code c0000409 (!!! second chance !!!)

Stack:
#226  Id: 2c28.178 Suspend: 1 Teb: 000000c3`c68db000 Unfrozen
 # Child-SP          RetAddr           Call Site
00 000000c3`ca13f7e0 00007ffb`b27140a0 xul!__report_gsfailure(unsigned int64 stack_cookie = 0x0000cd13`ca13f820)+0x1c [f:\dd\vctools\crt\vcstartup\src\gs\gs_report.c @ 220] 
01 000000c3`ca13f820 00007ffb`b271324c xul!`anonymous namespace'::refill(struct cubeb_stream * stm = <Value unavailable error>, void * input_buffer = <Value unavailable error>, long input_frames_count = 0n704, void * output_buffer = <Value unavailable error>, long output_frames_needed = 0n1056)+0x110 [z:\build\build\src\media\libcubeb\src\cubeb_wasapi.cpp @ 522] 
02 (Inline Function) --------`-------- xul!std::unique_ptr<auto_array_wrapper,std::default_delete<auto_array_wrapper> >::operator->+0x5 [z:\build\build\src\vs2017_15.6.6\VC\include\memory @ 2364] 
03 000000c3`ca13f8a0 00007ffb`b2714f89 xul!`anonymous namespace'::refill_callback_duplex(struct cubeb_stream * stm = <Value unavailable error>)+0x14c [z:\build\build\src\media\libcubeb\src\cubeb_wasapi.cpp @ 742] 
04 000000c3`ca13f920 00007ffc`0383c02e xul!`anonymous namespace'::wasapi_stream_render_loop(void * stream = 0x000001df`0090e040)+0x339 [z:\build\build\src\media\libcubeb\src\cubeb_wasapi.cpp @ 875] 
05 000000c3`ca13f9c0 00007ffc`04f33034 ucrtbase!thread_start<unsigned int +0x3e
06 000000c3`ca13f9f0 00007ffb`fc4e367b KERNEL32!BaseThreadInitThunk+0x14
07 (Inline Function) --------`-------- mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,128> >,void +0xe [z:\build\build\src\obj-firefox\dist\include\nsWindowsDllInterceptor.h @ 153] 
08 000000c3`ca13fa20 00007ffc`074c1431 mozglue!patched_BaseThreadInitThunk(int aIsInitialThread = 0n0, void * aStartAddress = 0x00007ffc`0383bff0, void * aThreadParam = 0x000001df`0000fd70)+0xbb [z:\build\build\src\mozglue\build\WindowsDllBlocklist.cpp @ 671] 
09 000000c3`ca13faa0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
If the minidump is needed, let me know how to get it to you.
Mozilla's Google Drive is one option.
We're failing a -GS stack integrity check, hiding the bug for now.
Group: core-security
At the time of the complaint in the minidump, the badness already happened, so this will be tricky to diagnose without sitting in front of your PC.

Any chance you could try capturing this in a Time Travel Trace? https://blogs.msdn.microsoft.com/windbg/2017/08/28/new-windbg-available-in-preview/

To keep the trace file to a sane size you'd probably want to start up the browser, load Appear, and only then attach the recorder to the content process.
[Tracking Requested - why for this release]: sec crash regression in 63 we should keep track of.
Severity: normal → major
Rank: 14
Priority: -- → P2
Just making sure one of you is assigned. Feel free to switch.
Assignee: nobody → dminor
dminor, dmajor...
Assignee: dminor → dmajor
While I'm assuming this is going to track down the root cause for the crash on appear.in do we need a separate bug for looking into why there is no crash report on about:crashes?

My concern here would be if switching to clang builds somehow results in crashes no longer getting reported.
Flags: needinfo?(dmajor)
(In reply to Nils Ohlmeier [:drno] from comment #13)
> While I'm assuming this is going to track down the root cause for the crash
> on appear.in do we need a separate bug for looking into why there is no
> crash report on about:crashes?
> 
> My concern here would be if switching to clang builds somehow results in
> crashes no longer getting reported.

The code considers this failure critical enough that it brings down the process as directly as possible, bypassing exception handlers. This happens for GS failures regardless of compiler. (See also http://www.alex-ionescu.com/?p=69 and bug 1235982 comment 17)

I'm not worried about clang turning off reporting of existing crashes. What is uncertain is whether clang-generated code is hitting GS failures more than MSVC. I don't think we have any way to tell.
Flags: needinfo?(dmajor)
I uploaded a time travel trace to Mozilla's Google Drive.
It's got the same filename and size as the previous minidump, are you sure it's the right file?
Jim, do you know if we get WER data on -GS crashes that deliberately bypass our crash reporter?
Flags: needinfo?(jmathies)
Group: core-security → media-core-security
Attached file asan-output.log
ASAN output
Paul could you take a look at the ASan log in comment 18? In particular this cast looks suspect:  https://searchfox.org/mozilla-central/rev/6f86cc3479f80ace97f62634e2c82a483d1ede40/media/libcubeb/src/cubeb_resampler.cpp#248
Flags: needinfo?(padenot)
Attachment #8992477 - Attachment mime type: text/x-log → text/plain
Thanks all. This is fixed upstream [0], bug for import is bug 1476278. This will uplift cleanly if needed. David, do you think we want this on builds that are still compiling with MSVC (beta, esr, etc.) ? The bug probably exist there as well.

It's going to be really hard for someone to control the values here, they comes from the system. It can be done if a custom audio driver has been installed, I suppose.

[0]: https://github.com/kinetiknz/cubeb/commit/6c4704304288c45d8198173ac4a6fe27d82b1e7c
Flags: needinfo?(padenot)
Flags: needinfo?(achronop)
Attached patch fix.patchSplinter Review
[Security approval request comment]
How easily could an exploit be constructed based on the patch?

This is tripping the stack canary and exiting the process without even touching the crash reporter, so I suppose pretty hard. Also the values written 4 bytes outside the stack are from the system, and cannot be controlled, unless a custom audio driver is being used. 

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

I've disguised this as a style fix, part of a cubeb uplift (that we do very often).

Which older supported branches are affected by this flaw?

I'll let dmajor comment on this, but probably ESR60 and Beta.

If not all supported branches, which bug introduced the flaw?

Bug 1339816 (upstream https://github.com/kinetiknz/cubeb/pull/221)

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This applies cleanly.

How likely is this patch to cause regressions; how much testing does it need?

Unlikely, we'll have confirmation that the fix works well soon.
Assignee: dmajor → padenot
Attachment #8992636 - Flags: sec-approval?
Nico, can you confirm applying this fixes it for you?
Flags: needinfo?(na-g)
(In reply to Paul Adenot (:padenot) from comment #21)
> David, do you think we want this on builds
> that are still compiling with MSVC (beta, esr, etc.) ? The bug probably
> exist there as well.

Uplifting would be the safest option, unless there's some reason not to (e.g. it draws attention to the bug or something)
Paul, I can confirm it no longer crashes with your patch applied.
Flags: needinfo?(na-g)
Duplicate of this bug: 1476110
Daniel, Al is on PTO and it's been a few days, would you mind giving us your judgment on the sec-approval, or finding someone else to do so? It's bringing down the process bypassing the crash-reporter when someone tries to use a microphone on Windows, we'd like to have it sorted out quickly so that the Nightly population can do WebRTC calls on Windows again.

Let me know if you need more info and thanks!
Flags: needinfo?(dveditz)
If you think the value is not controllable it's not sec-critical. I'm suspicious that since the value we're reading out depends on the audio being processed (frame count) and that's up to the page, maybe the other bit being written does too. I'll stick with sec-high out of caution but it could well be sec-moderate.
Comment on attachment 8992636 [details] [diff] [review]
fix.patch

sec-approval=dveditz
Attachment #8992636 - Flags: sec-approval? → sec-approval+
(In reply to Daniel Veditz [:dveditz] from comment #28)
> If you think the value is not controllable it's not sec-critical. I'm
> suspicious that since the value we're reading out depends on the audio being
> processed (frame count) and that's up to the page, maybe the other bit being
> written does too. I'll stick with sec-high out of caution but it could well
> be sec-moderate.

It's a frame count requested by the driver, not by the page. The page cannot decide the buffer size to use here, only the device driver does.
Attached patch uplift.patchSplinter Review
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1339816 (upstream https://github.com/kinetiknz/cubeb/pull/221)
[User impact if declined]: None if we're compiling with MSVC, but we can choose to uplift this if we want to be on the safe side. I'm pretty sure the current code (i.e. without this patch) is wrong but safe.
[Is this code covered by automated tests?]: Yes, however the crash can only be reproduced if compiling with clang-cl 
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: Run any webrtc 
[List of other uplifts needed for the feature/fix]: No
[Is the change risky?]: No
[Why is the change risky/not risky?]: It's very very simple
[String changes made/needed]: No
Attachment #8993734 - Flags: approval-mozilla-beta?
Is attachment 8992636 [details] [diff] [review] the right patch? It looks like it's for a different bug.
Flags: needinfo?(padenot)
It is the right one for central.

But that won’t apply to beta however...
For beta, uplift.patch needs to be used.
Flags: needinfo?(padenot)
Comment on attachment 8993734 [details] [diff] [review]
uplift.patch

Crash fix, has sec-approval, let's uplift for beta 11.
Attachment #8993734 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Why are there two copies of cubeb_resampler.cpp in the tree? Does the other one need to be updated too?
(In reply to David Major [:dmajor] from comment #37)
> Why are there two copies of cubeb_resampler.cpp in the tree? Does the other
> one need to be updated too?

Only one is compiled, the other is an artifact of having cubeb be a dependency of a rust crate.

only the one in media/ is compiled, not the one in third_party/.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
(In reply to Paul Adenot (:padenot) from comment #30)
> It's a frame count requested by the driver, not by the page. The page cannot
> decide the buffer size to use here, only the device driver does.

Would it be predictable given known audio content and the attacker correctly guessing a popular driver?
Group: media-core-security → core-security-release
Whiteboard: [fixed on trunk in bug 1476278]
Setting 63 as fixed as per comment #27
Based on comment 31, it sounds like we can wontfix this for ESR60. Feel free to nominate for uplift you feel strongly otherwise, however.
Target Milestone: --- → mozilla63
Flags: qe-verify-
Whiteboard: [fixed on trunk in bug 1476278] → [fixed on trunk in bug 1476278][post-critsmash-triage]
(In reply to David Major [:dmajor] from comment #17)
> Jim, do you know if we get WER data on -GS crashes that deliberately bypass
> our crash reporter?

We really don't get much of anything from them right now. The data they p[rovide is worthless due to the lack of stacks data and minidumps.
Flags: needinfo?(jmathies)
Whiteboard: [fixed on trunk in bug 1476278][post-critsmash-triage] → [fixed on trunk in bug 1476278][post-critsmash-triage][adv-main62+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.