Closed Bug 1368030 Opened 7 years ago Closed 7 years ago

Intermittent dom/media/tests/mochitest/test_getUserMedia_basicScreenshare.html | application terminated with exit code 5

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

55 Branch
x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 55+ fixed
firefox54 --- wontfix
firefox55 + fixed
firefox56 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: dminor)

References

Details

(Keywords: csectype-uaf, intermittent-failure, sec-high, Whiteboard: [adv-main55+][adv-esr52.3+][post-critsmash-triage])

Attachments

(5 files, 2 obsolete files)

Log contains:

01:45:50     INFO - GECKO(1784) | firefox(1784,0x7fff7c599300) malloc: *** error for object 0xe5e5e5e5e5e5e5e5: pointer being freed was not allocated

so potential UAF?
Group: core-security
Attached file Log snippet
Here's the relevant chunk of the log, for posterity.
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
Version: unspecified → 55 Branch
Group: core-security → media-core-security
Flags: needinfo?(rjesup)
Component: WebRTC → WebRTC: Audio/Video
Flags: needinfo?(rjesup)
Rank: 25
Priority: -- → P2
This is a UAF sec-high; it needs to be a P1
Rank: 25 → 13
Priority: P2 → P1
Attached file crash5.txt
I was able to reproduce and get a callstack by modifying test_getUserMedia_basicScreenshare to run in a loop. Looks like memory corruption to me.

TEST-PASS | /tests/dom/media/tests/mochitest/test_getUserMedia_basicScreenshare.html | ended event successfully fired
Call getUserMedia for {"video":{"mediaSource":"screen"},"fake":false}
libmozglue.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 93139 stopped
* thread #62: tid = 0x445e2b, 0x00000001000935a5 libmozglue.dylib`arena_dalloc(void*, unsigned long) [inlined] arena_run_reg_dalloc(run=0x000000011faee000, bin=<unavailable>, size=<unavailable>) + 83 at mozjemalloc.cpp:2658, name = 'VideoCapture', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001000935a5 libmozglue.dylib`arena_dalloc(void*, unsigned long) [inlined] arena_run_reg_dalloc(run=0x000000011faee000, bin=<unavailable>, size=<unavailable>) + 83 at mozjemalloc.cpp:2658 [opt]
   2655		if (elm < run->regs_minelm)
   2656			run->regs_minelm = elm;
   2657		bit = regind - (elm << (SIZEOF_INT_2POW + 3));
-> 2658		MOZ_DIAGNOSTIC_ASSERT((run->regs_mask[elm] & (1U << bit)) == 0);
   2659		run->regs_mask[elm] |= (1U << bit);
   2660	#undef SIZE_INV
   2661	#undef SIZE_INV_SHIFT

Callstack includes https://dxr.mozilla.org/mozilla-central/rev/a49112c7a5765802096b3fc298069b9495436107/dom/media/systemservices/CamerasParent.cpp#774
With https://jsfiddle.net/jib1/01L637Ly/ and debug build I got a different stack.

   ~WriteLockScoped() UNLOCK_FUNCTION() {
=>   rw_lock_.ReleaseLockExclusive();
   }

rw_lock_	webrtc::RWLockWrapper &	0x0000000192a6cd50 : (E5 E5 E5 E5  E5 E5 E5 E5  E5 E5 E5 E5  E5 E5 E5 E5)

Thread 1 Queue : com.apple.main-thread (serial)
#0	0x0000000109c8e2e6 in webrtc::WriteLockScoped::~WriteLockScoped() at /Users/Jan/moz/mozilla-central/media/webrtc/trunk/webrtc/system_wrappers/include/rw_lock_wrapper.h:59
#1	0x0000000109c709e5 in webrtc::WriteLockScoped::~WriteLockScoped() at /Users/Jan/moz/mozilla-central/media/webrtc/trunk/webrtc/system_wrappers/include/rw_lock_wrapper.h:58
#2	0x0000000109d75146 in webrtc::ScreenCapturerHelper::InvalidateRegion(webrtc::DesktopRegion const&) at /Users/Jan/moz/mozilla-central/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_helper.cc:37
#3	0x0000000109d8275e in webrtc::(anonymous namespace)::ScreenCapturerMac::ScreenRefresh(unsigned int, CGRect const*) at /Users/Jan/moz/mozilla-central/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_mac.mm:937
#4	0x0000000109d818b4 in webrtc::(anonymous namespace)::ScreenCapturerMac::ScreenRefreshCallback(unsigned int, CGRect const*, void*) at /Users/Jan/moz/mozilla-central/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_mac.mm:960
#5	0x00007fffe2247308 in display_refresh_proc ()
#6	0x00007fffe2245538 in CGSPostLocalNotification ()
#7	0x00007fffe2244f8f in (anonymous namespace)::notify_datagram_handler(unsigned int, CGSDatagramType, void*, unsigned long, void*) ()
#8	0x00007fffe20f4f39 in CGSDatagramReadStream::dispatch_next_main_queue_datagram() ()
#9	0x00007fffe20f4e09 in ___ZN21CGSDatagramReadStream35dispatch_main_queue_datagrams_asyncEP16dispatch_queue_sPS__block_invoke ()
#10	0x000000010012d6e5 in _dispatch_call_block_and_release ()
#11	0x0000000100123f5c in _dispatch_client_callout ()
#12	0x0000000100132b73 in _dispatch_main_queue_callback_4CF ()
#13	0x00007fffd0884529 in __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ ()
#14	0x00007fffd084546d in __CFRunLoopRun ()
#15	0x00007fffd0844974 in CFRunLoopRunSpecific ()
#16	0x00007fffcfdd0a5c in RunCurrentEventLoopInMode ()
#17	0x00007fffcfdd0799 in ReceiveNextEventCommon ()
#18	0x00007fffcfdd06c6 in _BlockUntilNextEventMatchingListInModeWithFilter ()
#19	0x00007fffce3765b4 in _DPSNextEvent ()
#20	0x00007fffceaf0d6b in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] ()
#21	0x00000001079b9204 in ::-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:](NSEventMask, NSDate *, NSString *, BOOL) at /Users/Jan/moz/mozilla-central/widget/cocoa/nsAppShell.mm:130
#22	0x00007fffce36af35 in -[NSApplication run] ()
#23	0x00000001079bb18c in nsAppShell::Run() at /Users/Jan/moz/mozilla-central/widget/cocoa/nsAppShell.mm:673
#24	0x000000010a4f717b in nsAppStartup::Run() at /Users/Jan/moz/mozilla-central/toolkit/components/startup/nsAppStartup.cpp:283
#25	0x000000010a64c98d in XREMain::XRE_mainRun() at /Users/Jan/moz/mozilla-central/toolkit/xre/nsAppRunner.cpp:4569
#26	0x000000010a64df1a in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) at /Users/Jan/moz/mozilla-central/toolkit/xre/nsAppRunner.cpp:4749
#27	0x000000010a64e69c in XRE_main(int, char**, mozilla::BootstrapConfig const&) at /Users/Jan/moz/mozilla-central/toolkit/xre/nsAppRunner.cpp:4844
#28	0x000000010a661927 in mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) at /Users/Jan/moz/mozilla-central/toolkit/xre/Bootstrap.cpp:45
#29	0x00000001000010d6 in do_main(int, char**, char**) at /Users/Jan/moz/mozilla-central/browser/app/nsBrowserApp.cpp:236
#30	0x0000000100000bb1 in main at /Users/Jan/moz/mozilla-central/browser/app/nsBrowserApp.cpp:309
#31	0x0000000100000ac4 in start ()
This looks like a race between

1. main thread in ScreenCapturerMac::ScreenRefresh waking up on the lock in ScreenCapturerHelper::InvalidateRegion [1], and

2. the VideoCapture thread in ~ScreenCapturerMac (it has called UnregisterRefreshAndMoveHandlers() [2] but it's insufficient).

[1] https://dxr.mozilla.org/mozilla-central/rev/a49112c7a5765802096b3fc298069b9495436107/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_mac.mm#937

[2] https://dxr.mozilla.org/mozilla-central/rev/a49112c7a5765802096b3fc298069b9495436107/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_mac.mm#915-916

Jesup, thoughts on best fix?
Flags: needinfo?(rjesup)
We might need to dispatch something holding it alive to the event thread after removing the listener, or some such (or remove it from that thread).  The Mac docs don't talk to this at all, of course.
Some sort of trampoline object with a locked pointer to the real object that ~realobject can null out might also resolve it.
Flags: needinfo?(rjesup)
Jesup, do you have anyone who can take this?
Flags: needinfo?(rjesup)
dminor, can you take a shot at it?
Flags: needinfo?(rjesup) → needinfo?(dminor)
I'll have a look once I'm back home and have access to my OS X machine.
Assignee: nobody → dminor
Flags: needinfo?(dminor)
(In reply to Dan Minor [:dminor] from comment #10)
> I'll have a look once I'm back home and have access to my OS X machine.

Any timeline?
Flags: needinfo?(dminor)
I spent some time on this last week unsuccessfully. I'll have another look today.
Flags: needinfo?(dminor)
I was unable to reproduce the crash on my machine while running the mochitest in a loop, even when left to run overnight. I also ran it for several hours with my patch applied without any problems. I'd be happier if I could verify this fixes things.
Attachment #8888477 - Flags: review?(rjesup)
Comment on attachment 8888477 [details] [diff] [review]
Fix race condition in ScreenCapturerMac

Review of attachment 8888477 [details] [diff] [review]:
-----------------------------------------------------------------

r+, one nit you may want to address before landing as you perfer

::: media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_mac.mm
@@ +436,5 @@
>  
>  ScreenCapturerMac::~ScreenCapturerMac() {
>    ReleaseBuffers();
> +  rtc::CritScope lock(&screen_callback_data_->crit_sect_);
> +  screen_callback_data_->capturer = nullptr;

Should this be in {}?
Attachment #8888477 - Flags: review?(rjesup) → review+
Comment on attachment 8888477 [details] [diff] [review]
Fix race condition in ScreenCapturerMac

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
I think it would be difficult because the affected code handles screen capture data provided by the OS rather than data from outside sources.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
The comment mentions the race condition, not the UAF. Please let me know if I should redact the comment before landing.

Which older supported branches are affected by this flaw?
All.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
The attached patch will apply with minimal changes.

How likely is this patch to cause regressions; how much testing does it need?
Seems unlikely, this is in shutdown code. I tested it by running a mochitest in a loop for 2 or 3 hours that simulated the shutdown condition without seeing any problems.
Attachment #8888477 - Flags: sec-approval?
Comment on attachment 8888477 [details] [diff] [review]
Fix race condition in ScreenCapturerMac

sec-approval+ for trunk.
We'll want Beta and ESR52 patches made and nominated ASAP as well.
Attachment #8888477 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a60cc1983211905a5bc0aaf9e3414469add6b91

This is going to need rebased patches for Beta & ESR52.
Flags: needinfo?(dminor)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0a60cc198321
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Attached patch bug-1368030-fix-beta.patch (obsolete) — Splinter Review
Approval Request Comment
[Feature/Bug causing the regression]:
This is a bug in the upstream webrtc.org code. It is likely that we have a different threading model which causes us to see this.
[User impact if declined]:
Crashes / potential security exploits.
[Is this code covered by automated tests?]:
Yes.
[Has the fix been verified in Nightly?]:
It has just merged to nightly, not verified there yet.
[Needs manual test from QE? If yes, steps to reproduce]:
No. 
[List of other uplifts needed for the feature/fix]:
None.
[Is the change risky?]:
No.
[Why is the change risky/not risky?]:
This is a small change to the screen capture code that is covered by automated tests.
[String changes made/needed]:
None.
Flags: needinfo?(dminor)
Attachment #8889898 - Flags: approval-mozilla-beta?
had to back this out, turned out this cause frequent failures like https://treeherder.mozilla.org/logviewer.html#?job_id=117543431&repo=mozilla-inbound
Status: RESOLVED → REOPENED
Flags: needinfo?(dminor)
Resolution: FIXED → ---
Flags: needinfo?(dminor)
Attachment #8889898 - Flags: approval-mozilla-beta?
Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/base/criticalsection.cc, line 141

See stack backtrace in the log
When screen_callback_data_ is deleted in the callback, the rtc::CritScope object goes out of scope after the deletion occurs, triggering the assertion here. Switching to using explicit Enter() and Leave() calls in the callbacks fixes things.

I forgot to set --enable-debug in my mozconfig which would have made this easy to notice the first time around. Sorry!
Attachment #8888477 - Attachment is obsolete: true
Attachment #8890495 - Flags: review?(rjesup)
Attachment #8890495 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/baa75a499752
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Depends on: 1384874
Approval Request Comment
[Feature/Bug causing the regression]:
This is a bug in the upstream webrtc.org code. It is likely that we have a different threading model which causes us to see this.
[User impact if declined]:
Crashes / potential security exploits.
[Is this code covered by automated tests?]:
Yes.
[Has the fix been verified in Nightly?]:
Yes.
[Needs manual test from QE? If yes, steps to reproduce]:
No. 
[List of other uplifts needed for the feature/fix]:
None.
[Is the change risky?]:
No.
[Why is the change risky/not risky?]:
This is a small change to the screen capture code that is covered by automated tests.
[String changes made/needed]:
None.
Attachment #8889898 - Attachment is obsolete: true
Attachment #8890850 - Flags: approval-mozilla-beta?
Comment on attachment 8890850 [details] [diff] [review]
bug-1368030-fix-beta.patch

webrtc fix, sec-high, should be in 55.0b13
Attachment #8890850 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined:
Crashes / potential security exploits. 
Fix Landed on Version:
56
Risk to taking this patch (and alternatives if risky): 
Low risk, covered by automated tests.
String or UUID changes made by this patch: 
None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8890884 - Flags: approval-mozilla-esr52?
Comment on attachment 8890884 [details] [diff] [review]
bug-1368030-fix-esr52.patch

Going into trunk and beta, we should take it on ESR52 as well for consistency and protection.
Attachment #8890884 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Whiteboard: [adv-main55+][adv-esr52.3+]
Group: media-core-security → core-security-release
Flags: qe-verify-
Whiteboard: [adv-main55+][adv-esr52.3+] → [adv-main55+][adv-esr52.3+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.