Closed Bug 1367850 Opened 2 years ago Closed 2 years ago

REFTEST PROCESS-CRASH | http://10.252.73.233:33649/tests/dom/media/test/crashtests/1236639.html | application crashed [@ libc.so + 0x4027e]

Categories

(Firefox for Android :: General, defect)

54 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 56
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 + fixed
firefox56 + fixed

People

(Reporter: bc, Assigned: rbarker)

References

Details

(Keywords: csectype-uaf, sec-high, Whiteboard: [adv-main55+])

Attachments

(2 files, 2 obsolete files)

I'm not completely sure this isn't a dupe.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=4ac208f317b37f7ed2e669d5ca23064ebc979103&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-searchStr=autophone&selectedJob=97735888

https://treeherder.mozilla.org/logviewer.html#?job_id=97735888&repo=mozilla-beta&lineNumber=870

REFTEST PROCESS-CRASH | http://10.252.73.233:33649/tests/dom/media/test/crashtests/1236639.html | application crashed [@ libc.so + 0x4027e]
Crash dump filename: /tmp/tmpCviVP9/34e795c8-c451-2540-c2ae-f92c6142f83a.dmp
Operating system: Android
                  0.0.0 Linux 3.10.73-g9741316 #1 SMP PREEMPT Thu Nov 5 02:17:00 UTC 2015 armv8l
CPU: arm
     ARMv1 ARM part(0x4100d030) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,vfpv4,idiva,idivt
     8 CPUs

GPU: UNKNOWN

Crash reason:  SIGSEGV
Crash address: 0xe5e5e5e5
Process uptime: not available

Thread 0 (crashed)
 0  libc.so + 0x4027e
     r0 = 0xe5e5e5e5    r1 = 0x00000000    r2 = 0xd1e37728    r3 = 0xe02cb24c
     r4 = 0xe5e5e5e5    r5 = 0xffb07e10    r6 = 0xffb07e14    r7 = 0xd1e37728
     r8 = 0xffb07e00    r9 = 0xbe414080   r10 = 0xffb07e08   r12 = 0xef37ad5c
     fp = 0xdf8c3c70    sp = 0xffb07dc0    lr = 0xef2acac3    pc = 0xf6fe727e
    Found by: given as instruction pointer in context
 1  libnss3.so!PR_Lock [ptsynch.c:4ac208f317b3 : 177 + 0x5]
     sp = 0xffb07dd0    pc = 0xef2acac3
    Found by: stack scanning
 2  libxul.so!mozilla::OffTheBooksMutex::Lock [BlockingResourceBase.cpp:4ac208f317b3 : 382 + 0x5]
     r4 = 0xd1e37728    sp = 0xffb07dd8    pc = 0xdd4ea083
    Found by: call frame info
 3  libxul.so!mozilla::AndroidBridge::RunDelayedUiThreadTasks [Mutex.h:4ac208f317b3 : 210 + 0x5]

This is not bug 1354853 which is resolved on beta and has a different stack.

I'm going to use bug 1290569 to star this but hide the exploitablity.
Looks like a UAF in some Android widget code. Any ideas, Jim?
Component: Audio/Video: Playback → General
Flags: needinfo?(nchen)
Product: Core → Firefox for Android
Flags: needinfo?(nchen) → needinfo?(rbarker)
Flags: needinfo?(rbarker)
Didn't mean to cancel the ni?

Is there more info somewhere other than that stack trace? Just guessing with out more info, since this is the last test to run could this crash be on shutting down fennec? I believe that is what the log shows? I would assume RunDelayedUiThreadTasks is getting invoked after shutdown? I guess I can look as see if there is a way to prevent that? But under normal operation, since android just kills fennec and it normally does not shutdown cleanly, I don't see this getting hit often. I ran locally multiple times and was unable to reproduce it.
Just the logcat beginning at https://treeherder.mozilla.org/logviewer.html#?job_id=97735888&repo=mozilla-beta&lineNumber=7466
Unfortunately we were still getting our logcat messages trimmed by chatty and I don't see anything. Maybe you will.

I haven't seen this again since it happened.
Looking at the code I'm fairly certain this is a race condition at shutdown. I'm thinking it may be a stale cache value on the UI thread? This call is guarded by two variables but neither one is protected by a lock nor are they atomic. Since both values are set in the main thread I would guess it is possible this guard fails at shutdown? I can make the boolean atomic and see if this still happens?
That sounds like it won't hurt and can only help. I'll keep an eye out and if we don't see it again in a while (months?) we can resolve it.
Assignee: nobody → rbarker
Ugly fix but it is the only solution I could come up with.

Problem:

The while loop in AndroidBridge::RunDelayedUiThreadTasks locks and unlock the Mutex mUiTaskQueueLock as it runs tasks in the UI thread. The AndroidUiThread destroys it self in this while loop with a task. The task is synchronous. Once the task has run, the main thread proceeds to destroy the AndroidBridge along with the mUiTaskQueueLock. When the MutexAutoLock and MutexAutoUnlock objects in AndroidBridge::RunDelayedUiThreadTasks try to lock/unlock the Mutex when they are destroyed, it is possible they are operating on a dangling pointer since the Mutex was already destroyed in the main thread. This is basically a race.

Solution:

When the AndroidUiThread gets destroyed, flush the queue and have the MutexAutoLock and MutexAutoUnlock forget the Mutex so they do not try and manipulate a deleted Mutex and return immediately.

This probably doesn't happen very often in the wild since most users never shutdown fennec.
Attachment #8873644 - Flags: review?(nchen)
Since this is a racey crash on exit, is it a security issue?
What do you think about moving `mUiTaskQueueLock` out of `AndroidBridge`, and just not free the lock when we shut down? We already do something similar in `nsAppShell` [1], where the lock gets initialized on startup but is never deleted on shutdown.

[1] https://dxr.mozilla.org/mozilla-central/rev/bdb2387396b4a74dfefb7c983733eed3625e906a/widget/android/nsAppShell.cpp#89
Flags: needinfo?(rbarker)
(In reply to Jim Chen [:jchen] [:darchons] from comment #9)
> What do you think about moving `mUiTaskQueueLock` out of `AndroidBridge`,
> and just not free the lock when we shut down? We already do something
> similar in `nsAppShell` [1], where the lock gets initialized on startup but
> is never deleted on shutdown.
>

Perhaps just removing AndroidBridge::PostTaskToUiThread() and moving RunDelayedUiThreadTasks, mUiTaskQueue, and its lock to AndroidUiThread makes the most sense. There is only one external call to AndroidBridge::PostTaskToUiThread so I could just change that to mozilla::GetAndroidUiThread()->Dispatch(). Then I think the mUiTaskQueue Mutex could be deleted safely in AndroidUiThread.
Flags: needinfo?(rbarker)
Comment on attachment 8873644 [details] [diff] [review]
0001-Bug-1367850-Flush-UI-task-queue-after-AndroidUiThread-has-been-shutdown-r-jchen-17060116-21a60aa0e51b.patch

Clearing review until I can create a new patch based on feedback.
Attachment #8873644 - Flags: review?(nchen)
Attachment #8873917 - Flags: review?(nchen) → review+
Comment on attachment 8873917 [details] [diff] [review]
0001-Bug-1367850-Move-Android-UI-thread-runnable-queue-from-AndroidBridge-to-AndroidUiThread-r-jchen-17060211-74bc139734bf.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

I don't see how an exploit could be created.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

I'm not sure there is a security problem.

Which older supported branches are affected by this flaw?

introduced in 53.

If not all supported branches, which bug introduced the flaw? Bug 1319850

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

No, I don't think it is an exploitable flaw since it only happens when Fennec is shutdown which really doesn't happen much in the wild.

How likely is this patch to cause regressions; how much testing does it need?

I think it is unlikely to cause a regression.
Attachment #8873917 - Flags: sec-approval?
So I'm pretty sure this is the same bug as Bug 1354853. However I don't believe the patch for that bug actually fixed the problem.
This has missed Firefox 54. We've already done our last beta. I'm giving sec-approval for checkin on June 27 but not before then. That's two weeks into the next cycle. At that point, we'll want it onBeta (55) as well so please nominate a patch when you do the trunk checkin.
Attachment #8873917 - Flags: sec-approval? → sec-approval+
Group: core-security → firefox-core-security
Duplicate of this bug: 1322781
Attached patch Rebased patch for Beta (obsolete) — Splinter Review
Approval Request Comment
[Feature/Bug causing the regression]: Adding support for nsThread to Android UI thread
[User impact if declined]: Possible use after free on shutdown
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: Just prevents mutex from being freed prematurely 
[String changes made/needed]: none
Attachment #8881541 - Flags: approval-mozilla-beta?
Randall pushed this to inbound earlier today.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6f8452cbd4b
Whiteboard: [checkin on 6/27]
https://hg.mozilla.org/mozilla-central/rev/a6f8452cbd4b
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
Group: firefox-core-security → core-security-release
Attachment #8881541 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
This got backed out in https://hg.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136 for causing bug 1376670.
Status: RESOLVED → REOPENED
Flags: needinfo?(rbarker)
Resolution: FIXED → ---
Target Milestone: Firefox 56 → ---
I'm looking into now.
Flags: needinfo?(rbarker)
I've run my new patch through try four times and have not seen the try failures. Do I just push my patch again to inbound or do I need approval? Try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=43321b4ce35e93cae47e5b893c554b83379b228d
You don't need to get approval again. Push when ready :)
Autophone doesn't invoke from -u all. It couldn't handle all of the load from everyone running -u all. To run the test which failed here use try: -b do -p android-api-15 -u autophone-crashtest-dom-media -t none.

I manually invoked the test on your try build. You can see the appropriate test as Cdm1.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=43321b4ce35e93cae47e5b893c554b83379b228d&filter-searchStr=Cdm1

If you want to rerun your test without submitting a new try build, just retrigger it. I retriggered it for a total of 4 times.
https://hg.mozilla.org/mozilla-central/rev/c803c00aed9b
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
Does beta need an updated patch as well?
Flags: needinfo?(rbarker)
(In reply to Julien Cristau [:jcristau] from comment #27)
> Does beta need an updated patch as well?

Yes, I was waiting to see if it gets backed out again since I wasn't 100% confident that I fixed the issue. There is an API change so it won't apply cleanly. I'll rebase and update the beta patch.
Flags: needinfo?(rbarker)
This is the updated version of the patch currently in mc, fixed for Beta.
Attachment #8881541 - Attachment is obsolete: true
Comment on attachment 8883640 [details] [diff] [review]
V2 of Rebased patch for Beta

carrying over beta approval to the new version of this patch
Attachment #8883640 - Flags: approval-mozilla-beta+
(In reply to Randall Barker [:rbarker] from comment #18)
> [Is this code covered by automated tests?]: no
> [Has the fix been verified in Nightly?]: no
> [Needs manual test from QE? If yes, steps to reproduce]: no

Setting qe-verify- based on Randall's assessment on manual testing needs.
Flags: qe-verify-
(In reply to Andrei Vaida [:avaida], Desktop Release QA – please ni? me from comment #32)
> (In reply to Randall Barker [:rbarker] from comment #18)
> > [Is this code covered by automated tests?]: no
> > [Has the fix been verified in Nightly?]: no
> > [Needs manual test from QE? If yes, steps to reproduce]: no
> 
> Setting qe-verify- based on Randall's assessment on manual testing needs.

Verifying it will be difficult. The original bug would intermittently occur on shutdown which general only happens in tests since normal Fennec operation is for the OS to just kill the app. Additionally, if you use an add-on to exit, if the shutdown is taking too long, the OS will kill the app anyway. The best you can probably do is shutdown Fennec a lot with an add-on and ensure that it doesn't crash releasing the mutex after invoking a runnalbe in the UI thread. It wouldn't surprise me if there were other crashes on shutdown since testing shutdown in Fennec is problematic.
Whiteboard: [adv-main55+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.