Closed Bug 1354853 Opened 7 years ago Closed 7 years ago

Categories

(Firefox for Android Graveyard :: Audio/Video, defect, P1)

53 Branch
defect

Tracking

(fennec+, firefox-esr52 wontfix, firefox53 wontfix, firefox54+ fixed, firefox55+ fixed)

RESOLVED FIXED
Firefox 55
Tracking Status
fennec + ---
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 + fixed
firefox55 + fixed

People

(Reporter: bc, Assigned: bechen)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main54+])

Crash Data

Attachments

(1 file, 5 obsolete files)

https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=92ba21762445f89ae0691c4eab0746ca1cb819c2&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-searchStr=autophone

https://treeherder.mozilla.org/logviewer.html#?job_id=89478742&repo=mozilla-aurora&lineNumber=643

Android 7.1 / Pixel device Aurora 53 build crashed with references to deleted memory

REFTEST PROCESS-CRASH | http://10.252.73.230:39858/tests/layout/reftests/ogg-video/poster-15.html == http://10.252.73.230:39858/tests/layout/reftests/ogg-video/poster-ref-green70x30.html | application crashed [@ libc.so + 0x475fa]
Crash dump filename: /tmp/tmpqB6vSh/1559537f-f53c-1b4f-76cb0111-0460c488.dmp
Operating system: Android
                  0.0.0 Linux 3.18.31-g895c4a6 #1 SMP PREEMPT Sun Sep 11 20:29:44 UTC 2016 armv8l
CPU: arm
     ARMv1 Qualcomm part(0x51002050) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
     4 CPUs

GPU: UNKNOWN

Crash reason:  SIGSEGV
Crash address: 0xe5e5e5e5
Process uptime: not available

Thread 0 (crashed)
 0  libc.so + 0x475fa
     r0 = 0xe5e5e5e5    r1 = 0x00000001    r2 = 0x00000000    r3 = 0x00000001
     r4 = 0xe5e5e5e5    r5 = 0xbe884b38    r6 = 0xffe6c3e4    r7 = 0xffe6c4d0
     r8 = 0xbe884b34    r9 = 0xb368a6e8   r10 = 0xea08548c   r12 = 0xe2efad44
     fp = 0xffe6c49c    sp = 0xffe6c3d0    lr = 0xe2e72f1f    pc = 0xeaffb5fa
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::AndroidBridge::RunDelayedUiThreadTasks [Mutex.h:92ba21762445 : 210 + 0x5]
     sp = 0xffe6c3d8    pc = 0xcdb3b66b
    Found by: stack scanning
 2  base.odex + 0x7ca34d
     r4 = 0xec894328    r5 = 0xffe6c760    r6 = 0xffe6c510    r7 = 0xffe6c4d0
     r8 = 0xffe6c760    r9 = 0xea085400    sp = 0xffe6c410    pc = 0xd080d34f
Note we've had reports of this test crashing for some time:
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20summary%3A%22poster-15.html%22
"0xe5e5e5e5" is a UAF marker from Gecko, not the Fennec front-end. Should this be moved into Core?
Flags: needinfo?(ajones)
Filed a public bug to use for sheriffing: bug 1355691
Blake - can you find someone to look into this issue?
Flags: needinfo?(ajones)
tracking-fennec: ? → +
Sorry. I missed this bug. 
Ni John and Benjamin to check.
Flags: needinfo?(jolin)
Flags: needinfo?(bechen)
Priority: -- → P1
Keywords: testcase
At the beginning, I think we should start from bug 1283144, because the link in comment 1 shows that ogg-video/poster-15.html crash again. And also need to know the "shutdown procedure" under media-e10s?
Flags: needinfo?(bechen)
Keywords: testcase
Keywords: testcase
After discuss with :jolin, we believe we have find the root cause.

1.https://searchfox.org/mozilla-central/source/widget/android/nsAppShell.cpp#451
2. https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoThread.java#96 
   https://searchfox.org/mozilla-central/source/widget/android/nsAppShell.cpp#242

The 1:AndroidBridge::DeconstructBridge() race with 2:RunUiThreadCallback(). The first one runs at our gecko thread and the second one runs at java UI thread.

And the crash stack clearly shows that the AndroidBridge::Bridge was deleted because the first line of AndroidBridge::RunDelayedUiThreadTasks is to access a member of AndroidBridge::Bridge.

Also crash scenario always happened at the ogg-video/poster-15.html because it is the last testcase in the list, when finish the last one, the code enter shutdown stage hit the crash.
Flags: needinfo?(jolin)
Attached patch bug1354853.patch (obsolete) — Splinter Review
Assignee: nobody → bechen
Attachment #8863669 - Flags: review?(jolin)
Attached patch bug1354853.patch (obsolete) — Splinter Review
Attachment #8863669 - Attachment is obsolete: true
Attachment #8863669 - Flags: review?(jolin)
Attachment #8863999 - Flags: review?(nchen)
Comment on attachment 8863999 [details] [diff] [review]
bug1354853.patch

Review of attachment 8863999 [details] [diff] [review]:
-----------------------------------------------------------------

Instead of changing `AndroidBridge`, you should add `IsAndroidUiThreadValid()` to `AndroidUiThread`, which returns false if `sThread == null`. Then you can call `IsAndroidUiThreadValid()` inside `GeckoAppShell`.

::: widget/android/AndroidBridge.cpp
@@ +63,5 @@
>  AndroidBridge* AndroidBridge::sBridge = nullptr;
>  static jobject sGlobalContext = nullptr;
>  nsDataHashtable<nsStringHashKey, nsString> AndroidBridge::sStoragePaths;
> +// sBridgeInvalid access only on JAVA UI thread.
> +bool AndroidBridge::sBridgeInvalid = false;

Rename to `sBridgeInvalidForUIThread`

@@ +134,5 @@
>      putenv("NSS_DISABLE_UNLOAD=1");
>  
>      MOZ_ASSERT(!sBridge);
>      sBridge = new AndroidBridge();
> +    sBridgeInvalid = false;

Don't need this line because it's initialized to false.

::: widget/android/AndroidBridge.h
@@ +114,5 @@
>          return sBridge;
>      }
>  
> +    // JAVA UI thread only.
> +    static void SetBridgeInvalid() {

`SetBridgeInvalidForUIThread`

@@ +117,5 @@
> +    // JAVA UI thread only.
> +    static void SetBridgeInvalid() {
> +        sBridgeInvalid = true;
> +    }
> +    static bool IsBridgeInvalid() {

`IsBridgeInvalidForUIThread`
Attachment #8863999 - Flags: review?(nchen) → feedback+
(In reply to Jim Chen [:jchen] [:darchons] from comment #13)
> Comment on attachment 8863999 [details] [diff] [review]
> bug1354853.patch
> 
> Rename to `sBridgeInvalidForUIThread`
> 
> Don't need this line because it's initialized to false.
> 
> `SetBridgeInvalidForUIThread`
> 
> `IsBridgeInvalidForUIThread`

Sorry please ignore these comments.
(In reply to Jim Chen [:jchen] [:darchons] from comment #13)
> Comment on attachment 8863999 [details] [diff] [review]
> bug1354853.patch
> 
> Review of attachment 8863999 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Instead of changing `AndroidBridge`, you should add
> `IsAndroidUiThreadValid()` to `AndroidUiThread`, which returns false if
> `sThread == null`. Then you can call `IsAndroidUiThreadValid()` inside
> `GeckoAppShell`.

 Good idea! However it seems to me that a dedicate flag is still needed because |sThread| is initialized in |CreateOnUiThread::Run()|, which runs on Java UI thread too.
 Also think instead of having |GeckoThreadSupport::RunUiThreadCallback()| validate then run tasks explicitly, we could add a static method AndroidUiThread::RunDelayedTasksIfValid() to hide those details.
Attached patch bug1354853.patch (obsolete) — Splinter Review
Attachment #8863999 - Attachment is obsolete: true
Attachment #8864405 - Flags: review?(nchen)
Attachment #8864405 - Flags: review?(jolin)
Comment on attachment 8864405 [details] [diff] [review]
bug1354853.patch

Review of attachment 8864405 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for getting this done. Please get r+ from Jim before landing.

::: widget/android/AndroidUiThread.cpp
@@ +56,5 @@
> +  static bool SetAndroidUiThreadInvalid() {
> +    return sBridgeValidForUIThread = false;
> +  }
> +  static int64_t RunDelayedTasksIfValid() {
> +    MOZ_ASSERT(AndroidBridge::Bridge());

This assertion would fail when Java UI thread try to run tasks after `AndroidBridge` is destroyed.

@@ +68,2 @@
>  private:
> +  static bool sBridgeValidForUIThread;

Suggest to rename this flag because it has nothing to do with Bridge.
Attachment #8864405 - Flags: review?(jolin) → review+
Attached patch bug1354853.patch (obsolete) — Splinter Review
Attachment #8864405 - Attachment is obsolete: true
Attachment #8864405 - Flags: review?(nchen)
Attachment #8864440 - Flags: review?(nchen)
Comment on attachment 8864440 [details] [diff] [review]
bug1354853.patch

Review of attachment 8864440 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/AndroidUiThread.cpp
@@ +52,5 @@
> +  }
> +  static bool SetAndroidUiThreadValid() {
> +    return sValidForUIThread = true;
> +  }
> +  static bool SetAndroidUiThreadInvalid() {

Use the variable directly. Don't add `IsAndroidUiThreadValid`, `SetAndroidUiThreadValid`, and `SetAndroidUiThreadInvalid`.

@@ +67,2 @@
>  private:
> +  static bool sValidForUIThread;

Rename to `sThreadDestroyed` and put it next to the definition for `sThread`. You only need to set `sThreadDestroyed` to true inside `DestroyOnUiThread::Run` and check it before using `AndroidBridge`.
Attachment #8864440 - Flags: review?(nchen) → review+
Attached patch bug1354853.patch (obsolete) — Splinter Review
Attachment #8864440 - Attachment is obsolete: true
Attachment #8865305 - Flags: review+
Comment on attachment 8865305 [details] [diff] [review]
bug1354853.patch

[Security approval request comment]
Q: How easily could an exploit be constructed based on the patch?
A; I don't think it is easily because the problem occurs at "shutdown" procedure.

Q: Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
A: If the term "use-after-free" expose the flaw, I should change the commit message to "avoid racing on Androidridge object"?

Q:Which older supported branches are affected by this flaw?
A: All.

Q: Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
A: The modification of the patch is simple, should be clean apply all other branches. 

Q: How likely is this patch to cause regressions; how much testing does it need?
A: If the patch introduces regressions, it is reproduce if we run the autophone tests on Android platform.
Attachment #8865305 - Flags: sec-approval?
(In reply to Benjamin Chen [:bechen] from comment #24)
> A: If the term "use-after-free" expose the flaw, I should change the commit
> message to "avoid racing on Androidridge object"?

Yes, please change this.

Will you be checking this in yourself, or asking the sheriffs? If the latter we should get a new patch in so the commit message is usable for them.
Attached patch bug1354853.patchSplinter Review
Remove the "use-after-free" term in commit message.
Attachment #8865305 - Attachment is obsolete: true
Attachment #8865305 - Flags: sec-approval?
Flags: needinfo?(bechen)
Attachment #8865791 - Flags: sec-approval?
Attachment #8865791 - Flags: review+
Comment on attachment 8865791 [details] [diff] [review]
bug1354853.patch

sec-approval=dveditz
Attachment #8865791 - Flags: sec-approval? → sec-approval+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e953db6c88cc
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Group: firefox-core-security → core-security-release
Comment on attachment 8865791 [details] [diff] [review]
bug1354853.patch


Approval Request Comment
[Feature/Bug causing the regression]: bug 1319850
[User impact if declined]: At fennec, might crash when closing app.
[Is this code covered by automated tests?]: Yes, covered by every time when shutting down the testcase.
[Has the fix been verified in Nightly?]: verify it by try server.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d5f5a737af58
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: Low risk.
[Why is the change risky/not risky?]:
Logic of code change is simple
[String changes made/needed]: no
Attachment #8865791 - Flags: approval-mozilla-beta?
Attachment #8865791 - Flags: approval-mozilla-aurora?
See Also: → 1359067
ESR52 was marked as affected and set to tracking+, but we don't actually ship Android releases off that branch. Can we call this bug wontfix for there?
Flags: needinfo?(bechen)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #32)
> ESR52 was marked as affected and set to tracking+, but we don't actually
> ship Android releases off that branch. Can we call this bug wontfix for
> there?

I just download the esr52 codebase, and found the patch can not apply to esr52. It seems that the bug 1319850 doesn't apply to esr52.
So the esr52 "might" has the same issue for android. But if android doesn't run at esr52, we can set it to wontfix.
Flags: needinfo?(bechen)
Comment on attachment 8865791 [details] [diff] [review]
bug1354853.patch

Fix a sec-high. Beta54+. Should be in 54 beta 8.
Attachment #8865791 - Flags: approval-mozilla-beta?
Attachment #8865791 - Flags: approval-mozilla-beta+
Attachment #8865791 - Flags: approval-mozilla-aurora?
Attachment #8865791 - Flags: approval-mozilla-aurora-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main54+]
Group: core-security-release
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: