Closed
Bug 1425930
Opened 3 years ago
Closed 3 years ago
Crash in @0x0 | nsScreenAndroid::GetDensity
Categories
(GeckoView :: General, defect, P1)
Tracking
(firefox-esr6061+ fixed, firefox57 unaffected, firefox58 wontfix, firefox59 wontfix, firefox60 wontfix, firefox61+ fixed, firefox62+ fixed)
RESOLVED
FIXED
mozilla62
People
(Reporter: philipp, Assigned: jesup)
References
Details
(4 keywords, Whiteboard: [adv-main61+][adv-esr60.1+][post-critsmash-triage])
Crash Data
Attachments
(2 files, 2 obsolete files)
1.83 KB,
patch
|
jesup
:
review+
abillings
:
approval-mozilla-beta+
abillings
:
approval-mozilla-esr60+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
1.32 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-59230ead-9e9c-43d7-9664-b70fe0171206. ============================================================= Top 10 frames of crashing thread: 0 @0x0 1 libxul.so nsScreenAndroid::GetDensity widget/android/nsScreenManagerAndroid.cpp 2 libxul.so mozilla::GeckoScreenOrientation::OnOrientationChange widget/android/GeckoScreenOrientation.h:47 3 libxul.so mozilla::jni::detail::ProxyNativeCall<mozilla::GeckoScreenOrientation, mozilla::java::GeckoScreenOrientation, true, false, short, short>::operator widget/android/jni/Natives.h:418 4 libxul.so mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::PutEventInternal xpcom/threads/ThreadEventQueue.cpp:116 5 libxul.so mozilla::detail::RunnableFunction<mozilla::jni::detail::ProxyNativeCall<mozilla::GeckoScreenOrientation, mozilla::java::GeckoScreenOrientation, true, false, short, short> >::Run xpcom/threads/nsThreadUtils.h:529 6 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1037 7 libxul.so mozilla::detail::VariantImplementation<unsigned char, 0, const int, const char*, void > > mfbt/Variant.h:219 8 libxul.so nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::DestructRange xpcom/ds/nsTArray.h:562 9 libxul.so wcsrtombs ============================================================= crashes with this signature are newly appearing in fennec 58 - they are fairly low volume though. the user comment to this crash report says: "occurred when the phone unlocked and rotated simultaneously"
Reporter | ||
Comment 1•3 years ago
|
||
the stack of this signature looks similar
Crash Signature: [@ @0x0 | nsScreenAndroid::GetDensity] → [@ @0x0 | nsScreenAndroid::GetDensity]
[@ mozilla::ObserverList<T>::Broadcast ]
Comment 2•3 years ago
|
||
http://bit.ly/2BmOHxA shows a mix of devices, including some Samsung. ni on Ioana to see if rotating the device after unlocking reproduces the issue (assuming you have one of the devices listed in the crash reports)
Flags: needinfo?(ioana.chiorean)
Comment 3•3 years ago
|
||
Tried with 58.0b14 build 2 on the following devices: - Galaxy J7 ( saw J5 in the report but we don't have it) Android 6.0.1 - Galaxy note 3 ( saw Note 4 in the report but we don't have it) Android 5.0 but I was not able to reproduce it at all. Steps: - rotate the device immediately after unlocking phone
Flags: needinfo?(ioana.chiorean)
Updated•3 years ago
|
Comment 4•3 years ago
|
||
I was able to reproduce this with Samsung Galaxy Note 4 (Android 5.0.1) on latest Nightly build (01/10) and beta 58.0b15. Steps to reproduce: 1. Go to https://browserengine.net/category/firefox/ and wait for page to load; 2. Change device orientation. (1-2 times) Expected result: Firefox opens the page without issues. Actual result: After changing device orientation the page becomes white and Firefox crashes. Notes: - crash report Nightly: https://crash-stats.mozilla.com/report/index/2ac0dbb2-d710-43e8-a546-9d78b0180111 - crash report Beta: https://crash-stats.mozilla.com/report/index/90a808f0-44b7-41bd-af78-030d00180111
Updated•3 years ago
|
Priority: -- → P1
Assignee: nobody → snorp
Neither of the crashes in comment #4 are the crash signature for this bug, and I can't seem to reproduce it locally. GeckoScreenOrientation::OnOrientationChange does not call nsScreenManagerAndroid::GetDensity(), so I think the stack may not quite be right. The stack also indicates that we're trying to deref 0x0, which is not possible in GetDensity() as it only dereferences this+mDensity and this+mDisplayType. Also, we guard against a null screen in OnOrientationChange(). Investigating some more...
Crash Signature: [@ @0x0 | nsScreenAndroid::GetDensity]
[@ mozilla::ObserverList<T>::Broadcast ] → [@ @0x0 | nsScreenAndroid::GetDensity]
[@ mozilla::ObserverList<T>::Broadcast ]
[@ mozilla::GeckoScreenOrientation::OnOrientationChange ]
The crash signature I just added has more strange stacks. It sorta looks like hal::NotifyScreenConfigurationChange() goes off into the weeds somehow?
We need industrial strength debugging support in the form of jchen...
Flags: needinfo?(nchen)
Updated•3 years ago
|
Comment 8•3 years ago
|
||
I see a lot of possible use-after-free in the crash reports.
Group: firefox-core-security
Comment 9•3 years ago
|
||
I think mozilla::ObserverList<T> is just prone to use-after-free. If an observer is ever removed from the list during a broadcast, we could end up using the observer after it's been freed.
Flags: needinfo?(nchen)
Updated•3 years ago
|
Updated•3 years ago
|
Crash Signature: [@ @0x0 | nsScreenAndroid::GetDensity]
[@ mozilla::ObserverList<T>::Broadcast ]
[@ mozilla::GeckoScreenOrientation::OnOrientationChange ] → [@ @0x0 | nsScreenAndroid::GetDensity]
[@ mozilla::ObserverList<T>::Broadcast ]
[@ mozilla::GeckoScreenOrientation::OnOrientationChange ]
[@ @0x0 | mozilla::GeckoScreenOrientation::OnOrientationChange]
Assignee | ||
Comment 10•3 years ago
|
||
I'll note the crash rate seems to be spiking... jchen? thoughts?
Flags: needinfo?(nchen)
Comment 11•3 years ago
|
||
I haven't looked at this recently. Someone should verify if comment 9 is accurate or not.
Flags: needinfo?(nchen)
Assignee | ||
Comment 12•3 years ago
|
||
(In reply to Jim Chen [:jchen] [:darchons] from comment #9) > I think mozilla::ObserverList<T> is just prone to use-after-free. If an > observer is ever removed from the list during a broadcast, we could end up > using the observer after it's been freed. ObserverList<T> clones the list into another nsCOMArray before walking it to do notifies - i.e. removal from the list during NotifyObservers() is totally safe.
Flags: needinfo?(nchen)
Comment 13•3 years ago
|
||
I think you're talking about `nsObserverList`? `ObserverList<T>` is defined in xpcom/ds/Observer.h. The problem with `ObserverList<T>` is that it doesn't ensure validity of the observer pointers, so even though it makes a copy of the array during broadcast [1], the pointers themselves are not guaranteed to be valid throughout the broadcast. [1] https://searchfox.org/mozilla-central/rev/00dd116638001772fe354b081353b73f1cad405d/xpcom/ds/Observer.h#70
Flags: needinfo?(nchen)
Assignee | ||
Comment 14•3 years ago
|
||
Ah. Wonderful naming there..... :-( and none of this is refcounted... It's only used in midi and Hal.cpp. There are two options: block RemoveObserver() from within Notify(), and handle it safely. (And it's not sure this is the cause of the bug, but it's probably good to do one of those.)
Assignee | ||
Comment 15•3 years ago
|
||
Attachment #8976406 -
Flags: review?(nfroyd)
Assignee | ||
Comment 16•3 years ago
|
||
Attachment #8976407 -
Flags: review?(nfroyd)
Assignee | ||
Comment 17•3 years ago
|
||
Note that blocking it will likely result in a bunch of assertion crashes, given the signature for Broadcast... very clear UAFs. Though it would be nice to try to provoke it and see why we're looping through to RemoveObserver, and if it's a bug or expected/correct.
Assignee | ||
Comment 18•3 years ago
|
||
Note: patches aren't tested yet, but they compile (modulo I changed some comments since compiling)
![]() |
||
Comment 19•3 years ago
|
||
Sigh, bug 678694 comment 52 and bug 678694 comment 53 said we were aware of this sort of problem, but we didn't do anything about it. I'm assuming the crashes are coming because we're iterating through the copy, and somebody RemoveObserver()'s an Observer that's not been notified yet. Then we get to the removed observer in our copy of the observers, and we crash? I'm not super keen on attachment 8976406 [details] [diff] [review] blocking notifications, because we'll just keep crashing in this scenario, AIUI. Unless we want to commit it as a way of figuring out what exactly is going on (and protect ourselves from future misuses of the API?) and then we can either decide to fix the uses of ObserverList and/or fix ObserverList itself to not UAF? Or can we push that patch to try and attempt to reproduce the crash with a patched build so we can understand the problem? I think attachment 8976407 [details] [diff] [review] (or something very much like it) is the right way to go (or is at least a reasonable attempt at fixing the bug), but I'm not quite sure what we're trying to do here...
Flags: needinfo?(rjesup)
Comment 20•3 years ago
|
||
This is currently the #10 topcrash on Fennec 61 Beta, FWIW.
Assignee | ||
Comment 21•3 years ago
|
||
Blocking (first patch) would be if we decide the API is "you aren't allowed to call RemoveObserver() inside Notify()", and would help us catch any mis-use (and be safe if one escapes to the wild). Handling RemoveObserver in Notify (second patch) would make it allowed and safe. Probably this is a better solution, overall. It does seem likely that the bug is, in fact, RemoveObserver() called within Notify from looking at the reports, then we crash. There is one other possibility here: this is not a thread-safe interface in any way. If another thread removes an observer while we're broadcasting, this fails badly in all sorts of ways. The Blocking patch might help diagnose how that's occurring, if it is. We could also add thread-safety checks - however, all the Remove/AddObserver calls in Hal.cpp have an AssertMainThread(), so I don't think this is happening. Also most (but not all) Broadcast calls are preceded by an AssertMainThread(). One comment that does concern me however is in NotifySwitchChange: // When callback this notification, main thread may call unregister function first. We should check if this pointer is valid. Others that don't assert: NotifyScreenConfigurationChange(), NotifyNetworkChange(). We could assert mainthread on those and see what happens.... I haven't looked at the Midi code
Flags: needinfo?(rjesup)
Assignee | ||
Comment 22•3 years ago
|
||
This affects ESR60 (do we do ESR for Fennec??). I suppose it might affect the Midi code, if that's enabled. We should fix this in 61 beta.
status-firefox-esr60:
--- → affected
Keywords: csectype-uaf,
sec-high
![]() |
||
Comment 23•3 years ago
|
||
Asserting on removals during iterating so we could get more information sounds attractive, but given that this isn't really happening on nightly or beta, I think we should just deal with removal during iteration. If we want to add some thread assertions somewhere (here or in a separate bug) that's fine with me.
![]() |
||
Comment 24•3 years ago
|
||
Comment on attachment 8976407 [details] [diff] [review] Handle Broadcast()->Notify() calling RemoveObserver() Review of attachment 8976407 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform-linux-android.cpp @@ +306,5 @@ > if (mSamplerTid == -1) { > mSamplerTid = gettid(); > } > int sampleeTid = aRegisteredThread.Info()->ThreadId(); > + MOZ_ASSERT(sampleeTid != mSamplerTid); I don't understand this change; was this for a separate bug? ::: xpcom/ds/Observer.h @@ +56,5 @@ > * @return Whether the observer has been found in the list. > */ > bool RemoveObserver(Observer<T>* aObserver) > { > + if (mObservers.RemoveElement(aObserver)) { Do we care if somebody adds the same observer twice? (Presumably they would remove it twice as well...?
Attachment #8976407 -
Flags: review?(nfroyd) → review+
![]() |
||
Comment 25•3 years ago
|
||
Comment on attachment 8976406 [details] [diff] [review] Block Broadcast()->Notify() calling RemoveObserver() See comment 23 for why we're not taking this path.
Attachment #8976406 -
Flags: review?(nfroyd)
Assignee | ||
Comment 26•3 years ago
|
||
> ::: tools/profiler/core/platform-linux-android.cpp > > + MOZ_ASSERT(sampleeTid != mSamplerTid); > > I don't understand this change; was this for a separate bug? Oops, ignore, change got swept up in qnew. I'll split that off to the other patch I was working on. > > + if (mObservers.RemoveElement(aObserver)) { > > Do we care if somebody adds the same observer twice? (Presumably they would > remove it twice as well...? I don't see a need to arbitrarily block it, though it's unlikely to be useful barring a few edge cases. If they add it twice, they need to remove it twice, so all the logic holds. It isn't blocked today. In general with other lists of this nature we're dealing with reference-counted objects, so people generally assume it's safe to have the same item in the list twice (and remove it twice), and these are unordered lists.
Assignee | ||
Comment 27•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Attachment #8976407 -
Attachment is obsolete: true
Assignee | ||
Updated•3 years ago
|
Assignee: snorp → rjesup
Status: NEW → ASSIGNED
Assignee | ||
Comment 28•3 years ago
|
||
Comment on attachment 8977030 [details] [diff] [review] Handle Broadcast()->Notify() calling RemoveObserver() [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: User impact if declined: UAFs Fix Landed on Version: will be 62 and 61 Risk to taking this patch (and alternatives if risky): Low String or UUID changes made by this patch: none Approval Request Comment [Feature/Bug causing the regression]: bug 678694 (Battery API) [User impact if declined]: Android UAF; might be hard to exploit if it's on shutdown [Is this code covered by automated tests?]: yes (never caught this) [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: no str [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: No [Why is the change risky/not risky?]: Adds safety checks, simple patch. [String changes made/needed]: none [Security approval request comment] How easily could an exploit be constructed based on the patch? Not clear; they'd have to figure out how to leverage a Remove-during-Notify. If these are only on shutdown, tough. If causable at other times, might be possible especially given lack of e10s. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? More or less they make it clear what can happen (especially the comment, but not hard to infer from the code why it's doing this). Which older supported branches are affected by this flaw? All If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should apply directly How likely is this patch to cause regressions; how much testing does it need? Low risk. Should get some normal field exposure before release, which will also help verify the fix.
Attachment #8977030 -
Flags: sec-approval?
Attachment #8977030 -
Flags: review+
Attachment #8977030 -
Flags: approval-mozilla-esr60?
Attachment #8977030 -
Flags: approval-mozilla-beta?
Comment 29•3 years ago
|
||
sec-approval+ and I'll give the other approvals.
Updated•3 years ago
|
Attachment #8977030 -
Flags: sec-approval?
Attachment #8977030 -
Flags: sec-approval+
Attachment #8977030 -
Flags: approval-mozilla-esr60?
Attachment #8977030 -
Flags: approval-mozilla-esr60+
Attachment #8977030 -
Flags: approval-mozilla-beta?
Attachment #8977030 -
Flags: approval-mozilla-beta+
Updated•3 years ago
|
Attachment #8976406 -
Attachment is obsolete: true
Comment 30•3 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dce0b1b22af0eaaa01030de35c97cd3af473305
Group: firefox-core-security → core-security-release
Backed out for causing build bustage: https://hg.mozilla.org/integration/mozilla-inbound/rev/b19af5ec19d98a3a0ccfd5244a01f95c94162edf Push with bustage: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4dce0b1b22af0eaaa01030de35c97cd3af473305&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Flags: needinfo?(rjesup)
Assignee | ||
Comment 32•3 years ago
|
||
relanded with bustage fix: https://hg.mozilla.org/integration/mozilla-inbound/rev/9d111a95cfbbe2dea1427d22ab342a584d9c8af6
Flags: needinfo?(rjesup)
![]() |
||
Comment 33•3 years ago
|
||
Backed out for build bustages at ../testing/xpcshell/selftest.py::XPCShellTestsTests::testAddTaskSki Push that caused the failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=9d111a95cfbbe2dea1427d22ab342a584d9c8af6 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=179510872&repo=mozilla-inbound&lineNumber=36667 Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f0b7e9a6450c28f816ca52d8809c41b6b87b3e5 The XPCShell test logs seem to indicate that we're removing the element from mObservers, and then we're trapping in an invalid array index access in mBroadcastCopy. Maybe we're *also* adding observers during Broadcast? (!) I think the straightforward fix is to check for NoIndex. But I'd like to understand whether the HAL code is doing normal things or dodgy things before applying that fix, which is why apavel and I decided to back it out, rather than apply the bandaid.
Flags: needinfo?(rjesup)
![]() |
||
Comment 34•3 years ago
|
||
Also a lot of Android test crashes, at the new MOZ_ASSERT and related places.
Assignee | ||
Comment 35•3 years ago
|
||
Obviously gets merged when relanding. Passes xpcshell tests now locally
Attachment #8979310 -
Flags: review?(nfroyd)
![]() |
||
Comment 36•3 years ago
|
||
Comment on attachment 8979310 [details] [diff] [review] add missing check that we're in a Broadcast Review of attachment 8979310 [details] [diff] [review]: ----------------------------------------------------------------- Doh. Should have thought of that myself.
Attachment #8979310 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 37•3 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a314710b0acd38afc7de74f0306f514b50d84463
Flags: needinfo?(rjesup)
Assignee | ||
Comment 38•3 years ago
|
||
Since we don't ship ESR android builds, and (if I'm correct) won't ship MIDI enabled in ESR 60, we actually wouldn't need to land this on ESR 60. It's only used in Hal.h/cpp and dom/midi/* (for two lists). This would potentially affect desktop MIDI, if it's enabled (depending on what the observers do).
https://hg.mozilla.org/mozilla-central/rev/a314710b0acd
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
Comment 40•3 years ago
|
||
Sorina, as you were able to reproduce it, can you test it for verification too? Thanks!
Flags: needinfo?(sorina.florean)
Comment 41•3 years ago
|
||
Tested with Samsung Galaxy S6(Android 7.0), Samsung Galaxy Tab S3 (Android 7.0), Samsung Galaxy Note 8 (Android 8.0) and the issue is not reproducible on latest Nightly build. Followed the steps from comment 4 on 58 as well, but wasn't able to reproduce with the devices mentioned above. Tested also with Samsung Galaxy Note 4 (Android 5.0.1) on latest Nightly from Play Store (62.0a1 - 05/23) and after changing device orientation 2-3 times, Fennec stops and no crash report is sent. Here is a video: https://drive.google.com/open?id=1w0Rhj7Mc8U6Ld8mf8SPEEeDstf0MxgE- Tested on a different build downloaded from archive.mozilla.org - http://archive.mozilla.org/pub/mobile/nightly/2018/05/2018-05-23-11-03-27-oak-android-api-16/ and after changing device orientation 7-8 times, Fennec crashes and this is the report: https://crash-stats.mozilla.com/report/index/48fed957-dcde-4d39-8558-636b00180524
Flags: needinfo?(sorina.florean)
Comment 42•3 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/9f007f15b9cf https://hg.mozilla.org/releases/mozilla-esr60/rev/633bd6d37172
Updated•3 years ago
|
Whiteboard: [adv-main61+][adv-esr60.1+]
Updated•3 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main61+][adv-esr60.1+] → [adv-main61+][adv-esr60.1+][post-critsmash-triage]
Updated•2 years ago
|
Product: Firefox for Android → GeckoView
Updated•2 years ago
|
Version: Firefox 58 → 58 Branch
Updated•2 years ago
|
Keywords: crash,
csectype-uaf,
regression,
sec-high
Target Milestone: Firefox 62 → mozilla62
Updated•2 years ago
|
Updated•2 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•