Closed Bug 1173284 (CVE-2015-8511) Opened 10 years ago Closed 10 years ago

Firefox OS Lockscreen passcode bypass due to race condition

Categories

(Firefox OS Graveyard :: Gaia::System::Lockscreen, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

(blocking-b2g:2.5+, b2g-v2.0 wontfix, b2g-v2.0M wontfix, b2g-v2.1 wontfix, b2g-v2.1S wontfix, b2g-v2.2 wontfix, b2g-v2.2r wontfix, b2g-master fixed)

RESOLVED FIXED
FxOS-S4 (07Aug)
blocking-b2g 2.5+
Tracking Status
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- wontfix
b2g-v2.2r --- wontfix
b2g-master --- fixed

People

(Reporter: lixia, Assigned: gweng)

References

Details

(Keywords: reporter-external, sec-high, Whiteboard: [v3.0-nexus-5-l][b2g-adv-main2.5+])

Attachments

(5 files)

[1.Description]: [Flame v3.0][Nexus 5 v3.0][Lock Screen]Enable the passcode, restart device, slide to left for unlocking screen, but sometimes it will not prompt user to input the passcode and directly enter homescreen. Found time: 20:47 Attach: passcode.mp4 and logcat_2047.txt [2.Testing Steps]: 1. Enable passcode lock in Settings/Screen Lock. 2. Restart device. 3. Try to unlock screen. [3.Expected Result]: 3. It should prompt user to input passcode for unlocking sceen. [4.Actual Result]: 3. It will not prompt user to input passcode. [5.Reproduction build]: Device: Flame v2.2 build(unaffected) Build ID 20150609162508 Gaia Revision 06edb0f8db7c2f45cde54401a8593663059861a4 Gaia Date 2015-06-08 14:29:09 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/a3d7b08fb0ec Gecko Version 37.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150609.211329 Firmware Date Tue Jun 9 21:13:39 EDT 2015 Bootloader L1TC000118D0 Device: Nexus 5 v2.2 build(unaffected) Build ID 20150609081832 Gaia Revision 06edb0f8db7c2f45cde54401a8593663059861a4 Gaia Date 2015-06-08 14:29:09 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/239c59921129 Gecko Version 37.0 Device Name hammerhead Firmware(Release) 5.1 Firmware(Incremental) eng.cltbld.20150609.121628 Firmware Date Tue Jun 9 12:16:44 EDT 2015 Bootloader HHZ12f Device: Flame v3.0 build(affected) Build ID 20150609160220 Gaia Revision 31ef8deec7a04a988eb92309178b87cc0bde8220 Gaia Date 2015-06-08 14:48:40 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/8be8deb10e4f Gecko Version 41.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150609.193617 Firmware Date Tue Jun 9 19:36:29 EDT 2015 Bootloader L1TC000118D0 Device: Nexus 5 v3.0 build(affected) Build ID 20150609081840 Gaia Revision ea27c4ed5b6083c9e21d233d4804372ac4d5d353 Gaia Date 2015-06-08 03:06:41 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/e10e2e8d8bf2 Gecko Version 41.0a1 Device Name hammerhead Firmware(Release) 5.1 Firmware(Incremental) eng.cltbld.20150609.121002 Firmware Date Tue Jun 9 12:10:19 EDT 2015 Bootloader HHZ11k [6.Reproduction Frequency]: occasionally Recurrence,6/20 [7.TCID]: Free Test
Attached file logcat_2047.txt
Attached video passcode.mp4
[Blocking Requested - why for this release]: The user should always be prompted for a passcode and this is a regression so nominating this 3.0?
blocking-b2g: --- → 3.0?
QA Contact: pcheng
Nom is good. I noticed one case is when the phone is flashed and the device is turned off completely and then turned back on.
I am able to reproduce this by swiping unlock less than one second after the lock screen is first displayed at power on. Waiting more than one second or so prevents the issue from occurring, and it correctly asks for a PIN.
A reflash is not required to trigger the issue.
This is a big concern for our foxfooders. Let's fix ASAP. Greg, any ideas what may be going on?
blocking-b2g: 2.5? → 2.5+
Flags: needinfo?(gweng)
I would check regression window from the time Bug 1094759 got landed, since recently I don't remember I changed any LockScreen control flow except for this one. The PR is here: https://github.com/mozilla-b2g/gaia/pull/30383 If it only exists after this bug than we need to fix it in the bootstraping code. If not, it's a pure LockScreen issue. It would be helpful if we could first identify this.
Flags: needinfo?(gweng)
Yes...I've bisected the bug. While it's without the patch from Bug 1094759, namely the patch: c1ef854924f18357832ddcf98dc6c42391d5599e I cannot reproduce it anymore. However, this is an unfortunately news. Since Alive's gone, and the patch is too huge, it's hard to fix this.
And maybe this should be in the pocket of smoke test, since it bypasses the security check. I don't know why it could escape from our people's check, I think it's another reason why never land big patch without split it.
Hi Michael: as I said, even though this is a serious bug, I may not solve it soon since the patch quite complicated (although I'll try it). Meanwhile I don't know what happened (maybe the exciting working week) to let the first landing (6/4) and report (6/10) are so far from the current revision (6/28). So if we can build some automation or manual test case for the bug, it would be great. But since I can only reproduce it after rebooting on device, I don't know if an automation test running not on an device could work as well.
Flags: needinfo?(mhenretty)
Should we be marking the lockscreen as 'ready' before we've wired up the passcode-lock preference to the lockscreen? --- a/apps/system/lockscreen/js/lockscreen.js +++ b/apps/system/lockscreen/js/lockscreen.js @@ -329,6 +329,11 @@ */ LockScreen.prototype.init = function ls_init() { + window.SettingsListener.observe( + 'lockscreen.passcode-lock.enabled', false, (function(value) { + this.setPassCodeEnabled(value); + }).bind(this)); + this.ready = true; /** * "new style" slider: as described in https://bugzil.la/950884 @@ -407,11 +412,6 @@ this.overlay.classList.remove('uninit'); }).bind(this)); - window.SettingsListener.observe( - 'lockscreen.passcode-lock.enabled', false, (function(value) { - this.setPassCodeEnabled(value); - }).bind(this)); -
b2g-inbound regression window: Last Working Device: Flame BuildID: 20150604012244 Gaia: e0fbadeb78a96137f071d9be7a47ef9fe882d17f Gecko: 64ce149fabf1 Version: 41.0a1 (3.0 Master) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0 First Broken Device: Flame BuildID: 20150604020845 Gaia: c1ef854924f18357832ddcf98dc6c42391d5599e Gecko: 4f7e7631e277 Version: 41.0a1 (3.0 Master) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0 Last Working Gaia & First Broken Gecko - no repro Gaia: e0fbadeb78a96137f071d9be7a47ef9fe882d17f Gecko: 4f7e7631e277 Last Working Gecko & First Broken Gaia - repro Gaia: c1ef854924f18357832ddcf98dc6c42391d5599e Gecko: 64ce149fabf1 Gaia pushlog: https://github.com/mozilla-b2g/gaia/compare/e0fbadeb78a96137f071d9be7a47ef9fe882d17f...c1ef854924f18357832ddcf98dc6c42391d5599e Confirmed that this is caused by bug 1094759.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Tim, can you take a look at this please since the author of the landing for bug 1094759 is no longer active? Looks like that landing caused this issue to occur.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(timdream)
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #12) > Hi Michael: as I said, even though this is a serious bug, I may not solve it > soon since the patch quite complicated (although I'll try it). Meanwhile I > don't know what happened (maybe the exciting working week) to let the first > landing (6/4) and report (6/10) are so far from the current revision (6/28). > So if we can build some automation or manual test case for the bug, it would > be great. But since I can only reproduce it after rebooting on device, I > don't know if an automation test running not on an device could work as well. Hi Greg, thanks for the explanation. If we were to add the `lockscreen.passcode-lock.enabled` and `lockscreen.passcode-lock.code` settings to a Marionette-JS startup (like here [1]), could we reproduce this bug from the context of an integration test? 1.) https://github.com/mozilla-b2g/gaia/blob/da447493053e704533c2b3fbb6be8dc93f71dacc/apps/system/test/marionette/software_home_fullscreen_test.js#L13
Flags: needinfo?(mhenretty) → needinfo?(gweng)
Greg is the best person left to fix this, however given the complexity we would need some time to find the root cause. This reads like a timing issue to me. Fixing it shouldn't be hard once we found the cause.
Flags: needinfo?(timdream)
Tim: Just update the situation: it looks caused by a "delay mozSetting change", revealed by the new bootstrapping and not covered by tests. I've found that 1. LockScreenStateManager would `start` itself to observe "lockscreen.passcode-lock.enabled". If the value is true, unlocking leads to passcode panel. Else, it unlocks straightly. 2. With new bootstrapping chain, the UI would show up since LockScreenWindowManager has been bootstrapped. LockScreenStateManager starts itself after that. This would observe the setting change immediately. And from the log, it does. Since I use SettingsListener, the "change" in fact includes the initial value even without any change after. 3. *However*, the value change bubbled up in an asynchronous way, and it's a really long waiting (about 1 second after the manager observers it). So before the value returns, the whole UI is able to be manipulated, while the passcode-enabled is assumed as false. This is why the bug appears. The tricky part is such long waiting only happens when you do a cold restart. If it only restart b2g process, the reading would be quick enough to hide the symptom. So that's one reason that I guess it can't be discovered in automatic tests, although I would still try. For the bug, solution is to hook LockScreenStateManager and the observing on the bootstraping chain: only when the state manager got the passcode value from the launcher, the OS logo hides itself and make LockScreen window visible. This should fix the bug but I need some time to figure out how to do that. And I still have no idea about how it was covered in the past. Maybe in the previous version OS logo hides itself lately, so user has no chance to do trigger that. Anyway, I don't believe in the previous complex of bootstrapping, we have any good chance to discover it like this. So although the author had gone and we need to take regressions from the lost king, to fix this is still fair enough.
Flags: needinfo?(timdream)
Understood, thank you for the analysis.
Flags: needinfo?(timdream)
Attached file WIP patch
Tim: this is a hotfix. It: 1. Changes booting sequence of LockScreenStateManager as a Promise, for those async steps 2. Bubbles up: a) [Launcher] waiting [LockScreenLauncher] b) [LockScreenLauncher] waiting [LockScreenWindowManager#openApp] c) [LockScreenWindowManager#openApp] waiting [LockScreenStateManager#start] So [Launcher] would only remove the animation/logo after the chain got entirely ready. Almost all these waiting are in Promises. The only one exception is: 3. |LockScreenWindowManager#openApp| would call |createWindow| if the WM is first time launched. So, |createWindow| would eventually fire an event called |lockscreen-frame-bootstrap| to notify stuff inside the *lockscreen frame* to start themselves, which includes |LockScreenStateManager|. As a result, the |#openApp| method needs to know when does this inner bootstrapping progress done. And I add a |lockscreen-frame-bootstrap-done| event to do that. You can image if it's an app there *must* be some IAC things. It's in fact discussed with Alive once, but he didn't adopt that because he didn't think it's necessary. Anyway, after |#openApp| got the |-done| event, it would resolve the promise to let caller (the |LockScreenLauncher|) know it's done, too. So the launcher could now know it's launched, and to let the master launcher keep going. (If my explanation is too confused, we may have it offline.) 4. The impact is the bootstrapping time, which has been sped up by the new bootstrapping, now slower than that because it now need to wait such mozSettings read. And sometimes it's quite long. Maybe we should invite performance team to find out if we can boost this, or if we can optimize it from Gaia to avoid too much platform work in short releases. And as I said, this is just a hotfix. It's in fact need a clear rewriting to make all of these subcomponent bootstrapping processes become Promise-orient. But for LockScreen business I don't think we have such resource so far, so let's pray in the future the entropy of this component would not grow fast.
Attachment #8628283 - Flags: feedback?(timdream)
Comment on attachment 8628283 [details] [review] WIP patch Discussed with Tim. Conclusion: 1. Find how long the settings reading is 2. LockScreen can show it self before the reading done, so we could reduce the waiting time of app interaction 3. However, when it's going to unlock, if the reading is unfinished, block the unlocking until it's done. So we can still switch to the correct unlocking panel 4. If the reading time is still too long and user would feel it "lags" when unlocking, ask UX' opinion before we can address the issue (as a follow-up bug) The point is we all agree that to add more coupling between System and LockScreen is bad. So if it's a pure LockScreen solution, it would be better.
Attachment #8628283 - Flags: feedback?(timdream)
Unfortunately it looks that we hit a mozSettings issues. The log: > I/GeckoConsole( 212): Content JS LOG: >>>>>>>>>>> launching lockscreen 1435570622127 > I/GeckoConsole( 212): Content JS LOG: >>>>>>>>>> start state manager: 1435570623006 > I/GeckoConsole( 212): Content JS LOG: >>>>>>>>>>> animate poweron logo 1435570623095 > I/GeckoConsole( 212): Content JS LOG: >>>>>>>>>>> remove poweron logo 1435570626797 > I/GeckoConsole( 212): Content JS LOG: >>>>>>>> got passcode status: 1435570630566 shows that from user could see LockScreen ("remove poweron logo") to ("got passcode status) is almost 3.8 seconds. And it's not caused by launching or setting up something, because the state manager was already started at "start state manager". And it's also long from there to the final reading. So, the whole time may spend on waiting to read the value. I'm not sure if it's because of new bootstrapping reads too many values from settings before this step. I would still solve this issue by blocking user to unlock the phone if it's waiting to read the value, since to leave a security hole is much serious. After that, I would start to address the performance issue.
Blocks: 1175955
Update: I've patched it to fix the bug. However, the performance issue is very serious, since the animation of unlocking is by-passed, too. So I'm now profiling and try to pin it down.
There is a bug about mozSetting reading is too slow (confidential): Bug 1100188. But for the missing animation I still need to dig in to find out the root cause.
Has the security hole been closed in the latest foxfood OTA update?
Flags: needinfo?(nhirata.bugzilla)
Richard, the bug that is addressing the issue that you're speaking of is this bug. No patches have officially landed, so no. The OTA does not have the fix for the lockscreen issue you are speaking of.
Flags: needinfo?(nhirata.bugzilla) → needinfo?(rsoderberg)
Yeah. Comment 22 indicated that a fix for the security issue had been confirmed, and so I'd hoped we chose to close the security hole (that prevents me from putting *any* credentials on my device for foxfood testing) while the performance investigation continues. Ah well.
Flags: needinfo?(rsoderberg)
Update: still digging in performance issue. Bug 1100188 looks also related to the performance issue.
Assignee: nobody → gweng
Flags: needinfo?(gweng)
Update: Discussed with Tim. Now I'm trying the new solution, but it's quite complicated so we need some more time to fix this.
Comment on attachment 8636455 [details] [review] [gaia] snowmantw:bug1173284-rev3 > mozilla-b2g:master Tim: this is the patch based on what we have discussed. I set feedback to see if you have any opinion about the code, before I start to write the tests. Thanks.
Attachment #8636455 - Flags: feedback?(timdream)
Is it possible to see that patch as two commits - one creating the 'this.bootstrapping = ...' block, and then a second that fixes the lockscreen issue itself?
It's quite complicated: the lockscreen part is because Alive had changed when and how to load the components inside LockScreen. In previous version, they were assumed as synchronous, or ready, at the moment LockScreen was used by LockScreenStateManager. He changed that but not changed how these two component cooperate. So I need to change the LockScreen to make the whole loading process in a promise that the state manager could get it and know when does the lockscreen get fully loaded. In fact, this is a minor change compare to the changes I've done for the state manager. So, yes, you can say it's two commits, but they're so related so I think it's okay to put them in one commit.
This bug occurs on 2.2 on flame. I just checked with today's build: Build ID 20150721162504 Gaia Revision e1e6317f17a840b19af9dbb25f5a771d8d9fa161 Gaia Date 2015-07-15 21:05:11 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/a73051740290 Gecko Version 37.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150721.203103 Firmware Date Tue Jul 21 20:31:14 EDT 2015 Bootloader L1TC000118D0
Also occurs in : Build ID 20150721161206 Gaia Revision cc24cce17ab2ebf79f6505103da714fc65bc5ec1 Gaia Date 2015-07-15 18:48:55 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/3856aebece7b Gecko Version 34.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150721.193944 Firmware Date Tue Jul 21 19:39:55 EDT 2015 Bootloader L1TC000118D0
Comment on attachment 8636455 [details] [review] [gaia] snowmantw:bug1173284-rev3 > mozilla-b2g:master We really, really need to simplify the code here........
Attachment #8636455 - Flags: feedback?(timdream) → feedback+
Marking this as security-sensitive b2g bug, rating sec-critical
Group: b2g-core-security
Keywords: sec-critical
Summary: [Lock Screen]Sometimes it will not prompt user to input passcode for unlocking screen after restarting device. → [Lock Screen] pass code bypass after restarting device due to settings observer race condition
I'm confident this is a race condition with the settings observers. I discussed parts of this and a dirty hotfix in bug 1187368 (which I'll mark duplicate of this now). Switching to promises looks like the clean solution.
Summary: [Lock Screen] pass code bypass after restarting device due to settings observer race condition → [Lock Screen] pass code bypass after reboot due to settings observer race condition
I have been exploring a very simple promise-based fix. Since it's delaying most of LockScreen's init code to after the SettingsListener getter promises resolve, it unfortunately has some visual regressions, showing a mostly black lock screen on startup, and showing Homescreen placing icons if lockscreen is disabled: Fix pass code reboot bypass with promise Moving the grunt of LockScreen .init() to .initStaged() which is called from a promise that wraps settings getters for the problematic settings values, avoiding the race condition that allows pass code bypass. It has the side effect of showing no lock screen when it is disabled in settings, and it creates two UX artefacts by exposing apps while rendering their UI: - (mostly) black lock screen state on startup - Homescreen rendering icons when lock screen disabled diff --git a/apps/system/lockscreen/js/lockscreen.js b/apps/system/lockscreen/js/lockscreen.js index c3be4b4..b0ac6fe 100755 --- a/apps/system/lockscreen/js/lockscreen.js +++ b/apps/system/lockscreen/js/lockscreen.js @@ -62,7 +62,7 @@ * in Settings API. * Will be ignored if 'enabled' is set to false. */ - passCodeEnabled: false, + passCodeEnabled: true, /* * Boolean returns whether the screen is enabled, as mutated by screenchange @@ -330,6 +330,27 @@ LockScreen.prototype.init = function ls_init() { this.ready = true; + + /** + * Ensure that settings state is up to date before deciding on pass code + * locking or not. This avoids a race conditions with Settings observers. + */ + var pEnabled = window.SettingsListener + .getSettingsLock().get('lockscreen.enabled'); + pEnabled.then((v) => { + this.setEnabled(v['lockscreen.enabled']);}); + + var pPassEnabled = window.SettingsListener + .getSettingsLock().get('lockscreen.passcode-lock.enabled'); + pPassEnabled.then((v) => { + this.setPassCodeEnabled(v['lockscreen.passcode-lock.enabled']);}); + + var pSettings = Promise.all([pEnabled, pPassEnabled]); + pSettings.then(() => {this.initStaged()}); + }; + + LockScreen.prototype.initStaged = + function ls_initStaged() { /** * "new style" slider: as described in https://bugzil.la/950884 * setting this parameter to true causes the LockScreenSlide to render
FxOS always opens the lock screen on reboot even if the user disables it in the settings. This turned out to be an artefact of the very same race condition that causes the pass code bypass. Has this been a conscious design decision?
(In reply to Christiane Ruetten [:cr] from comment #42) > FxOS always opens the lock screen on reboot even if the user disables it in > the settings. This turned out to be an artefact of the very same race > condition that causes the pass code bypass. Has this been a conscious design > decision? It seems like this is a UX design question, so I'm tentatively setting needinfo? Tiffany (UX) to invite comment.
Flags: needinfo?(tshakespeare)
Just a remind: I've attached my patch (see the comment) to fix this according to the current design of state manager (which manages most of UI changes right now and should take over the legacy LockScreen in the future, although we have discovered some concerns so we were trying to migrate to another new state machine, which eliminates lots of issues of this intermediate one). The patch is still in progress because I'm writing unit tests for that, and I could only unfortunately use a slow round-robing strategy to allocate my time on that, because I'm also dealing with some other critical issues in different components. So, what I'm saying is, if you're proposing a new patch, please note the thing should be considered with the fact that the legacy LockScreen, as I mentioned many times, now is in a progress to decentralize itself to multiple components. You'll never want to see another 2000+ lines 'lockscreen.js' appears again (that's what happened in v1.2: at that time it's a ridiculous monolithic component only smaller than the 'window_manager.js' which is also a ridiculous and messy one). And for the solution, as you can see in the second WIP patch (with f+), I did what you posted here as a state condition of LockScreenStateManager, not as a global "init lock" that would postpone every initialization things until it's fully read (result in what you mentioned: the "black lockscreen state on startup"). That's because although the solution is exactly what I also considered to adopt to solve the case, it really hurts our platform because: - User don't know what's happening: all they can see is a black screen without any prompt - While only the unlocking action depends on the value, LockScreen components should be able to do other rendering and preparation things **in parallel**. To block all of these by a value may or may not be used later indeed violates what the performance team is doing: reducing the time user can start use the app as much as possible. So what I've discussed with Tim is, according to Comment 30 and Comment 32, we should show the screen as soon as possible. And only block user only when the value isn't ready but the finger already dragged the slider to the end, which means the intention of unlocking now is presented. So the diagram would be: [Booting]----|show LockScreen | ---- done | | |other preparations| ------ done | user could unlock here | -----------------| |slider rendering | ---- done | | | |read the value | ---------- (an unfortunate long waiting at Comment 22) -- * --> unlock As you can see, to respect the performance and UX principlse we can't make the diagram as: [Booting]----|read the value| ---- (reading....) ---- |show LockScreen...| ----- |user could unlock here| Nevertheless, I think for 2.2 and 2.1, since I believe it's not really a regression but an old bug hidden in the previous tests and bootstrapping which is slower than now (so it covered the symptom well), if the mozSettings reading isn't too long, we may patch them as that.
Flags: needinfo?(cr)
I'm aware, and I'm all supportive of your rewrite, and I'm not proposing to replace your patch. But as you say: it's a huge and complex project, and I am merely posting minimally-intrusive fixes so that we don't stand empty-handed until your rewrite lands.
Flags: needinfo?(cr)
Okay. Thanks for your understanding and affords. I plan to complete this and land the patch in the next week, although it depends on the CI results. Sorry for the delay.
Sorry, *efforts*, a sleepy time :(
It's not remotely exploitable and requires a device falling into the wrong hands ("angry spouse attack"), so I'd down-rate this to sec-moderate, but I'm settling for sec-high because a bug like this can easily result in a serious PR disaster.
Keywords: sec-criticalsec-high
Update: fixed all unit tests of state manager, will continue to fix other failed tests for the patch.
Group: core-security
Flags: sec-bounty?
Okay. All tests are green: https://treeherder.mozilla.org/#/jobs?repo=gaia&revision=1094fb7bc7ecfdc157c0ea0cf0c56a619b847f87 I now will try to add new tests for the new parts. If everything is fine, I will be able to land it within these two days.
Is bug 1187765 a dupe of this one?
Oh, yes, I missed it in the bugmail, sorry.
Jumping into this, I see you are playing with mozSettings observers, and specifically moving code around. Please make sure you do not endup with a codepath that gets triggered multiple times within the systemapp without a matching removeObserver(): otherwise, this will just leak observers and slow down the system.
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #24) > There is a bug about mozSetting reading is too slow (confidential): Bug > 1100188. But for the missing animation I still need to dig in to find out > the root cause. Investigation on this topic showed that the big delay was because the main thread was busy, and thus messages delivery was taking quite some time. Given the context, I get that you are observing the latency during a boot time phase. And this is exactly the moment where this happens. So the 3.8 secs delay you are observing can just be because of this: slow async messages passing because of busy main process.
We have had some profiling, but it only shows the main thread is waiting somethings, not really busy. And we can't see what makes it waiting. Maybe the root cause is just like you said. But we also discovered some extra waiting in settings delay the whole booting time, and some of the requests seems from Gecko. For example, BT enabled value reading delay all the following settings at very beginning, and we've found they are all reading requests, so in theory they could be parallel executed. Anyway, for this bug we can try to find a specific solution first. I just mentioned that Cervantes and me prepare to do some study to see if mozsettings could be more optimised by making it more parallel. And since you once landed a patch about FORCE READONLY flag of setting requests, we want to know if you have some similar plans or concerns, so that we don't need to fix the duplicated bugs. (This is a BTW. But since you mentioned the relative point, I post it here)
We should focus on correctness instead of performance in this bug. It's pretty unacceptable to have a broken lock screen.
Yes. I will land the Gaia patch first. I have almost completed that (see my updates). We may fire another bug for performance.
(In reply to Richard Soderberg [:atoll] from comment #43) > (In reply to Christiane Ruetten [:cr] from comment #42) > > FxOS always opens the lock screen on reboot even if the user disables it in > > the settings. This turned out to be an artefact of the very same race > > condition that causes the pass code bypass. Has this been a conscious design > > decision? > > It seems like this is a UX design question, so I'm tentatively setting > needinfo? Tiffany (UX) to invite comment. If the user disables the lockscreen in settings, then how do they know how to unlock the device? The device should be respecting the user's settings and either prompt them or not depending on the their settings. Adding Harly for further comment if needed. Thanks!
Flags: needinfo?(tshakespeare) → needinfo?(hhsu)
Tiffanie, perhaps this is already clear, but it sounds like this may need clarification: we're talking about two types of locking in this bug: screen locking, requiring the user just to swipe, and pass code locking, requiring pin entry. The side-issue I brought up in comment #42 is that after reboot the device is always screen-locked even when the user disables screen lock. The rest of this bug is about an error in pass code locking, but both these have a common cause. But from your comment I take that invariably booting to screen lock was in fact not a deliberate UX decision.
Comment on attachment 8636455 [details] [review] [gaia] snowmantw:bug1173284-rev3 > mozilla-b2g:master Since it works on my device and passed all tests on CI (excepted one I fixed now) and local console, set review.
Attachment #8636455 - Flags: review?(timdream)
Comment on attachment 8636455 [details] [review] [gaia] snowmantw:bug1173284-rev3 > mozilla-b2g:master I have one major concern around this patch and I want to know if this is something purposely done to make turnaround faster or accidental sloppiness. |this.lockScreenStates.passcodeEnabled| could be either an anno object from |deferredRequest| or an actual boolean. You should try to pick one and stuck with it -- having to leverage dynamic-typing in JS almost always imply questionable design. One alternative to be schedule the state-getting promises on to the mainPromise so that we could ensure the states are always an up-to-date boolean when the first |doTransfer| is being run. In that way you don't need to do complex things like above nor |resolveInnerStates| etc. r=me if I am just wrong, if you don't agree with what I said, or you want to fix this in future bugs.
Attachment #8636455 - Flags: review?(timdream) → review+
Before I reply Tim, I suddenly found it's a regression the root cause is only on 2.2+ (master; see Comment 14). But obviously this bug exists in 2.0, 2.1 and 2.2. I think we better to clarify the regression status: if it's really a regression as Comment 14, the 2.0, 2.1 and 2.2 symptom before Bug 1094759 *couldn't* be the same bug. They must be with different root causes and deserves different solutions in different bugs. So I suggest that we should fire and investigate another bug for 2.2 and older branches before the bug 1094759, or the regression status won't be correct.
Flags: needinfo?(ktucker)
I think that Lockscreen can and must ensure that its (so far unsafe) defaults have been overridden by the settings listener callbacks before deciding on whether or not to screen lock or pass code lock the device. Let's just make sure now once and for all that this won't regress again. The rest of this problem complex lives in other components.
Reproducible on Woodduck as per bug 1185096#c10 Btw, when reproducing this bug it helps to build more CPU load on boot-up by booking the device into an encrypted wireless network, giving you more time to swipe.
Comment 66 doesn't affect the fact it's a regression from Bug 1094759, unless the analyzes are wrong. So I were not suggesting we should give up to patch them, but the bug status must be clarified. So, if they're uncovered by different root causes, we should fire different bugs for them, not try to manage different root causes in one bug with different patches. Otherwise, we should do the regression-window searching again, and to make sure whether it's a regression (since the flag and the conclusion is still held). If not, we need to clear it. It's just a simple flag problem.
By all means, let's restrict this bug to LockScreen. I was just saying that because of its central role to security we should make sure that it in itself has all the safeguards for securely operating in a regressive environment. So are we in agreement that within LockScreen the problem was a race condition with the settings observers not quickly enough overwriting the unsafe defaults? Or did I miss something?
1. If it's not a regression of Bug 1094759, because you and others raised that it appears in older branches, cancel the regression tag and the blocks of Bug 1094759: it's now not correct anymore. And we need to take a look to see if setting it as regression still makes senses, since we now don't have the regression window anymore. 2. If it's still a regression of Bug 1094759, we're discussion two kinds of bug that may have almost the same root cause of racing. But Bug 1094759 make the symptom worse. So, we may need two bugs to track the bug status correctly. Please, take a look at Comment 10 to Comment 14. And I must emphasis it again: I don't say the racing isn't the root case. But apparently the regression status might be wrong: whether it's a regression of Bug 1094759 by previous QA and my result, OR not. There is no third possibility. And I just raise a requirement to clear the bug status according to what we have now.
Let me preface this by restating what the principal problem is: SettingsListener.observe() not only registers the callback with mozSettings.addObserver(), it also triggers the callback by registering it in a mozSettings.createLock().get() 'success' event listener. This is meant for initializing the value on the listener side, but it can take up to several seconds during system bootstrap. This is the race condition we're talking about. Regarding the regression: I have looked at commit 02a14c282c5fb391675246e627a10d1508418c40 from March right before Alive landed his first patches in bug 1094759. The basic anatomy starting from lockscreen_bootstrap.js is the same: 1. this.lockscreen = new LockScreen(); - sets local defaults - .enabled: true - .passCodeEnabled: false - calls .lockIfEnabled(true) even before registering observers - registers settings observers for .enabled and .passCodeEnabled - this starts the race 2. window.lockScreen = this.lockscreen; 3. window.lockScreenStateManager = new window.LockScreenStateManager(); 4. window.lockScreenStateManager.start(window.lockScreen); - maintains it's own state about passcodeEnabled - registers own settings observer for that, no sync with lockScreen - only lockscreen interaction ever is checkPassCodeTimeout 5. this.lockscreen.init(); - activates screen lock if .enabled And later: - user swipes right on LockScreen, triggering event that leads to LockScreen.unlock() - LockScreen.unlock checks .passCodeEnabled, unlocks without pass code if (still) false There was and is no logic that prevents that LockScreen.unlock is reached before the initial settings observer callbacks are fired, overriding the unsafe defaults. If the callbacks fire before swiping to the right and LockScreen.unlock(), pass code entry appears. If they don't, the phone unlocks without passcode. Everything points to the race condition existing prior to that, but it never was a problem as long as the callbacks were fired faster than the user could execute a swipe. I'm sure there were other patches along the way that increased system load and shifted the first round of firing mozSettings observer callbacks to a later and later time to a point where it became relevant. Hence it doesn't make sense to single any one out pushing the race condition beyond some acceptable limit. TL;DR: bug 1094759 did not introduce the observer callback race condition, which I consider to be the root cause. If anything, I speculate it has only emphasized it by making lockscreen bootstrapping faster.
No longer blocks: system-bootstrap
Keywords: regression
I must correct my analysis write-up: The last two items of 1. actually belong to 5., and 1. just sets defaults. Sorry for the spam.
In reply to Comment 65, I double checked the regression window in Comment 14 and it is absolutely correct for this bug. This bug as written in Comment 0 only reproduces on the latest Flame 2.5 build. I cannot reproduce this issue on the latest Flame 2.2, 2.1 or 2.0 builds so that must be a different issue with a different set of steps and a different regression window. Device: Flame 2.5 BuildID: 20150728115244 (issue reproduces 100% as written) Gaia: 862f0895f3f5a97200601542d99a152a46385a0b Gecko: 833403badc41 Version: 42.0a1 (2.5) Firmware Version: v18D User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0 Device: Flame 2.2 (Does not reproduce) BuildID: 20150728111942 Gaia: e1e6317f17a840b19af9dbb25f5a771d8d9fa161 Gecko: 813cb3d271f0 Version: 37.0 (2.2) Firmware Version: v18D User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 Device: Flame 2.1 (Does not reproduce) BuildID: 20150723133047 Gaia: 9dba58d18006e921546cec62c76074ce81e16518 Gecko: 41e10c6740be Version: 34.0 (2.1) Firmware Version: v18D User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 Device: Flame 2.0 BuildID: 20150722193116 (Does not reproduce) Gaia: b16ba05481e577bc644ed8966f587a70fe2148e6 Gecko: 2e6f1d4deff9 Version: 32.0 (2.0) Firmware Version: v18D User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0 This issue as written in Comment 0 appears to have been caused by the landing for bug 1094759.
Keywords: qawanted
To clarify the steps given in Comment 6 should be written up as a different bug. However, I cannot reproduce that issue manually following those steps. I'm sliding to unlock as soon as the slider appears but it still prompts me for a passcode on 2.2, 2.1 and 2.0. On master, I am not particularly rushing. I waited like 3 seconds before sliding to unlock after restart and the passcode is not prompted. I hope this helps to clarify what is going on here.
I can reproduce the issue on those builds by swiping before the ui appears. Keep swiping frantically and you can get passed the lockscreen.
Kevin and I spoke offline. I think the difference might have been I have SIM security turned on my SIM. I'll reinvestigate the issue for 2.0, 2.1, 2.2 He reproduced not seeing the SIM pin dialog for 2.2 We probably should separate what I'm seeing into a separate bug from this original issue.
While QA puts great efforts to clarify the situation, I shall land the patch to solve what we have now, for master (2.5). I would also take a look at other branches after that. But first, I need to switches temporarily to the recent intermittent errors of Gij, which bothers lots of people and make test unstable. Naoki: please fire a different bug if you caught the different symptom, thanks.
Okay the patch passed all tests: https://treeherder.mozilla.org/#/jobs?repo=gaia&revision=4586ecefd52a1ff1ea87f7db7f53f9ee9d6ef9b3 and it works on my device. So I now land it.
Keywords: checkin-needed
master: https://github.com/mozilla-b2g/gaia/commit/8bc364b351531cfaaa033229ec2dd81d8e5ebe7d I'm looking for any possible regressions. So NI me at this bug it they do happen. Thanks.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
master: https://github.com/mozilla-b2g/gaia/commit/8bc364b351531cfaaa033229ec2dd81d8e5ebe7d I'm looking for any possible regressions. So NI me at this bug it they do happen. Thanks.
Ahh sorry I suddenly saw one debugging line left in the patch. I now fix it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached file The new PR
I created a new one and it's the same with the old (landed) one, but without the debugging line.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Naoki, KTucker, we have successfully reproduced this bug all the way back to 2.0 on Woodduck. Why are you resetting affected status?
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(ktucker)
This is a pretty critical flaw, we should probably sec-review this to make sure this is fixed for sure - can you make sure this is done please cr? Thanks.
Flags: sec-review?(cr)
Keywords: checkin-needed
Target Milestone: --- → FxOS-S4 (07Aug)
Please note the symptom of Bug 1188716 is very different from this bug: in this one, user can easily reproduce it by dragging the slider in an ordinary scene, while in Bug 1188716, as I've tried and the video shows, it needs some edging action to by-pass the lock. If user slide it when the slider shows up, it's hard to reproduce. So, although these bugs are both critical for security and they're with the same root cause, I can fix them with different solution that all suppose to plug the hole. The only different these two patches is in this bug, the patch cares about user experience and the new bootstrapping changes (which definitely makes it more complicated, but also necessary), while another one needn't to care about that since the reading time is much shorter. As a conclusion, I now 1. Fixed this bug while caring user behavior & new bootstrapping 2. Fixing Bug 1188716 with a much simple patch. It is because we can block the by-passing behavior in a simpler way, since the bootstrapping is the legacy one and the reading time user must to wait is not so long So if the code is implemented correctly and will not trigger any new regression, and security team review it as Paul said, we may claim this/these bugs are fixed.
See Also: → 1188934
CR, per comment 88; in comment 35, I used different steps when testing 2.x which is listed in bug 1188716. We seperated them into two separate issues as there was a major regression which is an issue and a harder to reproduce security issue that was split off in bug 1188716. Greg is going to try to address both. I think we're all on the same page that we don't want this bug to escape. Just separating this one from the other one is probably better as ktucker and Greg had noted.
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(ktucker)
Flags: needinfo?(cr)
Naoki, I am able to reproduce this on my Woodduck with 2.0m with just the STR in comment 0 (and without any SIM pin involvement). The mechanism of the flaw is the same, and the code in 2.0 is essentially identical in that respect (I verified). This bug needs to be addressed all the way back to 2.0. KTucker's approach of testing this bug against different B2G versions on the relatively powerful Flame is not suitable for this type of bug. Since it's a race condition, the race window's length varies greatly with device performance and system load during boot up. We already know that latest B2G versions put so much stress on the system that it pushes the race length beyond acceptable limits even on powerful devices like the Flame. And even if the race window is too short to reliably exploit on a Flame with 2.0, it doesn't mean that it's neither exploitable on a slower device or when the device is connecting to an encrypted wireless network, or connected via Bluetooth, or anything else that increases system load during boot-up. Consequently I am resetting affected status and re-opening this bug as incomplete, so we don't lose track of this. If different B2G versions require different fixes, please address this in separate blocking bugs, but please don't reset the tracking flags again unless you can prove my analysis wrong, or – at the very least – test the bug negatively against the slowest device running the most load-demanding config we need to support. I would propose to remove the 2.5 blocker once the sec review of Greg's patch is finished, but I don't expect any issues there.
This should be fixed - cr misunderstood the status meanings here.
Resolution: INCOMPLETE → FIXED
So I feel Christiane is treating this as a "meta" or "tracking" bug for tracking according to the words ("...don't lose tracking ..." and "address this in separate blocking bugs") . I don't mind people to reset the tracking flags if that is true. But that intention should be stated clearly, so maybe we should change the title to add some [tracking] or [meta] to follow the usual convention of bug management.
I think Christiane's main concern was just that this does in fact affect earlier versions, and that is now reflected in the status flags.
Greg - it might be too late, but how possible is this fix to backport to 2.2? I wonder if there is any chance we can get a fix to partners before they ship?
Flags: needinfo?(gweng)
No. This fix is only for code after bug 1094759, as I said at Comment 65. And please note that's why I asked if we should raise another bug with a different symptom which also deserves a different patch, although the root case may be the same. Because before bug 1094759, there is no need to handle the LazyLoading + abnormal reading delay, which significantly add the effort to deal with the bug. You can take a look at Bug 1188716: the solution is simple and thus it's with fewer risks to land it in 2.2, 2.1 and maybe 2.0. And I need to mention that I can't reproduce this bug on 2.2 without the rapid sliding at very beginning of bootstrapping, just like QA did at bug 1188716. So I feel it will be great if Christiane can upload a video about how to reproduce it as STR0 (Flame and Wooduck). Basically, if your reproduction needs to start slide it *before* you see the unlocker, or the settings reading doesn't cost the abnormally long time to complete, it could be solved by applying the patch for Bug 1188716. Otherwise, we *might* need a more complicated patch like this one. Unless UX decided it's tolerable to fix the issue with the freezing delay.
Flags: needinfo?(gweng)
So Paul, that's what I really don't know why secure team looks like to fix them in the same bug. To fork a bug with different symptoms and with different patches won't hurt the fact that they're both valid bug and *all* fixed the same flaw, while we can avoid to uplift the large patch to the stabilizing branches. Moreover, to mix bugs with different symptoms on different branches always mess bug status. To separate them is usually a better way to manage them. And, I'm just so tired to explain that again and again (*see* how much comments I can refer to), while what I'm talking about seemed totally ignored. It's also really annoyed to change the tracking flags without asking (NI) he who opposes it earlier. Please, if someone disagree others' opinions, at least the one should take a look at what the opponent said, and counter strike them based on the opinions or the new solid facts.
Could someone CC me to the newly referred bugs here, please?
Greg, I don't see how producing video proof is going to help. AFAICS, the rapid swiping that helps exploiting Woodduck isn't indicative of a different bug and it isn't even a different symptom. It merely has two effects: 1. It increases system load by hammering the event queue, thus widening the race window. 2. It helps hitting short race windows. Rapid swiping is just an advanced exploit variant of STR0, and the underlying issue is the same: The unlock event is fired before the settings were read. Greg, if you are really convinced that rapid swiping is a different issue, please, point out the code flow to me that is responsible. I can't see how. Until then I maintain that it is as outlined comment 72 / 73. However, there's a very simple way to shut me up about this as proposed in bug 1188934 and before: We could change lockscreen.js: passCodeEnabled: false; // insecure to lockscreen.js: passCodeEnabled: true; // secure in all affected versions, and this problem ceases to be a security issue with a simple one-liner. Its frankly very minor UX regression of having to swipe twice – and that if and only if you're fast enough to hit the race window – could then be dealt with in separate, non-security-relevant bugs, and in the same way that Greg is planning to, anyway. I'd rather have a working pass code lock than never ever having to swipe twice.
Moving tracking of uplifting efforts in affected branches to tracking bug 1189314. If you fix the bypass in other releases, please open a separate bug that blocks 1189314.
I don't know how many times I must repeat that the difference is matter for the different solution. *Sigh* As I said (yes, I've said, again), why it matters is because according the symptom, we can prepare different solutions to eliminate the racing. One is to delay the unlocker to block user action before we successfully reading the value. This is a very simple solution that you also proposed at Comment 41, although your version blocks more than the unlocker (you blocks the whole initialization of the lockscreen). The advantage is it's very simple, so you it's not to risky to uplift to other branches. However, the shortage is it could seriously hurt the user experience if the bootstrapping time is short and user can see the screen lock soon, and that's what we now have after Bug 1094759. So the second solution I used here, is to block the component only when the value is really needed, which means before user unlock it. This gives us a better solution respect performance and UX requirement, plus the changes of new bootstrapping. The shortage is the patch is complicated and mixed with lots of new changes don't exist in the old branches. So to uplift such patch is risky. That's what I've and they still hold true even you still don't get it. Time to cite: > I don't see how producing video proof is going to help. If you know what I'm talking about, and if you've seen my solutions and their differences and what I concern, you'll know it's important to help. > AFAICS, the rapid swiping that helps exploiting Woodduck isn't indicative of a different bug and it isn't even a different symptom. No. Even though they're the same bug (Comment 65, Comment 88, Comment 95 and Bug 1188716 Comment 4), the different racing window before user can see the screen locker means whether we can adopt a more simple solution or not, which is important for uplifting since we need to reduce the risk AMAP. > It merely has two effects: > 1. It increases system load by hammering the event queue, thus widening the race window. > 2. It helps hitting short race windows. > Rapid swiping is just an advanced exploit variant of STR0, and the underlying issue is the same: The unlock event is fired before the settings were read. Greg, if you are really convinced that rapid swiping is a different issue, please, point out the code flow to me that is responsible. I can't see how. Until then I maintain that it is as outlined comment 72 / 73. You're just repeating what I said, and you still don't get it: yes, they're the same bug, and they're the same issue, but they could be solved with different solutions which are better for different branches. And the solutions are both valid since they both block user to unlock the screen by-passing the passcode, but their concerns are not different. So is "Rapid swiping is just an advanced exploit variant of STR0" ? Yes, and I never said it isn't. But if a bug need to be reproduced by rapid swiping + early swiping at bootstrapping, especially when it's in an old branch, it definitely deserves a different solution. *Repeat again*: they're both valid solutions for the same bug, but they're good to the different cases. > However, there's a very simple way to shut me up about this as proposed in bug 1188934 and before: We could change > lockscreen.js: passCodeEnabled: false; // insecure > to > lockscreen.js: passCodeEnabled: true; // secure > in all affected versions, and this problem ceases to be a security issue with a simple one-liner. NO. It *covers* the racing again, and the mindset like this is what really doesn't help. What surprise me is you proposed this as I haven't submit any valid solution, and as you never saw my patch has been landed and waiting for future validation. The fact is, that value should never be used if it's still not read from db. So we either block user action requires that value, or we block the whole initialization of the unlocker. And that's what the two patches implemented for. > However, there's a very simple way to shut me up about this as proposed in bug 1188934 and before: We could change I rather you seriously read what I commented and proposed and landed. If you have found any defects in my patches and concerns for different branches, you can pick and put them here, so we can make it more secure and sophisticated. That's what a effective and ordinary bug should do. But to push me to explain it again and again, and to ignore them all, is not the way to solve the case.
(In reply to Christiane Ruetten [:cr] from comment #99) > Moving tracking of uplifting efforts in affected branches to tracking bug > 1189314. If you fix the bypass in other releases, please open a separate bug > that blocks 1189314. That's what Bug 118871 is for. It's already open, and you're also in the CC list and has been NIed.
Greg, I really tried, but we're moving backwards. Please stop blaming me for your refusal to adopt a security mindset even for the sake of this discussion. Just because I don't interpret things your way doesn't mean that I don't understand what your approach and motivation are. You made that plenty clear by now and repeating your stance until I fall in line isn't helping. While there's lots to be said about comment 100, I'm not going to debate this any more.
We will need this landing backed out. It seems to be causing a smoke test blocker. Please see bug 1189455.
Flags: needinfo?(nhirata.bugzilla)
CR and I spoke about this bug. We both thought the current situation of having to reboot the device is better than having the security issue that the patch tried to resolve. I'm thinking we also have to reevaluate when we're doing backouts to figuring out which break is worse. The break that the patch is resolving or the regression that it causes. I would prefer to either talk to Greg first before backing out, or more so let him decide in this case esp since he's the owner of the patch.
I submitted the patch at Bug 1189455.
Just FYI for other readers of this bug - I've taken the discussions above to email and face to face meetings to figure out the jumble of bugs, versions and issues and figure out a path forward here. I'll post back in this and related bugs once we get a solid plan.
Flags: needinfo?(nhirata.bugzilla)
Flags: sec-bounty? → sec-bounty+
Bug 1188716 covers fixing older releases, so wontfixing this bug for better tracking.
Group: b2g-core-security
Flags: needinfo?(hhsu)
Group: core-security → core-security-release
Flags: sec-review?(cr) → sec-review+
Whiteboard: [v3.0-nexus-5-l] → [v3.0-nexus-5-l][b2g-adv-main2.5+]
Summary: [Lock Screen] pass code bypass after reboot due to settings observer race condition → Firefox OS Lockscreen passcode bypass due to settings observer race condition
Alias: CVE-2015-8511
Summary: Firefox OS Lockscreen passcode bypass due to settings observer race condition → Firefox OS Lockscreen passcode bypass due to race condition
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: