Closed Bug 1188934 Opened 5 years ago Closed 2 years ago

Lockscreen: harden .unlock() against settings observer callback race conditions

Categories

(Firefox OS Graveyard :: Gaia::System::Lockscreen, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cr, Unassigned)

References

Details

(Keywords: sec-want)

Attachments

(1 file)

Bug 1173284 was caused by a race condition with the settings observer callback that it registers. The current fix ensures that LockScreen.unlock() is not called from the outside before Settings have been read.

The proposition is to harden LockScreen itself against the race condition by employing internal checks that lead to refusal to unlock while the race condition is still in progress.
Attached file Github pull request
This adds an internal check to LockScreen.unlock() that detects whether
or not the settings observer callback has written the user-configured
settings value for passCodeEnabled. If it hasn't, unlock() gracefully
refuses to unlock, and logs an error message to the console.

It also introduces LockScreen.forceUnlock() that does not check for
the race condition (identical to the old .unlock() behavior). Please
use this if you really mean to unconditionally unlock and circumvent
the pass code check.
Makes sense?
Flags: needinfo?(gweng)
Keywords: sec-want
You should submit the patch with a clear intention of architure planning, unless you're provide some hotfix for existing bugs. I said lots of times, and I don't know if you're aware of that, but we tried to avoid naïve patch like you do, to add more control flag to fix the architecture problem with bandages. For me, to fix this properly within a tolerable schedule, we should create an async promise queue and manage these racing with that. But again, I don't even know if you have noticed what  I said, that we should and already stop growing LockScreen and move the logic to the state manager, which could handle these check with the queue and state transferring more reasonablly now. As a result, the patch should be implemented by simplifing the unlocking method instead of growing it, meanwhile to protect it from the outside callers. And maybe the most important, stopping adding more flags unless to arrange and queue the dependent steps couldn't solve the case. People suffered from that, a lot.
Flags: needinfo?(gweng)
In my view this check is necessary to prevent future regression, and it's exactly where it needs to be, which is where the race is started and finished. It's certainly preferable to address all of this in an architecture overhaul, but until then the patch would have provided a way forward in terms of security.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Greg & Christiane,

Can you work together on that. Which means that Greg explains how he would like the fix to be done (no need to rant about anything else) with the state manager. I guess pointers to the relevant code could be helpful fro Christiane to move faster.

thanks!
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I understand you want to address all of these issues at a time and fix them all. But please consider we already had the terrible v1.3 window manager which is a giant closure that no one really knows its things are for, and the poor screen manager now no one is really be able to care it, not to mention the notification screen almost messy as the same with LockScreen because of the same legacy code issues. All these are the victims of using flags and monolithic "design" to manage complicated racing and asynchronous flow (and they're not my legacy: I always tried to reduce the size and decentralize them, but never had the luck to achieve the great success).

In the LockScreen case we almost have the same problem, too. So that's why I said to prevent to patch the old component by the old method is very important. Moreover, after the current urgent bugs are cleared, I'm going to remove most of reading and event listening inside LockScreen soon, and hope to deprecate it eventually. In that way, at least we could make sure the logic of managing these states and UI transition are unified and belonging to each states. So if you or anyone want to boost the progress, please consider to start from that. I think the first step is to remove the settings event listeners from LockScreen, and leave it to the state manager and even each state (if it really needs that).
My perspective is that lockscreen.js and its unlock functionality are highly security-critical components, and as such they deserve internal security checks and safeguards that prevent catastrophic failure triggered by brittle code in other places – like we witnessed. The hardening patch I proposed didn't address the code problems we have in the other areas, of course, it just helped to detect and prevent failure, which my experience says will likely happen again in the future. My perspective is also that adding missing safeguards to security-critical components is more important than not growing legacy code, but hey, pushing that opinion part of my job. ;)

I think there is a viable compromise that addresses your concern for not adding to the legacy, which I perfectly did and do understand: I revised my pull request to just change the default to .passCodeEnabled: true and add a test for that. I proposed this as my very first hotfix in bug 1187368, but then it had the side effect that slider input would be ignored while the race was on. This isn't a problem anymore with your latest patch in place, AFAIU.
That being said, adding proper safeguards to high-level logic usually requires much more elaborate analysis and design than on the lower levels. I will see what I can do by the side in the coming weeks when more of Greg's architecture changes trickle in, or that I allocate time for it next quarter. I hope I'll be able say something more substantial after finishing the sec review of your patch for bug 1173284.
Blocks: 1189314
It is unclear to me what's left to make this bug considered fixed?
Flags: needinfo?(gweng)
No? It's still reopened. Right now, there are several options for this bug (since I don't know if you're asking for a short-term or mid-term plan, which are all in the plan), like:

1. We take a look at the method, and firstly protect it. However, the problem is now "unlocking" relates to several methods in different instances, which was especially indicated in Bug 1173284, so to only take a look at the method, namely strictly follow what this bug wants is no enough.

2. We protect the whole unlocking logic by tracing and verifying the whole path, which is back to the original intention of this bug. However, Bug 1173284 already fixed the major flaw and it has been there for 5 months, so I don't know if we can find out more obvious racing cases that this bug want to address without a fully test plan. Right now what I can do for this solution is to test remaining possible edge cases, and to fix them with some limited architecture changes (still not prefer tricky flag manipulations).

3. If the work in 2 shows no obvious racing of unlocking, we then wait for device locking API to be implemented after the next workweek and Gip-to-Gij. It could provide a more clear view of such racing + mozSettings issue since the API already takes settings into consideration, so we can eliminate most racing cases by cancelling the code to manipulate mozSettings and locking/unlocking in the component. However, the issue here is it won't work during at least these two weeks, although I feel it's okay to wait since currently we haven't seen more lethal cases so far.
Flags: needinfo?(gweng)
Firefox OS is not being worked on
Status: REOPENED → RESOLVED
Closed: 5 years ago2 years ago
Resolution: --- → WONTFIX
Group: b2g-core-security
You need to log in before you can comment on or make changes to this bug.