Closed Bug 1592204 Opened 5 years ago Closed 5 years ago

Ignore unknown files in repositories and origin directories during temporary storage initialization

Categories

(Core :: Storage: Quota Manager, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: tt, Assigned: tt)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

It's one of the major issues in storage initialization failures. Now, if there is one and we just stop all the process of initialization.

The filenames of these files seem various (some of them are highly likely to be created by virus), so we take another approach rather than maintain allow-list and the deny-list.

I think we should investigate other options. We need a robust solution with good performance long term.
If we just start ignoring unknown files and directories, the performance will degrade more and more since we would need to check them every time the storage is initialized.

(In reply to Jan Varga [:janv] from comment #1)

I think we should investigate other options. We need a robust solution with good performance long term.
If we just start ignoring unknown files and directories, the performance will degrade more and more since we would need to check them every time the storage is initialized.

I was thinking of trying to delete these files and ignore the results, but I only mentioned ignore. But maybe this still needs to be discussed.

We suspect this to be one possible root cause for the Quota Manager storage issues that led to bug 1592136.

Type: enhancement → defect
Priority: -- → P1

I'm working on a plan how to do this properly. There are many tricky cases like IndexedDB directories, directories that were partially deleted, files that can't be removed, very long filenames, etc.
People also make backups of sqlite databases parallel to our sqlite databases (in the same directory), so removing such files can make some users unhappy. This all needs to be reflected in the plan.

(In reply to Jan Varga [:janv] from comment #4)

People also make backups of sqlite databases parallel to our sqlite databases (in the same directory), so removing such files can make some users unhappy. This all needs to be reflected in the plan.

Mike, does product think it should be a Firefox feature that users can store arbitrary files in their quota-managed directory, some of which may look like valid data to QuotaManager but otherwise violate our invariants? I would suggest we should not given that it increases system complexity and is hard to support because a user may accidentally do something that confuses QuotaManager and its clients leading to the breakage like we're currently experiencing with LSNG and mandating its backout.

Flags: needinfo?(mconca)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #5)

(In reply to Jan Varga [:janv] from comment #4)

People also make backups of sqlite databases parallel to our sqlite databases (in the same directory), so removing such files can make some users unhappy. This all needs to be reflected in the plan.

Mike, does product think it should be a Firefox feature that users can store arbitrary files in their quota-managed directory, some of which may look like valid data to QuotaManager but otherwise violate our invariants? I would suggest we should not given that it increases system complexity and is hard to support because a user may accidentally do something that confuses QuotaManager and its clients leading to the breakage like we're currently experiencing with LSNG and mandating its backout.

Support for arbitrary files stored in the quota-managed directory (or any other Firefox directory, for that matter) is not a desired feature. The structure and usage of that storage area is not an officially documented interface to Firefox and users should have no expectations for any data stored there.

Firefox should be robust to unknown files in the storage area, but I think ignoring, moving, modifying and even deleting extraneous files is within bounds. Data integrity is a fundamental principle of any storage system, and Firefox needs the freedom to deliver on that promise, particularly if we think any of the extra files are being introduced for malicious purposes.

Idea - perhaps the current storage system could be copied to an entirely new top-level directory (e.g. /storage2), the old Firefox storage files deleted, leaving only the miscellaneous files behind. The preserves any potential user data already stored while giving Firefox a fresh start. Any new files that appear in the /storage2 area would be deleted (if possible).

Flags: needinfo?(mconca)

(In reply to Mike Conca [:mconca] from comment #6)

Idea - perhaps the current storage system could be copied to an entirely new top-level directory (e.g. /storage2), the old Firefox storage files deleted, leaving only the miscellaneous files behind. The preserves any potential user data already stored while giving Firefox a fresh start. Any new files that appear in the /storage2 area would be deleted (if possible).

Yeah, I was thinking about something like this. I'll give it more thought.

Leaving the files behind that we don't understand suffers from a potential ongoing privacy leak because the existing file path structure fundamentally includes information we consider sensitive.

That is, let's assume a user visited the website "https://incriminating.example.com". Data for the origin is stored under PROFILE/storage/default/https+++incriminating.example.com. An errant file created by an unpopular file-manager that QuotaManager doesn't know about and doesn't begin with a dot (and therefore isn't known as something that can definitely be deleted) got in there and has broken QuotaManager entirely in the hypothetical.

Anything that retains the path information retains sensitive information. Anything that doesn't increases the complexity. For example, creating a lost-and-found directory that has to deal with duplicate file name collisions.

A compromise would be to create lost-and-found under the origin directory, so unknown stuff would get deleted along with the origin (in that case the deletion would be justified by explicit user action or temporary storage eviction of not persisted origins). Code which traverses directories would have just one check for lost-and-found and ignore that directory (for example, not recursing to that directory during initialization and excluding it from usage calculation).
Not sure if this a good idea, but we could also add a button to "Manage Cookies and Site Data" which would trigger removal of lost-and-found. We could add a nice explanation for that.
I'm afraid that users read release notes only if something stopped working for them, so saying in release notes that we will start deleting unknown stuff in release XX would have limited value.

A compromise between what and what, though? We don't support storing arbitrary data in the storage directory. We also don't support backups of Firefox profiles at a granularity less than the whole profile; one can't take an arbitrary IndexedDB database from an old profile and copy it into a new profile and reliably copy it into a new profile and expect it to work.

I meant a compromise between not deleting unknown files immediately and performance since there would be at most one additional directory entry to check when we scan origin directories next time. I just don't want to replace one extreme when we stop the initialization process on first unknown file with another one when we delete everything unknown immediately. Adding a transition period would be safer IMO. What if there's a bug in our new "cleanup storage directory" function and we accidentally remove something that we shouldn't have removed ? With lost-and-found, there's still a chance to repair it. I know, it sounds too paranoid.
We didn't officially support using the same profile with older FF versions, but we still tried to avoid incompatible changes if we could until we had profile per channel (because people were doing it anyway despite being not officially supported). I think implementation of a simple lost-and-found directory should be quite easy.

Though I like the discussion here: Can we start to describe this in a document? It gets hard to follow all these comments. Once we fix that document, we can use it as new description of this or a new [meta] bug?

I think a document would be good once we have a consensus for this.

(In reply to Jan Varga [:janv] from comment #11)

Adding a transition period would be safer IMO. What if there's a bug in our new "cleanup storage directory" function and we accidentally remove something that we shouldn't have removed ?

We've already been making a safe, slow transition:

  • We[1] identified the specific places QuotaManager was failing initialization in bug 1436188.
  • We gathered all unknown file names for roughly a week on nightly in bug 1529016 (which we disabled in bug 1561611).

Any cleanup logic that deletes should:

  1. Have a preference so that we can immediately disable the logic. We can prepare a disabling patch, have it pre-reviewed and ready to land on lando immediately if we need to disable it.
  2. Have telemetry that reports how many files/origins were wiped (which we should monitor).

Additionally, all fixes ride the trains like normal.

Additional failsafes may also be appropriate, such as making sure that any recursive deletes we perform canonicalize the path for each deletion and verify that the path prefix matches the expected PROFILE/storage/ prefix or origin prefix. (The thing to always be concerned about with recursive deletions is symlinks that cause you to leave the sub-tree you thought you were deleting. I believe we/Gecko may already have preventative logic in place, but it's something I'd want to triple-check.)

With lost-and-found, there's still a chance to repair it. I know, it sounds too paranoid. We didn't officially support using the same profile with older FF versions, but we still tried to avoid incompatible changes if we could until we had profile per channel (because people were doing it anyway despite being not officially supported). I think implementation of a simple lost-and-found directory should be quite easy.

Our motivating concern about people using the same profile with different versions was that it entirely broke QuotaManager and all its quota clients and potentially the only automatic fix that didn't involve UX was wiping all of the QM storage. In this case we're talking about deleting files that are breaking QuotaManager and are not and cannot be visible to content.

My concern about the lost-and-found implementation continues to be that it adds complexity to an already complex piece of software for a use-case we don't support (users intentionally put files there) and that isn't supported by the data we gathered (bug 1529016). It's obviously something we can do, but I don't think it's something we should do.

1: Actually, :tt did pretty much all the work ;)

I understand, but those file names were collected on Nightly only and only in origin directories non recursively. I also understand your point on adding complexity to an already complex piece of software, but I think lost-and-found shouldn't be that hard to implement, we can create just one generic function that should be able to move any unexpected file/directory to lost-and-found. That function can later transform into file/directory removing function or there can be a preference and the function would either move unknown stuff to lost-and-found or just delete unknown stuff immediately.
I will be happy if we later find out that there are no important unknown files, but I wouldn't be happy if we deleted something important even if it affects only few people.
I also wouldn't worry about additional complexity so much. We implemented shadow writes in LSNG, that wasn't trivial for sure, I thought it was useless initially and that we will remove all that code soon anyway when we remove old LS implementation, but it's now obvious that it is worth every penny.

The benefit of being able to pref off LSNG at any time was a clear justification for the implementation complexity of shadow writes. This was a benefit for every user of Firefox in the event of breakage.

Telemetry from activity stream indicates there are at least 86,000 users on release who would have experienced LocalStorage breakage until LSNG was pref'ed off. Their IndexedDB, Service Workers, Push Support, Web Extension storage.local persistence, etc. all continue to be broken. My fundamental concern continues to be that the risk/reward for spending extra time and effort on a piece of functionality to support a small number of hypothetical users with hypothetical important files stored in an explicitly unsupported location that is non-persistent by default is harming these 86,000's users experience by delaying our fixes and/or providing an opportunity for additional breakage that leaves their Firefox profiles broken for another release. The alternative fix for these users is to use "Refresh Firefox" which definitely loses all of their QuotaManager data, all of their extension configuration, etc.

Additionally, the lost-and-found directory has privacy implications along the lines of comment 8. What if the files we move were auto-generated thumbnails of IndexedDB blobs of images which contain the user's private data, and so we are now moving thumbnails of the user's private data? How is the user supposed to know about the data? How is the user supposed to clear the data? Please keep in mind that we never managed to ship a UI in Fennec or currently in Firefox Preview that lets the user list the origins stored by the browser and allows them to clear them, so it's unlikely that any additional UI would be forthcoming for such a feature.

(In reply to Jan Varga [:janv] from comment #9)

Not sure if this a good idea, but we could also add a button to "Manage Cookies and Site Data" which would trigger removal of lost-and-found. We could add a nice explanation for that.

I realize this somewhat addresses my 3rd paragraph above in comment 16. It's not clear to me that there is a good explanation for this that can be concisely presented and that we can expect users to be able to parse and act on.

My best possibility would be: "We detected some unknown files in your storage directory when upgrading your Firefox that Firefox was unable to identify. They may include portions of your private data. [See in file manager] [Clear these files]"

This continues to strike me as something:

  • Most users would not look for or otherwise see.
  • Most users would not be able to act on.
  • Might cause decision anxiety and they do nothing, leaving the data around.
  • Might cause users to search for information about the files and end up on tech support scams or other.

And it's still not clear to me what an example of a successful scenario would look like where this functionality helps a user out. I worry that the goal here is for us not to get yelled at by users in Bugzilla which should not be a primary concern in decision making.

We've already experienced a situation like this and I think we should follow the famous CheckedUnsafePtr approach :) I'll post more details later, stay tuned!

So we have a situation when people propose basically 3 approaches for unknown files:
Ignore them all!
Move them all!
Delete them all!

Solution: IgnoreMoveDelete them all! (obviously not at the same time)

Each approach has its upsides and downsides and as we know the time is limited and we need to act. So I propose that we do this in stages:
FF 71: Start using our own telemetry for temporary storage initialization success rate

FF 72: Start ignoring unknown files in /storage as much as possible with the goal to improve temporary storage initialization success rate as much as possible (it can happen that we'll find out that unknown files are not the only reason for storage initialization failures)
We can also add logging to the console when we find unknown stuff.

FF 73: Start moving unknown files in /storage to lost-and-found sub directories. If an unknown file appears again we'll move it again with the possibility that the previous one gets overwritten in lost-and-found sub directory. Exact details can change because we'll have more time to think about it.
Logging to the console is again an option.

FF 74/75/76 (We'll decide later): Start deleting lost-and-found sub directories and also any new unknown files immediately.
We may add logging for this too.

I think this addresses most of the concerns mentioned so far (either mine or others').

I have some other ideas how to improve storage in future (also using lost-and-found), but it needs more work and I don't want to solve it in this bug, otherwise we never end this discussion :)

(In reply to Jan Varga [:janv] from comment #19)

FF 72: Start ignoring unknown files in /storage as much as possible with the goal to improve temporary storage initialization success rate as much as possible (it can happen that we'll find out that unknown files are not the only reason for storage initialization failures)
We can also add logging to the console when we find unknown stuff.

Why can't we do this step already in FF 71 and 70.x? I think we need to do something in 70.x beyond disabling LSNG to fix the affected users' IndexedDB & Cache.

I would also recommend to separate concerns a bit:

  • this bug is (as of its title) a defect named "Ignore unexpected files during storage initialization". We should just do this right now and lift it up to all affected versions we can apply it safely. And center the discussion here only on issues around "how to safely ignore them", if there are any.
  • all the rest is about avoiding bloating of the directories with unexpected files and/or keeping our directories clean. I would like to see telemetry data on the amount and size of such files before spending effort on this. It might not even be necessary to do anything here, once we don't stumble over them any more. And I'd propose to move this discussion to a new bug, if we want to continue it.

(In reply to Simon Giesecke [:sg] [he/him] from comment #20)

(In reply to Jan Varga [:janv] from comment #19)

FF 72: Start ignoring unknown files in /storage as much as possible with the goal to improve temporary storage initialization success rate as much as possible (it can happen that we'll find out that unknown files are not the only reason for storage initialization failures)
We can also add logging to the console when we find unknown stuff.

Why can't we do this step already in FF 71 and 70.x? I think we need to do something in 70.x beyond disabling LSNG to fix the affected users' IndexedDB & Cache.

In general, ignoring unknown files in some directories can have side effects like wrong usage calculation and broken quota tracking, I wouldn't recommend that for 70.x
On the other hand, if we feel confident about some fixes we can uplift them to 71 beta, I agree.

Just for your information, those 86 000 users (or even more according to the Activity Stream telemetry, so let's say 100K users) with broken storage, that's not something that is a problem in 70 only, it goes much further to the past. At least to the 68 release, telemetry data for older versions doesn't exist anymore or I just can't see it in Activity Stream query. So while LSNG is disabled, the situation shouldn't be worse than it was before 70. As we stated in the LSNG status document, this wasn't so visible in previous versions because most of release users probably don't mind broken IndexedDB, DOM Cache, Activity Stream, etc. On the other hand, broken LocalStorage is a big problem, so LocalStorage is still essential for the web.
100K users out of 250M, that's 0.04% I don't say we shouldn't try to fix their storage, but the number looks a bit different when you compare it with total number of users.

I also want to repeat that something else may be causing storage initialization failures as I stated in bug 1482662 comment 6.

I would rather think twice before doing something and asking for uplifts. I don't want to be responsible for new "accidents". I think we should calm down a bit and get back to work and provide incremental fixes that will reduce storage initialization failures towards fast and reliable Firefox storage.

Assignee: nobody → ttung
Status: NEW → ASSIGNED

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=db1df9c37a827e9a75cc0719ba08d78c945d772c

Assuming we want to separate the bugs for ignoring the unexpected files in origin directories, unexpected folders (clients) in origin directories, and unexpected files in repositories.

See Also: → 1594075
See Also: → 1594080
Attachment #9106268 - Attachment description: Bug 1592204 - P1 - Ignore the unexpected files in origin directories excluding unknown client directories; → Bug 1592204 - P1 - Ignore the unexpected files in origin directories;
Blocks: 1595445
Blocks: 1595447
Blocks: 1595448
Attachment #9106268 - Attachment description: Bug 1592204 - P1 - Ignore the unexpected files in origin directories; → Bug 1592204 - P1 - Ignore the unknown files in origin directories;
Attachment #9106609 - Attachment description: Bug 1592204 - P2 - Ignore the unexpected files in repositories; → Bug 1592204 - P2 - Ignore the unknown files in repositories;
Pushed by ttung@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6786e02d337
P1 - Ignore the unknown files in origin directories; r=asuth,janv
https://hg.mozilla.org/integration/autoland/rev/7943aaba3362
P2 - Ignore the unknown files in repositories; r=janv
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Blocks: 1619893
Blocks: 1619895
Summary: Ignore unexpected files during storage initialization → Ignore unknown files during repository and origin initialization
No longer blocks: 1482662, 1595445, 1595447, 1595448
Summary: Ignore unknown files during repository and origin initialization → Ignore unknown files in repositories and origin directories during temporary storage initialization
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: