Closed Bug 1714573 Opened 6 months ago Closed 2 months ago

Reduce the (theoretical) risk of races during profile deletion

Categories

(Toolkit :: Startup and Profile System, defect)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jstutte, Assigned: jstutte)

References

Details

Attachments

(1 file)

RemoveProfileFiles unlocks the profile directory before deletion of all files.

Assignee: nobody → jstutte
Attachment #9225195 - Attachment description: WIP: Bug 1714573: Ensure profile deletion takes place before the profile is unlocked. → Bug 1714573: Ensure profile deletion takes place before the profile is unlocked. r?#dom-storage-reviewers,mossop!
Status: NEW → ASSIGNED
Depends on: 1715738
No longer depends on: 1715738
See Also: → 1715738
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eaa1eb96adbe
Ensure profile deletion takes place before the profile is unlocked. r=mossop,dom-storage-reviewers,janv
See Also: → 1715742

This patch does now delete the entire content of the profile directory before unlocking. And once we solved bug 1715742, we can be pretty sure to have just an empty profile directory left before unlocking. But there is still a small chance for a race with another starting instance left, as we cannot delete the profile directory itself before unlocking it. I see basically three paths here:

  1. If bug 1715742 is solved, we can downgrade the root directoy's NS_ENSURE_SUCCESS_VOID(rootDir->Remove(true)); to be non-recursive (as it is not needed any more as a fallback). This should reduce the risk of doing harm to another instance that just started to put files there (what would make the removal just fail).
  2. To be 100% sure to not cause any problems, we could just leave the empty directory in place. It feels not very clean, but the overhead on disk is minimal and there is no sensitive data left, anyway.
  3. We could find a different place for the lock-file (outside the profile directory). I assume this to be non-trivial in terms of finding a "good" place and name for it, but I might be wrong.

I would opt for 1.) (and maybe add it as todo to bug 1715742).

Flags: needinfo?(dtownsend)

So I added some WARNING, and I see:

[task 2021-06-10T20:20:33.689Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.689Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: data.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.691Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.691Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: lock.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.692Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x80520014 (NS_ERROR_FILE_DIR_NOT_EMPTY): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.692Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: extension-store: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.706Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.706Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: data.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.707Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.707Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: lock.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.708Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x80520014 (NS_ERROR_FILE_DIR_NOT_EMPTY): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.708Z] 20:20:33     INFO -  [Parent 3216, StreamTrans #2] WARNING: xulstore: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98

that browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefresh.testFxANoSync ends up to try to delete the profile with actively locked files. I must assume, that this was always the case and we just see this now thanks to the assertion we added.

I do not know if the test has a flaw, but given that also browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefresh.testResetEverything fails in the same way, I believe that we have a general problem with profiles that cannot be removed under Windows here.

lock.mdb and data.mdb appear in several places together. My gut feeling would point to rkv, though.

Flags: needinfo?(jstutte)

The directory that cannot be removed (probably due to lock.mdb and data.mdb) is xulstore. xulstore is based on rkv in the new implementation. I was not yet able to see, if rkv is used there in SafeMode or not, but it seems that we shutdown xulstore only during real FF shutdown at a very late stage. So I would not be surprised to see it holding locks here (assuming that we arrive around some "profile-before-change" step here and are not shutting down at all).

Edit: My guess would be that xulstore should have a listener for "profile-before-change" that makes it flush, close and release files. At least I could not find any path that ensures this. But adding this for a try at the right place inside this combination of .jsm, .rs and .cpp xulstores goes beyond my capabilities, I fear.

Myk, can you help me to shed some light on this?

Flags: needinfo?(myk)

It's been a while since I've looked at the xulstore code, so I don't recall quite how it works anymore, although I'm a bit surprised to hear that it loads in safe mode. Regardless, it should be possible for it to listen for profile-before-change and flush/close/release the files. But I'll have to dig into it some more before I can make a good recommendation for the best place to do that.

(In reply to Myk Melez [:myk] [@mykmelez] from comment #7)

It's been a while since I've looked at the xulstore code, so I don't recall quite how it works anymore, although I'm a bit surprised to hear that it loads in safe mode.

According to get_database(), this seems to be the case since bug 1654192 landed. Still my mental model for rkv tells me that it should not use ldbm in safe mode at all. So I am still puzzled why we see those lock.mdb and data.mdb files at all, but I might ignore how it really works.
In any case, that bug later introduced also the singleton Manager = rkv::Manager<SafeModeEnvironment>; which might contribute to rkv staying alive and keeping files locked longer than expected, too?

Regardless, it should be possible for it to listen for profile-before-change and flush/close/release the files. But I'll have to dig into it some more before I can make a good recommendation for the best place to do that.

Thanks, looking forward for any help! I might spin-off a separate bug for this, though. It feels like it might become more than just a one-liner to me.

Depends on: 1716291
Flags: needinfo?(myk)

While waiting for bug 1716291, I'll try to adjust the patch to make its best effort to delete anything it can but without asserting.

Keywords: leave-open
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9d4405036f08
Ensure profile deletion takes place before the profile is unlocked. r=mossop,dom-storage-reviewers,janv
See Also: → 1716966

(In reply to Jens Stutte [:jstutte] from comment #3)

This patch does now delete the entire content of the profile directory before unlocking. And once we solved bug 1715742, we can be pretty sure to have just an empty profile directory left before unlocking. But there is still a small chance for a race with another starting instance left, as we cannot delete the profile directory itself before unlocking it. I see basically three paths here:

  1. If bug 1715742 is solved, we can downgrade the root directoy's NS_ENSURE_SUCCESS_VOID(rootDir->Remove(true)); to be non-recursive (as it is not needed any more as a fallback). This should reduce the risk of doing harm to another instance that just started to put files there (what would make the removal just fail).
  2. To be 100% sure to not cause any problems, we could just leave the empty directory in place. It feels not very clean, but the overhead on disk is minimal and there is no sensitive data left, anyway.
  3. We could find a different place for the lock-file (outside the profile directory). I assume this to be non-trivial in terms of finding a "good" place and name for it, but I might be wrong.

Actually thinking about this I think we have a reasonably straightforward way to handle this. As part of the remoting service (toolkit/components/remote) we create a lock in a place outside of the profile (https://searchfox.org/mozilla-central/rev/9975889f5c0d5c59bd22121a454beba774cbae71/toolkit/components/remote/nsRemoteService.cpp#62). This allows us to safely check if another Firefox instance is already running without racing with another instance starting up. Crucially we select the user profile and lock it while this startup lock is still active, only unlocking later. If we were to re-acquire this startup lock then it would block any other Firefox instance from starting and selecting a profile. So we could acquire it, unlock the profile and delete the folder then unlock. The only slight edge case is that if a user runs Firefox with the -no-remote command line argument then we do not use this lock at all. That flag is practically useless at this point so I wonder if it should just be removed.

Flags: needinfo?(dtownsend)
See Also: → 1702595

(In reply to Dave Townsend [:mossop] from comment #12)

Actually thinking about this I think we have a reasonably straightforward way to handle this. As part of the remoting service (toolkit/components/remote) we create a lock in a place outside of the profile (https://searchfox.org/mozilla-central/rev/9975889f5c0d5c59bd22121a454beba774cbae71/toolkit/components/remote/nsRemoteService.cpp#62). This allows us to safely check if another Firefox instance is already running without racing with another instance starting up. Crucially we select the user profile and lock it while this startup lock is still active, only unlocking later. If we were to re-acquire this startup lock then it would block any other Firefox instance from starting and selecting a profile. So we could acquire it, unlock the profile and delete the folder then unlock. The only slight edge case is that if a user runs Firefox with the -no-remote command line argument then we do not use this lock at all. That flag is practically useless at this point so I wonder if it should just be removed.

Hi Dave, do you think we need a further follow-up bug for this? To my feeling, the scope of this bug is sufficiently covered now that bug 1716291 (or better bug 1546838) landed and we have some other bugs on file (bug 1715738, bug 1715742 and bug 1716966) to further improve the reporting of problems here.

Flags: needinfo?(dtownsend)

I think it is worth filing the bug and closing this one, no urgency to implement though.

Flags: needinfo?(dtownsend)
See Also: → 1734849
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.