Reduce the (theoretical) risk of races during profile deletion
Categories
(Toolkit :: Startup and Profile System, defect)
Tracking
()
People
(Reporter: jstutte, Assigned: jstutte)
References
Details
Attachments
(1 file)
RemoveProfileFiles
unlocks the profile directory before deletion of all files.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
•
|
||
This patch does now delete the entire content of the profile directory before unlocking. And once we solved bug 1715742, we can be pretty sure to have just an empty profile directory left before unlocking. But there is still a small chance for a race with another starting instance left, as we cannot delete the profile directory itself before unlocking it. I see basically three paths here:
- If bug 1715742 is solved, we can downgrade the root directoy's
NS_ENSURE_SUCCESS_VOID(rootDir->Remove(true));
to be non-recursive (as it is not needed any more as a fallback). This should reduce the risk of doing harm to another instance that just started to put files there (what would make the removal just fail). - To be 100% sure to not cause any problems, we could just leave the empty directory in place. It feels not very clean, but the overhead on disk is minimal and there is no sensitive data left, anyway.
- We could find a different place for the lock-file (outside the profile directory). I assume this to be non-trivial in terms of finding a "good" place and name for it, but I might be wrong.
I would opt for 1.) (and maybe add it as todo to bug 1715742).
Comment 4•4 years ago
|
||
Backed out changeset eaa1eb96adbe (Bug 1714573) for causing marionette failures in test_refresh_firefox.py.
https://hg.mozilla.org/integration/autoland/rev/f53a6f8a9e2d90fe72410f1b37861743087b9166
Push with failures:
https://treeherder.mozilla.org/jobs?repo=autoland&revision=eaa1eb96adbef171682d6b45dbaa25ba750fbba9&selectedTaskRun=XHgDhJhXTTeRpClclG3uhw.0
Failure log:
https://treeherder.mozilla.org/logviewer?job_id=342380649&repo=autoland&lineNumber=34038
Assignee | ||
Comment 5•4 years ago
|
||
So I added some WARNING, and I see:
[task 2021-06-10T20:20:33.689Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.689Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: data.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.691Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.691Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: lock.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.692Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x80520014 (NS_ERROR_FILE_DIR_NOT_EMPTY): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.692Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: extension-store: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.706Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.706Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: data.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.707Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x8052000E (NS_ERROR_FILE_IS_LOCKED): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.707Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: lock.mdb: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
[task 2021-06-10T20:20:33.708Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: NS_ENSURE_SUCCESS_VOID(aDirectoryOrFile->Remove(false)) failed with result 0x80520014 (NS_ERROR_FILE_DIR_NOT_EMPTY): file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:133
[task 2021-06-10T20:20:33.708Z] 20:20:33 INFO - [Parent 3216, StreamTrans #2] WARNING: xulstore: file /builds/worker/checkouts/gecko/toolkit/profile/nsToolkitProfileService.cpp:98
that browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefresh.testFxANoSync
ends up to try to delete the profile with actively locked files. I must assume, that this was always the case and we just see this now thanks to the assertion we added.
I do not know if the test has a flaw, but given that also browser/components/migration/tests/marionette/test_refresh_firefox.py TestFirefoxRefresh.testResetEverything
fails in the same way, I believe that we have a general problem with profiles that cannot be removed under Windows here.
lock.mdb
and data.mdb
appear in several places together. My gut feeling would point to rkv
, though.
Assignee | ||
Comment 6•4 years ago
•
|
||
The directory that cannot be removed (probably due to lock.mdb
and data.mdb
) is xulstore. xulstore is based on rkv in the new implementation. I was not yet able to see, if rkv is used there in SafeMode or not, but it seems that we shutdown xulstore only during real FF shutdown at a very late stage. So I would not be surprised to see it holding locks here (assuming that we arrive around some "profile-before-change" step here and are not shutting down at all).
Edit: My guess would be that xulstore should have a listener for "profile-before-change"
that makes it flush, close and release files. At least I could not find any path that ensures this. But adding this for a try at the right place inside this combination of .jsm
, .rs
and .cpp
xulstores goes beyond my capabilities, I fear.
Myk, can you help me to shed some light on this?
Comment 7•4 years ago
|
||
It's been a while since I've looked at the xulstore code, so I don't recall quite how it works anymore, although I'm a bit surprised to hear that it loads in safe mode. Regardless, it should be possible for it to listen for profile-before-change
and flush/close/release the files. But I'll have to dig into it some more before I can make a good recommendation for the best place to do that.
Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #7)
It's been a while since I've looked at the xulstore code, so I don't recall quite how it works anymore, although I'm a bit surprised to hear that it loads in safe mode.
According to get_database()
, this seems to be the case since bug 1654192 landed. Still my mental model for rkv tells me that it should not use ldbm in safe mode at all. So I am still puzzled why we see those lock.mdb
and data.mdb
files at all, but I might ignore how it really works.
In any case, that bug later introduced also the singleton Manager = rkv::Manager<SafeModeEnvironment>;
which might contribute to rkv staying alive and keeping files locked longer than expected, too?
Regardless, it should be possible for it to listen for
profile-before-change
and flush/close/release the files. But I'll have to dig into it some more before I can make a good recommendation for the best place to do that.
Thanks, looking forward for any help! I might spin-off a separate bug for this, though. It feels like it might become more than just a one-liner to me.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
While waiting for bug 1716291, I'll try to adjust the patch to make its best effort to delete anything it can but without asserting.
Assignee | ||
Updated•4 years ago
|
Comment 10•4 years ago
|
||
Comment 11•4 years ago
|
||
bugherder |
Comment 12•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
This patch does now delete the entire content of the profile directory before unlocking. And once we solved bug 1715742, we can be pretty sure to have just an empty profile directory left before unlocking. But there is still a small chance for a race with another starting instance left, as we cannot delete the profile directory itself before unlocking it. I see basically three paths here:
- If bug 1715742 is solved, we can downgrade the root directoy's
NS_ENSURE_SUCCESS_VOID(rootDir->Remove(true));
to be non-recursive (as it is not needed any more as a fallback). This should reduce the risk of doing harm to another instance that just started to put files there (what would make the removal just fail).- To be 100% sure to not cause any problems, we could just leave the empty directory in place. It feels not very clean, but the overhead on disk is minimal and there is no sensitive data left, anyway.
- We could find a different place for the lock-file (outside the profile directory). I assume this to be non-trivial in terms of finding a "good" place and name for it, but I might be wrong.
Actually thinking about this I think we have a reasonably straightforward way to handle this. As part of the remoting service (toolkit/components/remote
) we create a lock in a place outside of the profile (https://searchfox.org/mozilla-central/rev/9975889f5c0d5c59bd22121a454beba774cbae71/toolkit/components/remote/nsRemoteService.cpp#62). This allows us to safely check if another Firefox instance is already running without racing with another instance starting up. Crucially we select the user profile and lock it while this startup lock is still active, only unlocking later. If we were to re-acquire this startup lock then it would block any other Firefox instance from starting and selecting a profile. So we could acquire it, unlock the profile and delete the folder then unlock. The only slight edge case is that if a user runs Firefox with the -no-remote
command line argument then we do not use this lock at all. That flag is practically useless at this point so I wonder if it should just be removed.
Assignee | ||
Comment 13•4 years ago
|
||
(In reply to Dave Townsend [:mossop] from comment #12)
Actually thinking about this I think we have a reasonably straightforward way to handle this. As part of the remoting service (
toolkit/components/remote
) we create a lock in a place outside of the profile (https://searchfox.org/mozilla-central/rev/9975889f5c0d5c59bd22121a454beba774cbae71/toolkit/components/remote/nsRemoteService.cpp#62). This allows us to safely check if another Firefox instance is already running without racing with another instance starting up. Crucially we select the user profile and lock it while this startup lock is still active, only unlocking later. If we were to re-acquire this startup lock then it would block any other Firefox instance from starting and selecting a profile. So we could acquire it, unlock the profile and delete the folder then unlock. The only slight edge case is that if a user runs Firefox with the-no-remote
command line argument then we do not use this lock at all. That flag is practically useless at this point so I wonder if it should just be removed.
Hi Dave, do you think we need a further follow-up bug for this? To my feeling, the scope of this bug is sufficiently covered now that bug 1716291 (or better bug 1546838) landed and we have some other bugs on file (bug 1715738, bug 1715742 and bug 1716966) to further improve the reporting of problems here.
Comment 14•4 years ago
|
||
I think it is worth filing the bug and closing this one, no urgency to implement though.
Assignee | ||
Updated•4 years ago
|
Description
•