Closed Bug 1722777 Opened 1 year ago Closed 9 months ago

Background task updater leaves profiles or parts of profiles on disk in TmpD (%APPDATA%/Temp)

Categories

(Toolkit :: Application Update, defect, P1)

Desktop
Windows 10
defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox92 --- wontfix
firefox100 --- fixed

People

(Reporter: Gijs, Assigned: nrishel)

References

Details

(Whiteboard: [fidedi-ope])

Attachments

(7 files, 2 obsolete files)

20.72 KB, image/png
Details
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
Attached image image.png

My temp dir appears to have 18 directories of the form MozillaBackgroundTask-{hash}-backgroundupdate[-N], where N ranges from 1-17.

Some are empty, some have a bunch of subfolders that have more subfolders but no files, some have something like the attachment, some just have 2 NSS/password files (key4+cert9).

All of these are from the last 9-ish days. Confusingly, most of the timestamps on the files seem not to be close to times that I would have shut off the machine, so I'm not sure why they wouldn't have been cleaned up successfully.

We should make sure these files get tidied up.

I'll add another data point here. I currently have 146 of these directories. It's true that I have an unusual number of installations, and I do a fair amount of background update testing though. If we narrow it down to directories with the hash of my primary installation though, there are still 42 directories. Those 42 directories are using 1MB total disk space. The timestamps on those directories range from 2021-04-26 to 2021-07-27. I do see a fair number of timestamps that are very close together, which might be from times when I was trying to reproduce bugs on Nightly.

I just tried running the task manually on my primary installation (while Firefox was still running) and did not see a directory get left behind. I then tried running the task while Firefox was not running and with an update pending and a directory got left behind with contents similar to the screenshot in the description. I then ran it again after the update had been installed, and a directory was left behind with only the key4 and cert9 files.

We have another report of this in the wild: https://twitter.com/ericlaw/status/1431326875687792645. NI to me to see what a periodic cleanup would look like.

Flags: needinfo?(nalexander)

I'm calling this an S4. It can potentially consume some disk space, so we should really fix it. But it shouldn't have much impact on users.

Severity: -- → S4
Whiteboard: [fidedi-ope]

set regressor bug # to tracking

Regressed by: 1734262

(In reply to Alice0775 White from comment #5)

set regressor bug # to tracking

Is that bug really the regressor? This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?

(In reply to :Gijs (he/him) from comment #6)

(In reply to Alice0775 White from comment #5)

set regressor bug # to tracking

Is that bug really the regressor? This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?

It's not. This ticket is exacerbated by using the background task apparatus more aggressively, but it predates converting pingsender.

Flags: needinfo?(nalexander)
No longer regressed by: 1734262

(In reply to :Gijs (he/him) from comment #6)

(In reply to Alice0775 White from comment #5)

set regressor bug # to tracking

Is that bug really the regressor?

Yes, It actually started after bug 1734262 landed.
I checked autoland Build and I got the regression window in Bug 1736910 comment #1.

This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?

So, This bug seems to be different from bug 1736910

Assignee: nobody → nrishel
Priority: -- → P1

Consolidate method implementations into BackgroundTasks.cpp, remove unused headers and replaced with forward declarations where possible, removed redundant mozilla:: namespacing for individual variables.

Keywords: leave-open
Keywords: leave-open
Depends on: 1756675

Comment on attachment 9264899 [details]
Bug 1722777 - Pre: Cleanup BackgroundTask. r=nalexander

Revision D139328 was moved to bug 1756675. Setting attachment 9264899 [details] to obsolete.

Attachment #9264899 - Attachment is obsolete: true

Comment on attachment 9264900 [details]
Bug 1722777 - Pre: Add logging for BackgroundTask temporary profile setup and teardown. r=nalexander

Revision D139329 was moved to bug 1756675. Setting attachment 9264900 [details] to obsolete.

Attachment #9264900 - Attachment is obsolete: true

I won't file a sub-ticket for this just yet, but we probably want to disable fast shutdown mode entirely in background tasks. We can use toolkit.shutdown.fastShutdownStage=0 in backgroundtasks.js to do so. Curiously, it's not enough for me when I hack it into a PGO build, and I don't yet understand why not.

I got a chance to dig into this. :nrishel reported that this reproed much more consistently with PGO builds, which was a clue. Focusing on the case where the deletion fails due to cert9.db and key4.db, this is what is happening. Due to challenges refcounting NSS objects, NSS is shutdown as part of XPCOM shutdown. In the "fast shutdown" mode, that shutdown simply doesn't occur. That means the SQLite DBs aren't closed, and (waves hands) that somehow leaves them blocking the temporary profile directory deletion. Hence, this bug. (Although, not the only cause of this bug.)

Fast shutdown mode is only enabled when !DEBUG and !MOZ_PROFILE_GENERATE, explaining why I don't see this locally. If we were to test against a task that makes network requests in test_backgroundtask_deletes_profile.js, I think we'd witness this problem with opt builds.)

Here's what I propose for this ticket. First, we need to fill the hole we've dug:

  1. We need to raise the high-water mark for the number of temporary profile directories immediately.
  2. We need an approach to clean up existing directories incrementally.
  3. If we can use a Windows-only feature like MoveFileEx to make sure these get cleaned up at shutdown time, great. Perhaps let's only do that if we fail to delete the directory after N attempts (say 2 attempts).

Concurrently, we need to stop digging:

  1. Let's get a test establishing a background tasks profile contents highwater mark: Bug 1679443. This is tricky because what the test task does actually matters in this case; profile contents are created on-demand. But we can at least perform a network request to see the NSS files, say.
  2. Let's disable fast shutdown in background tasks mode, so that we are more likely to have an orderly shutdown. It's not great that shutdowns could take longer and/or hang, but I think it's a risk worth taking: we don't have a human around to witness and correct problems. (This doesn't have a ticket yet but should get one.)
  3. Let's turn of these NSS databases entirely: Bug 1757252. We can easily keep this fixed with the tests added by Bug 1679443.
  4. Let's turn off recording Telemetry in background tasks mode (since it won't be submitted): Bug 1757229. This should squash another large source of this particular problem.
  5. As a bonus, over the weekend I dug into not writing cookie databases to disk and it's pretty straightforward: Bug 1675829. I'll get a patch up.

Since we determined the profile directory limit won't be reached for ~6 years we can skip 1.

If we're going to clean up directories incrementally we can skip 3.; in retrospect clean up on shutdown makes more sense for one off file cleanup.

Pushed by nrishel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/51a7df84c674
Pre: Move "skip windows" XPCShell test config for test_symlinks.js from source to configuration file. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/a8344f5044e6
Pre: Add comment explaining why Windows recursive folder deletion uses neither `SHFileOperation` nor `IFileOperation` APIs. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/b90cab865ce4
Pre: Reuse existing directory iteration function when deleting folder content on Windows. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/00fa6e16e916
Pre: files contained within a folder implicitly must have a longer path; make this explicit for clarity. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/8ea40ee4caf0
Incrementally remove stale temporary profiles from background tasks which match the name of the current background task profile (before adding unique identifier). r=nalexander,mossop
Attachment #9266631 - Attachment description: Bug 1722777 - Pre: Move "skip windows" XPCShell test config for test_symlinks.js from source to configuration file. → Bug 1722777 - Pre: Move "skip windows" XPCShell test config for test_symlinks.js and test_bug476919.js from source to configuration file.
Pushed by nrishel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f8f914b7ee0c
Pre: Move "skip windows" XPCShell test config for test_symlinks.js and test_bug476919.js from source to configuration file. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/08a582756a53
Pre: Add comment explaining why Windows recursive folder deletion uses neither `SHFileOperation` nor `IFileOperation` APIs. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/633d15f38810
Pre: Reuse existing directory iteration function when deleting folder content on Windows. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/566db8d9f4f8
Pre: files contained within a folder implicitly must have a longer path; make this explicit for clarity. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/3eb6132bff96
Incrementally remove stale temporary profiles from background tasks which match the name of the current background task profile (before adding unique identifier). r=nalexander,mossop
https://hg.mozilla.org/integration/autoland/rev/c63edab0c2bb
Test. r=nrishel,bytesized
Pushed by nrishel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c96cf253726
Pre: Move "skip windows" XPCShell test config for test_symlinks.js and test_bug476919.js from source to configuration file. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/9232381871c4
Pre: Add comment explaining why Windows recursive folder deletion uses neither `SHFileOperation` nor `IFileOperation` APIs. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/e39d7cb0170b
Pre: Reuse existing directory iteration function when deleting folder content on Windows. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/bcb41f5b279a
Pre: files contained within a folder implicitly must have a longer path; make this explicit for clarity. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/6c22e1c0fc14
Incrementally remove stale temporary profiles from background tasks which match the name of the current background task profile (before adding unique identifier). r=nalexander,mossop
https://hg.mozilla.org/integration/autoland/rev/20ab625c5204
Test. r=bytesized

Resolved test issues that caused backouts.

Flags: needinfo?(nrishel)
Flags: in-testsuite+
Blocks: 1746983
You need to log in before you can comment on or make changes to this bug.