Closed Bug 1705676 Opened 3 years ago Closed 2 years ago

Investigate purging HTTP disk cache using backgroundtasks (out-of-process)

Categories

(Core :: Networking: Cache, task, P2)

Unspecified
Windows
task

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: valentin, Assigned: valentin)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(5 files)

We have quite a few shutdown hang bugs caused by slow IO (on Windows) when clearing the cache folder at shutdown. This is more common for users who opted to purge the cache on shutdown.

Seeing how we now have a backgroundtasks process that can handle out-of-process work, it would be nicer if instead of waiting for each individual file to be removed we could let the backgroundtasks process do it.
Going forward we have two options:

  • the main process renames the folder, passes the name to the background tasks, which then removes it. Here I need to investigate if renaming the folder has a better performance than removing each file.
  • the main process changes the path it has for the cache folder, and passes the initial name to the background process. The benefit here is that we don't need to change anything about that folder.

Also something we need to consider - how do we deal with failures? Removing the cache folder is somewhat privacy sensitive, but even if it weren't, it would suck to have multiple cache folders sitting around wasting disk space.

This is an interesting idea; in spirit, it's similar to the way that we launch the pingsender binary as the process exits: it provides performance and robustness during shutdown. Do we already use a threadpool for deletion on Windows? Former Mozilla employee :gps talks about this in his Surprisingly Slow blog post; we probably already do it, but maybe worth talking about.

There are some limitations built into the current background task implementation that impact how tasks like this are structured, especially around temporary profiles; some docs are at https://firefox-source-docs.mozilla.org/toolkit/components/backgroundtasks/index.html. It would be good for us to sketch the design and tradeoffs together if we're serious about doing this.

Fascinating idea!

This is a PoC that dispatches a background task to completely delete the entire cache2 folder instead of individually deleting each file at shutdown.
There are still a bunch of TODOs left to figure out, but I'd appreciate some early feedback for this approach.

Attachment #9242439 - Attachment description: Bug 1705676 - Purge HTTP disk cache using backgroundtasks (out-of-process) r=nalexander → WIP: Bug 1705676 - Purge HTTP disk cache using backgroundtasks (out-of-process) r=nalexander
Attachment #9242439 - Attachment description: WIP: Bug 1705676 - Purge HTTP disk cache using backgroundtasks (out-of-process) r=nalexander → Bug 1705676 - Purge HTTP disk cache using backgroundtasks (out-of-process) r=nalexander

This patch uses a CachePurgeLock for every directory it deletes.
Because we can't possibly know if a folder was already deleted before we
aquire the lock or if it never existed (without the use of a log file) we
can assume that if a folder isn't there after the timeout has expired it's
already been purged by a different cleanup task.

Depends on D132164

valentin: sorry for the delay here. I did get the prerequisite ticket landed, but not on the time frame you were pushing for. Sorry for the delay. Talk when you're back from PTO.

(In reply to Nick Alexander :nalexander [he/him] from comment #6)

valentin: sorry for the delay here. I did get the prerequisite ticket landed, but not on the time frame you were pushing for. Sorry for the delay. Talk when you're back from PTO.

No worries. Hopefully we can land this in this release and enable it on nightly.

Hi Nick, let me know if I should do anything to help with the reviews. Thanks!

Flags: needinfo?(nalexander)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #8)

Hi Nick, let me know if I should do anything to help with the reviews. Thanks!

Sorry for the slow replies. This is on my list, will update by EoW.

Flags: needinfo?(nalexander)

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:valentin, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(dd.mozilla)

The plan is that I will land this after some prerequisites land. NI to me.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(nalexander)
Flags: needinfo?(dd.mozilla)

Assuming there are no other blockers, I'm going to rebase the patches and see if we have a green try run.
Hopefully we can land them this week.

Flags: needinfo?(nalexander)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #12)

Assuming there are no other blockers, I'm going to rebase the patches and see if we have a green try run.
Hopefully we can land them this week.

Valentin -- I'm so sorry for not getting this across the line. There is, of course, a story, but it's not particularly interesting. I found some small issues, that I addressed months ago locally. Rebased try builds are here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=196a4e01863f20230f01b1fc5c499f9cdb22606d
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d71152cecf4b64e1f05714ee9162c45d2ef8faab

Please take the last patch (marked TO FOLD) and merge it into your changes. (I also squashed some commits that do not stand alone in my opinion.) Thanks for taking this back and sorry it's not already done!

Flags: needinfo?(valentin.gosu)

No worries. I'm sure this wasn't the most important thing for you to work on.
Thanks for tracking down the path issue, I was just getting around to investigating it, you probably saved me a few hours 🙂.

Flags: needinfo?(valentin.gosu)

The failures in this Windows TV job, are, I think, because this this line needs a .catch() to ignore errors. It can happen that the process exits before we can close stdin, IIRC.

Not sure what the other errors might be. Bump MOZ_LOG=BackgroundTasks:5 to get more information about potential issues choosing ephemeral profile directories, which might race (still!).

Hi Valentin,

This sounds like a great thing and I think this should be used for anything that does cleanup on shutdown. I'd be happy if I can reuse your work for my component, do you think there is anything on the current patch that blocking reuse?

Despite the name (purgeHTTPCache and CachePurgeLock), it looks like the implementation is already quite generic as the actual behavior doesn't seem to assert cacheDirName is a cache directory, so I think giving a different dir name would do the work. Am I understanding correctly here?

Flags: needinfo?(valentin.gosu)

(In reply to Kagami :saschanaz from comment #17)

This sounds like a great thing and I think this should be used for anything that does cleanup on shutdown. I'd be happy if I can reuse your work for my component, do you think there is anything on the current patch that blocking reuse?

I don't think there are major things blocking reuse. But we should be careful with having many shutdown tasks going at the same time, as they could cause unwanted interactions.

Despite the name (purgeHTTPCache and CachePurgeLock), it looks like the implementation is already quite generic as the actual behavior doesn't seem to assert cacheDirName is a cache directory, so I think giving a different dir name would do the work. Am I understanding correctly here?

Yes, it could potentially be used for other things as well, so I'm not against renaming it. I don't especially like the interface of CachePurgeLock, as you need to manually unlock it all the time, and if you throw or return before that you'll have a bad time 🙁. I plan to change the interface a bit in the future.

Flags: needinfo?(valentin.gosu)
Blocks: 1784840
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/05f442b8dc10
Purge HTTP disk cache using backgroundtasks (out-of-process) r=nalexander,necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/a7467de0421b
Move GetNormalizedAppFile function to MultiInstanceLock header r=nalexander
https://hg.mozilla.org/integration/autoland/rev/a3efb692ab3f
Add CachePurgeLock to lock access to the purged cache directory r=nalexander,necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/6fc586712802
Use CachePurgeLock to prevent multiple background tasks deleting the same folder. r=nalexander,necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/cad6e6b08c7b
Ignore errors when closing stdin in BackgroundTasksTestUtils r=nalexander
Blocks: 1786256
Regressions: 1785018
Regressions: 1787927
No longer depends on: 1791675
Regressions: 1806831
Depends on: 1848542
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: