Closed Bug 1014394 Opened 6 years ago Closed 6 years ago

Stale cache2.Trash files in local profile dir

Categories

(Core :: Networking: Cache, defect)

x86_64
All
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla32

People

(Reporter: gcp, Assigned: mayhemer)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

In the Local profile directory on my Windows machine that's running Nightly, I have a whole pile of

cache2.TrashXXXX

directories. Many of these several days old. The directories are empty.

It looks like something isn't cleaning up correctly.
Any special preferences mainly around the HTTP cache?  You can find out in about:support.
Flags: needinfo?(gpascutto)
browser.cache.disk.capacity	358400
browser.cache.disk.smart_size_cached_value	358400
browser.cache.disk.smart_size.first_run	false
browser.cache.disk.smart_size.use_old_max	false
browser.cache.frecency_experiment	2

This happens on 2/2 Windows machines I have a Nightly on. On the second machine, one of the directories actually has 40M of data in it.

I'm going to have a look at Android right now and potentially start crying.
Flags: needinfo?(gpascutto)
Michal, any idea what can cause this?
Flags: needinfo?(michal.novotny)
Android looks OK, but this is my Linux box:

morbo@mozwell:~/hg/mozilla-central$ ll ~/.cache/mozilla/firefox/kapchq0q.default/
total 60
drwxr-xr-x 18 morbo morbo 4096 Apr 17 08:10 Cache
drwx------  4 morbo morbo 4096 May  5 12:51 cache2.Trash1443559524
drwx------  4 morbo morbo 4096 May  7 12:03 cache2.Trash1546745077
drwx------  4 morbo morbo 4096 Apr 14 13:24 cache2.Trash1841779727
drwx------  4 morbo morbo 4096 Apr 14 13:55 cache2.Trash2122832036
drwx------  4 morbo morbo 4096 Mar 31 11:33 cache2.Trash276789846
drwx------  4 morbo morbo 4096 Mar 31 11:33 cache2.Trash352141461
drwx------  4 morbo morbo 4096 Apr 14 14:11 cache2.Trash586554833
drwx------  4 morbo morbo 4096 Apr 14 10:59 cache2.Trash630220573
drwx------  4 morbo morbo 4096 Apr 14 12:01 cache2.Trash916387910
-rw-------  1 morbo morbo    1 May 23 17:36 _CACHE_CLEAN_
drwx------ 18 morbo morbo 4096 May 20 16:52 OfflineCache
drwxr-xr-x  2 morbo morbo 4096 May 23 17:11 safebrowsing
drwxr-xr-x  2 morbo morbo 4096 May 20 11:34 startupCache
drwxr-xr-x  2 morbo morbo 4096 May 23 16:49 thumbnails
OS: Windows 7 → All
Could you please provide a NSPR log (cache2:5) from a session that runs more than 60 seconds (we start removing old trash directories after 60 seconds)?
Flags: needinfo?(michal.novotny)
Attached file cachelog.txt
Unless I'm misreading the cache2 code, in CacheFileIOManager::FindTrashDirToRemove it's only checking for directories named trashxxxxx not cache2.Trashxxxxxx.

Did someone rename the directory patterns and then not clean up after himself?
So the problem is in nsDeleteDir. This explains why Android isn't affected: there the directories are moved to the Android Cache, which is cleaned by Android itself.
Could it be a problem with CacheObserver::SchduleAutoDelete() which calls CacheStorageService::WipeCacheDirectory() which uses nsDeleteDir to delete the cache directory?
Flags: needinfo?(honzab.moz)
So looking in nsDeleteDir, we seem to be missing RemoveOldTrashes. Looking for callers of that, there's only one in nsDiskCacheDevice::Init().

At a guess, when cache2 replaced cache, nobody added the corresponding call to nsDeleteDir::RemoveOldTrashes()?
nsDeleteDir is written so that nsDeleteDir::RemoveOldTrashes() cannot be called several times. And since the old cache calls it already with argument ".../Cache", we cannot call it again from the new cache with "../cache2". I.e. nsDeleteDir (as is) must be used only to delete the old cache directory.
Gian, are you using the new cache?  What is the setting of "browser.cache.auto_delete_cache_version" pref?

Michal, why cannot nsDeleteDir be called more then once?  I hear this the first time.  As I see the code, there is no base dir or something, but maybe I read it wrong?
Flags: needinfo?(honzab.moz)
>Gian, are you using the new cache?  What is the setting of "browser.cache.auto_delete_cache_version" pref?

Whatever the default in Nightly is.

Seems to be "browser.cache.auto_delete_cache_version=1" on all machines.
According CacheObserver::SchduleAutoDelete() this should do just nothing for current default prefs.  browser.cache.auto_delete_cache_version=1 means to delete the cache2 dir automatically.  But if you are using the new cache (as you say "Whatever the default in Nightly is.") then cache2 dirs should not be trashed.  The code is pretty simpler for this.

Gian, what are the values of browser.cache.use_new_backend and browser.cache.use_new_backend_temp prefs?
(In reply to Honza Bambas (:mayhemer) from comment #14)
> browser.cache.auto_delete_cache_version=1 means to
> delete the cache2 dir automatically.  

...when you are using the old cache.
browser.cache.use_new_backend;0
browser.cache.use_new_backend_temp;true
I'm getting lost :)  Have to think.  Would you be, in case, willing to run a try build with some more logs added to see where from the trashing gets invoked?

BTW, just to confirm, a new cache2.trashXXXXX dir is created on *every* start?  Or at least after some 30 or 60 seconds after the start?

I'm checking the code in a debugger locally and it behaves with the default pref as expected.  No trash dirs..
>Would you be, in case, willing to run a try build with some more logs added to see where from the 
>trashing gets invoked?
>BTW, just to confirm, a new cache2.trashXXXXX dir is created on *every* start?  Or at least after 
>some 30 or 60 seconds after the start?

No, I just see a bunch of these in the profiles of pretty much every Firefox install I have, but they are not created on every start or I'd have a few hundred instead of tens.

This means that a logging build may not do much as I can't reproduce whatever is producing it on demand.

The most recent one seems to date to 15th of May btw.

Could switching to a release/aurora/beta Firefox build on the same profile trigger it? That does happen.
http://hg.mozilla.org/mozilla-central/rev/53c7d68128b9

Using a Firefox that crosses this commit would result in a cache2.Trashxxxx, that only gets deleted after a certain delay, right? Looking at the file dates in comment 4, and my bugmail, I probably did a bisection that day (14th of April). The ones on my Windows machine also look like they might correspond to bisection-like bughunting dates.

If this theory is correct impact for regular users should be minimal.
(In reply to Honza Bambas (:mayhemer) from comment #12)
> Michal, why cannot nsDeleteDir be called more then once?  I hear this the
> first time.  As I see the code, there is no base dir or something, but maybe
> I read it wrong?

nsDeleteDir::DeleteDir() can be called several times, but nsDeleteDir::RemoveOldTrashes() checks whether it was already called (static variable firstRun) and does nothing if it is not a first call. RemoveOldTrashes() is called from nsDiskCacheDevice::Init() which could be called more times in a specific cases, so the firstRun check ensures that we won't try to remove trash directory that is already posted. This could be fixed, but there was simply no need for an additional logic before...

So, right now only RemoveOldTrashes(".../cache") is called and if you use nsDeleteDir::DeleteDir() to remove other directory than the old cache directory (let's say you'll delete "Cache2"), the timer with trash "Cache2.Trashxxxx" is posted. If the timer is run, then the trash is deleted successfully, but if the browser is terminated before the trash is deleted by the timer, nobody is going delete it since RemoveOldTrashes() will not find it on the next start.

Did I describe it understandably?
Aha, got it, thanks.  Hence we should allow multiple calls to nsDeleteDir::RemoveOldTrashes and call it from... somewhere, probably CacheObserver after session restore.

I'll create a patch.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attached patch v1 (obsolete) — Splinter Review
Attachment #8428241 - Flags: review?(michal.novotny)
By looking at dates from comment 4, apparently those are dates close to ending of the the two trial periods.  So there is no current problem in the auto-delete functionality except when Fx is closed sooner then in 30 secs after the start, renamed dirs are not deleted.  The patch is going to fix it.
Comment on attachment 8428241 [details] [diff] [review]
v1

Review of attachment 8428241 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsDeleteDir.h
@@ +53,5 @@
>    /**
>     * Remove all trashes left from previous run. This function does nothing when
>     * called second and more times.
>     */
> +  static nsresult RemoveOldTrashes(nsIFile *cacheDir, bool &firstRun);

You've moved firstRun variable to the caller, so it is now checked per cacheDir. I don't see a reason to pass the argument to this method. Instead, check the variable locally and simply don't call RemoveOldTrashes() when firstRun == false.

::: netwerk/cache2/CacheObserver.cpp
@@ +258,5 @@
> +  if (!cacheDir)
> +    return;
> +
> +  // Schedule delete of any previously undeleted cache2.Trash* dirs.
> +  nsDeleteDir::Init();

nsDeleteDir's lifetime is strictly bound to nsCacheService. It is initialized in nsCacheService::Init() and shut down in nsCacheService::Shutdown(). Initialize nsCacheService to make sure nsDeleteDir is initialized.
Attachment #8428241 - Flags: review?(michal.novotny) → feedback+
(In reply to Michal Novotny (:michal) from comment #24)
> Comment on attachment 8428241 [details] [diff] [review]
> v1
> 
> Review of attachment 8428241 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/cache/nsDeleteDir.h
> @@ +53,5 @@
> >    /**
> >     * Remove all trashes left from previous run. This function does nothing when
> >     * called second and more times.
> >     */
> > +  static nsresult RemoveOldTrashes(nsIFile *cacheDir, bool &firstRun);
> 
> You've moved firstRun variable to the caller, so it is now checked per
> cacheDir. I don't see a reason to pass the argument to this method. Instead,
> check the variable locally and simply don't call RemoveOldTrashes() when
> firstRun == false.

The patch is smaller and code more elegant the way I do it :)

> 
> ::: netwerk/cache2/CacheObserver.cpp
> @@ +258,5 @@
> > +  if (!cacheDir)
> > +    return;
> > +
> > +  // Schedule delete of any previously undeleted cache2.Trash* dirs.
> > +  nsDeleteDir::Init();
> 
> nsDeleteDir's lifetime is strictly bound to nsCacheService. 

How strictly?  And why?

> It is
> initialized in nsCacheService::Init() and shut down in
> nsCacheService::Shutdown(). Initialize nsCacheService to make sure
> nsDeleteDir is initialized.

Oh, really?  cache2 should init the old cache just to make sure cache2 *rare* trashes are gone?  You cannot be serious :)  That would just be an unnecessary perf startup issue and /Cache would be recreted on every startup.  We should find a way to detach them or just not use nsDeleteDir...  what would be a shame.
I see few problems here:
- we never delete old Cache.TrashXXXX folders any more
- we don't correctly shut the deleter down
- when user changes the setting to delete cache after shutdown, we will not delete cache1

For all of them I have a solution w/o starting the cache service during startup time.

I don't see any strict bondage with the old cache that could not be released.  Old cache startup is (however small) time and resource usage regression which I really want to avoid when not necessary.  Michal, can you please enlighten me why we have to start the old cache when only the new cache is on?


For deleting the old cache trashes, we can easily start the deleter for old cache trashes as well in cache2 (to grab them both).

For shutdown, two options: start the cache service in "profile-change-teardown" that is called before "profile-before-change" and indeed "xpcom-shutdown".  This way the deleter will be initialized in the old cache and soon after that correctly shutdown + respecting any preferences for shutdown cleanup.
Flags: needinfo?(michal.novotny)
Attached patch v2 (obsolete) — Splinter Review
- cache2 now takes care of both cache1/2 pending trashes deletion
- cache1 pending trashes deletion scheduling removed from cache1 code (now only cache2 handles it)
- fixes other problem: not deleting cache1 pending trashes until disk device has been started
- cache1 service initialized because we need to get the correct cache1 dir ; happens in "sessionstore-windows-restored" ; still, the IO thread wastes resources and also its start blocks the main thread unnecessarily ; originally I had a plan to instantiate it before shutdown so that the deleter would be shut correctly down and do the proper sanitization job correctly

Anyway, all of this will go away with the old code, so no need to worry that much.
Attachment #8428241 - Attachment is obsolete: true
Attachment #8429466 - Flags: review?(michal.novotny)
Flags: needinfo?(michal.novotny)
Blocks: 961024
Comment on attachment 8429466 [details] [diff] [review]
v2

The patch has one bug disadvantage: does a lot of main thread IO...  CleaupCacheDirectories must happen off the main thread.

One more note - we will need this code to remove also any remnants of cache2 after we move the mobile cache in bug 961024.
Comment on attachment 8429466 [details] [diff] [review]
v2

Review of attachment 8429466 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Honza Bambas (:mayhemer) from comment #26)
> Michal, can you please enlighten me why we have to start the old cache when
> only the new cache is on?

nsDeleteDir must be somewhere shut down and as you already noted, this was missing in the first patch. And you cannot simply add a nsDeleteDir::Shutdown() to the cache2 code and leave it also in nsCacheService since even if the old cache is disabled nsCacheService would be initialized when the appCache is used. So yes, we could init/shutdown the nsDeleteDir somewhere else, but we would need to separate it from nsCacheService.

::: netwerk/cache2/CacheStorageService.cpp
@@ +546,5 @@
> +  nsCOMPtr<nsIFile> cache1Dir, cache2Dir;
> +  nsCacheService::GetDiskCacheDirectory(getter_AddRefs(cache1Dir));
> +  CacheFileIOManager::GetCacheDirectory(getter_AddRefs(cache2Dir));
> +
> +  // Make sure we schedule just once in case WipeCacheDirectory gets called

WipeCacheDirectory -> CleaupCacheDirectories
Attachment #8429466 - Flags: review?(michal.novotny) → review+
Attached patch v2.1 [to land]Splinter Review
- added NS_WARNING when RemoveOldTrashes failed, actually gives a usage to the runOnce static so that we don't build warn on linux
Attachment #8429466 - Attachment is obsolete: true
Attachment #8430373 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/4d081b291d1b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Depends on: 1017636
No longer depends on: 1017636
Gian-Carlo, can you please verify this is fixed?  Thanks.
Flags: needinfo?(gpascutto)
Looks good on Linux and Windows.
Flags: needinfo?(gpascutto)
..based on comment 34.  Thanks!
Status: RESOLVED → VERIFIED
This was either not fixed in Firefox 31 ESR or not fixed correctly.  I am using ESR 31.3.0 under RHEL 6 and the browser is creating and leaving one ~/.cache/mozilla/firefox/default/cache2.Trash* directory each time the browser is launched and closed.  When the browser is next started, it waits 60 seconds and then deletes ONLY THE VERY LAST cache2.Trash directory.  So if you start the browser, then close it in under 60 seconds, then start it again, and close it, etc, you will have a slew of cache2.Trash* directories that are never deleted.  On a multiuser system where users start and close Firefox for looking at calendars and utilities and such, you could have thousands of these directories accumulating every week that are never, ever deleted.

For now, I was forced to introduce a weekly cron script to purge them.
This is fixed in Firefox 32. You would need to use the current release, Firefox 35, patch this locally & build Firefox ESR 31 or wait for ESR 38.
(In reply to Kevin Brosnan from comment #37)
> This is fixed in Firefox 32. You would need to use the current release,
> Firefox 35, patch this locally & build Firefox ESR 31 or wait for ESR 38.

I figured this was serious enough that backporting a fix to the current ESR would be in order.  Maybe not.
"Maintenance of each ESR, through point releases, is limited to high-risk/high-impact security vulnerabilities and in rare cases may also include off-schedule releases that address live security vulnerabilities. Backports of any functional enhancements and/or stability fixes are not in scope."

If you want these, don't use ESR.
You need to log in before you can comment on or make changes to this bug.