Last Comment Bug 754575 - Cache.Trash* files fill up disk space
: Cache.Trash* files fill up disk space
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All Android
: -- normal (vote)
: ---
Assigned To: Mark Finkle (:mfinkle) (use needinfo?)
:
Mentors:
: 757385 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-12 06:38 PDT by Christian Holler (:decoder)
Modified: 2012-06-11 20:12 PDT (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed
fixed
+


Attachments
wip patch (8.72 KB, patch)
2012-05-30 16:21 PDT, Geoff Brown [:gbrown] (pto May 28-June 13)
no flags Details | Diff | Review
C code to detect NTFS in case we go ctypes route (708 bytes, text/plain)
2012-06-01 13:58 PDT, Jason Duell [:jduell] (needinfo? me)
no flags Details
alt WIP patch (4.52 KB, patch)
2012-06-06 14:40 PDT, Mark Finkle (:mfinkle) (use needinfo?)
no flags Details | Diff | Review
alt WIP patch 2 (4.77 KB, patch)
2012-06-07 10:29 PDT, Mark Finkle (:mfinkle) (use needinfo?)
no flags Details | Diff | Review
alt patch (3.26 KB, patch)
2012-06-07 14:08 PDT, Mark Finkle (:mfinkle) (use needinfo?)
blassey.bugs: review+
jduell.mcbugs: review-
Details | Diff | Review
alt patch v2 (4.49 KB, patch)
2012-06-08 09:15 PDT, Mark Finkle (:mfinkle) (use needinfo?)
jduell.mcbugs: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Review
just moves NTFS comment to better place in code (1.59 KB, patch)
2012-06-11 10:13 PDT, Jason Duell [:jduell] (needinfo? me)
no flags Details | Diff | Review

Description Christian Holler (:decoder) 2012-05-12 06:38:27 PDT
While fuzzing on mobile (mozilla-central, debug build, rev 448f554f6acb) I keep seeing folders like "Cache.Trash821552478" in the profile directory, slowly filling up all disk space until I manually delete them. I don't know exactly what conditions lead to this but I had it a second time now that all the space on /data/ was completely exhausted by these. They seem to contain cache data:

$ adb shell ls -l /data/data/org.mozilla.fennec_decoder/files/mozilla/szzixceq.default/Cache.Trash821552478/
-rw------- app_35   app_35              18998 1970-01-01 12:05 _CACHE_001_
drwx------ app_35   app_35                      1970-01-01 12:04 C
-rw------- app_35   app_35                276 1970-01-01 12:04 _CACHE_MAP_
drwx------ app_35   app_35                      1970-01-01 12:04 2
drwx------ app_35   app_35                      1970-01-01 12:04 9
drwx------ app_35   app_35                      1970-01-01 12:04 0
drwx------ app_35   app_35                      1970-01-01 12:04 F
drwx------ app_35   app_35                      1970-01-01 12:04 E
drwx------ app_35   app_35                      1970-01-01 12:04 A
drwx------ app_35   app_35                      1970-01-01 12:04 7
drwx------ app_35   app_35                      1970-01-01 12:04 4
drwx------ app_35   app_35                      1970-01-01 12:04 8
drwx------ app_35   app_35                      1970-01-01 12:04 6
-rw------- app_35   app_35               4096 1970-01-01 12:05 _CACHE_002_
drwx------ app_35   app_35                      1970-01-01 12:04 D
drwx------ app_35   app_35                      1970-01-01 12:04 B
drwx------ app_35   app_35                      1970-01-01 12:04 1
drwx------ app_35   app_35                      1970-01-01 12:04 3
drwx------ app_35   app_35                      1970-01-01 12:04 5
-rw------- app_35   app_35               4183 1970-01-01 12:05 _CACHE_003_
Comment 1 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-14 08:11:40 PDT
Weird.  The date is 01-01-1970?  What's the date on your device set to?   What device is this?
Comment 2 Christian Holler (:decoder) 2012-05-14 08:49:10 PDT
(In reply to Naoki Hirata :nhirata from comment #1)
> Weird.  The date is 01-01-1970?  What's the date on your device set to?  
> What device is this?

These are Tegra boards. I cannot reach them right now due to some IT failure in MV, but I'll check their date once I have access again.
Comment 3 Christian Holler (:decoder) 2012-05-16 17:22:58 PDT
(In reply to Naoki Hirata :nhirata from comment #1)
> What's the date on your device set to?  

It's indeed 01-01-1970. But I guess thats normal for the Tegra boards.
Comment 4 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-17 08:53:38 PDT
I wonder if it's some sort of y2k or some similar issue?  Could you try setting the device to be the current date/time and see what occurs after some time please?
Comment 5 Christian Holler (:decoder) 2012-05-17 09:05:12 PDT
(In reply to Naoki Hirata :nhirata from comment #4)
> I wonder if it's some sort of y2k or some similar issue?  Could you try
> setting the device to be the current date/time and see what occurs after
> some time please?

I can try that later but I doubt it has something to do with date/time at all. I encounter this mainly during minimization of testcases, that means the browser crashes a lot. I could imagine that some cleanup is done on regular exit of Fennec, that is of course not done when it crashes. And starting up lots of times and not exiting regularly might leave something over.
Comment 6 david 2012-05-19 08:28:55 PDT
On my device Firefox Beta is eating the storage memory till i get a memory warning and my phone is getting really slow. 
When i see in the app manager it's shown that "data" is growing, the "cache" is still empty (0).
Comment 7 Christian Holler (:decoder) 2012-05-19 08:37:47 PDT
(In reply to david from comment #6)
> On my device Firefox Beta is eating the storage memory till i get a memory
> warning and my phone is getting really slow. 
> When i see in the app manager it's shown that "data" is growing, the "cache"
> is still empty (0).

Is your device rooted so you could peak into the profile directory using ADB?
Comment 8 david 2012-05-19 08:50:56 PDT
Yes, it is rooted, running with CM7.
With ADB you mean the Android Debug Bridge? I don't have it installed. Is this quickly done on Ubuntu?
Comment 9 Christian Holler (:decoder) 2012-05-19 16:53:15 PDT
(In reply to david from comment #8)
> Yes, it is rooted, running with CM7.
> With ADB you mean the Android Debug Bridge? I don't have it installed. Is
> this quickly done on Ubuntu?

It should be, yes (I'm using Ubuntu as well and I remember it being really easy here). I think you need the Android SDK available at http://developer.android.com/sdk/index.html . Once you have that installed, just connect your device with an USB cable to your computer and using the "adb" command you should be able to run shell commands on the device (e.g. "adb shell ls").
Comment 10 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-05-21 08:56:35 PDT
A couple of things to watch for:
 - the cache is typically invalidated and trashed (moved to Cache.Trashxxx) on the first run after a crash
 - the Cache.Trashxxx file is subsequently deleted only after a delay (60 seconds? I can't quite remember) and on some devices, the deletion may take some time (an additional 60s or so)

I sometimes see an accumulation of Cache.Trash directories when testing, because I crash repeatedly, and Fennec doesn't get a chance to clean up.
Comment 11 Gian-Carlo Pascutto [:gcp] 2012-05-22 06:17:34 PDT
*** Bug 757385 has been marked as a duplicate of this bug. ***
Comment 12 Gian-Carlo Pascutto [:gcp] 2012-05-22 06:19:15 PDT
There's a dump of the profile directory from another user who's experiencing this: https://bugzilla.mozilla.org/show_bug.cgi?id=757385
Comment 13 Christian Holler (:decoder) 2012-05-22 06:21:07 PDT
(In reply to Geoff Brown [:gbrown] from comment #10)
> A couple of things to watch for:
>  - the cache is typically invalidated and trashed (moved to Cache.Trashxxx)
> on the first run after a crash
>  - the Cache.Trashxxx file is subsequently deleted only after a delay (60
> seconds? I can't quite remember) and on some devices, the deletion may take
> some time (an additional 60s or so)

That might explain it. I'm encountering this bug mainly during minimization of test cases which involves many short runs which often crash. I worked around this by having our harness delete the files before each run.
Comment 14 Mark Finkle (:mfinkle) (use needinfo?) 2012-05-22 06:21:40 PDT
Nom'd for blocking since this seems to be happening in the field
Comment 17 Brad Lassey [:blassey] (use needinfo?) 2012-05-22 11:51:16 PDT
Geoff, we were discussing putting these files in the android cache dir such that android can clean them up on its own if we're running out of disk space.
Comment 18 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-05-22 12:06:02 PDT
(In reply to Brad Lassey [:blassey] from comment #17)
> Geoff, we were discussing putting these files in the android cache dir such
> that android can clean them up on its own if we're running out of disk space.

That's bug 742558. Firefox likes to manage its own cache; there's currently no allowance made for an outside entity cleaning it up, so it's not just a simple matter of changing the directory.
Comment 19 Brad Lassey [:blassey] (use needinfo?) 2012-05-22 12:07:38 PDT
moving the trash files should be relatively safe, no?
Comment 20 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-05-22 12:12:13 PDT
Oh, sorry, I misinterpreted. Moving an invalid cache to the Android cache directory for future deletion seems safe and easy.
Comment 21 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-05-30 09:02:21 PDT
I tested cache invalidation and trash cleanup and found everything working as designed:
 - max 20 MB (approx) disk cache on Android
 - disk cache invalidated when Fennec crashes or is killed
 - robocop tests always kill Fennec when test completes
 - Cache moved to Cache.Trash when invalid
 - Cache.Trash deleted in background after 90 s delay

Moving the trash destination to the Android application cache directory has the advantage that the OS should clean up the .Trashes for us when free space is low. The OS cleaning also seems to happen after a delay, so we might not improve on this scenario much.
Comment 22 (Back on May31) Kartikaya Gupta (email:kats@mozilla.com) 2012-05-30 10:24:02 PDT
Another user reporting running out of disk space: https://support.mozilla.org/en-US/questions/928351
Comment 23 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-05-30 16:21:06 PDT
Created attachment 628532 [details] [diff] [review]
wip patch

This patch changes behavior on Android such that the disk cache "trash dir" is now a sub-directory of the Android cache dir. 

Currently we move an invalid Cache from, say:
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache 
to:
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache.TrashNNNN

With this patch, we move from 
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache 
to:
/data/data/org.mozilla.fennec/cache/Cache.TrashNNNN

The advantage is that, if the volume is low on space, Android may delete the trash folder itself, so that space is reclaimed even if Fennec has exited.


This is a first draft/wip patch. It seems to work for me but is very rough/inelegant and needs testing. 

I am on PTO until June 11; feel free to take this bug!
Comment 24 Brad Lassey [:blassey] (use needinfo?) 2012-05-30 20:29:29 PDT
gcp, can you pick this up?
Comment 25 Jason Duell [:jduell] (needinfo? me) 2012-06-01 13:54:58 PDT
Comment on attachment 628532 [details] [diff] [review]
wip patch

Review of attachment 628532 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsICacheService.idl
@@ +58,5 @@
>       * Event target which is used for I/O operations
>       */
>      readonly attribute nsIEventTarget cacheIOTarget;
> +
> +    attribute nsIFile cacheTrashDir;

Well, it's a gross thing to have an IDL be platform-specific, but we need to make sure this never, ever gets used on an NTFS-based filesystem (windows):  if we try to move Cache to a different directory (rather than renaming it in the current directory) on NTFS the rename (which we block browser startup on) takes as long as actually deleting everything (due to traversing all files to update NTFS permissions: see bug 670911).  That could be fixed by changing our underlying NTFS file code to have a flag that allows moving w/o the check, but you don't want to go there for this bug.

I'd be fine with having SetCacheDir simply throw if you try to use it on NTFS.  Alas, right now we don't have a way to determine is an nsIFile is on NTFS. bbondy tells me we could use ctypes to access the win32 API:  

  "You could also simply add an attribute to nsILocalFileWin.idl which wraps a call to GetVolumeInformation and checks for NTFS easily. http://msdn.microsoft.com/en-us/library/windows/desktop/aa364993%28v=vs.85%29.aspx"

But given that this is android-specific for now, and we're in a hurry (I am: are you? ;)  I suggest we simply QI the nsIFile passed into SetCacheTrashDir and throw if it's a  nsILocalFileWin.   Make sure to add a comment that this is just to prevent NTFS usage, and that we can write a more precise test for that if we ever need one (i.e. we add Windows phone support, or something like that), and/or tweak the NFTS rename code to not do checks. Cite bug 670911 in the comment.
Comment 26 Jason Duell [:jduell] (needinfo? me) 2012-06-01 13:58:06 PDT
Created attachment 629326 [details]
C code to detect NTFS in case we go ctypes route

I don't expect we want to use ctypes, but in case we do, bbondy kindly gave me this C code that does the NTFS check that we could base it on.
Comment 27 Brad Lassey [:blassey] (use needinfo?) 2012-06-05 08:32:26 PDT
So, given that this is ifdef'd to Android, I don't think we need to care too much about NTFS. However, this is not the first time that we've wanted to change behavior based on the file system (see: https://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileUnix.cpp#1072) so we may want to add a method to nsIFile to get that information in the future.
Comment 28 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-06 06:45:25 PDT
Could we just use the Directory Service to return a new directory for the Cache Trash folder? Instead of adding IDL for a specific folder property?
Comment 29 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-06 14:40:43 PDT
Created attachment 630726 [details] [diff] [review]
alt WIP patch

This patch simple puts the Android cache folder into the env and grabs it in C++. We do the same thing for Downloads:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/nsDownloadManager.cpp#1182

The patch has some debugging crap in it. I am trying to verify the Trash folder is created and I am having a hard time.
Comment 30 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-06 14:41:31 PDT
I have verified the Android Cache folder is correctly received in DeleteDir, but that's as far as I have got.
Comment 31 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-07 10:29:44 PDT
Created attachment 631033 [details] [diff] [review]
alt WIP patch 2

Fallback to current behavior if no android cache folder is found, less debugging and XUL bustage fix. I sent this to Try server:
https://tbpl.mozilla.org/?tree=Try&rev=89444a668270
Comment 32 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-07 10:30:32 PDT
Stealing from Gian-Carlo for now
Comment 33 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-07 14:08:35 PDT
Created attachment 631130 [details] [diff] [review]
alt patch

The patch ran fine on Try, so asking for reviews.
Comment 34 Jason Duell [:jduell] (needinfo? me) 2012-06-07 17:29:28 PDT
Comment on attachment 631130 [details] [diff] [review]
alt patch

Review of attachment 631130 [details] [diff] [review]:
-----------------------------------------------------------------

I'm surprised this works (do you know for sure it does?), for the following reason:

   http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDeleteDir.cpp#243

When we changed the code to always rename Cache -> Cache.trashRANDOM__NUMBER within the same directory (to avoid NTFS nastiness), we changed the MoveToNative() call to pass null for the "new directory" parameter.  And we don't even wind up using the full 'trash' directory for the rename, but just the leafName.  Poking through the MovetoNative code for Unix filesystems, I see we wind up getting the file's original directory for 'newdir' when it's passed a blank.   So I strongly suspect that you're going to the trouble of getting a new trash directory name ("/foo/CACHE_DIRECTORY/Cache.trash"), but then we're grabbing just "Cache.trash" from it (in nsDeleteDir.cpp:214), appending a random number to it, then passing it to MovetoNative, where it winds up living in the same directory it started, not the one you assigned in GetTrashDir.

If I'm right about this, the easiest fix is probably to keep your existing logic, but also compare 'dirIn.parent' with 'Trashdir.parent' right before the MoveToNative call:

  targetDir = nsnull;
  #if ANDROID
    if (dirIn.parent != TrashDir.parent)
       targetDir = Trashdir.parent
  #endif

   MoveToNative(targetDir, leaf);
    

It's vital that we pass nsnull to MoveToNative in the NTFS case (we only skip the GUI-hanging ACL checks if the param is null), so it won't work to always simply pass TrashDir.parent as the first param.

Sorry this code has gotten so tricky to navigate.

::: netwerk/cache/nsDeleteDir.cpp
@@ +268,5 @@
> +  char* cachePath = getenv("CACHE_DIRECTORY");
> +  if (cachePath) {
> +    rv = NS_NewNativeLocalFile(nsDependentCString(cachePath),
> +                               true, getter_AddRefs(*result));
> +

Not checking rv here.  Guard the rest of the logic in this branch with an if(NS_SUCCEEDED(rv)), and the if NS_FAILED after the #endif will handle the failure case.
Comment 35 Brad Lassey [:blassey] (use needinfo?) 2012-06-07 20:19:46 PDT
Comment on attachment 631130 [details] [diff] [review]
alt patch

Review of attachment 631130 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsDeleteDir.cpp
@@ +282,2 @@
>    nsresult rv = target->Clone(getter_AddRefs(*result));
> +#endif

I'd rather have this as:

nsresult rv;
#ifdef MOZ_WIDGET_ANDROID

  blah blah

  } else 
#endf
  {
    rv = target->Clone(getter_AddRefs(*result));
  }

just so that if that default behavior ever changes, we don't miss the change on android
Comment 36 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-08 09:15:20 PDT
Created attachment 631426 [details] [diff] [review]
alt patch v2

Thanks a ton for the explanation Jason!

This patch takes parts of Geoff's DeleteDir WIP patch, the parts Jason warned me about. It also makes the change Brad asked for regarding the fallback.

I tested this on my Nexus S. I loaded a page and killed Fennec quickly a few times. It created 2 Cache.Trash#### folders in the Android cache folder. I then let the app run for a few minutes and the Cache.Trash#### folders were removed by Fennec.
Comment 37 Jason Duell [:jduell] (needinfo? me) 2012-06-11 10:13:19 PDT
Created attachment 631933 [details] [diff] [review]
just moves NTFS comment to better place in code

mfinkle: please make the comment change in this patch (just apply on top of your patch) before you land.
Comment 38 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-11 13:46:23 PDT
landed with the comment fix:
https://hg.mozilla.org/mozilla-central/rev/44777d90bbd6
Comment 39 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-11 13:48:10 PDT
Comment on attachment 631426 [details] [diff] [review]
alt patch v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: unused cache files can fill up the device
Testing completed (on m-c, etc.): just landed on m-c... needs to bake
Risk to taking this patch (and alternatives if risky): low to med risk, but bake time should give confidence.
String or UUID changes made by this patch: none
Comment 40 Alex Keybl [:akeybl] 2012-06-11 17:25:11 PDT
Comment on attachment 631426 [details] [diff] [review]
alt patch v2

[Triage Comment]
Given the medium risk profile, we shouldn't land in our last beta next week. Additionally, giving this one day of bake time probably won't get us the feedback we need. If we're going to fix before release, let's fix now. Approved for branches.
Comment 41 Alex Keybl [:akeybl] 2012-06-11 17:33:15 PDT
(In reply to Alex Keybl [:akeybl] from comment #40)
> Comment on attachment 631426 [details] [diff] [review]
> alt patch v2
> 
> [Triage Comment]
> Given the medium risk profile, we shouldn't land in our last beta next week.
> Additionally, giving this one day of bake time probably won't get us the
> feedback we need. If we're going to fix before release, let's fix now.
> Approved for branches.

(I'll note that this approval is based upon the fact that we think that the worst regression this could cause would be a startup time regression)
Comment 42 Matt Brubeck (:mbrubeck) 2012-06-11 17:43:11 PDT
This landed on Aurora and Beta but was backed out because of a build error:

https://hg.mozilla.org/releases/mozilla-aurora/rev/9d3ca1eaa001
https://hg.mozilla.org/releases/mozilla-beta/rev/86d23e17bd96

https://tbpl.mozilla.org/php/getParsedLog.php?id=12566994&tree=Mozilla-Aurora
/builds/slave/m-aurora-andrd-xul/build/netwerk/cache/nsDeleteDir.cpp:279: error: cannot convert 'nsGetterAddRefs<nsIFile>' to 'nsILocalFile**' for argument '3' to 'nsresult NS_NewNativeLocalFile_P(const nsACString_internal&, bool, nsILocalFile**)'
Comment 43 Mark Finkle (:mfinkle) (use needinfo?) 2012-06-11 20:12:25 PDT
The patch needed a tweak to handle the nsILocalFile -> nsIFile changes on aurora and beta:
https://hg.mozilla.org/releases/mozilla-aurora/rev/e135fca7989a
https://hg.mozilla.org/releases/mozilla-beta/rev/ed059049854e

Note You need to log in before you can comment on or make changes to this bug.