Closed Bug 754575 Opened 12 years ago Closed 12 years ago

Cache.Trash* files fill up disk space

Categories

(Core :: Networking: Cache, defect)

All
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox14 --- fixed
firefox15 --- fixed
firefox16 --- fixed
blocking-fennec1.0 --- +

People

(Reporter: decoder, Assigned: mfinkle)

References

Details

Attachments

(4 files, 3 obsolete files)

While fuzzing on mobile (mozilla-central, debug build, rev 448f554f6acb) I keep seeing folders like "Cache.Trash821552478" in the profile directory, slowly filling up all disk space until I manually delete them. I don't know exactly what conditions lead to this but I had it a second time now that all the space on /data/ was completely exhausted by these. They seem to contain cache data:

$ adb shell ls -l /data/data/org.mozilla.fennec_decoder/files/mozilla/szzixceq.default/Cache.Trash821552478/
-rw------- app_35   app_35              18998 1970-01-01 12:05 _CACHE_001_
drwx------ app_35   app_35                      1970-01-01 12:04 C
-rw------- app_35   app_35                276 1970-01-01 12:04 _CACHE_MAP_
drwx------ app_35   app_35                      1970-01-01 12:04 2
drwx------ app_35   app_35                      1970-01-01 12:04 9
drwx------ app_35   app_35                      1970-01-01 12:04 0
drwx------ app_35   app_35                      1970-01-01 12:04 F
drwx------ app_35   app_35                      1970-01-01 12:04 E
drwx------ app_35   app_35                      1970-01-01 12:04 A
drwx------ app_35   app_35                      1970-01-01 12:04 7
drwx------ app_35   app_35                      1970-01-01 12:04 4
drwx------ app_35   app_35                      1970-01-01 12:04 8
drwx------ app_35   app_35                      1970-01-01 12:04 6
-rw------- app_35   app_35               4096 1970-01-01 12:05 _CACHE_002_
drwx------ app_35   app_35                      1970-01-01 12:04 D
drwx------ app_35   app_35                      1970-01-01 12:04 B
drwx------ app_35   app_35                      1970-01-01 12:04 1
drwx------ app_35   app_35                      1970-01-01 12:04 3
drwx------ app_35   app_35                      1970-01-01 12:04 5
-rw------- app_35   app_35               4183 1970-01-01 12:05 _CACHE_003_
Weird.  The date is 01-01-1970?  What's the date on your device set to?   What device is this?
(In reply to Naoki Hirata :nhirata from comment #1)
> Weird.  The date is 01-01-1970?  What's the date on your device set to?  
> What device is this?

These are Tegra boards. I cannot reach them right now due to some IT failure in MV, but I'll check their date once I have access again.
(In reply to Naoki Hirata :nhirata from comment #1)
> What's the date on your device set to?  

It's indeed 01-01-1970. But I guess thats normal for the Tegra boards.
I wonder if it's some sort of y2k or some similar issue?  Could you try setting the device to be the current date/time and see what occurs after some time please?
(In reply to Naoki Hirata :nhirata from comment #4)
> I wonder if it's some sort of y2k or some similar issue?  Could you try
> setting the device to be the current date/time and see what occurs after
> some time please?

I can try that later but I doubt it has something to do with date/time at all. I encounter this mainly during minimization of testcases, that means the browser crashes a lot. I could imagine that some cleanup is done on regular exit of Fennec, that is of course not done when it crashes. And starting up lots of times and not exiting regularly might leave something over.
On my device Firefox Beta is eating the storage memory till i get a memory warning and my phone is getting really slow. 
When i see in the app manager it's shown that "data" is growing, the "cache" is still empty (0).
(In reply to david from comment #6)
> On my device Firefox Beta is eating the storage memory till i get a memory
> warning and my phone is getting really slow. 
> When i see in the app manager it's shown that "data" is growing, the "cache"
> is still empty (0).

Is your device rooted so you could peak into the profile directory using ADB?
Yes, it is rooted, running with CM7.
With ADB you mean the Android Debug Bridge? I don't have it installed. Is this quickly done on Ubuntu?
(In reply to david from comment #8)
> Yes, it is rooted, running with CM7.
> With ADB you mean the Android Debug Bridge? I don't have it installed. Is
> this quickly done on Ubuntu?

It should be, yes (I'm using Ubuntu as well and I remember it being really easy here). I think you need the Android SDK available at http://developer.android.com/sdk/index.html . Once you have that installed, just connect your device with an USB cable to your computer and using the "adb" command you should be able to run shell commands on the device (e.g. "adb shell ls").
A couple of things to watch for:
 - the cache is typically invalidated and trashed (moved to Cache.Trashxxx) on the first run after a crash
 - the Cache.Trashxxx file is subsequently deleted only after a delay (60 seconds? I can't quite remember) and on some devices, the deletion may take some time (an additional 60s or so)

I sometimes see an accumulation of Cache.Trash directories when testing, because I crash repeatedly, and Fennec doesn't get a chance to clean up.
There's a dump of the profile directory from another user who's experiencing this: https://bugzilla.mozilla.org/show_bug.cgi?id=757385
(In reply to Geoff Brown [:gbrown] from comment #10)
> A couple of things to watch for:
>  - the cache is typically invalidated and trashed (moved to Cache.Trashxxx)
> on the first run after a crash
>  - the Cache.Trashxxx file is subsequently deleted only after a delay (60
> seconds? I can't quite remember) and on some devices, the deletion may take
> some time (an additional 60s or so)

That might explain it. I'm encountering this bug mainly during minimization of test cases which involves many short runs which often crash. I worked around this by having our harness delete the files before each run.
Nom'd for blocking since this seems to be happening in the field
blocking-fennec1.0: --- → ?
Geoff, we were discussing putting these files in the android cache dir such that android can clean them up on its own if we're running out of disk space.
Assignee: nobody → gbrown
blocking-fennec1.0: ? → +
(In reply to Brad Lassey [:blassey] from comment #17)
> Geoff, we were discussing putting these files in the android cache dir such
> that android can clean them up on its own if we're running out of disk space.

That's bug 742558. Firefox likes to manage its own cache; there's currently no allowance made for an outside entity cleaning it up, so it's not just a simple matter of changing the directory.
moving the trash files should be relatively safe, no?
Oh, sorry, I misinterpreted. Moving an invalid cache to the Android cache directory for future deletion seems safe and easy.
Assignee: gbrown → nobody
Component: General → Networking: Cache
Product: Fennec Native → Core
QA Contact: general → networking.cache
I tested cache invalidation and trash cleanup and found everything working as designed:
 - max 20 MB (approx) disk cache on Android
 - disk cache invalidated when Fennec crashes or is killed
 - robocop tests always kill Fennec when test completes
 - Cache moved to Cache.Trash when invalid
 - Cache.Trash deleted in background after 90 s delay

Moving the trash destination to the Android application cache directory has the advantage that the OS should clean up the .Trashes for us when free space is low. The OS cleaning also seems to happen after a delay, so we might not improve on this scenario much.
Assignee: nobody → gbrown
Attached patch wip patchSplinter Review
This patch changes behavior on Android such that the disk cache "trash dir" is now a sub-directory of the Android cache dir. 

Currently we move an invalid Cache from, say:
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache 
to:
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache.TrashNNNN

With this patch, we move from 
/data/data/org.mozilla.fennec/files/mozilla/xxx.default/Cache 
to:
/data/data/org.mozilla.fennec/cache/Cache.TrashNNNN

The advantage is that, if the volume is low on space, Android may delete the trash folder itself, so that space is reclaimed even if Fennec has exited.


This is a first draft/wip patch. It seems to work for me but is very rough/inelegant and needs testing. 

I am on PTO until June 11; feel free to take this bug!
gcp, can you pick this up?
Assignee: gbrown → gpascutto
Comment on attachment 628532 [details] [diff] [review]
wip patch

Review of attachment 628532 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsICacheService.idl
@@ +58,5 @@
>       * Event target which is used for I/O operations
>       */
>      readonly attribute nsIEventTarget cacheIOTarget;
> +
> +    attribute nsIFile cacheTrashDir;

Well, it's a gross thing to have an IDL be platform-specific, but we need to make sure this never, ever gets used on an NTFS-based filesystem (windows):  if we try to move Cache to a different directory (rather than renaming it in the current directory) on NTFS the rename (which we block browser startup on) takes as long as actually deleting everything (due to traversing all files to update NTFS permissions: see bug 670911).  That could be fixed by changing our underlying NTFS file code to have a flag that allows moving w/o the check, but you don't want to go there for this bug.

I'd be fine with having SetCacheDir simply throw if you try to use it on NTFS.  Alas, right now we don't have a way to determine is an nsIFile is on NTFS. bbondy tells me we could use ctypes to access the win32 API:  

  "You could also simply add an attribute to nsILocalFileWin.idl which wraps a call to GetVolumeInformation and checks for NTFS easily. http://msdn.microsoft.com/en-us/library/windows/desktop/aa364993%28v=vs.85%29.aspx"

But given that this is android-specific for now, and we're in a hurry (I am: are you? ;)  I suggest we simply QI the nsIFile passed into SetCacheTrashDir and throw if it's a  nsILocalFileWin.   Make sure to add a comment that this is just to prevent NTFS usage, and that we can write a more precise test for that if we ever need one (i.e. we add Windows phone support, or something like that), and/or tweak the NFTS rename code to not do checks. Cite bug 670911 in the comment.
I don't expect we want to use ctypes, but in case we do, bbondy kindly gave me this C code that does the NTFS check that we could base it on.
So, given that this is ifdef'd to Android, I don't think we need to care too much about NTFS. However, this is not the first time that we've wanted to change behavior based on the file system (see: https://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileUnix.cpp#1072) so we may want to add a method to nsIFile to get that information in the future.
Could we just use the Directory Service to return a new directory for the Cache Trash folder? Instead of adding IDL for a specific folder property?
Attached patch alt WIP patch (obsolete) — Splinter Review
This patch simple puts the Android cache folder into the env and grabs it in C++. We do the same thing for Downloads:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/nsDownloadManager.cpp#1182

The patch has some debugging crap in it. I am trying to verify the Trash folder is created and I am having a hard time.
I have verified the Android Cache folder is correctly received in DeleteDir, but that's as far as I have got.
Attached patch alt WIP patch 2 (obsolete) — Splinter Review
Fallback to current behavior if no android cache folder is found, less debugging and XUL bustage fix. I sent this to Try server:
https://tbpl.mozilla.org/?tree=Try&rev=89444a668270
Attachment #630726 - Attachment is obsolete: true
Stealing from Gian-Carlo for now
Assignee: gpascutto → mark.finkle
Attached patch alt patch (obsolete) — Splinter Review
The patch ran fine on Try, so asking for reviews.
Attachment #631033 - Attachment is obsolete: true
Attachment #631130 - Flags: review?(blassey.bugs)
Attachment #631130 - Flags: review?(jduell.mcbugs)
Comment on attachment 631130 [details] [diff] [review]
alt patch

Review of attachment 631130 [details] [diff] [review]:
-----------------------------------------------------------------

I'm surprised this works (do you know for sure it does?), for the following reason:

   http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDeleteDir.cpp#243

When we changed the code to always rename Cache -> Cache.trashRANDOM__NUMBER within the same directory (to avoid NTFS nastiness), we changed the MoveToNative() call to pass null for the "new directory" parameter.  And we don't even wind up using the full 'trash' directory for the rename, but just the leafName.  Poking through the MovetoNative code for Unix filesystems, I see we wind up getting the file's original directory for 'newdir' when it's passed a blank.   So I strongly suspect that you're going to the trouble of getting a new trash directory name ("/foo/CACHE_DIRECTORY/Cache.trash"), but then we're grabbing just "Cache.trash" from it (in nsDeleteDir.cpp:214), appending a random number to it, then passing it to MovetoNative, where it winds up living in the same directory it started, not the one you assigned in GetTrashDir.

If I'm right about this, the easiest fix is probably to keep your existing logic, but also compare 'dirIn.parent' with 'Trashdir.parent' right before the MoveToNative call:

  targetDir = nsnull;
  #if ANDROID
    if (dirIn.parent != TrashDir.parent)
       targetDir = Trashdir.parent
  #endif

   MoveToNative(targetDir, leaf);
    

It's vital that we pass nsnull to MoveToNative in the NTFS case (we only skip the GUI-hanging ACL checks if the param is null), so it won't work to always simply pass TrashDir.parent as the first param.

Sorry this code has gotten so tricky to navigate.

::: netwerk/cache/nsDeleteDir.cpp
@@ +268,5 @@
> +  char* cachePath = getenv("CACHE_DIRECTORY");
> +  if (cachePath) {
> +    rv = NS_NewNativeLocalFile(nsDependentCString(cachePath),
> +                               true, getter_AddRefs(*result));
> +

Not checking rv here.  Guard the rest of the logic in this branch with an if(NS_SUCCEEDED(rv)), and the if NS_FAILED after the #endif will handle the failure case.
Attachment #631130 - Flags: review?(jduell.mcbugs) → review-
Comment on attachment 631130 [details] [diff] [review]
alt patch

Review of attachment 631130 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsDeleteDir.cpp
@@ +282,2 @@
>    nsresult rv = target->Clone(getter_AddRefs(*result));
> +#endif

I'd rather have this as:

nsresult rv;
#ifdef MOZ_WIDGET_ANDROID

  blah blah

  } else 
#endf
  {
    rv = target->Clone(getter_AddRefs(*result));
  }

just so that if that default behavior ever changes, we don't miss the change on android
Attachment #631130 - Flags: review?(blassey.bugs) → review+
Attached patch alt patch v2Splinter Review
Thanks a ton for the explanation Jason!

This patch takes parts of Geoff's DeleteDir WIP patch, the parts Jason warned me about. It also makes the change Brad asked for regarding the fallback.

I tested this on my Nexus S. I loaded a page and killed Fennec quickly a few times. It created 2 Cache.Trash#### folders in the Android cache folder. I then let the app run for a few minutes and the Cache.Trash#### folders were removed by Fennec.
Attachment #631130 - Attachment is obsolete: true
Attachment #631426 - Flags: review?(jduell.mcbugs)
Whiteboard: [has new patch][needs review jduell]
Attachment #631426 - Flags: review?(jduell.mcbugs) → review+
mfinkle: please make the comment change in this patch (just apply on top of your patch) before you land.
landed with the comment fix:
https://hg.mozilla.org/mozilla-central/rev/44777d90bbd6
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [has new patch][needs review jduell]
Comment on attachment 631426 [details] [diff] [review]
alt patch v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: unused cache files can fill up the device
Testing completed (on m-c, etc.): just landed on m-c... needs to bake
Risk to taking this patch (and alternatives if risky): low to med risk, but bake time should give confidence.
String or UUID changes made by this patch: none
Attachment #631426 - Flags: approval-mozilla-beta?
Attachment #631426 - Flags: approval-mozilla-aurora?
Comment on attachment 631426 [details] [diff] [review]
alt patch v2

[Triage Comment]
Given the medium risk profile, we shouldn't land in our last beta next week. Additionally, giving this one day of bake time probably won't get us the feedback we need. If we're going to fix before release, let's fix now. Approved for branches.
Attachment #631426 - Flags: approval-mozilla-beta?
Attachment #631426 - Flags: approval-mozilla-beta+
Attachment #631426 - Flags: approval-mozilla-aurora?
Attachment #631426 - Flags: approval-mozilla-aurora+
(In reply to Alex Keybl [:akeybl] from comment #40)
> Comment on attachment 631426 [details] [diff] [review]
> alt patch v2
> 
> [Triage Comment]
> Given the medium risk profile, we shouldn't land in our last beta next week.
> Additionally, giving this one day of bake time probably won't get us the
> feedback we need. If we're going to fix before release, let's fix now.
> Approved for branches.

(I'll note that this approval is based upon the fact that we think that the worst regression this could cause would be a startup time regression)
This landed on Aurora and Beta but was backed out because of a build error:

https://hg.mozilla.org/releases/mozilla-aurora/rev/9d3ca1eaa001
https://hg.mozilla.org/releases/mozilla-beta/rev/86d23e17bd96

https://tbpl.mozilla.org/php/getParsedLog.php?id=12566994&tree=Mozilla-Aurora
/builds/slave/m-aurora-andrd-xul/build/netwerk/cache/nsDeleteDir.cpp:279: error: cannot convert 'nsGetterAddRefs<nsIFile>' to 'nsILocalFile**' for argument '3' to 'nsresult NS_NewNativeLocalFile_P(const nsACString_internal&, bool, nsILocalFile**)'
Target Milestone: --- → mozilla16
The patch needed a tweak to handle the nsILocalFile -> nsIFile changes on aurora and beta:
https://hg.mozilla.org/releases/mozilla-aurora/rev/e135fca7989a
https://hg.mozilla.org/releases/mozilla-beta/rev/ed059049854e
Target Milestone: mozilla16 → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: