Closed Bug 1342722 Opened 3 years ago Closed 3 years ago

Firefox cache used all available inodes on the system.

Categories

(Core :: Networking: Cache, defect)

45 Branch
x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: benland100, Assigned: michal)

Details

(Whiteboard: [necko-active])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Build ID: 20170223115752

Steps to reproduce:

We use a small javascript enabled webpage that updates with GET queries that returns very little data frequently. After having this webpage running for a few months we started to notice the system had used nearly all of the available inodes. The culprit ended up being Firefox caching the results of these GET queries!


Actual results:

Even though the total cache size was small (well under the 350MB default) the total number of files in the cache2 directory was enormous due to their small size - in the tens of millions. Needless to say the system ran out of inodes and even deleting all the files has taken a significant amount of time.


Expected results:

One would think that Firefox should have some mechanism to safeguard against using up all the inodes on the system due to caching. Of course lowering the total cache size is a work around: lowering the cache to 1MB will effectively cap the total files around 250,000 assuming a 4096 filesystem block size. However, who really needs over a million cache entries?
Component: Untriaged → Networking: Cache
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
Michal, should we have a file number limit?
Flags: needinfo?(michal.novotny)
(In reply to benland100 from comment #0)
> Even though the total cache size was small (well under the 350MB default)
> the total number of files in the cache2 directory was enormous due to their
> small size - in the tens of millions. Needless to say the system ran out of
> inodes and even deleting all the files has taken a significant amount of
> time.

Where did you see tens of millions entries? In cache2/entries or in cache2/doomed?

> One would think that Firefox should have some mechanism to safeguard against
> using up all the inodes on the system due to caching. Of course lowering the
> total cache size is a work around: lowering the cache to 1MB will
> effectively cap the total files around 250,000 assuming a 4096 filesystem
> block size. However, who really needs over a million cache entries?

I don't understand your computation. 250000 small files (smaller than a block size) will consume 1GB, not 1MB. Anyway, we round entry size up to kilobytes, i.e. even if the file contains only 1 byte, we count its size as 1KB. So with standard cache size 350MB the maximum number of files is 358400. If you really have tens of millions files, then I can see two possible causes:

1) We shouldn't generate empty entry files, but if we do then it could lead to the described problem. Check you cache2/entries directory for empty files.

2) Eviction doesn't work for some reason. In this case about:cache would show "Storage in use" bigger than "Maximum storage size".
Flags: needinfo?(michal.novotny) → needinfo?(benland100)
The files I saw were in cache2/entries. 

You're right I did get my orders of magnitude wrong in the calculation in the original message, so it's unclear to me how this happened as it should have exceeded the cache size. None of the files that I checked were zero in size, but checking comprehensively will take some time (one cannot simply ls a directory with tens of millions of files in it). I've cleared out the files in cache2/entries on the machine that actually ran out of inodes so we could use it again, but I have 3 others with millions of files so I'll check those for zero size files.

Unfortunately about:cache seems to hang forever calculating stats for the disk cache - not entirely unexpected.
(In reply to benland100 from comment #3)
> Unfortunately about:cache seems to hang forever calculating stats for the
> disk cache - not entirely unexpected.

We don't iterate through the files to get size of the cache. We use index in memory which keeps track of all files. So it shouldn't take much time to show the summary of the cache stats. Even if we're building the index right now, about:cache should show size of the entries we've parsed so far without waiting for end of the building process. OTOH, in Preferences->Advanced->Network we show "calculating" in case index is not ready.
The disk cache on one of the machines with millions of files in cache2/entries did finally display results:

Number of entries:     7486143
Maximum storage size:  358400 KiB
Storage in use:        4038660 KiB

In the first 150k files none of them are size zero. Should have a final result in about an hour. Interestingly though this seems to indicate that the "Storage in use" cannot be the disk space used since it is less than 4kB * number of entries.
Yes, "Storage in use" is wrong here. The size overflows here http://searchfox.org/mozilla-central/rev/4039fb4c5833706f6880763de216974e00ba096c/netwerk/cache2/CacheStorageService.cpp#448

But in CacheIndex internally we have a correct value. So overlimit eviction should keep the cache size within the defined limit. A problem can arise when the storage is too slow, because writing entries has a bigger priority than eviction, so we could end up not being able to evict the entries as quickly as the new ones are created.

You could verify the above by keeping firefox idle (ideally put it into offline mode to make sure we don't do any network request that could be cached) and watching "Number of entries". It should decrease over time.
As it happens I left firefox running with no tabs open over the last hour and the number of entries did decrease by about 1000. This seems quite slow though - no wonder it was able to build up many millions of files with a page updating frequently.

Perhaps unimportant now but I can also confirm that none of the files in cache2/entries were zero size. They had an average data size of 703 bytes.
Yes, this is slow. The question is whether it is slow because there are so many files, or whether we've written so many files because the storage is slow in general. Also, It would be good to know if only deleting files is slow. Could you please do 2 tests?

1) Turn on logging of cache activity where I could see what takes most of the time when evicting entries. Easiest would be to turn this in section Logging in about:networking for a while when Firefox is idle and is slowly deleting entries, but this requires Firefox 52, i.e. at least current beta. You could also follow instruction at https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging to start logging in 51 release but this will generate huge amount of data on startup. MOZ_LOG for this bug should be "timestamp,sync,cache2:5".

2) It would be good to know how fast are other IO operations. Quit Firefox and delete cache/index file. Then start Firefox and remember the exact time. Let it run with no tabs open and watch when the index is created. The problem is that once we build the index we'll rewrite it periodically due to eviction. You can use following command to find the first creation time:

while ! stat index 2>/dev/null ; do sleep 1; done

The whole stat output will be interesting. Difference between access time and modify time tells how slow is writing to the disk. Exact number of entries can be calculated from size. And of course most interesting is the difference between access time and Firefox startup time.
(In reply to Michal Novotny (:michal) from comment #6)
> Yes, "Storage in use" is wrong here. The size overflows here
> http://searchfox.org/mozilla-central/rev/
> 4039fb4c5833706f6880763de216974e00ba096c/netwerk/cache2/CacheStorageService.
> cpp#448

Filed bug 1343191 for it.
Assignee: nobody → michal.novotny
Whiteboard: [necko-active]
Attached patch fixSplinter Review
Fix in this patch just ensures that the cache doesn't grow out of control. We still don't know what's exactly slow on eviction in this specific case.
Attachment #8846149 - Flags: review?(honzab.moz)
Comment on attachment 8846149 [details] [diff] [review]
fix

Review of attachment 8846149 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFileIOManager.h
@@ +473,5 @@
>    nsTArray<CacheFileHandle *>          mSpecialHandles;
>    nsTArray<RefPtr<CacheFile> >         mScheduledMetadataWrites;
>    nsCOMPtr<nsITimer>                   mMetadataWritesTimer;
>    bool                                 mOverLimitEvicting;
> +  bool                                 mCacheSizeOnHardLimit;

would use a comment..
Attachment #8846149 - Flags: review?(honzab.moz) → review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/474f08673e24
Firefox cache used all available inodes on the system, r=honzab
https://hg.mozilla.org/mozilla-central/rev/474f08673e24
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Flags: needinfo?(benland100)
You need to log in before you can comment on or make changes to this bug.