As per bug 105843, if we crash or shutdown uncleanly we rename the old cache directory to a random number. We attempt 10 times to find a cache directory name that is unused. We don't currently call srand() though before calling rand(). And so it loops through 10 times once you have 10 such directories and then you have no cache at all. This could lead to all sorts of slowdowns and reports on telemetry data.
By the way, the amount of IO that bug 105843 generates, probably causes significant jank everywhere else that depends on IO (everywhere).
Created attachment 622755 [details] [diff] [review] Patch v1. Verified that more than 10 directories can be created now. I noticed because I've been using the Metro browser and unclean shutdown is currently very common there.
I wonder if any rand() implementations always return the same sequence of results after a particular result regardless of srand(); i.e. is there always extra hidden state or not? If RAND_MAX is 32767 and srand() takes uint32, probably there is hidden state, but it's not mandated. rand() sucks... ;-) What's the test sequence? Why aren't the Trash cache dirs getting deleted to begin with such that you get 10? That seems the more serious bug. That all said, this is certainly an improvement!
It gets deleted over time, so if you have a big cache directory to delete, and your browser crashes 9 other times then you could reach this state. Without srand it was returning the same sequence of 10 numbers in the for loop for me.
Comment on attachment 622755 [details] [diff] [review] Patch v1. Review of attachment 622755 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for noticing this. The only way we should be able to get lots of undeleted cache directories is if firefox is run over and over without ever getting a chance to run for more than a minute, and crashes (or is killed) each time. (does that match your experience, Brian?) This is because we currently wait for 60 seconds after startup before deleting the old cache directories, in an attempt to prevent slowing down startup. Perhaps that's too long to wait on mobile? So long as we eventually get a chance to delete these directories, it's not too bad to accumulate a number of them (they're not massive: 1/2 MB each). And yes, this whole scheme of deleting (sometimes large + with too many files) Cache directories stinks, and causes lots more I/O than needed. We need to replace it.
It is entirely possible that none of my 10 sessions lasted longer than a minute, but the directories can take some time to delete even after started since you have to delete 1 file at a time. But once your cache is disabled for the session, that session may be alive for a long time and your cache will be disabled the whole time.
> It is entirely possible that none of my 10 sessions lasted longer than a minute Did you crash each time? You should only get a new Cache.trash-RAND## each time you crash. You can have successful sessions in between the crashes, if they are less than a minute (plus time to delete the old directories, as you point out), and still keep accumulating. With this fix you should statistically speaking always get a Cache directory. So the risk becomes filling up your disk should you always have sessions than are under a minute or so. I thought that was a minimal risk factor, but perhaps I'm wrong?
If you crash or have an unclean shutdown. In Metro you currently get that commonly, but we might be able to improve that. I probably wouldn't have noticed this if I wasn't using the Metro browser in its current state.
Comment on attachment 622755 [details] [diff] [review] Patch v1. As close to a zero-risk code patch as you can get (just seeds random number generator, which we forgot to do in orig code). Prevents possible scenario where users who have crashed repeatedly within a minute or so of startup no longer get a Cache for successful sessions, until one of those sessions lasts long enough to delete the old Cache directories. So, not a very common situation (or extremely serious: browser still works, just w/o HTTP cache, so slower). But very low risk. String changes made by this patch: none.
Comment on attachment 622755 [details] [diff] [review] Patch v1. [Triage Comment] thanks for the risk assessment, glad the post-crash user experience will be improved with this fix.
Is there something QA can do to verify this is fixed?
It's not easy to reproduce but I verified it locally.
Thanks Brian, are you able to verify this fixed in Firefox 13, 14, or 15? or was it just a locally built version?
No only in my locally built version, maybe you can kill firefox and set no access on the folders for your user. The folders should accumulate up until it reaches 10 and then it should disable the cache altogether. I'm not sure about exact details about how to test this though, but something along those lines.
Adding [qa+] to test along the lines of comment 17, though we aren't sure if that will verify this bug 100%.
(In reply to Brian R. Bondy [:bbondy] from comment #17) > No only in my locally built version, maybe you can kill firefox and set no > access on the folders for your user. The folders should accumulate up until > it reaches 10 and then it should disable the cache altogether. I'm not sure > about exact details about how to test this though, but something along those > lines. So in Nightly 2012-05-09 the numbers of cache folders stopped at 10 after end-tasking the FF process multiple times. But I was able to see more than 10 cache folders on Nightly 16.0a1 (2012-06-05) which then disappeared after ~5 min. From my understandings is this the right behavior? And what has to do with this the access on folders for my user ?
That is the purpose of this task yes. There should be more than 10 in the new build as you confirmed, so I would call this verified.
I'll come back with results from FF 13, 14 and 15, since I've only verified this on FF 16 (latest nightly).
More than 10 cache folders seen on FF 13 release, FF 14b6 and FF 15 2012-06-04. This is verified fixed.