Closed Bug 745075 Opened 12 years ago Closed 12 years ago

Flush disk cache when going into background so we don't lose the disk cache if Fennec is OOM killed

Categories

(Firefox for Android Graveyard :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(blocking-fennec1.0 beta+)

RESOLVED FIXED
Firefox 14
Tracking Status
blocking-fennec1.0 --- beta+

People

(Reporter: mfinkle, Assigned: bnicholson)

References

Details

Attachments

(1 file)

Bug 105843 tells the story of how the Necko disk cache is "corrupted" when the app crashes. On Android, we can be OOM killed by the OS when the app is in the background and the OS needs more memory.

When the app is accessed again, we reload the previous tabs from sessionstore, but we don't get any disk cache benefit since the disk cache was lost.

Can we flush the disk cache on a notification? We could then send a notification when Fennec goes into the background. If the app is OOM killed and is restored, we should have a valid cache to pull resources and load the previous tabs quicker.
Attached patch patchSplinter Review
This patch closes the disk cache when pausing, and re-initializes it when resuming. I verified that the cache is still intact after OOM kills and when force closing.
Attachment #614885 - Flags: review?(mark.finkle)
Comment on attachment 614885 [details] [diff] [review]
patch

Looks sane to me, but Brad and/or a Necko peer should look at this too.
Attachment #614885 - Flags: review?(mark.finkle)
Attachment #614885 - Flags: review?(blassey.bugs)
Attachment #614885 - Flags: feedback+
(In reply to Brian Nicholson (:bnicholson) from comment #1)
> This patch closes the disk cache when pausing, and re-initializes it when
> resuming. I verified that the cache is still intact after OOM kills and when
> force closing.

To be a little clearer, the whole cache service is shutdown / re-init'd in this patch, so other cache stuff is affected too. The memory cache will be discarded, for instance.

At first, this approach seemed too heavy-weight, but I'm feeling better about it now. The memory cache loss is likely no big deal, since most content is probably in the disk cache. Also, shutting down the whole cache service will reduce memory use...perhaps improving our chance of avoiding being killed.

cc'ing some cache people who might be interested...
blocking-fennec1.0: --- → ?
When restoring from OOM, this fix will load page resources from disk cache instead of reloading them from the network. Meaning we should show page content faster during the restore startup.
Comment on attachment 614885 [details] [diff] [review]
patch

Review of attachment 614885 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, but Geoff has more experience with the disk cache
Attachment #614885 - Flags: review?(blassey.bugs) → review?(gbrown)
Geoff, one of the concerns is that shutting down the cache service will throw out the mem cache even if we're not gonna get killed. Can you comment please?
blocking-fennec1.0: ? → beta+
(In reply to Brad Lassey [:blassey] from comment #6)
> Geoff, one of the concerns is that shutting down the cache service will
> throw out the mem cache even if we're not gonna get killed. Can you comment
> please?

I thought I already had - see comment 3! 

I see it like this: With this patch, the mem cache is cleared whenever fennec goes to the background, so we lose an opportunity for future mem cache hits. *But* content only goes to the mem cache if it cannot be cached in the disk cache (if it cannot be persisted). I just visited 10 complex pages and checked my mem cache -- only a dozen resources in there, most of them trivial. Throwing that away when I go to the background seems reasonable. Also, shutting down the entire cache service and clearing the mem cache has the advantage of saving additional memory, which seems like a win from the perspective of avoiding oom kills.
Another concern I had was that the Init or Shutdown might be time-intense. I tested for a while on the Galaxy S and found that Init is quite snappy - <5ms typically - but Shutdown can take quite a while - 2000 to 5000 ms sometimes.

Is that Shutdown time a concern? The UI goes down right away even so. I tried switching back to Fennec while Shutdown was running and everything worked out fine - Fennec came to foreground and seemed responsive. 

This looks like a good change to me, but I'll test more if there is concern.
Comment on attachment 614885 [details] [diff] [review]
patch

Right now the memory cache is seriously underused (it used to store most SSL content: now it only stores pages that are explicitly flagged not to be stored on disk).  We've talked about making the memory cache more of a "write-through" cache for the disk (bug 602611), but we're not there yet.   So for now I think tossing the memory cache is fine--even good, given OOM pressure--since we get the trade-off of avoiding disk cache corruption.  But I want Michal's opinion too--there might be a lighter-weight way to have the disk cache pause its operations and mark the cache "clean" w/o a full shutdown.

Michal: I was wondering if Android gives us enough time to shut down the cache when we get the AndroidGeckoEvent::ACTIVITY_PAUSING event.  blassey tells me we can have "as much time as we want", so it should be ok that shutdown blocks while it waits for the cache thread to finish pending cache operations.
Attachment #614885 - Flags: feedback?(michal.novotny)
Comment on attachment 614885 [details] [diff] [review]
patch

Review of attachment 614885 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.
Attachment #614885 - Flags: review?(gbrown) → review+
(In reply to Geoff Brown [:gbrown] from comment #8)
> Another concern I had was that the Init or Shutdown might be time-intense. I
> tested for a while on the Galaxy S and found that Init is quite snappy -
> <5ms typically - but Shutdown can take quite a while - 2000 to 5000 ms
> sometimes.
> 
> Is that Shutdown time a concern? The UI goes down right away even so. I
> tried switching back to Fennec while Shutdown was running and everything
> worked out fine - Fennec came to foreground and seemed responsive. 

Thanks for the perf check. That was another aspect we were worried about. Your tests seem to indicate perf won't be an issue. The 2-5 sec shutdown time doesn't seem to come into play.
Comment on attachment 614885 [details] [diff] [review]
patch

(In reply to Jason Duell (:jduell) from comment #9)
> But I want Michal's opinion too--there might be a lighter-weight way to have
> the disk cache pause its operations and mark the cache "clean" w/o a full
> shutdown.

Now, there is no other way to shutdown the cache cleanly. And I don't think we could come up with any faster method. nsCacheService::Shutdown() performs only the necessary steps, so if it is slow, it is probably caused by the IO that we simply need to do.
Attachment #614885 - Flags: feedback?(michal.novotny) → feedback+
Assignee: nobody → bnicholson
Do we need to check this on our old friend the Galaxy S which is about the worst of the I/O offenders?
(In reply to JP Rosevear [:jpr] from comment #13)
> Do we need to check this on our old friend the Galaxy S which is about the
> worst of the I/O offenders?

Already done (at least briefly) - comment 8.
It would be worth checking to see if the memory cache shutdown actually provides frees RAM from the process--see bug 413947 comment 7 (and the memory cache is just a hashtable that stores individually-alloc'd nsCacheEntries, so we don't do any kind of block allocation).   Prob not a big deal in practice since I don't expect the memory cache is actually getting very large.  We could modify the code so nsCacheEntries get their own allocator that we return to the OS when we background.
https://hg.mozilla.org/mozilla-central/rev/ac561faaee40
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14
Depends on: 746548
Depends on: 746697
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: