Flush disk cache when going into background so we don't lose the disk cache if Fennec is OOM killed

RESOLVED FIXED in Firefox 14

Status

()

RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: mfinkle, Assigned: bnicholson)

Tracking

unspecified
Firefox 14
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-fennec1.0 beta+)

Details

Attachments

(1 attachment)

Bug 105843 tells the story of how the Necko disk cache is "corrupted" when the app crashes. On Android, we can be OOM killed by the OS when the app is in the background and the OS needs more memory.

When the app is accessed again, we reload the previous tabs from sessionstore, but we don't get any disk cache benefit since the disk cache was lost.

Can we flush the disk cache on a notification? We could then send a notification when Fennec goes into the background. If the app is OOM killed and is restored, we should have a valid cache to pull resources and load the previous tabs quicker.
(Assignee)

Comment 1

7 years ago
Created attachment 614885 [details] [diff] [review]
patch

This patch closes the disk cache when pausing, and re-initializes it when resuming. I verified that the cache is still intact after OOM kills and when force closing.
Attachment #614885 - Flags: review?(mark.finkle)
Comment on attachment 614885 [details] [diff] [review]
patch

Looks sane to me, but Brad and/or a Necko peer should look at this too.
Attachment #614885 - Flags: review?(mark.finkle)
Attachment #614885 - Flags: review?(blassey.bugs)
Attachment #614885 - Flags: feedback+
(In reply to Brian Nicholson (:bnicholson) from comment #1)
> This patch closes the disk cache when pausing, and re-initializes it when
> resuming. I verified that the cache is still intact after OOM kills and when
> force closing.

To be a little clearer, the whole cache service is shutdown / re-init'd in this patch, so other cache stuff is affected too. The memory cache will be discarded, for instance.

At first, this approach seemed too heavy-weight, but I'm feeling better about it now. The memory cache loss is likely no big deal, since most content is probably in the disk cache. Also, shutting down the whole cache service will reduce memory use...perhaps improving our chance of avoiding being killed.

cc'ing some cache people who might be interested...
(Assignee)

Updated

7 years ago
blocking-fennec1.0: --- → ?
When restoring from OOM, this fix will load page resources from disk cache instead of reloading them from the network. Meaning we should show page content faster during the restore startup.
Comment on attachment 614885 [details] [diff] [review]
patch

Review of attachment 614885 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, but Geoff has more experience with the disk cache
Attachment #614885 - Flags: review?(blassey.bugs) → review?(gbrown)
Geoff, one of the concerns is that shutting down the cache service will throw out the mem cache even if we're not gonna get killed. Can you comment please?
blocking-fennec1.0: ? → beta+
(In reply to Brad Lassey [:blassey] from comment #6)
> Geoff, one of the concerns is that shutting down the cache service will
> throw out the mem cache even if we're not gonna get killed. Can you comment
> please?

I thought I already had - see comment 3! 

I see it like this: With this patch, the mem cache is cleared whenever fennec goes to the background, so we lose an opportunity for future mem cache hits. *But* content only goes to the mem cache if it cannot be cached in the disk cache (if it cannot be persisted). I just visited 10 complex pages and checked my mem cache -- only a dozen resources in there, most of them trivial. Throwing that away when I go to the background seems reasonable. Also, shutting down the entire cache service and clearing the mem cache has the advantage of saving additional memory, which seems like a win from the perspective of avoiding oom kills.
Another concern I had was that the Init or Shutdown might be time-intense. I tested for a while on the Galaxy S and found that Init is quite snappy - <5ms typically - but Shutdown can take quite a while - 2000 to 5000 ms sometimes.

Is that Shutdown time a concern? The UI goes down right away even so. I tried switching back to Fennec while Shutdown was running and everything worked out fine - Fennec came to foreground and seemed responsive. 

This looks like a good change to me, but I'll test more if there is concern.

Comment 9

7 years ago
Comment on attachment 614885 [details] [diff] [review]
patch

Right now the memory cache is seriously underused (it used to store most SSL content: now it only stores pages that are explicitly flagged not to be stored on disk).  We've talked about making the memory cache more of a "write-through" cache for the disk (bug 602611), but we're not there yet.   So for now I think tossing the memory cache is fine--even good, given OOM pressure--since we get the trade-off of avoiding disk cache corruption.  But I want Michal's opinion too--there might be a lighter-weight way to have the disk cache pause its operations and mark the cache "clean" w/o a full shutdown.

Michal: I was wondering if Android gives us enough time to shut down the cache when we get the AndroidGeckoEvent::ACTIVITY_PAUSING event.  blassey tells me we can have "as much time as we want", so it should be ok that shutdown blocks while it waits for the cache thread to finish pending cache operations.
Attachment #614885 - Flags: feedback?(michal.novotny)
Comment on attachment 614885 [details] [diff] [review]
patch

Review of attachment 614885 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.

Updated

7 years ago
Attachment #614885 - Flags: review?(gbrown) → review+
(In reply to Geoff Brown [:gbrown] from comment #8)
> Another concern I had was that the Init or Shutdown might be time-intense. I
> tested for a while on the Galaxy S and found that Init is quite snappy -
> <5ms typically - but Shutdown can take quite a while - 2000 to 5000 ms
> sometimes.
> 
> Is that Shutdown time a concern? The UI goes down right away even so. I
> tried switching back to Fennec while Shutdown was running and everything
> worked out fine - Fennec came to foreground and seemed responsive. 

Thanks for the perf check. That was another aspect we were worried about. Your tests seem to indicate perf won't be an issue. The 2-5 sec shutdown time doesn't seem to come into play.
Comment on attachment 614885 [details] [diff] [review]
patch

(In reply to Jason Duell (:jduell) from comment #9)
> But I want Michal's opinion too--there might be a lighter-weight way to have
> the disk cache pause its operations and mark the cache "clean" w/o a full
> shutdown.

Now, there is no other way to shutdown the cache cleanly. And I don't think we could come up with any faster method. nsCacheService::Shutdown() performs only the necessary steps, so if it is slow, it is probably caused by the IO that we simply need to do.
Attachment #614885 - Flags: feedback?(michal.novotny) → feedback+
(Reporter)

Updated

7 years ago
Assignee: nobody → bnicholson

Comment 13

7 years ago
Do we need to check this on our old friend the Galaxy S which is about the worst of the I/O offenders?
(In reply to JP Rosevear [:jpr] from comment #13)
> Do we need to check this on our old friend the Galaxy S which is about the
> worst of the I/O offenders?

Already done (at least briefly) - comment 8.

Comment 16

7 years ago
It would be worth checking to see if the memory cache shutdown actually provides frees RAM from the process--see bug 413947 comment 7 (and the memory cache is just a hashtable that stores individually-alloc'd nsCacheEntries, so we don't do any kind of block allocation).   Prob not a big deal in practice since I don't expect the memory cache is actually getting very large.  We could modify the code so nsCacheEntries get their own allocator that we return to the OS when we background.
https://hg.mozilla.org/mozilla-central/rev/ac561faaee40
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14

Updated

7 years ago
Depends on: 746548

Updated

7 years ago
Depends on: 746697
You need to log in before you can comment on or make changes to this bug.