Closed Bug 602611 Opened 14 years ago Closed 8 years ago

cache recently-used content in memory

Categories

(Core :: Networking: Cache, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: shaver, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [Snappy:P3][necko-backlog])

Attachments

(1 file)

We don't cache pretty much anything in memory, which means that we go to disk all. the. time. and are slow.

We should cache recently-used content in memory, so that we can be fast.
Do we want to try to do this for 2.0?  E.g. a write-through cache where we just treat memory as a cache level above disk?  Seems like too big a change... :(
Blocks: http_cache
If the OS VMM and disk cache are working correctly, we shouldn't need to go to disk any more than we would if we implemented our own disk cache. If we are going to disk too often then maybe we are not expressing our intentions to the OS appropriately.

Possible strategies:

1. Use modifiers like FILE_ATTRIBUTE_TEMPORARY for when creating cache files, to delay writing files as much as possible.

2. Use a LRU cache for the last N used cache files; instead of closing each cache file when we are done writing/reading it, leave it mmap'ed until we open the (N+1)th cache file. This would give us a "memory cache" that the OS can page more efficiently than it would page our heap.

http://www.varnish-cache.org/trac/wiki/ArchitectNotes
> If the OS VMM and disk cache are working correctly

At least on Mac (and notoriously on Windows) they seem to not be.... Or alternately the context-switching or other overhead of doing I/O, even cached, is too high?  Note that I didn't measure whether we hit disk; I just observed that we take more than 10ms to read from cache, generally speaking.

The mmapping thing might work...  Note that our memory cache is just 32mb max, so it's not that likely to actually page out much unless we're being paged out in general.
FILE_ATTRIBUTE_TEMPORARY isn't what we want, since we don't delete it right away.  

Varnish's architecture is good, but consumes a lot of address space.  For a dedicated server like Varnish, this is exactly the right thing, but given the mappings consumed by other parts of our system and the OS (especially with graphics drivers in play now) I think it could be problematic for us.

Using fewer cache files is a good idea, and I think we have bugs to move in that direction for a number of reasons.  Collecting assets in a memory cache would also let us write out assets that are related to each other in contiguous regions on disk, improving locality when we're reloading from cache.  It's unlikely that the kernel will be able to do as good a job, given the knowledge it has.

As an FF4-timeframe possibility, caching FDs might work well, and maybe have less risk of introducing a leak.  I'll see if we have any cache-performance harnesses that I can poke at to see why it takes so long to read from a just-written cache entry.  (I suspect we can't read a cache entry until it's finished being written, either, so the async stuff might be causing us some pain?)
Might as well make memory cache bigger while we're at it.  Right now we max at 32 MB, bz points out on irc.
FILE_ATTRIBUTE_TEMPORARY simply re-prioritizes the writing to disk so that it is done after writing of files without FILE_ATTRIBUTE_TEMPORARY and before writing to the page file. So, if you had a cache write with FILE_ATTRIBUTE_TEMPORARY and then a SQLite database write without it, the SQLite write will nearly always hit the disk first, which is probably what we want. (Though, the fsync() after the SQLite write should already handle that case.)  FILE_FLAG_DELETE_ON_CLOSE is the separate flag that actually makes the file temporary.
I'm not sure that delaying writing is what we need here.  We want to speed up _reading_.  If delaying writing means that files hang out in ram and the os is more likely to hand them back fast, that _might_ work.  But the case where I ran into this has a file I've had cached forever.  So it's just in the cache, period.  It's been recently read, but not recently written, in the case I care about.

I'll try to do some testing of this on Windows next week early (so far the >10ms reads I ran into were on Mac).  We should probably just create a timeline of the cache read and see what's going on....
Blocks: 454001
(In reply to comment #1)
> E.g. a write-through cache where we just
> treat memory as a cache level above disk?

I have some ideas about this and would like to give it a shot. The approach would be to re-bind cache-entries to the disk-device when they are evicted from memory-device.
Assignee: nobody → bjarne
Status: NEW → ASSIGNED
We'd want to make sure not to leave write-to-disk until eviction, or we add I/O when we're trying to quickly store something new.  I think everything in memory should also be on disk; maybe that's what you mean by "rebind", though.
Mmmm..  with "rebind" I merely mean updating cache book-keeping so that the entry is managed by the disk-device instead of the memory-device.

Plan is to bind (subject to policies and flags, of-course) new entries to the memory-device (md) first - currently we try disk-device (dd) first which is why we store pretty much everything on disk. Whenever an entry gets evicted from md it is re-bound to dd (still subject to policies and flags). This seems to be working quite nicely and I'm currently experimenting with how to write data to disk without copying in memory (seems impossible, in fact). *When* to write data is also to be considered carefully, but initially it will happen when entry is evicted from dd (which is what you don't want, I know). When all this works I'll set up a mechanism to flush such entries to disk on a background-thread somehow.

This strategy should allow us to efficiently re-use data just loaded over the network since it should reside in memory-cache. It would, however, not help the situation bz describes in comment #7, but I believe it addresses the initial concern in comment #0. (I'd be very happy to be corrected here if I'm wrong! :} )
Note that on mobile we'll probably not want to store the same item in both memory and disk at the same time, as we've got little space to play with there (some disk caches are only 4-6 MB).
Another nutty idea: if we delay writing until eviction, we could batch a bunch of less-recently-used RAM cache entries, write them (async of course) to disk, and then flush them and the CACHE_MAP.  I.e. if we do writes to cache less frequently, we might be able to afford some relatively rare fsync calls, and actually be able to recover from crashes most of the time.   

But if fsync really sucks as hard as it sounds like it does on some platforms (global filesystem flush per call, etc), then maybe it's still not worth doing.

Also, Nick Hurley is gathering data that's quite relevant to this bug (OMG: actual DATA :) and I suspect we'll find that OS file caching is not as good a pal as we hoped.
Bjarne -- Do you have a wip patch for this still? I am curious about your approach.
(In reply to Jason Duell (:jduell) from comment #12)
> Another nutty idea: if we delay writing until eviction, we could batch a bunch
> of less-recently-used RAM cache entries ... [snip]

Referring to comment #10: We stuff things into mem-cache and when we need to evict (subject to lru-sp) entries from the mem-cache we "re-bind" it to the disk-cache. The part about batching is a little unclear - do you mean e.g. that upon evicting an entry we could grab a batch of less-recently-used entries and evict/rebind them all to disk in one go?

> I.e. if we do writes to
> cache less frequently, we might be able to afford some relatively rare fsync
> calls, and actually be able to recover from crashes most of the time. 

Probably worth a shot...  Btw, have anyone actually tried to add the necessary fsyncs to recover from crashes and measured the effect? I know that calling fsync is generally considered to be a performance-killer but is there any hard data to back this up in our case?

> Also, Nick Hurley is gathering data that's quite relevant to this bug (OMG:
> actual DATA :) and I suspect we'll find that OS file caching is not as good
> a pal as we hoped

Is there a bug or wiki-page tracking this work?

(In reply to Geoff Brown [:gbrown] from comment #13)
> Bjarne -- Do you have a wip patch for this still? I am curious about your
> approach.

I do but it's been some time... Iirc, there were some issues which made me put the work on hold but I'll dig up the patch and get back here about it.
(In reply to Bjarne (:bjarne) from comment #14)
> (In reply to Jason Duell (:jduell) from comment #12)
> > Also, Nick Hurley is gathering data that's quite relevant to this bug (OMG:
> > actual DATA :) and I suspect we'll find that OS file caching is not as good
> > a pal as we hoped
> 
> Is there a bug or wiki-page tracking this work?

I plan on adding my results to https://wiki.mozilla.org/Necko/MobileCache sometime today
(In reply to Bjarne (:bjarne) from comment #14)
> Probably worth a shot...  Btw, have anyone actually tried to add the
> necessary fsyncs to recover from crashes and measured the effect? I know
> that calling fsync is generally considered to be a performance-killer but is
> there any hard data to back this up in our case?

I wanted to do it, but it isn't a simple change and it would take quite a lot of time just to test it. And since I was told that fsync is unacceptable, I decided not to implement it. But if there is at least any chance to accept fsync I'll be happy to give it a try.
Is it the principle of using fsync which is unacceptable? If not, I'd say that it may be worth to measure cache-performance with a simple, naive fsync-strategy and compare it to the cache without fsyncs. Especially if this brings us closer to use the cache after a crash. (With "simple strategy" I mean a quick implementation which fsyncs frequently enough to be representative, without necessarily covering all cases.)
Attached patch Proposed fixSplinter Review
This is current thinking. It needs a fix for bug #528569 but I did not set up any dependency since it is only *this* proposed solution which needs the fix, not a general solution to this bug.

The idea is to use the existing mem-device as a "1st level" cache and the existing disk-device as "2nd level" cache. We always store new entries in mem-device first, and when the resource is read from the net we set up a backup-entry in the disk-device. Conversely, if we find a resource in the disk-device, we set up and write to an entry in mem-device when the client reads the cached resource from disk. (All this subject to policies of-course.)

Earlier versions of this pass surprisingly many tests on tryserver but there are a number of TODOs in the code to which I would like comments before requesting review on this.

I'm working on how to test this. No benchmarking has been done yet.
Attachment #574459 - Flags: feedback?(michal.novotny)
Bjarne:

Let me see if I follow you:  So when a new cacheable URI is loaded, this would store it in the memory cache.  If/when we get a cache hit for it, we would then store it into the disk cache as well.

Is that correct?   Does this mean that memory cache entries that get evicted before they are read do not wind up on the disk cache?

[We still have a very high limit on the max size of an entry in the memory cache, right (up to 90% of the total cache size, IIRC)?  That might not work well with a strategy that evicts unread memory cache entries w/o storing them on disk.]

Also, what do we do on shutdown (and/or memory cache flush--which we may start doing a lot on mobile, every time the browser stops being the foreground app).  Ideally we save entries to disk at those times?
Couldn't we just use mmap() and let the OS page cache handle it? Or is mmap() not reliable on all platforms?
Two issues with that:

1)  We want the disk cache to persist across browser restarts.
2)  mmaping our whole disk cache would use too much address space, which is all too limited on Windows.
(In reply to Jason Duell (:jduell) from comment #19)
> Bjarne:
> 
> Let me see if I follow you:

You don't...  :)  But it's my fault - I wrote comment #18 way too late in the evening (or perhaps way too early in the morning).

When loading a new cacheable resource we try to place it in mem-device. When the output-stream for the mem-device-entry is requested we return a tee which stores a copy in disk-device. Book-keeping is handled properly so that we will find the entry in disk-device if it gets evicted from mem-device.

When a cached resource is found in the disk-device, we replace the input-stream from it with a tee which writes a copy into a mem-device-entry and do book-keeping so that the next time it is requested it will be found in the mem-device.

I believe this closely resembles a so-called write-through cache (as opposed to a write-back cache). The benefit of this scheme is that we let the read/write from the client drive the data-copy to and from disk.
Attachment #574459 - Flags: feedback?(hurley)
Whiteboard: [Snappy[
Whiteboard: [Snappy[ → [Snappy]
(In reply to Jason Duell (:jduell) from comment #11)
> Note that on mobile we'll probably not want to store the same item in both
> memory and disk at the same time, as we've got little space to play with
> there (some disk caches are only 4-6 MB).

Space is certainly at a premium on mobile...but we are pretty concerned with performance too, and I am hopeful that this change will improve cache use on mobile. 

Current mobile default settings give us a 1 MB memory cache and a 10 MB disk cache. Recall that with both caches enabled, content is written to the disk cache unless persistent storage is forbidden: so the memory cache is under-utilized. On the other hand, if we disable the mobile disk cache, and rely on the 1 MB memory cache, we often see content evicted before it can be re-accessed.

Rather than seeing Bjarne's approach as wasting space, I look at it as better-utilizing the memory cache. For a typical page set, the disk cache should have approximately the same content and the memory cache should have more relevant content, giving us more hits from memory.

There's lots of testing ahead and we need to look at performance numbers still (and won't that be fun!?) but I think the approach is sound.
Whiteboard: [Snappy] → [Snappy:P3]
> Current mobile default settings give us a 1 MB memory cache and a 10 MB disk cache.

At that ratio (a 10% overhead), write-through sounds good.  Let's give it a go and see how it improves performance.
I've been planning to use Tp4 to measure improvements for this patch. So far results are not promising: a general performance-loss with two exceptions over the whole suite (assuming I interpret the results correctly).

I will look into this, but are there any input to whether Tp4 is suitable for measuring performance for this patch? The approach introduce overhead which is expected to be outweighed by faster access to recently used content, but I am starting to get second thoughts about whether we will see that effect in "stock" Tp4...

Maybe we should get started on a (micro)benchmark which includes the effect of re-using stuff?
Running the existing microbenchmarks shows a clear performance-hit with this patch. I'll re-run and double-check build-options, but I think those are ok. More work to be done on the patch itself... :(
(Note to myself: Don't compare recent results with older results.)

Previous comment is wrong: There is no significant difference when running microbenchmarks with and without this patch (as long as you use the same repo, build-options and runtime-setup).

Will also double-check conclusion from comment #25. I was comparing my results with Tp4-results from before the summer (which were available).
> but are there any input to whether Tp4 is suitable for measuring performance for this patch? 

I seem to remember Geoff (and then Nick) using a tweaked version of tp (not sure if tp4 or not) that ensured there was some actual resource reuse in the cache.  If you're using vanilla tp4 there's a good chance that's not a good benchmark. Ask Nick/Geoff.
Nick provided great info about his tweaked tp in bug 648429 (comments 127-131).
Speculation about your tp4 results: I think normal tp4 would not show much improvement with this patch. As I recall, the tp4 page set is pretty large, and each page is accessed in sequence, then the whole sequence is repeated. If the page set size is much bigger than the memory cache capacity, most resources will be evicted from the memory cache before being revisited (some won't be, and it's hard to predict because of LRU-SP eviction, but still...) -- and we will end up hitting the disk cache almost as much as without the patch.
I'm seeing a rather mixed bag of results running this on mobile devices. Sometimes it helps a little, sometimes it hurts a little (page load time seems to be +/- 10% at worst/best, often closer to the baseline). I'm going to test this with a modified tp4 on desktop to see if the larger memory cache size makes this a more definite win. If it is, it could also serve as an indicator that we might want to up the memory cache size on mobile a bit, too (heresy in some circles, I know). Heck, I'll go ahead and test that case (bigger memory cache) on mobile, too, and see what happens.
(In reply to Bjarne (:bjarne) from comment #22)
> When loading a new cacheable resource we try to place it in mem-device. When
> the output-stream for the mem-device-entry is requested we return a tee
> which stores a copy in disk-device.

I am not surprised that there isn't clear evidence of a win yet.

With this approach, isn't the entry effectively stored in memory twice? Once in our memory cache, and once in the operating system's buffer cache. This is because the operating system already implements a two-level write-through or write-back cache.

The problem with us doing this in our userspace is that it makes us look more memory-hungry to the operating system. This encourages Android to kill us more often. The advantage of relying on the operating system's buffer cache is that the operating system doesn't/shouldn't count the buffer cache against us. And, it can automatically free memory by flushing the buffer cache to disk at its discretion. Both of these effects reduce the likelihood that Android will kill our process to reclaim memory.

The disadvantage of relying on the operating system's buffer cache is that you have less control over what gets evicted from memory--but only if the operating system has a policy of paging out the buffer cache instead of, or before, swapping. If the operating system is just as eager to swap as it is to page out buffer cache pages, then it would be a wash. (I suspect that OSs will have at least *some* preference towards paging out the buffer cache over swapping, but I have no evidence right now.)

The other potential reason using the memory cache more, and the disk cache less, could hurt performance is locality: Using the memory cache more causes us to grow the memory cache towards its maximum size. IIRC, it isn't allocated as one big block, so that means it is fragmented all over the heap. Not only does this potentially make locality of reference bad for the cache itself, but it also potentially spreads out non-cache data used by other parts of the system, reducing those subsystem's locality of reference.

*IF* we were to write to the disk cache less, we may be able to reduce the amount of work that ext3/ext4's horrible fsync() would do, speeding up other parts of the browser/system that are waiting on fsyncs. However, with the approach Bjarne outlined, teeing the data to our internal memory cache and the operating system's buffer cache, we would be writing just as much stuff to the operating system's buffer cache, so fsync would have to do just as much work as in a pure-disk-cache approach. (Another way to reduce the impact of the disk cache on fsyncs would be to write the cache on a different logical disk than the permanent data like places. Perhaps, at least on Android, the special cache directory returned by getCacheDir [1] might be on a different filesystem from the rest of the system, so using the cache directory may be a win here. I wouldn't be surprised if other operating systems, especially Windows, treated temp directories specially in some beneficial way here too.)

[1] http://developer.android.com/reference/android/content/Context.html#getCacheDir%28%29
(In reply to Boris Zbarsky (:bz) from comment #3)
> At least on Mac (and notoriously on Windows) they seem to not be.... Or
> alternately the context-switching or other overhead of doing I/O, even
> cached, is too high?  Note that I didn't measure whether we hit disk; I just
> observed that we take more than 10ms to read from cache, generally speaking.

(In reply to Boris Zbarsky (:bz) from comment #7)
> [T]he case where I ran into this has a file I've had cached forever.  So
> it's just in the cache, period.  It's been recently read, but not
> recently written[.]
> 
> I'll try to do some testing of this on Windows next week early (so far the
> >10ms reads I ran into were on Mac).  We should probably just create a
> timeline of the cache read and see what's going on....

I think it is very important to reproduce and understand what Boris reported here. If it takes 10ms to return an entry that should already be paged into the operating system's buffer cache, then how much of that 10ms is our fault, and how much of it is the operating system's fault? If 9ms of it is our fault, then attempting to avoid using the operating system's buffer cache is probably not going to be a win.
Hmm.  I thought I'd commented in this bug, but apparently not.

I did more testing at the time and found that part of the problem was simply that starting to read from a cache entry required something like three separate trips through the event loop.  So if there were events posted that took a few ms to process (repaints as we're doing the pageload, say), we got screwed on the cache latency just from that.
A constructive discussion is nice. :)

Let me first clarify my understanding of the goal of this bug so we don't waste time arguing about issues arising from misaligned goals: The basic assumption is that locating and reading content from the memory-cache is faster than locating and reading from the disk-cache. From this and the general locality-principle, IMO the goal must be to increase the chance to find content we have used recently in the memory-cache. Complicating factors are that we need the cache to be persistent so we cannot just switch to using a pure memory cache, and we must have the ability to honor flags to store things only in memory or on disk as a separate file.

The suggested approach is rather straightforward and certainly not the only one. A completely different go at this may be to let recently used entries stay "active" in nsCacheService, betting on the OS to keep their content in memory-buffers, avoiding explicit writes to disk (by delaying deactivation), and speed up search (active entries are kept in a hashtable). This could e.g. be combined with applying the LRU-SP replacement algorithm to *active* entries to decide when to deactivate. With this approach though, we don't control nor report how much memory we use for caching (which may or may not be considered an issue), and we'd also have to redo quite a lot of logic which makes decisions based on entries being active or not. But it should be doable and may even be a way to include Jasons idea from comment #12.

(In reply to Brian Smith (:bsmith) from comment #33)
> With this approach, isn't the entry effectively stored in memory twice? Once
> in our memory cache, and once in the operating system's buffer cache. 

Correct, one copy is made implicitly at the discretion of the OS. See also comment #11 and comment #23.

> The problem with us doing this in our userspace is that it makes us look
> more memory-hungry to the operating system. This encourages Android to kill
> us more often. The advantage of relying on the operating system's buffer
> cache is that the operating system doesn't/shouldn't count the buffer cache
> against us. And, it can automatically free memory by flushing the buffer
> cache to disk at its discretion. Both of these effects reduce the likelihood
> that Android will kill our process to reclaim memory.

This is a good (and new to me) point which sounds like an argument against using an explicit memory-cache at all. I don't know how to relate to this: Any input from mobile team?

See also alternative approach described above.

> The disadvantage of relying on the operating system's buffer cache is that
> you have less control over what gets evicted from memory

Yes. We (hopefully) have more control of what to evict since we know about http, hence we *should* be able to improve the general hit-rate due to a better replacement-algorithm (see bug #648605 for why we cannot do this also for the disk-cache). There is also the point that searching the memory-cache is faster than searching the disk-cache (due to how data has to be organized) so locating something in the memory-cache is generally faster. And see also the alternative approach suggested above.

> The other potential reason using the memory cache more, and the disk cache
> less, could hurt performance is locality: Using the memory cache more causes
> us to grow the memory cache towards its maximum size. IIRC, it isn't
> allocated as one big block, so that means it is fragmented all over the
> heap.

Hmm...  subtle point. But actually more of an argument against the way we implement the mem-cache, no? I'd love to hear more arguments for or against this.

> *IF* we were to write to the disk cache less, we may be able to reduce the
> amount of work that ext3/ext4's horrible fsync() would do, speeding up other
> parts of the browser/system that are waiting on fsyncs. However, with the
> approach Bjarne outlined, teeing the data to our internal memory cache and
> the operating system's buffer cache, we would be writing just as much stuff
> to the operating system's buffer cache, so fsync would have to do just as
> much work as in a pure-disk-cache approach. 

We do need to store things on disk in order to make them persistent and we rely on the OS to deal with this in an efficient manner. Having said this, I'm open to the idea of enhancing this patch to delay writes to disk when we read from the net. (And see the alternative approach suggested above, which implicitly may do as indicated here.)

(In reply to Boris Zbarsky (:bz) from comment #35)
> I did more testing at the time and found that part of the problem was simply
> that starting to read from a cache entry required something like three
> separate trips through the event loop.  So if there were events posted that
> took a few ms to process (repaints as we're doing the pageload, say), we got
> screwed on the cache latency just from that.

I guess this is independent of where the cache-entry is stored (i.e. also applies to entries loaded form memory-cache)? Sounds like something to be addressed separately with the potential for good ROI - bug #612632?
(In reply to Geoff Brown [:gbrown] from comment #30)
> Nick provided great info about his tweaked tp in bug 648429 (comments
> 127-131).

Yes - I've seen this. Thanks Nick! I'll try to find time to check it out unless someone already have reported results from this.
> bug #612632?

Oh, that's right.  I forgot I filed a separate bug on that!
(In reply to Brian Smith (:bsmith) from comment #33)
> (In reply to Bjarne (:bjarne) from comment #22)
> > When loading a new cacheable resource we try to place it in mem-device. When
> > the output-stream for the mem-device-entry is requested we return a tee
> > which stores a copy in disk-device.
> 
> I am not surprised that there isn't clear evidence of a win yet.
> 
> With this approach, isn't the entry effectively stored in memory twice? Once
> in our memory cache, and once in the operating system's buffer cache. This
> is because the operating system already implements a two-level write-through
> or write-back cache.
> 
> The problem with us doing this in our userspace is that it makes us look
> more memory-hungry to the operating system. This encourages Android to kill
> us more often. The advantage of relying on the operating system's buffer
> cache is that the operating system doesn't/shouldn't count the buffer cache
> against us. And, it can automatically free memory by flushing the buffer
> cache to disk at its discretion. Both of these effects reduce the likelihood
> that Android will kill our process to reclaim memory.

Normal file IO goes through VFS caches and those are in no way tied to the app that caused those caches to fill up. Reading every file on the system will not penalize your memory usage unless you mmap().
(In reply to Bjarne (:bjarne) from comment #36)
> Let me first clarify my understanding of the goal of this bug so we don't
> waste time arguing about issues arising from misaligned goals: The basic
> assumption is that locating and reading content from the memory-cache is
> faster than locating and reading from the disk-cache.

This has not been demonstrated to be true. And, even if it is demonstrated to be true, we should try to make the disk cache faster first (e.g. possibly bug 612632) before we try to use the memory cache more, for the reasons I mentioned in comment 33. Only when we find that we can't make the disk cache fast enough, and understand why, should we spend more memory on it.

> This is a good (and new to me) point which sounds like an argument against
> using an explicit memory-cache at all.

Yes, that is what I am arguing that we should try to do first.

> Yes. We (hopefully) have more control of what to evict since we know about
> http, hence we *should* be able to improve the general hit-rate due to a
> better replacement-algorithm (see bug #648605 for why we cannot do this also
> for the disk-cache).

AFAICT, we should be comparing the OS's buffer cache replacement algorithm to the memory cache's replacement algorithm, not the disk cache's replacement algorithm to the memory cache's replacement algorithm.

> There is also the point that searching the memory-cache
> is faster than searching the disk-cache (due to how data has to be
> organized) so locating something in the memory-cache is generally faster.

How much faster is it? Is there anything we can do to make searching the disk cache (nearly) as fast as searching the memory cache?

> > The other potential reason using the memory cache more, and the disk cache
> > less, could hurt performance is locality: Using the memory cache more causes
> > us to grow the memory cache towards its maximum size. IIRC, it isn't
> > allocated as one big block, so that means it is fragmented all over the
> > heap.
> 
> Hmm...  subtle point. But actually more of an argument against the way we
> implement the mem-cache, no?

Right. We could definitely try to avoid heap fragmentation, if it would be a large win.

> I'm open to the idea of enhancing this patch to delay writes to disk when we
> read from the net. (And see the alternative approach suggested above, which
> implicitly may do as indicated here.)

If we have seen a problem where we are writing too much data to disk unnecessarily, or too quickly, then we should have a bug or bugs for those issues. And, in that case, it may indeed make sense to implement a write-back cache. See also comment 2 about hinting to the OS that the write to disk should be delayed.

> I guess this is independent of where the cache-entry is stored (i.e. also
> applies to entries loaded form memory-cache)? Sounds like something to be
> addressed separately with the potential for good ROI - bug #612632?

Right. I think my comment in comment #2 is still valid (at least, unrefuted): "If the OS VMM and disk cache are working correctly, we shouldn't need to go to disk any more than we would if we implemented our own disk cache. If we are going to disk too often then maybe we are not expressing our intentions to the OS appropriately."

The first step in this bug should be finding out if/when/why the OS VMM is not working optimally for our application, or whether the problem is in our own code (e.g. bug 612632).

To me, this bug feels like we're creating a solution for an unknown problem. Even if sombody posts a patch for this bug that makes the cache 10x faster, it would still be bad to not understand *why* there was such an improvement.
Hmm... eventually I think I see and understand your point here (as well as your opening-phrase in comment #33).

You're arguing that the OS does a really good job at caching recently used parts of the cache-file and that we wont see clear wins with an explicit approach because of this. Moreover, you argue that this implies we shouldn't bother to cache explicitly in memory. Right?

I don't not know much about how good VMM or file-caching is in various OSes. Comments 3 and 4 convinced me early on that we needed something explicit, but I see your points (and I read results). Need to think about it...  other input to this is also really welcome.

What do you think about the alternative approach from comment #36 as a way to utilize OS-buffering even more?
(In reply to Brian Smith (:bsmith) from comment #40)
> > There is also the point that searching the memory-cache
> > is faster than searching the disk-cache (due to how data has to be
> > organized) so locating something in the memory-cache is generally faster.
> 
> How much faster is it? Is there anything we can do to make searching the
> disk cache (nearly) as fast as searching the memory cache?

We don't have telemetry for this. Anyway, to find a disk entry we need to read the nsDiskCacheEntry from the disk if the entry is inactive, so I would expect this to be much slower due to IO.
(In reply to Brian Smith (:bsmith) from comment #40)
> Right. I think my comment in comment #2 is still valid (at least,
> unrefuted): "If the OS VMM and disk cache are working correctly, we
> shouldn't need to go to disk any more than we would if we implemented our
> own disk cache. If we are going to disk too often then maybe we are not
> expressing our intentions to the OS appropriately."

Brian, while I do think it'd be great if we could rely on the OS to effectively do our memory caching for us, I'm not sure we can. Think of the filesystems that we know of where fsync causes the entire filesystem to be flushed to disk. If any process running on a machine with one of those filesystems calls fsync, our disk cache will be flushed, and its entries in the buffer cache will be marked as clean, and thus available for re-use by other data, effectively destroying any reliability in the OS-run "memory cache" for us. Similarly, if we happen to be running on a machine that has other processes doing a lot of IO, then the OS will be removing entries from the buffer cache more often, quite possibly to the detriment of our implicit memory cache. Thus, even if the OS' buffer cache is "working correctly", we may end up taking a hit.

This doesn't mean that I don't agree that we should try to make our disk cache faster (we should, which is why looking at that is currently on our Q1 goals list), but neither do I think we should consider just getting rid of our memory cache altogether (as you advocated a bit farther down in comment 40), for this and for other reasons (content cached only for the benefit of view-source, etc).
Comment on attachment 574459 [details] [diff] [review]
Proposed fix

Rebinding from memory to disk won't work correctly if you append data to an existing entry, which is present only in memory. There will be only the new appended data in the disk entry (if it doesn't fail at all).

The same problem is in the opposite way (disk->memory) when not reading the whole content (the memory entry won't be complete, which is probably OK) or not reading from the beginning (again the beginning is missing).

I think that we shouldn't rebind from disk to memory all entries. We should try to select only those entries that are small and will be probably used in the future again (js, small icons, etc...). E.g. this decision can make nsHttpChannel which will set appropriate metadata.

Also, the terminology is confusing. Writing a copy to the disk cache is definitely not a rebinding, but rather a backing up or mirroring. mSecondaryCacheDevice can in fact be only a disk device, so the name is misleading.
Attachment #574459 - Flags: feedback?(michal.novotny)
Lots of good input here - thanks guys! I'll add some comments and a proposed strategy in the end...

(In reply to Michal Novotny (:michal) from comment #42)
> Anyway, to find a disk entry we need to
> read the nsDiskCacheEntry from the disk if the entry is inactive, so I would
> expect this to be much slower due to IO.

Not if the disk entry has been recently used. If we accept Brians premises then the OS has cached such entries in memory already. You are absolutely correct if this entry has *not* been recently used, but then we wouldn't find it in our explicit mem-cache either...

(In reply to Nick Hurley [:hurley] from comment #43)
> Think of the
> filesystems that we know of where fsync causes the entire filesystem to be
> flushed to disk. If any process running on a machine with one of those
> filesystems calls fsync, our disk cache will be flushed, and its entries in
> the buffer cache will be marked as clean, and thus available for re-use by
> other data, effectively destroying any reliability in the OS-run "memory
> cache" for us. Similarly, if we happen to be running on a machine that has
> other processes doing a lot of IO, then the OS will be removing entries from
> the buffer cache more often, quite possibly to the detriment of our implicit
> memory cache.

I may be over-interpreting here but this sounds similar to comment #4, second paragraph. By relying on the OS we are completely at the mercy of other processes and the efficiency of the VMM. For Varnish this is fine, but the browser lives in a different environment and should deal with this differently.

OTOH, the architecture-document for Varnish (link in comment #2) points out that even if we do manage a memory-cache explicitly, the OS may have paged our memory out and we're back to disk-access anyway. Our process cannot be "blamed" for it but it's nevertheless access to disk.

(In reply to Michal Novotny (:michal) from comment #44)

Thanks for pointing out these possible bugs! The test must cover them...

> I think that we shouldn't rebind from disk to memory all entries. We should
> try to select only those entries that are small and will be probably used in
> the future again (js, small icons, etc...). E.g. this decision can make
> nsHttpChannel which will set appropriate metadata.

This is a very good idea! We might even avoid the issues you describe around writing and reading parts of entries by making sure to copy whole entries even if the client aborts (with small entries this would probably not hit us too hard).

> Also, the terminology is confusing. Writing a copy to the disk cache is
> definitely not a rebinding, but rather a backing up or mirroring.
> mSecondaryCacheDevice can in fact be only a disk device, so the name is
> misleading.

Yeah...  :]  I know.  The WIP patch is the result of several iterations on my end and terminology must be cleaned up in a final patch.

So, to move further with this we should IMO

1) Decide if we want to trust the OS and VMM enough to take charge of memory-buffering. I.e. do we want to manage caching in memory explicitly or not? (Not talking about non-persistent entries - these will be dealt with.) Note that the situation in this respect is likely to be different on mobile and desktop so total consensus may not be reached, but IMO we must make this decision anyway in order to be constructive in our next steps.

2) If we delegate to OS, come up with some ideas for how we can help it by re-organizing the disk-cache (see e.g. alternative approach in comment #36) and/or setting flags etc (see comment #2)

3) If we don't delegate, we either

  3.1) refine and bugfix (comment #44) the current approach, or
  3.2) try out a different approach like e.g. a write-back cache (see comment #12)

4) Alternatively, we drop this idea and come up with a completely new disk-cache solution. This could, however, be done in parallel with efforts in this area.

We should probably also establish a benchmark which has some effect of reusing cached entries to measure the effect of different approaches. The existing cache-microbenchmarks are ok for checking that we don't degrade performance, but we should also see improved performance somehow. Tp4 do not seem to be right for this (see comment #31) but maybe the modified Tp4 referenced in comment #29 and comment #30 will do the job.

In any case, generally improving both mem-cache and disk-cache devices is useful (unless option 4 above is selected) and should be done independently of this.
(In reply to Bjarne (:bjarne) from comment #41)
> You're arguing that the OS does a really good job at caching recently used
> parts of the cache-file and that we wont see clear wins with an explicit
> approach because of this. Moreover, you argue that this implies we shouldn't
> bother to cache explicitly in memory. Right?

Not exactly. What I am saying is that we should identify where and why the disk cache is slow, and see if we can make it faster, first. Like I said, I think we're building a solution without knowing what the problem is.

> What do you think about the alternative approach from comment #36 as a way
> to utilize OS-buffering even more?

In order to compare different solutions, we have to understand the problem. I feel like I don't understand the problem.

To be clear, I am not saying that this work *shouldn't be done. I am saying that we should prioritize this work against other work based on evidence of performance problems that we've gathered. And, hopefully, we won't have to expand our use of the memory cache at all, and we will be able to save memory.

(In reply to Michal Novotny (:michal) from comment #42)
> (In reply to Brian Smith (:bsmith) from comment #40)
> > How much faster is it? Is there anything we can do to make searching the
> > disk cache (nearly) as fast as searching the memory cache?
> 
> We don't have telemetry for this.

We don't need telemetry, necessarily. Even something simple, like "I instrumented the cache and on some test runs I found that 40% of the time was waiting for syscall XXX reading file YYY, 20% of the time was spent waiting for events in the event queue, 20% of the time was spent reading file ZZZ, and 10% of the time was spent blah blah" would be incredibly helpful.

> Anyway, to find a disk entry we need to read the nsDiskCacheEntry from the
> disk if the entry is inactive, so I would expect this to be much slower due
> to IO.

1. If we have a memory cache miss, we would have to access the disk cache anyway, and incur this cost.
2. At startup, we will always incur the cost of the disk cache because the memory cache will be empty.
3. There are a bunch of ways to pre-warm the disk cache (e.g. opening the files early and using fadvise, preloading the indexes into memory, etc.)
(In reply to Nick Hurley [:hurley] from comment #43)
> If any process running on a machine with one of those
> filesystems calls fsync, our disk cache will be flushed, and its entries in
> the buffer cache will be marked as clean, and thus available for re-use by
> other data, effectively destroying any reliability in the OS-run "memory
> cache" for us.

Is this really true? I understand that the buffer cache pages will be marked clean, but won't they still be reused in LRU order? And, don't OS's implement anything that prevents writers from starving readers, especially from separate processes?

> Similarly, if we happen to be running on a machine that has
> other processes doing a lot of IO, then the OS will be removing entries from
> the buffer cache more often, quite possibly to the detriment of our implicit
> memory cache. Thus, even if the OS' buffer cache is "working correctly", we
> may end up taking a hit.

When Firefox Mobile is in the foreground, I bet it is pretty uncommon for a background app to be doing tons of I/O that would totally wipe out the buffer cache. For a variety of reasons, particularly battery usage, background applications on the phone really shouldn't be writing to disk unless absolutely necessary.

> but neither do I think we should consider just getting rid of
> our memory cache altogether (as you advocated a bit farther down in comment
> 40), for this and for other reasons (content cached only for the benefit of
> view-source, etc).

At least on mobile, I would rather disable the "view source" feature than spend memory on caching contents in memory for a feature almost nobody (*on a mobile phone*) will use.

On my phone, Firefox gets killed *way* too often, and that makes it unbearably slow to use. IMO, we should be trying really hard to use *fewer* resource, not more.
(In reply to Brian Smith (:bsmith) from comment #47)
> Is this really true? I understand that the buffer cache pages will be marked
> clean, but won't they still be reused in LRU order? And, don't OS's
> implement anything that prevents writers from starving readers, especially
> from separate processes?

I'm sure they'll be reused in some sort of a sane order, and I'm sure there's starving prevention (unless there's some wildly pathological FS/OS out there I don't know about). What I'm saying is, our idea of what we want kept in memory and the OS' idea of what we should have available to us in memory may (and likely are) different, and we can't necessarily rely on the OS to do what we want it to do, especially if the user happens to have a lot of other IO going on (compiles running, OS updates downloading, etc, etc).

> When Firefox Mobile is in the foreground, I bet it is pretty uncommon for a
> background app to be doing tons of I/O that would totally wipe out the
> buffer cache. For a variety of reasons, particularly battery usage,
> background applications on the phone really shouldn't be writing to disk
> unless absolutely necessary.

On mobile, sure (for the most part). I must've missed where you were arguing primarily for mobile in your comment that I originally replied to, in which case I pretty much agree with you (especially on the view-source use case... I don't even know how to view source on mobile!). I was taking your comments as applying globally to all firefoxen, mobile and desktop, sorry for any confusion on my part :)

It should also be noted that, while user-installed apps are unlikely to be doing much IO while in the background on mobile, that doesn't mean other apps aren't. There have already been issues with this on Samsung devices running their RFS being noticeably slower than they should be because of FS bottlenecks. (RFS fortunately seems to be abandoned now, although there are still plenty of devices out there running RFS, I don't believe there are any new models being produced with it.)

Having different interactions between the memory and disk caches for desktop vs. mobile are something I would at least like to look at when we start reconsidering the cache in Q1, but in the meantime, I think this work at least has the potential to be useful on desktop. Also, given that we aren't increasing the size of the memory cache for this feature, it has no more potential for excessive resource usage on mobile than we already have. Whether or not we use more resources in practice (ie, have a filled memory cache when we didn't previously causing us to request more memory from the OS) is another question, and we don't (currently) have the data to answer that.
> I bet it is pretty uncommon for a
> background app to be doing tons of I/O that would totally wipe out the
> buffer cache

It's not just other apps.  If we rely on the filesystem's memory cache, and we save anything large to the filesystem (certainly large cache files: possibly youtube vids, etc--I'm not sure if we save such things to a temp file or not) we'll blow away all the small things (CSS for youtube.com) that we could keep in a manually-controlled memory cache.   That said, this whole discussion is speculative, so data would be nice to have.

Re: mobile getting killed more often due to having a memory cache: we should look into whether we can evict the memory cache when we're notified of memory pressure (possibly using ashmem?  Not clear: see bug 413947 comment 6).
Comment on attachment 574459 [details] [diff] [review]
Proposed fix

Feedback given earlier, forgot to clear the flag
Attachment #574459 - Flags: feedback?(hurley)
-> default owner
Assignee: bjarne → nobody
Whiteboard: [Snappy:P3] → [Snappy:P3][necko-backlog]
Beaten by cache2.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: