LRU-SP not fully implemented in memory cache: small entries avoid eviction

RESOLVED FIXED in mozilla10



Networking: Cache
6 years ago
6 years ago


(Reporter: gbrown, Assigned: gbrown)



Firefox Tracking Flags

(Not tracked)




(1 attachment, 4 obsolete attachments)

The memory cache implements most features of the LRU-SP cache algorithm, but diverges in one significant way. The Cheng/Kambayashi paper states that the choice of eviction entry is determined by considering the set of entries at the heads of all eviction lists, and choosing the one with the greatest cost (cost=time-since-last-access x size / nref). The memory cache simply takes entries in the order of the eviction lists (roughly size / nref).

A consequence of the memory cache implementation is that small items tend to avoid eviction, even if they are unused for a long time. This behavior was documented in bug 397275, but the departure from the LRU-SP algorithm was not noted at the time. Indeed, this detail is hardly mentioned in the Cheng paper and is easy to miss -- I believe it was simply overlooked in the original implementation.
Created attachment 537195 [details] [diff] [review]
Patch to correct deviation from LRU-SP

I have tested this patch in some ad-hoc scenarios only -- it seems to fix the problem with small items not being evicted.
Assignee: nobody → gbrown

Comment 2

6 years ago
line 394 : it's not necessary to recalculate maxCost, as it's identical to entryCost.

I would just rewrite the inner part of the loop as :

        if ((entry != &mEvictionList[i]) &&
            (!entry->IsInUse())) {
            entryCost = (PRUint64)
                (now - entry->LastFetched()) * entry->Size() / 
                PR_MAX(1, entry->FetchCount());
            if (!maxEntry || (entryCost > maxCost)) {
                maxEntry = entry;
                maxCost = entryCost;

This function will be called a lot (for every item added to the cache), and it calculates now every time, which can be slow. Maybe you need to keep the original check for mTotalSize and mInactiveSize at the start of the function to avoid that ? And then change the while-loop into a do-while.
Created attachment 540636 [details] [diff] [review]
Patch to correct deviation from LRU-SP

Optimized as suggested - thanks Jo!
Attachment #537195 - Attachment is obsolete: true
Created attachment 540639 [details] [diff] [review]
Patch to correct deviation from LRU-SP
Attachment #540636 - Attachment is obsolete: true

Comment 5

6 years ago
geoff - who is the right person to review this?
(In reply to Doug Turner (:dougt) from comment #5)
> geoff - who is the right person to review this?

Michal or Bjarne. I have been holding off on a review request while the team sorts out how to test the effectiveness of cache algorithm changes.
Comment on attachment 540639 [details] [diff] [review]
Patch to correct deviation from LRU-SP

Let's get feedback at least, and if it looks good, +r.

Bjarne/Michal: I assigned to Michal, but either of you could take.
Attachment #540639 - Flags: feedback?(michal.novotny)
Comment on attachment 540639 [details] [diff] [review]
Patch to correct deviation from LRU-SP

Heard on dev.platform:

"what I did see though, is that without the patch in bug #661900, the cache ends up just filled with tiny ad-related content and is practically useless."

So let's bump this in priority and see if we can land this soon.
Attachment #540639 - Flags: feedback?(michal.novotny) → review?(michal.novotny)
Comment on attachment 540639 [details] [diff] [review]
Patch to correct deviation from LRU-SP

> +     // LRU-SP eviction selection: Check the head of each segment (each
> +     // eviction list, kept in LRU order) and select the maximal-cost
> +     // entry for eviction. Cost is time-since-accessed * size / nref.

Shouldn't we check the first unused entry? If the first entry is in use we won't evict any other entry from that list.
Attachment #540639 - Flags: review?(michal.novotny)
Created attachment 558560 [details] [diff] [review]
Patch to correct deviation from LRU-SP

Yes, I agree: It is better to check the first not-in-use entry in each queue, rather than skip the queue entirely if the head entry is in use. Patch updated.
Attachment #540639 - Attachment is obsolete: true
Attachment #558560 - Flags: review?(michal.novotny)
Attachment #558560 - Flags: review?(michal.novotny) → review+
Any objections to landing this now?
I assume you went through the oranges on the try build and verified they're unrelated?  If so, you're got +r, so should be ok to land.  Maybe wait until Michal is awake again to make sure...
I went through the oranges and I think that they all are unrelated to this change.
Me too.
Keywords: checkin-needed


6 years ago
Keywords: checkin-needed
Whiteboard: [inbound]
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla9
Geoff (or anyone using fennec regularly): please verify this gets rid of the "lots of small stale ad objects" issue and mark VERIFIED if so.
It looks good to me: I tested on Android, with light browsing over a period of about 8 hours. At the end of the 8 hours, my 1M memory cache had about 100 entries (average 10K/entry), with the largest entry about 80K, and many of the entries corresponding to recent activity.
This _might_ have regressed Mobile Tp4 <>, so I have temporarily backed it out: <>.

If the backout doesn't change the numbers for this test, I will reland it either today or tomorrow.
Resolution: FIXED → ---
The backout was merged to mozilla-central:

It seems to have resolved the regression.
Target Milestone: mozilla9 → ---

Comment 21

6 years ago
The possibility exists that the new algorithm tries to delete let's say 100 small files, while the old one took 1 large file to free the same number of bytes, due the different nature of the selection (see the do-while loop). That might explain the regression.
(In reply to Jo Hermans from comment #21)

Absolutely. I'll try to collect some data like number of evictions and hit rate for the Talos run with/without this patch...probably next week.
Patrick, do you have a bug open for the cache telemetry patch you wrote yesterday?

Geoff:  I recommend that you build on that patch and add eviction and overall cache hit rate to the telemetry stats, rather than doing a one-off instrumentation just for this bug (should be just as easy).  If you run into questions, ask.  We could really use those stats in general, so let's do it now.
(In reply to Jason Duell (:jduell) from comment #23)
> Patrick, do you have a bug open for the cache telemetry patch you wrote
> yesterday?

nick hurley owns that action item.
It's bug 686948. We also have plans to do telemetry for number of evictions (and dooms). I can make that the next one I work on, should be simple enough.
The regression (comment 19) is in remote-tp4m_nochrome. This test loads 21 pages, 2 times (1, 2, 3, .. 21, 1, 2, 3 .. 21). Each page comes from a test webserver, set up with copies of very-popular sites (google, yandex, baidu, etc). The time to load each page is measured and reported in logs, but the overall metric for the test is the "average median" time: The average, excluding the largest value, of the median page load times, excluding the largest value, for each page. Since there are only 2 measurements per page, and the largest measurement is discarded, the median page load time for a page corresponds to the minimum page load time for that page. In the vast majority of cases, a page's first load time is greater than it's second load time. So the tp4m time is, generally, the average of the page load times for the second cycle (when caching may be applicable).

I examined logs from try server tp4m-nochrome runs to investigate the effect of this patch on memory cache hit rate and evictions. There is some variability between test runs, but results like this are typical:

            Hits       Misses     Evictions  tp4m time
original    324        516        112        547.5
with patch  241        597        222        585.9

==> The patch to correct LRU-SP decreases the number of Necko memory cache hits, increases the number of evictions, and increases the overall test time, for tp4m-remote.
I think these results are predictable (if not predicted!) and not necessarily anything to be concerned about. The tp4m page set is approx 8 MB, while the mobile memory cache is only 1 MB. If the memory cache were strict LRU, we would expect all tp4m pages to be evicted long before being revisited, resulting in no hits. LRU-SP (either implementation) improves on LRU in this test by evicting larger entries first: Some smaller entries are kept in the cache long enough to result in hits when the second test cycle comes around. The original LRU-SP implementation can be viewed as holding on to those smaller entries more aggressively than my patch, resulting in more hits and better performance -- during this type of test. With my patch, each entry's last access time is taken into account also, so some smaller entries are evicted while larger entries that have been recently loaded are kept in the cache, resulting in fewer hits in this cyclical test. 

In "real life" / "typical" browsing, not every resource that is loaded once will be loaded again: keeping entries in the cache that have not been accessed for an extended period impairs our ability to store recently accessed content, which ought to be more relevant.

On the other hand, I am reluctant to check in this patch when our only test results are negative! Feedback welcome...
Do we have any idea how much cache reuse there is within the 8 MB?  If there's none,
then yes, regular LRU would roll over dead, and the fact that your patch has slightly less good perf than our previous LRU-SP is overcompensated-for by the fact that it avoid what comment 8 describes, i.e. the cache filled with never-evicted small files.

[A cache that gets clogged up with small obsolete items will indeed do better on a very large repeated dataset than any LRU-ish algorithm, but that's like saying my broken watch has the correct time twice a day: it's a spurious performance difference.]
(In reply to Jason Duell (:jduell) from comment #28)
> Do we have any idea how much cache reuse there is within the 8 MB?  

There is very little cache reuse: In a single cycle of the tp4m test with evictions disabled there are 434 cache requests total, and only 28 of these are duplicates (14 requests are each made 2x).
Ok, so the tp4m test doesn't seem too worrying.  

Do we have any benchmark anywhere that has a cache access pattern that's more realistic (more cache hits, and with hits skewed towards being closer to the original use)?   I don't know of any.  Did your patch do better on any of the benchmarks?  Like you said, it would be nice to have some positive test results, too.   Given how hard it is to write a good caching benchmark, I'd even be happy if you just browsed around on a mobile device for a while with/without the patch and report back on the difference in hit rate.  (Can I get you to write the telemetry patch for RAM cache hit rate? bug 687085 :)
(In reply to Jason Duell (:jduell) from comment #30)
> Do we have any benchmark anywhere that has a cache access pattern that's
> more realistic (more cache hits, and with hits skewed towards being closer
> to the original use)?   I don't know of any.  

I don't know of any either. But it is fairly easy to manipulate the tp4m test to show the benefit of this patch:
 - change the -tpcycles from 2 to 1, so that every page in the manifest is loaded just once (unless it is entered in the manifest more than once)
 - change the manifest such that some pages are loaded twice or more early in the test, then different pages are loaded twice or more later in the test

Selecting some duplicate pages at random, I got these results:

            Hits       Misses     Evictions  tp4m time
original    97         502        75         1129
with patch  116        474        115        1047

The original cache implementation does not adapt well to the changing access pattern: It makes fewer evictions than the new implementation, but keeps the wrong ones (presumably the small entries that were stored early in the test) resulting in more misses.
Nice work--I'm sold.  The tp regression is bogus.

Let's re-land--put something in your checkin msg like "TP4 REGRESSION OK: see bug" :)
Created attachment 563872 [details] [diff] [review]
Patch to correct deviation from LRU-SP

Updated patch comment only. Carrying forward r+ from michal.novotny.
Attachment #558560 - Attachment is obsolete: true
Attachment #563872 - Flags: review+
Keywords: checkin-needed
Last Resolved: 6 years ago6 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
You need to log in before you can comment on or make changes to this bug.