Closed
Bug 661900
Opened 13 years ago
Closed 13 years ago
LRU-SP not fully implemented in memory cache: small entries avoid eviction
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: gbrown, Assigned: gbrown)
Details
(Whiteboard: Note: TP4M REGRESSION EXPECTED / OK)
Attachments
(1 file, 4 obsolete files)
3.72 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
The memory cache implements most features of the LRU-SP cache algorithm, but diverges in one significant way. The Cheng/Kambayashi paper states that the choice of eviction entry is determined by considering the set of entries at the heads of all eviction lists, and choosing the one with the greatest cost (cost=time-since-last-access x size / nref). The memory cache simply takes entries in the order of the eviction lists (roughly size / nref). A consequence of the memory cache implementation is that small items tend to avoid eviction, even if they are unused for a long time. This behavior was documented in bug 397275, but the departure from the LRU-SP algorithm was not noted at the time. Indeed, this detail is hardly mentioned in the Cheng paper and is easy to miss -- I believe it was simply overlooked in the original implementation.
Assignee | ||
Comment 1•13 years ago
|
||
I have tested this patch in some ad-hoc scenarios only -- it seems to fix the problem with small items not being evicted.
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → gbrown
Comment 2•13 years ago
|
||
line 394 : it's not necessary to recalculate maxCost, as it's identical to entryCost. I would just rewrite the inner part of the loop as : if ((entry != &mEvictionList[i]) && (!entry->IsInUse())) { entryCost = (PRUint64) (now - entry->LastFetched()) * entry->Size() / PR_MAX(1, entry->FetchCount()); if (!maxEntry || (entryCost > maxCost)) { maxEntry = entry; maxCost = entryCost; } This function will be called a lot (for every item added to the cache), and it calculates now every time, which can be slow. Maybe you need to keep the original check for mTotalSize and mInactiveSize at the start of the function to avoid that ? And then change the while-loop into a do-while.
Assignee | ||
Comment 3•13 years ago
|
||
Optimized as suggested - thanks Jo!
Attachment #537195 -
Attachment is obsolete: true
Assignee | ||
Comment 4•13 years ago
|
||
Attachment #540636 -
Attachment is obsolete: true
Comment 5•13 years ago
|
||
geoff - who is the right person to review this?
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to Doug Turner (:dougt) from comment #5) > geoff - who is the right person to review this? Michal or Bjarne. I have been holding off on a review request while the team sorts out how to test the effectiveness of cache algorithm changes.
Comment 7•13 years ago
|
||
Comment on attachment 540639 [details] [diff] [review] Patch to correct deviation from LRU-SP Let's get feedback at least, and if it looks good, +r. Bjarne/Michal: I assigned to Michal, but either of you could take.
Attachment #540639 -
Flags: feedback?(michal.novotny)
Comment 8•13 years ago
|
||
Comment on attachment 540639 [details] [diff] [review] Patch to correct deviation from LRU-SP Heard on dev.platform: "what I did see though, is that without the patch in bug #661900, the cache ends up just filled with tiny ad-related content and is practically useless." So let's bump this in priority and see if we can land this soon.
Attachment #540639 -
Flags: feedback?(michal.novotny) → review?(michal.novotny)
Comment 9•13 years ago
|
||
Comment on attachment 540639 [details] [diff] [review] Patch to correct deviation from LRU-SP > + // LRU-SP eviction selection: Check the head of each segment (each > + // eviction list, kept in LRU order) and select the maximal-cost > + // entry for eviction. Cost is time-since-accessed * size / nref. Shouldn't we check the first unused entry? If the first entry is in use we won't evict any other entry from that list.
Attachment #540639 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 10•13 years ago
|
||
Yes, I agree: It is better to check the first not-in-use entry in each queue, rather than skip the queue entirely if the head entry is in use. Patch updated.
Attachment #540639 -
Attachment is obsolete: true
Attachment #558560 -
Flags: review?(michal.novotny)
Updated•13 years ago
|
Attachment #558560 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Comment 11•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=cd08c8a97175
Assignee | ||
Comment 12•13 years ago
|
||
Any objections to landing this now?
Comment 13•13 years ago
|
||
I assume you went through the oranges on the try build and verified they're unrelated? If so, you're got +r, so should be ok to land. Maybe wait until Michal is awake again to make sure...
Comment 14•13 years ago
|
||
I went through the oranges and I think that they all are unrelated to this change.
Updated•13 years ago
|
Keywords: checkin-needed
Whiteboard: [inbound]
Comment 16•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/d469840a370f
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla9
Comment 17•13 years ago
|
||
Geoff (or anyone using fennec regularly): please verify this gets rid of the "lots of small stale ad objects" issue and mark VERIFIED if so.
Assignee | ||
Comment 18•13 years ago
|
||
It looks good to me: I tested on Android, with light browsing over a period of about 8 hours. At the end of the 8 hours, my 1M memory cache had about 100 entries (average 10K/entry), with the largest entry about 80K, and many of the entries corresponding to recent activity.
Status: RESOLVED → VERIFIED
Comment 19•13 years ago
|
||
This _might_ have regressed Mobile Tp4 <http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/c1767648781e4425#>, so I have temporarily backed it out: <https://hg.mozilla.org/integration/mozilla-inbound/rev/7e132eee5650>. If the backout doesn't change the numbers for this test, I will reland it either today or tomorrow.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 20•13 years ago
|
||
The backout was merged to mozilla-central: https://hg.mozilla.org/mozilla-central/rev/7e132eee5650 It seems to have resolved the regression.
Target Milestone: mozilla9 → ---
Comment 21•13 years ago
|
||
The possibility exists that the new algorithm tries to delete let's say 100 small files, while the old one took 1 large file to free the same number of bytes, due the different nature of the selection (see the do-while loop). That might explain the regression.
Assignee | ||
Comment 22•13 years ago
|
||
(In reply to Jo Hermans from comment #21) Absolutely. I'll try to collect some data like number of evictions and hit rate for the Talos run with/without this patch...probably next week.
Comment 23•13 years ago
|
||
Patrick, do you have a bug open for the cache telemetry patch you wrote yesterday? Geoff: I recommend that you build on that patch and add eviction and overall cache hit rate to the telemetry stats, rather than doing a one-off instrumentation just for this bug (should be just as easy). If you run into questions, ask. We could really use those stats in general, so let's do it now.
Comment 24•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #23) > Patrick, do you have a bug open for the cache telemetry patch you wrote > yesterday? > nick hurley owns that action item.
Comment 25•13 years ago
|
||
It's bug 686948. We also have plans to do telemetry for number of evictions (and dooms). I can make that the next one I work on, should be simple enough.
Assignee | ||
Comment 26•13 years ago
|
||
The regression (comment 19) is in remote-tp4m_nochrome. This test loads 21 pages, 2 times (1, 2, 3, .. 21, 1, 2, 3 .. 21). Each page comes from a test webserver, set up with copies of very-popular sites (google, yandex, baidu, etc). The time to load each page is measured and reported in logs, but the overall metric for the test is the "average median" time: The average, excluding the largest value, of the median page load times, excluding the largest value, for each page. Since there are only 2 measurements per page, and the largest measurement is discarded, the median page load time for a page corresponds to the minimum page load time for that page. In the vast majority of cases, a page's first load time is greater than it's second load time. So the tp4m time is, generally, the average of the page load times for the second cycle (when caching may be applicable). I examined logs from try server tp4m-nochrome runs to investigate the effect of this patch on memory cache hit rate and evictions. There is some variability between test runs, but results like this are typical: Hits Misses Evictions tp4m time original 324 516 112 547.5 with patch 241 597 222 585.9 ==> The patch to correct LRU-SP decreases the number of Necko memory cache hits, increases the number of evictions, and increases the overall test time, for tp4m-remote.
Assignee | ||
Comment 27•13 years ago
|
||
I think these results are predictable (if not predicted!) and not necessarily anything to be concerned about. The tp4m page set is approx 8 MB, while the mobile memory cache is only 1 MB. If the memory cache were strict LRU, we would expect all tp4m pages to be evicted long before being revisited, resulting in no hits. LRU-SP (either implementation) improves on LRU in this test by evicting larger entries first: Some smaller entries are kept in the cache long enough to result in hits when the second test cycle comes around. The original LRU-SP implementation can be viewed as holding on to those smaller entries more aggressively than my patch, resulting in more hits and better performance -- during this type of test. With my patch, each entry's last access time is taken into account also, so some smaller entries are evicted while larger entries that have been recently loaded are kept in the cache, resulting in fewer hits in this cyclical test. In "real life" / "typical" browsing, not every resource that is loaded once will be loaded again: keeping entries in the cache that have not been accessed for an extended period impairs our ability to store recently accessed content, which ought to be more relevant. On the other hand, I am reluctant to check in this patch when our only test results are negative! Feedback welcome...
Comment 28•13 years ago
|
||
Do we have any idea how much cache reuse there is within the 8 MB? If there's none, then yes, regular LRU would roll over dead, and the fact that your patch has slightly less good perf than our previous LRU-SP is overcompensated-for by the fact that it avoid what comment 8 describes, i.e. the cache filled with never-evicted small files. [A cache that gets clogged up with small obsolete items will indeed do better on a very large repeated dataset than any LRU-ish algorithm, but that's like saying my broken watch has the correct time twice a day: it's a spurious performance difference.]
Assignee | ||
Comment 29•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #28) > Do we have any idea how much cache reuse there is within the 8 MB? There is very little cache reuse: In a single cycle of the tp4m test with evictions disabled there are 434 cache requests total, and only 28 of these are duplicates (14 requests are each made 2x).
Comment 30•13 years ago
|
||
Ok, so the tp4m test doesn't seem too worrying. Do we have any benchmark anywhere that has a cache access pattern that's more realistic (more cache hits, and with hits skewed towards being closer to the original use)? I don't know of any. Did your patch do better on any of the benchmarks? Like you said, it would be nice to have some positive test results, too. Given how hard it is to write a good caching benchmark, I'd even be happy if you just browsed around on a mobile device for a while with/without the patch and report back on the difference in hit rate. (Can I get you to write the telemetry patch for RAM cache hit rate? bug 687085 :)
Assignee | ||
Comment 31•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #30) > Do we have any benchmark anywhere that has a cache access pattern that's > more realistic (more cache hits, and with hits skewed towards being closer > to the original use)? I don't know of any. I don't know of any either. But it is fairly easy to manipulate the tp4m test to show the benefit of this patch: - change the -tpcycles from 2 to 1, so that every page in the manifest is loaded just once (unless it is entered in the manifest more than once) - change the manifest such that some pages are loaded twice or more early in the test, then different pages are loaded twice or more later in the test Selecting some duplicate pages at random, I got these results: Hits Misses Evictions tp4m time original 97 502 75 1129 with patch 116 474 115 1047 The original cache implementation does not adapt well to the changing access pattern: It makes fewer evictions than the new implementation, but keeps the wrong ones (presumably the small entries that were stored early in the test) resulting in more misses.
Comment 32•13 years ago
|
||
Nice work--I'm sold. The tp regression is bogus. Let's re-land--put something in your checkin msg like "TP4 REGRESSION OK: see bug" :)
Assignee | ||
Comment 33•13 years ago
|
||
Updated patch comment only. Carrying forward r+ from michal.novotny.
Attachment #558560 -
Attachment is obsolete: true
Attachment #563872 -
Flags: review+
Assignee | ||
Updated•13 years ago
|
Keywords: checkin-needed
Whiteboard: Note: TP4M REGRESSION EXPECTED / OK
Comment 34•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/5bfd700622c7
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
You need to log in
before you can comment on or make changes to this bug.
Description
•