Closed Bug 496593 Opened 15 years ago Closed 15 years ago

Image cache entry comparison backwards (evicts the entries we want to keep)

Categories

(Core :: Graphics: ImageLib, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: bzbarsky, Assigned: joe)

References

Details

(Whiteboard: [3.5RC2?])

Attachments

(1 file, 1 obsolete file)

See 457809 comment 114.  We need to fix this ASAP.  I wouldn't block on this, or even take it for 1.9.1.0 given the bake time we have available, but we should consider taking for 1.9.1.x.

And it would be good to add tests that verify the behavior here.
Flags: wanted1.9.1?
Flags: blocking1.9.2?
Flags: wanted1.9.1? → wanted1.9.1+
Attached patch JESUS CHRIST (obsolete) — Splinter Review
Ok, I tested this when I wrote the cache. Seriously.

Anyways, I *also* tested this change, and went over it in detail with shaver and jrmuizel. This is correct, and way, way clearer.

I am thinking about how to correctly test it.
Attachment #381862 - Flags: superreview?(bzbarsky)
Attachment #381862 - Flags: review?(vladimir)
Attachment #381862 - Flags: approval1.9.1?
Pretend the comment says the following:

// Returns true if we should prefer evicting cache entry |two| over cache         
// entry |one|.

Because I've convinced myself that that's the case.
Comment on attachment 381862 [details] [diff] [review]
JESUS CHRIST

It looks fine to me (and wow was the previous code hard to understand), as long as returning one < two to mean that "one is either bigger than two or was NOT touched more recently than two" is the desired semantic.
Comment on attachment 381862 [details] [diff] [review]
JESUS CHRIST

Please do fix the comment as indicated.

This code doesn't do what we want it to do, if I understood the goal correctly.  As written, this function will prefer to evict |two| if oneweight is small, right?  But oneweight's partial derivatives with respect to both touched time and size are positive.  So oneweight is smaller when the image is smaller and older.

Hence this function will prefer to evict two if one is smaller and older.  Or in other words it'll evict the bigger and newer images.

Note that in the old code the partial derivatives were correctly of opposite signs, and it was just the < 0 comparison that was wrong.

If we are in fact trying to evict large old images (which I think we d), and if a large total weight corresponds to being evicted (which it definitely does) then the partial derivative of weight wrt size needs to be positive, while that wrt last modification time needs to be negative.  So we either need to make sCacheTimeWeight negative (and set sizeweight to 1.0 + sCacheTimeWeight), or put a minus sign in front of the time component of total weight.  I think the latter would be clearer, myself.
Attachment #381862 - Flags: superreview?(bzbarsky)
Attachment #381862 - Flags: superreview-
Attachment #381862 - Flags: review?(vladimir)
Attachment #381862 - Flags: review+
> (which I think we d)

"do", not "d".  ;)
Testing ideas, which assume that you can change the cache size from script:

Test 1:  Have three images each of size N.  Set cache size to 1.5N.  Do new Image() and set its src to image1.  Wait for its onload to fire.  Do new Image() and set its src to image2.  Wait for its onload to fire.  Drop the ref to image1.  GC.  Drop the ref to image 2.  GC.  Do new Image() and set its src to image3.  Wait for its onload to fire.  Set up an HTTP on-modify-request observer.  Do new Image() and set its src to image1.  Wait for its onload to fire.  Assert that we saw the on-modify-request for image1 (because it was older than image2, and should have been evicted from the cache when image3 loaded, if I read the code right).

Test 2: As test 1, but point that last image to image2, not image1.  Assert thatwe do not see the on-modify-request callback.

Similar tests can be devised for the size thing, especially if you make sure the size disparity is big enough.  Given that we weight size and time equally, and that time is going to be almost equal for any given pair of images (since it's just a PRTime, right?), a factor of 2 difference is size should be plenty.

Which makes me think we should be weighting size less and time more, possibly, but that can be a separate bug.
(In reply to comment #4)
> (From update of attachment 381862 [details] [diff] [review])
> If we are in fact trying to evict large old images (which I think we d), and if
> a large total weight corresponds to being evicted (which it definitely does)
> then the partial derivative of weight wrt size needs to be positive, while that
> wrt last modification time needs to be negative.  So we either need to make
> sCacheTimeWeight negative (and set sizeweight to 1.0 + sCacheTimeWeight), or
> put a minus sign in front of the time component of total weight.  I think the
> latter would be clearer, myself.

RIGHT. I knew there was a reason the previous code calculated timeweight "backwards" with regard to sizeweight.

(In reply to comment #6)
> Which makes me think we should be weighting size less and time more, possibly,
> but that can be a separate bug.

I have a blog post in my todo list that details the method I used (simulating the caches) to determine that this weighting was best. (Mostly a flickr browse, which maybe weights size too highly, though.)
Attached patch omg, etc, part 2Splinter Review
This patch also adds some explanatory comments, because omg, was this code ever opaque.
Attachment #381862 - Attachment is obsolete: true
Attachment #382167 - Flags: superreview?(bzbarsky)
Attachment #382167 - Flags: review?(vladimir)
Attachment #381862 - Flags: review?(vladimir)
Attachment #381862 - Flags: approval1.9.1?
Attachment #382167 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 382167 [details] [diff] [review]
omg, etc, part 2

This code made my head hurt, but slightly less so than previous versions.
fwiw, try server claims a slight Tp rise on Linux/Mac; a wash or a bit of a win on Windows depending on the machine....  I'm not sure how much to trust it.
With small deltas like that, I think we're safe to land on 1.9.2 and see what Talos on our real perf machines says.
Attachment #382167 - Flags: approval1.9.1?
(In reply to comment #10)
> fwiw, try server claims a slight Tp rise on Linux/Mac; a wash or a bit of a win
> on Windows depending on the machine....  I'm not sure how much to trust it.

Keep in mind that talos cycles 10 times over the pageset, not clearing caches between each cycle. Keeping old and large images in this setting may not be too different from keeping new and small images. However, this fix should improve a normal user-experience.
Just for the heck of it I decided to measure how much time is spent in the cleanup-loop in imgLoader::CheckCacheLimits(). The first observation is that this method is called a large number of times, typically removing one image from the cache in each call. This is obviously because the cleanup-loop essentially iterates until current-cache-size < max-cache-size, and removing just one image will often make this expression true.

I then decided to let the cleanup-loop iterate until current-cache-size is below 75% of max-cache-size. Suddenly, average time spent in imgLoader::CheckCacheLimits() per loaded page dropped from 1.4ms to 0.4ms. Funny enough, this did not have any effect on overall talos performance (no improvement, no degrading). I suspect this is due to a similar effect as mentioned in comment above : Since talos cycle over all pages several times, we are guaranteed to benefit from keeping old images around as long as possible.

However, for normal browsing it might make sense to clean up the image-cache a bit more aggressively. In fact, if we want to be really fancy, we could let imgLoader::CheckCacheLimits() evict entries from cache until current-cache-size is below some limit, then post an event to let a separate (low-priority?) thread carefully evict entries until current-cache-size is below some lower limit (yes, I know it would require thread-safety in certain parts of imglib).
http://hg.mozilla.org/mozilla-central/rev/de5bca83e682
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Could you write something about risk/reward here, in case we do an RC2.
Depends on: 498692
OS: Mac OS X → All
Hardware: x86 → All
Flags: wanted1.9.1.x?
Flags: blocking1.9.1.1?
Any word on the Tp regression? I'm hesitant to block 1.9.1.1 on this bug because of it.
Flags: wanted1.9.1.x? → wanted1.9.1.x+
There were no clear conclusions when I re-reversed it and tried it on the Try server, no.
blocking1.9.1: --- → needed
Flags: blocking1.9.1.1? → blocking1.9.1.1-
Attachment #382167 - Flags: approval1.9.1? → approval1.9.1.1?
Comment on attachment 382167 [details] [diff] [review]
omg, etc, part 2

After talking with Joe, I don't think we really need to take this on the branch. He said that user benefit would only be felt in a pathalogical case, and that if we wanted to minimize risk, we should just wait until 3.6.

If any of the past reviewers feel like this is a wrong decision, please renominate for approval!
Attachment #382167 - Flags: approval1.9.1.1? → approval1.9.1.1-
blocking1.9.1: needed → ---
Flags: wanted1.9.1.x+
Why only in pathalogical cases? It seems like caching of images is completely disabled by this. I would have thought that caching is pretty important?
I mean that users would only notice in pathological cases. Image caching is reasonably important, but we never throw away images that are currently in use, so the common use case - opening lots of tabs to a single site - won't see any improvement from this patch. It's only once you've navigated away from a site, then go back to it, that you see any benefit from this.
I don't know why we think that many-tabs-from-same-site is the common case (IIRC average tab count is < 5, so...).  Image caching matters a lot in page-load cases for real users, from everything I've heard, which is probably why bz was so earnest in comment 0.

We need tests for this, too.  Would have caught the problem before first checkin, probably! :)
Flags: in-testsuite?
Hm, I'd have to run a test to see if loading a page from the same site in the same tab has any guarantees about whether those images stay in the cache.

That depends a lot on how content tears itself down and builds itself up; if there is a new reference to the image before the old one is released, then it's not possible for the image caching behaviour to have an impact on this. (I'm also not sure whether bfcache holds on to images.) Fodder for an image-caching test suite.
bfcache holds on to images.
What about going from page to page on the same site using a single tab? Then the previous page will not be pinning the images in the cache. Or going to another site and then clicking the back button?

On the flip side, how big are the risks here?
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.