Closed Bug 1122446 Opened 5 years ago Closed 5 years ago

Tune the SurfaceCache for downscale-during-decode

Categories

(Core :: ImageLib, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox37 --- fixed
firefox38 --- fixed

People

(Reporter: seth, Assigned: seth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

There's a fairly serious issue with the way the SurfaceCache works right, when paired with downscale-during-decode. I think it's probably easier to explain via an example timeline of what could happen when the user views an image document that's too large to fit on their screen at full size.

1. User navigates to http://foo.com/bar.jpg. Intrinsic size: 1000x1000.
2. We decode and display bar.jpg at 500x500 because anything larger won't fit on their screen.
3. The user clicks on the image to zoom in.
4. We decode and display bar.jpg at 1000x1000 and let the user scroll as they see fit.
5. The user clicks again to shrink the image back to 500x500. We display the 500x500 surface we've previously decoded.
6. The user leaves this window open on their desktop and goes about their day.

The problem here is that throughout all these steps, bar.jpg is locked, because it's visible. That means that *all surfaces associated with it* are locked, and cannot be discarded. Now, the user is only looking at the 500x500 surface after step 5, so they should only be using 1MB of memory for this image. Unfortunately, they previously looked at the 1000x1000 surface, so we decoded it, using an additional 4MB. Even though they zoomed back out and will only display the 500x500 version for the rest of the day, we can't discard the 1000x100 version, because the image is locked. So the user ends up using 5MB of memory to display a 1MB image. That's worse than the pre-downscale-during-decode situation, where they would've used 4MB!

With a little tuning, we can fix this, and ensure that the user only needs 1MB of memory to display that image.
So there's a pretty simple solution here. Before, when we called
SurfaceCache::LockImage(), we locked all of the image's persistent surfaces, and
when we called SurfaceCache::UnlockImage(), we unlocked all of the image's
persistent surfaces. Now, we make LockImage() *not* lock the existing surfaces.
Instead, the new behavior will be that to lock a surface, you need to either
insert it or access it (via Lookup() or LookupBestMatch()) while the image is
locked.

This enables the following strategy which you can think of as kind of like a
tracing GC, if you enjoy that kind of metaphor:

1. We decide to decode an image to a new size.

2. We unlock all of its existing surfaces.

3. We decode the image. All of the surfaces we generate during decoding will be
   inserted into the SurfaceCache, so if the image is locked, those new surfaces
   will be too.

4. When we're done decoding, we send out an invalidation rect. (We actually
   already do this!) This causes all of the visible instances of the image to
   redraw themselves.

5. Surfaces which are used by some visible instance of the image will get
   accessed, so if the image is locked, they will be too.

6. Other surfaces stay unlocked and will eventually expire if something doesn't
   access them soon.

That's exactly what this patch implements.

One more minor point: the logic is actually a bit subtle to unlock the existing
surfaces by calling UnlockImage() followed by LockImage(); if the image is
already unlocked, UnlockImage() does nothing and then LockImage() *locks it*,
which we definitely don't want. Calling both functions also requires taking
SurfaceCache's lock twice, which is unnecessary. To avoid both of those issues,
I've added a new function, SurfaceCache::UnlockSurfaces(), which does the right
thing no matter what.

BTW, I've fixed two minor mistakes in comments in SurfaceCache.h, even though
they have nothing to do with this change. They didn't seem to warrant their own
patch.
Attachment #8550195 - Flags: review?(dholbert)
BTW, I think it's worth noting that I'd eventually like to change the terminology in SurfaceCache to describe this behavior better. I'm thinking of replacing the Lifetime concept ("Transient" and "Persistent" surfaces) with "Active" and "Inactive" surfaces, similar to what we use in the layer system.

I considered making the switch as part of this patch but I think it's better to wait until we're totally switched over to downscale-during-decode, which will remove the need for transient surfaces totally (they're only used for HQ scaling, which downscale-during-decode will replace) and let me keep the new terminology conceptually cleaner.
Depends on: 1119774
Version: unspecified → Trunk
Comment on attachment 8550195 [details] [diff] [review]
Tune SurfaceCache for downscale-during-decode

Review of attachment 8550195 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments below addressed as you see fit.


> Bug 1122446 - Tune SurfaceCache for downscale-during-decode. r=dholbert

This commit message is a bit vague, and doesn't really get at what's changing here. Maybe replace with something like this:
 "Give cached surfaces a chance to expire, when starting a downscale-during-decode"?

::: image/src/SurfaceCache.cpp
@@ +624,5 @@
> +  void UnlockSurfaces(const ImageKey aImageKey)
> +  {
> +    nsRefPtr<ImageSurfaceCache> cache = GetImageCache(aImageKey);
> +    if (!cache || !cache->IsLocked()) {
> +      return;  // Already unlocked.

So here, in UnlockSurfaces, we're returning early "if (!cache || !cache->IsLocked())".

But the (existing) UnlockImage() impl, right above this, only checks "!cache", in its version of this early-return. It doesn't check  if the cache is already unlocked.  Should we add that check? (to UnlockImage)

(It's not clear to me why these methods would differ on that point.  I suspect it doesn't affect actual functionality -- it just would save us from doing some redundant work. It seems like it's worth making these methods as similar as possible, aside from the one intentional difference -- the absence of a "cache->SetLocked(false)" call.)

::: image/src/SurfaceCache.h
@@ +225,5 @@
>     * Each surface in the cache has a lifetime, either Transient or Persistent.
>     * Transient surfaces can expire from the cache at any time. Persistent
>     * surfaces can ordinarily also expire from the cache at any time, but if the
> +   * image they're associated with is locked when they are inserted, and never
> +   * becomes unlocked, then these surfaces will never expire. This means that

This "if the image...never becomes unlocked" feels like an edge-casey (and potentially buggy) scenario, & hence seems a bit odd to be the main scenario that we explain here.

I think you might really want to say something more like the following (focusing on the common case where the image *is* eventually unlocked):
   * [...] if the
   * image they're associated with is locked when they are inserted, then
   * these surfaces won't expire until after that image is unlocked. This
   * means that [...]

(This is the documentation for Insert(), btw, if it's not clear from the splinter context.)

@@ +282,5 @@
>    /**
> +   * Locks an image. Any of the image's persistent surfaces which are either
> +   * inserted or accessed while the image is locked will not expire.
> +   *
> +   * Locking an image does not automatically lock that image's surfaces. A call

s/surfaces/existing surfaces/, in that last line there.

Otherwise this is a bit contradictory, because locking an image *does* automatically lock the image's *future* surfaces, as the previous sentence mentioned.  It's just the pre-existing ones that don't get locked.

@@ +285,5 @@
> +   *
> +   * Locking an image does not automatically lock that image's surfaces. A call
> +   * to LockImage() guarantees that persistent surfaces which are inserted
> +   * before the next call to UnlockImage() or UnlockSurfaces() for that image
> +   * will not expire. Images that accessed via Lookup() or LookupBestMatch()

I think you need s/will not expire/will not expire until $X./   (I think $X is "Unlock[Image|Surfaces]() is called", but I'm not sure.)

Otherwise, this seems to be saying that these surfaces just never ever expire.  ("guarantees that persistent surfaces...will not expire.")

Also, the final last bit of this quoted chunk is missing the word "are" -- s/that accessed/that are accessed/

@@ +287,5 @@
> +   * to LockImage() guarantees that persistent surfaces which are inserted
> +   * before the next call to UnlockImage() or UnlockSurfaces() for that image
> +   * will not expire. Images that accessed via Lookup() or LookupBestMatch()
> +   * between a LockImage() call and the matching UnlockImage() or
> +   * UnlockSurfaces() call will also not expire. Any other surfaces owned by the

As above, this "will also not expire" probably needs an "until..." clarification.  (Though maybe this one will be clear from context, if you just clarify the earlier "will not expire" statement.)
Attachment #8550195 - Flags: review?(dholbert) → review+
Thanks for the review! All good points; I'll fix those problems.

(In reply to Daniel Holbert [:dholbert] from comment #3)
> This "if the image...never becomes unlocked" feels like an edge-casey (and
> potentially buggy) scenario, & hence seems a bit odd to be the main scenario
> that we explain here.

I think this hits on a good point: a disproportionate amount of documentation space is devoted to surfaces that can't be rematerialized, which is a *very* small category of surfaces that basically boils down to "chrome images". We should be able to get rid of any support for that once I do the ImageLib cache rewrite, which will allow us to pin things in the Necko cache and reload them if we need them again. That will let us discard source data for *all* images and we won't need to do anything special or clever for these chrome images.

Combined with a complete transition to downscale-during-decode, which will let us get rid of the lifetime concept (since only HQ scaled surfaces are transient), I think we're going to see SurfaceCache start to get a lot simpler in the next six months.

Hopefully this patch marks the apex (nadir?) of its complexity! =)
This should address all the review comments.

I ended up using the commit message Daniel recommended in comment 3; it sums up the patch pretty well.
Attachment #8550195 - Attachment is obsolete: true
Comment on attachment 8550670 [details] [diff] [review]
Give cached surfaces a chance to expire, when starting a downscale-during-decode

Looks good! Noticed one minor comment typo:

>+++ b/image/src/SurfaceCache.h
>   /**
>    * Insert a surface into the cache. If a surface with the same ImageKey and
[...]
>+   * called or before the surface is touched again by Lookup(). Second, they
>+   * image they are associated with must never be unlocked.

s/Second, they/Second, the/
Attachment #8550670 - Flags: review+
Attachment #8550670 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/869afff72f63
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment on attachment 8550764 [details] [diff] [review]
Give cached surfaces a chance to expire, when starting a downscale-during-decode

Approval Request Comment
[Feature/regressing bug #]: Pulling in downscale-during-decode refactoring to ensure that patch compatibility between 37 and 38+.
[User impact if declined]: Already approved.
[Describe test coverage new/current, TreeHerder]: In 38 for a long time.
[Risks and why]: Low risk; in 38 for a long time.
[String/UUID change made/needed]: None.
Attachment #8550764 - Flags: approval-mozilla-aurora?
Comment on attachment 8550764 [details] [diff] [review]
Give cached surfaces a chance to expire, when starting a downscale-during-decode

Aurora approval previously granted for a collection of ImageLib related bugs that Seth worked on and QE verified before uplift.
Attachment #8550764 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.