Closed Bug 1060200 Opened 10 years ago Closed 10 years ago

Store HQ scaled frames in the SurfaceCache

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
relnote-firefox --- 35+

People

(Reporter: seth, Assigned: seth)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(4 files, 5 obsolete files)

5.48 KB, patch
Details | Diff | Splinter Review
3.28 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
2.62 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
29.54 KB, patch
Details | Diff | Splinter Review
We want to support multiple scaled frames at the same time in RasterImage, because not doing so needlessly limits us and is the store of bugs that can peg Firefox at 100% CPU. We also want to move all the image data that we can in imagelib into SurfaceCache, so we can easily have global memory management policies that apply to everything in imagelib.

In this bug, we'll achieve the first goal and move closer to the second by moving HQ scaled frames into the SurfaceCache.
Comment on attachment 8481051 [details] [diff] [review]
(Part 1) - Add support for RasterImage surfaces to SurfaceCache

>diff --git a/image/src/VectorImage.cpp b/image/src/VectorImage.cpp
>--- a/image/src/VectorImage.cpp
>+++ b/image/src/VectorImage.cpp
>   DrawableFrameRef frameRef =
>     SurfaceCache::Lookup(ImageKey(this),
>-                         SurfaceKey(params.size, aSVGContext,
>-                                    animTime, aFlags));
>+                         VectorSurfaceKey(params.size,
>+                                          params.svgContext,
>+                                          params.animationTime));

Hmm... So we're dropping "flags" from the hash key here, for VectorImage. Is that safe? I think it is...?

It looks like the only flags we care about in VectorImage are FLAG_BYPASS_SURFACE_CACHE and FLAG_SYNC_DECODE, neither of which influence the rendering. So it seems like it's fine that we're ignoring the flags in making the hash key for the surface cache.

Maybe add a comment above the VectorSurfaceKey() implementation, though, as an explicit heads-up about this, saying something like:
  // NOTE: If we ever add a new FLAG_* to imgIContainer::Draw() that
  // influences VectorImage rendering, then we'll need to make this
  // function require aFlags to use as part of the hash key.

r=me with something like that
Attachment #8481051 - Flags: review?(dholbert) → review+
Blocks: 977459
Blocks: 969325
Blocks: 1050815
Blocks: 1013224
Blocks: 1006353
(In reply to Daniel Holbert [:dholbert] from comment #2)
> Hmm... So we're dropping "flags" from the hash key here, for VectorImage. Is
> that safe? I think it is...?
> 
> It looks like the only flags we care about in VectorImage are
> FLAG_BYPASS_SURFACE_CACHE and FLAG_SYNC_DECODE, neither of which influence
> the rendering. So it seems like it's fine that we're ignoring the flags in
> making the hash key for the surface cache.

Yeah, I realized when reasoning about which flags we cared about for RasterImage that we cared about *no* flags for VectorImage. =) I agree that we could use a comment there, though!

(We could probably also use a comment about the fact that we don't care about animation for RasterImage because it's currently not possible to store animated frames in the SurfaceCache.)
This version of the patch adds the comments discussed in comment 2 and comment 3.
Attachment #8481051 - Attachment is obsolete: true
This patch adds a way to release all the surfaces an imgFrame holds without destroying it. We'll use that in part 3 to allow an imgFrame that is the result of a failed scale operation to remain in the SurfaceCache, serving as a sentinel that keeps us from trying to reattempt the scale over and over, while still releasing the memory that imgFrame was using.
Attachment #8481121 - Flags: review?(tnikkel)
Attachment #8481121 - Attachment is obsolete: true
Attachment #8481121 - Flags: review?(tnikkel)
Ack, sorry. I changed my mind about part 2. I'm going to solve that problem another way.
Alright, here's the new part 2. The approach in the previous part 2 was flawed, since the first time Lookup() got called the empty imgFrame would have been removed from the cache anyway, since you couldn't create a DrawableFrameRef to it. Given that, and the fact that on reflection the current code works in exactly the same way (if scaling fails we will just retry), I've decided to just do the same thing more directly rather than increasing imgFrame's internal complexity.

This patch adds a function called SurfaceCache::RemoveIfPresent(), and I think its behavior is probably pretty obvious. =)

So it's worth asking the question: even if the current code doesn't do anything to rate limit scaling requests that keep failing, wouldn't it be better to solve that problem? I think that we'd be better off deferring that problem for a while, because once downscale-during-decode arrives (and it's coming very soon), scaling failure and decode failure will be essentially the same thing, and we already have a way of handling decode failure. (Right now, if a decode *ever* fails, we'll never try again. I think that's probably the right thing, because there doesn't seem to be any reason to think that future attempts will succeed if one attempt fails.)
Attachment #8481139 - Flags: review?(dholbert)
Here's the patch with the 'meat' of the change. We store HQ scaled frames in the SurfaceCache. ScaleRequest and ScaleResult are totally removed, which triggers some refactoring in various places that I think is universally an improvement.

RasterImage used to pay attention to three 'ScaleStatus' states. They all have equivalents in the new code.

SCALE_PENDING meant that a scale job had started but hadn't yet finished. Now, that's represented by the situation where SurfaceCache::Lookup successfully returns an imgFrame, but calling imgFrame::ImageCompleted() on that frame returns false.

SCALE_DONE meant that a scale job was done and we could draw the resulting imgFrame. Now we know that's the case if SurfaceCache::Lookup returns an imgFrame and calling imgFrame::ImageCompleted() returns true.

SCALE_INVALID meant that scaling failed and the resulting imgFrame was invalid, so we should scale again. Now we handle that case by removing the imgFrame from the SurfaceCache, so Lookup() just fails and we request a new scale if needed. (As discussed in comment 7, that's the existing behavior, and it's not worth diverging from that right now since downscale-during-decode will render any HQ-scaling-specific policy we make obsolete very soon.)
Attachment #8481140 - Flags: review?(tnikkel)
I've tested, and this patch series does indeed resolve bug 1050815 - we get all of the HQ scaled versions we request, and no 100% CPU problem. I haven't checked the other dependent bugs yet, but I'm confident that it'll resolve them too.

Here's a try job:

https://tbpl.mozilla.org/?tree=Try&rev=4875f9f2ad2a
(In reply to Seth Fowler [:seth] from comment #9)
> Here's a try job:

So green, so green...
Attachment #8481139 - Flags: review?(dholbert) → review+
Comment on attachment 8481140 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

A few nits on ScaleRunner in Part 3:

>+class ScaleRunner : public nsRunnable
> {
[...]
>+  ~ScaleRunner()
>   {

nit: The destuctor should be private or protected (since this is a refcounted object, which means we don't want just anybody to be able to call 'delete' on it).

It should also probably be marked 'virtual' (for clarity -- the parent class, nsRunnable, has a virtual destructor, so we will too).

[...]
>   NS_IMETHOD Run()
>   {
[...]
>+    if (mState == eReady) {
>+      // Collect information from the frames that we need to scale.

Nit: Run() should be marked as MOZ_OVERRIDE.
(In reply to Daniel Holbert [:dholbert] from comment #11)
> A few nits on ScaleRunner in Part 3:

Agreed; I'll make those changes. Thanks for the reviews!
Blocks: 846315
Blocks: 1065502
This version of part 3 addresses Daniel's comments.
Attachment #8488499 - Flags: review?(tnikkel)
Attachment #8481140 - Attachment is obsolete: true
Attachment #8481140 - Flags: review?(tnikkel)
Comment on attachment 8488499 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

Just so that it's clear: this patch makes us no longer call LockImage on the RasterImage while we are scaling, this is okay because we now hold a nsRefPtr to the imgFrame that we are scaling, whereas we didn't before ("before" meaning before this patch and one of your other recent patches).

Shouldn't we be purging entries in the surface cache in RasterImage::Discard for this image?
(In reply to Timothy Nikkel (:tn) from comment #15)
> Comment on attachment 8488499 [details] [diff] [review]
> (Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest
> and ScaleResult
> 
> Just so that it's clear: this patch makes us no longer call LockImage on the
> RasterImage while we are scaling, this is okay because we now hold a
> nsRefPtr to the imgFrame that we are scaling, whereas we didn't before
> ("before" meaning before this patch and one of your other recent patches).

We did that to avoid getting discarded while scaling. As you say, since we hold a strong reference to the imgFrame in ScaleRunner now, that's no longer a problem.

> Shouldn't we be purging entries in the surface cache in RasterImage::Discard
> for this image?

Yeah, we probably should. Though the surfaces get freed on a timer anyway; we don't need the DiscardTracker timer for things in the SurfaceCache, and indeed the SurfaceCache is about to replace the DiscardTracker totally. But as long as we support explicit calls to RequestDiscard() we'll want to remove SurfaceCache entries in RasterImage::Discard.

The very next bug in the series (bug 1060869) does add this feature. (And adds a significant amount of code to SurfaceCache to support image locking.)
Attachment #8488499 - Flags: review?(tnikkel) → review+
(In reply to Seth Fowler [:seth] from comment #16)
> The very next bug in the series (bug 1060869) does add this feature. (And
> adds a significant amount of code to SurfaceCache to support image locking.)

Yeah, I saw that after.
sorry had to back this changes out since i guess this caused https://tbpl.mozilla.org/php/getParsedLog.php?id=48072061&tree=Mozilla-Inbound
So I investigated the failures that led to this getting backed out, and it looks like at the time we're destroying the RasterImage object that leads to the failure, the SurfaceCache instance has already been torn down. (In other words, ShutdownModule has already called SurfaceCache::Shutdown().)

Given the much different timing of the xpcshell test environment, it's not surprising that this could happen. Given that RasterImage objects are refcounted and kept alive indirectly by all sorts of other objects, I'm actually surprised this hasn't been an issue so far!

To make this safe, we need to make SurfaceCache functions fail gracefully when the instance has been torn down. This patch does this by changing the MOZ_ASSERT(sInstance) calls in the various SurfaceCache functions into if() checks.
Attachment #8490435 - Flags: review?(dholbert)
Attachment #8488499 - Attachment is obsolete: true
(In reply to Seth Fowler [:seth] from comment #20)
> Given that RasterImage objects are
> refcounted and kept alive indirectly by all sorts of other objects,

(This should have said "Image objects" and not "RasterImage objects". This is actually the first patch that puts surfaces from RasterImage objects in the SurfaceCache.)

Also, to state it more clearly: I've renumbered things so that the old part 3 is now part 4.
Attachment #8490435 - Flags: review?(dholbert) → review+
Thanks for the quick review! Just in case there are any lingering issues, I've pushed another try job:

https://tbpl.mozilla.org/?tree=Try&rev=dfe4dfa12191
This version of the patch is rebased against bug 1067207 and makes a minor change (refusing to scale frames with padding) that I promised to make in that bug.

The failures in the try job above are actually coming from bug 1067207. I believe I've fixed that patch, so here's a fresh try job:

https://tbpl.mozilla.org/?tree=Try&rev=8e75f43e78e8
Attachment #8490436 - Attachment is obsolete: true
So I'd expect that to be happening if we were suddenly using HQ scaled frames on B2G. I've pushed a modified version of the patch that will crash if we do that:

https://tbpl.mozilla.org/?tree=Try&rev=0ac86eb833ef
OK, after seeing those results I have come to the conclusion that, although I'm not really sure why, this patch seems to have fixed those tests on B2G. Nothing wrong seems to be happening here; we're not using scaled frames inappropriately, or anything like that.

I'd feel more comfortable if I understood exactly how this patch caused those tests to start passing, but it's important to get this bug in ASAP, as it fixes some important user-visible problems. I don't think it's worth taking the time to debug any further at this point. So I went ahead and removed the 'fails-if(B2G)' annotations from those tests.

Pushed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/1a4717057623
https://hg.mozilla.org/integration/mozilla-inbound/rev/af506098d6b6
https://hg.mozilla.org/integration/mozilla-inbound/rev/17e14ef96366
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa25a75b4ad9
https://bugzilla.mozilla.org/show_bug.cgi?id=795072#c6 (rather nsfw) ain't fixed yet.
Opening the mentioned links in that order, when opening the last link the tiny image is hq-scaled but the bigger version isn't.

When loading the last link first or reloading it with Shift-F5, both versions of the image are hq-scaled.

Should I open a new bug?
(In reply to Elbart from comment #30)
> Should I open a new bug?

I couldn't reproduce this on my local build that has the patches in this bug. I'm not sure whether you're testing on Nightly or your own build, but if you're using Nightly, I suggest to wait for tomorrow's build and see if you can still reproduce the problem. If so, please file a new imagelib bug and CC me on it.

BTW, I'd *really* appreciate it if you could create a new testcase that isn't NSFW. Just change the image to a cat picture or something. It will get the bug fixed faster.
(In reply to Seth Fowler [:seth] from comment #31)
> (In reply to Elbart from comment #30)
> > Should I open a new bug?
> 
> I couldn't reproduce this on my local build that has the patches in this
> bug. I'm not sure whether you're testing on Nightly or your own build, but
> if you're using Nightly,

I used the latest inbound I could find just minutes before I sent the posting.
I will wait for the next Nightly, to be sure.
(In reply to Elbart from comment #32)
> I used the latest inbound I could find just minutes before I sent the
> posting.
> I will wait for the next Nightly, to be sure.

If you're using inbound, that should be good enough. Maybe there's something different in how I'm testing it? I'm opening each link in a new tab, looking at that tab, then opening the next one. (The new bug would be a good place to put more precise steps to reproduce; we don't need to keep talking about it here.)
Depends on: 1071666
Depends on: 1069369
No longer depends on: 1069369
Depends on: 895412
Seth, is there any chance of uplifting this to 34 or is that too risky / not an option / dependent on too many other bugs that got fixed on 35?
Flags: needinfo?(seth)
While I have no doubt Seth's code is of awesome quality and stability, uplifting something of this magnitude to what is going to be Beta in less than two weeks really scares me.
(In reply to Milan Sreckovic [:milan] from comment #35)
> While I have no doubt Seth's code is of awesome quality and stability,
> uplifting something of this magnitude to what is going to be Beta in less
> than two weeks really scares me.

Thanks for the diplomatic answer. =)

I agree with Milan; this patch it built on a whole stack of complex changes that would need to be uplifted. It was only last week I was racing against the clock to fix hard-to-reproduce crash bugs that were threatening to force a backout. Though I wish it were otherwise, I think we need to let this ride the trains normally to make absolutely sure all the bugs are squashed.
Flags: needinfo?(seth)
Release Note Request (optional, but appreciated)
[Why is this notable]: Less ressource usage for scaled images (especially very big images scaled to very small ones), see e.g. Bug 846315
[Suggested wording]: Reduced ressource usage for scaled images
[Links (documentation, blog post, etc)]:
relnote-firefox: --- → ?
ressource -> resource
Depends on: 1128467
Depends on: 1256603
Blocks: 927507
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: