Store HQ scaled frames in the SurfaceCache

RESOLVED FIXED in mozilla35

Status

()

RESOLVED FIXED
4 years ago
20 days ago

People

(Reporter: seth, Assigned: seth)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug)

unspecified
mozilla35
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(relnote-firefox 35+)

Details

Attachments

(4 attachments, 5 obsolete attachments)

5.48 KB, patch
Details | Diff | Splinter Review
3.28 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
2.62 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
29.54 KB, patch
Details | Diff | Splinter Review
(Assignee)

Description

4 years ago
We want to support multiple scaled frames at the same time in RasterImage, because not doing so needlessly limits us and is the store of bugs that can peg Firefox at 100% CPU. We also want to move all the image data that we can in imagelib into SurfaceCache, so we can easily have global memory management policies that apply to everything in imagelib.

In this bug, we'll achieve the first goal and move closer to the second by moving HQ scaled frames into the SurfaceCache.
(Assignee)

Comment 1

4 years ago
Created attachment 8481051 [details] [diff] [review]
(Part 1) - Add support for RasterImage surfaces to SurfaceCache
Attachment #8481051 - Flags: review?(dholbert)
Comment on attachment 8481051 [details] [diff] [review]
(Part 1) - Add support for RasterImage surfaces to SurfaceCache

>diff --git a/image/src/VectorImage.cpp b/image/src/VectorImage.cpp
>--- a/image/src/VectorImage.cpp
>+++ b/image/src/VectorImage.cpp
>   DrawableFrameRef frameRef =
>     SurfaceCache::Lookup(ImageKey(this),
>-                         SurfaceKey(params.size, aSVGContext,
>-                                    animTime, aFlags));
>+                         VectorSurfaceKey(params.size,
>+                                          params.svgContext,
>+                                          params.animationTime));

Hmm... So we're dropping "flags" from the hash key here, for VectorImage. Is that safe? I think it is...?

It looks like the only flags we care about in VectorImage are FLAG_BYPASS_SURFACE_CACHE and FLAG_SYNC_DECODE, neither of which influence the rendering. So it seems like it's fine that we're ignoring the flags in making the hash key for the surface cache.

Maybe add a comment above the VectorSurfaceKey() implementation, though, as an explicit heads-up about this, saying something like:
  // NOTE: If we ever add a new FLAG_* to imgIContainer::Draw() that
  // influences VectorImage rendering, then we'll need to make this
  // function require aFlags to use as part of the hash key.

r=me with something like that
Attachment #8481051 - Flags: review?(dholbert) → review+
(Assignee)

Updated

4 years ago
Blocks: 977459
(Assignee)

Updated

4 years ago
Blocks: 969325
(Assignee)

Updated

4 years ago
Blocks: 1050815
(Assignee)

Updated

4 years ago
Blocks: 1013224
(Assignee)

Updated

4 years ago
Blocks: 1006353
(Assignee)

Comment 3

4 years ago
(In reply to Daniel Holbert [:dholbert] from comment #2)
> Hmm... So we're dropping "flags" from the hash key here, for VectorImage. Is
> that safe? I think it is...?
> 
> It looks like the only flags we care about in VectorImage are
> FLAG_BYPASS_SURFACE_CACHE and FLAG_SYNC_DECODE, neither of which influence
> the rendering. So it seems like it's fine that we're ignoring the flags in
> making the hash key for the surface cache.

Yeah, I realized when reasoning about which flags we cared about for RasterImage that we cared about *no* flags for VectorImage. =) I agree that we could use a comment there, though!

(We could probably also use a comment about the fact that we don't care about animation for RasterImage because it's currently not possible to store animated frames in the SurfaceCache.)
(Assignee)

Comment 4

4 years ago
Created attachment 8481118 [details] [diff] [review]
(Part 1) - Add support for RasterImage surfaces to SurfaceCache

This version of the patch adds the comments discussed in comment 2 and comment 3.
(Assignee)

Updated

4 years ago
Attachment #8481051 - Attachment is obsolete: true
(Assignee)

Comment 5

4 years ago
Created attachment 8481121 [details] [diff] [review]
(Part 2) - Add a way to release all the surfaces an imgFrame holds

This patch adds a way to release all the surfaces an imgFrame holds without destroying it. We'll use that in part 3 to allow an imgFrame that is the result of a failed scale operation to remain in the SurfaceCache, serving as a sentinel that keeps us from trying to reattempt the scale over and over, while still releasing the memory that imgFrame was using.
Attachment #8481121 - Flags: review?(tnikkel)
(Assignee)

Updated

4 years ago
Attachment #8481121 - Attachment is obsolete: true
Attachment #8481121 - Flags: review?(tnikkel)
(Assignee)

Comment 6

4 years ago
Ack, sorry. I changed my mind about part 2. I'm going to solve that problem another way.
(Assignee)

Comment 7

4 years ago
Created attachment 8481139 [details] [diff] [review]
(Part 2) - Add SurfaceCache::RemoveIfPresent so invalid entries can be freed eagerly

Alright, here's the new part 2. The approach in the previous part 2 was flawed, since the first time Lookup() got called the empty imgFrame would have been removed from the cache anyway, since you couldn't create a DrawableFrameRef to it. Given that, and the fact that on reflection the current code works in exactly the same way (if scaling fails we will just retry), I've decided to just do the same thing more directly rather than increasing imgFrame's internal complexity.

This patch adds a function called SurfaceCache::RemoveIfPresent(), and I think its behavior is probably pretty obvious. =)

So it's worth asking the question: even if the current code doesn't do anything to rate limit scaling requests that keep failing, wouldn't it be better to solve that problem? I think that we'd be better off deferring that problem for a while, because once downscale-during-decode arrives (and it's coming very soon), scaling failure and decode failure will be essentially the same thing, and we already have a way of handling decode failure. (Right now, if a decode *ever* fails, we'll never try again. I think that's probably the right thing, because there doesn't seem to be any reason to think that future attempts will succeed if one attempt fails.)
Attachment #8481139 - Flags: review?(dholbert)
(Assignee)

Comment 8

4 years ago
Created attachment 8481140 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

Here's the patch with the 'meat' of the change. We store HQ scaled frames in the SurfaceCache. ScaleRequest and ScaleResult are totally removed, which triggers some refactoring in various places that I think is universally an improvement.

RasterImage used to pay attention to three 'ScaleStatus' states. They all have equivalents in the new code.

SCALE_PENDING meant that a scale job had started but hadn't yet finished. Now, that's represented by the situation where SurfaceCache::Lookup successfully returns an imgFrame, but calling imgFrame::ImageCompleted() on that frame returns false.

SCALE_DONE meant that a scale job was done and we could draw the resulting imgFrame. Now we know that's the case if SurfaceCache::Lookup returns an imgFrame and calling imgFrame::ImageCompleted() returns true.

SCALE_INVALID meant that scaling failed and the resulting imgFrame was invalid, so we should scale again. Now we handle that case by removing the imgFrame from the SurfaceCache, so Lookup() just fails and we request a new scale if needed. (As discussed in comment 7, that's the existing behavior, and it's not worth diverging from that right now since downscale-during-decode will render any HQ-scaling-specific policy we make obsolete very soon.)
Attachment #8481140 - Flags: review?(tnikkel)
(Assignee)

Comment 9

4 years ago
I've tested, and this patch series does indeed resolve bug 1050815 - we get all of the HQ scaled versions we request, and no 100% CPU problem. I haven't checked the other dependent bugs yet, but I'm confident that it'll resolve them too.

Here's a try job:

https://tbpl.mozilla.org/?tree=Try&rev=4875f9f2ad2a
(Assignee)

Comment 10

4 years ago
(In reply to Seth Fowler [:seth] from comment #9)
> Here's a try job:

So green, so green...
Attachment #8481139 - Flags: review?(dholbert) → review+
Comment on attachment 8481140 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

A few nits on ScaleRunner in Part 3:

>+class ScaleRunner : public nsRunnable
> {
[...]
>+  ~ScaleRunner()
>   {

nit: The destuctor should be private or protected (since this is a refcounted object, which means we don't want just anybody to be able to call 'delete' on it).

It should also probably be marked 'virtual' (for clarity -- the parent class, nsRunnable, has a virtual destructor, so we will too).

[...]
>   NS_IMETHOD Run()
>   {
[...]
>+    if (mState == eReady) {
>+      // Collect information from the frames that we need to scale.

Nit: Run() should be marked as MOZ_OVERRIDE.
(Assignee)

Comment 12

4 years ago
(In reply to Daniel Holbert [:dholbert] from comment #11)
> A few nits on ScaleRunner in Part 3:

Agreed; I'll make those changes. Thanks for the reviews!
(Assignee)

Updated

4 years ago
Blocks: 846315
(Assignee)

Updated

4 years ago
Duplicate of this bug: 795072

Updated

4 years ago
Blocks: 1065502
(Assignee)

Comment 14

4 years ago
Created attachment 8488499 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

This version of part 3 addresses Daniel's comments.
Attachment #8488499 - Flags: review?(tnikkel)
(Assignee)

Updated

4 years ago
Attachment #8481140 - Attachment is obsolete: true
Attachment #8481140 - Flags: review?(tnikkel)
Comment on attachment 8488499 [details] [diff] [review]
(Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

Just so that it's clear: this patch makes us no longer call LockImage on the RasterImage while we are scaling, this is okay because we now hold a nsRefPtr to the imgFrame that we are scaling, whereas we didn't before ("before" meaning before this patch and one of your other recent patches).

Shouldn't we be purging entries in the surface cache in RasterImage::Discard for this image?
(Assignee)

Comment 16

4 years ago
(In reply to Timothy Nikkel (:tn) from comment #15)
> Comment on attachment 8488499 [details] [diff] [review]
> (Part 3) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest
> and ScaleResult
> 
> Just so that it's clear: this patch makes us no longer call LockImage on the
> RasterImage while we are scaling, this is okay because we now hold a
> nsRefPtr to the imgFrame that we are scaling, whereas we didn't before
> ("before" meaning before this patch and one of your other recent patches).

We did that to avoid getting discarded while scaling. As you say, since we hold a strong reference to the imgFrame in ScaleRunner now, that's no longer a problem.

> Shouldn't we be purging entries in the surface cache in RasterImage::Discard
> for this image?

Yeah, we probably should. Though the surfaces get freed on a timer anyway; we don't need the DiscardTracker timer for things in the SurfaceCache, and indeed the SurfaceCache is about to replace the DiscardTracker totally. But as long as we support explicit calls to RequestDiscard() we'll want to remove SurfaceCache entries in RasterImage::Discard.

The very next bug in the series (bug 1060869) does add this feature. (And adds a significant amount of code to SurfaceCache to support image locking.)
Attachment #8488499 - Flags: review?(tnikkel) → review+
(In reply to Seth Fowler [:seth] from comment #16)
> The very next bug in the series (bug 1060869) does add this feature. (And
> adds a significant amount of code to SurfaceCache to support image locking.)

Yeah, I saw that after.
sorry had to back this changes out since i guess this caused https://tbpl.mozilla.org/php/getParsedLog.php?id=48072061&tree=Mozilla-Inbound
(Assignee)

Comment 20

4 years ago
Created attachment 8490435 [details] [diff] [review]
(Part 3) - Handle SurfaceCache function calls after shutdown

So I investigated the failures that led to this getting backed out, and it looks like at the time we're destroying the RasterImage object that leads to the failure, the SurfaceCache instance has already been torn down. (In other words, ShutdownModule has already called SurfaceCache::Shutdown().)

Given the much different timing of the xpcshell test environment, it's not surprising that this could happen. Given that RasterImage objects are refcounted and kept alive indirectly by all sorts of other objects, I'm actually surprised this hasn't been an issue so far!

To make this safe, we need to make SurfaceCache functions fail gracefully when the instance has been torn down. This patch does this by changing the MOZ_ASSERT(sInstance) calls in the various SurfaceCache functions into if() checks.
Attachment #8490435 - Flags: review?(dholbert)
(Assignee)

Comment 21

4 years ago
Created attachment 8490436 [details] [diff] [review]
(Part 4) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

(Rebased and renumbered to part 4.)
(Assignee)

Updated

4 years ago
Attachment #8488499 - Attachment is obsolete: true
(Assignee)

Comment 22

4 years ago
(In reply to Seth Fowler [:seth] from comment #20)
> Given that RasterImage objects are
> refcounted and kept alive indirectly by all sorts of other objects,

(This should have said "Image objects" and not "RasterImage objects". This is actually the first patch that puts surfaces from RasterImage objects in the SurfaceCache.)

Also, to state it more clearly: I've renumbered things so that the old part 3 is now part 4.
Attachment #8490435 - Flags: review?(dholbert) → review+
(Assignee)

Comment 23

4 years ago
Thanks for the quick review! Just in case there are any lingering issues, I've pushed another try job:

https://tbpl.mozilla.org/?tree=Try&rev=dfe4dfa12191
(Assignee)

Comment 24

4 years ago
Created attachment 8490471 [details] [diff] [review]
(Part 4) - Store HQ scaled frames in SurfaceCache and remove ScaleRequest and ScaleResult

This version of the patch is rebased against bug 1067207 and makes a minor change (refusing to scale frames with padding) that I promised to make in that bug.

The failures in the try job above are actually coming from bug 1067207. I believe I've fixed that patch, so here's a fresh try job:

https://tbpl.mozilla.org/?tree=Try&rev=8e75f43e78e8
(Assignee)

Updated

4 years ago
Attachment #8490436 - Attachment is obsolete: true
(Assignee)

Comment 27

4 years ago
So I'd expect that to be happening if we were suddenly using HQ scaled frames on B2G. I've pushed a modified version of the patch that will crash if we do that:

https://tbpl.mozilla.org/?tree=Try&rev=0ac86eb833ef
(Assignee)

Comment 28

4 years ago
OK, after seeing those results I have come to the conclusion that, although I'm not really sure why, this patch seems to have fixed those tests on B2G. Nothing wrong seems to be happening here; we're not using scaled frames inappropriately, or anything like that.

I'd feel more comfortable if I understood exactly how this patch caused those tests to start passing, but it's important to get this bug in ASAP, as it fixes some important user-visible problems. I don't think it's worth taking the time to debug any further at this point. So I went ahead and removed the 'fails-if(B2G)' annotations from those tests.

Pushed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/1a4717057623
https://hg.mozilla.org/integration/mozilla-inbound/rev/af506098d6b6
https://hg.mozilla.org/integration/mozilla-inbound/rev/17e14ef96366
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa25a75b4ad9

Comment 30

4 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=795072#c6 (rather nsfw) ain't fixed yet.
Opening the mentioned links in that order, when opening the last link the tiny image is hq-scaled but the bigger version isn't.

When loading the last link first or reloading it with Shift-F5, both versions of the image are hq-scaled.

Should I open a new bug?
(Assignee)

Comment 31

4 years ago
(In reply to Elbart from comment #30)
> Should I open a new bug?

I couldn't reproduce this on my local build that has the patches in this bug. I'm not sure whether you're testing on Nightly or your own build, but if you're using Nightly, I suggest to wait for tomorrow's build and see if you can still reproduce the problem. If so, please file a new imagelib bug and CC me on it.

BTW, I'd *really* appreciate it if you could create a new testcase that isn't NSFW. Just change the image to a cat picture or something. It will get the bug fixed faster.

Comment 32

4 years ago
(In reply to Seth Fowler [:seth] from comment #31)
> (In reply to Elbart from comment #30)
> > Should I open a new bug?
> 
> I couldn't reproduce this on my local build that has the patches in this
> bug. I'm not sure whether you're testing on Nightly or your own build, but
> if you're using Nightly,

I used the latest inbound I could find just minutes before I sent the posting.
I will wait for the next Nightly, to be sure.
(Assignee)

Comment 33

4 years ago
(In reply to Elbart from comment #32)
> I used the latest inbound I could find just minutes before I sent the
> posting.
> I will wait for the next Nightly, to be sure.

If you're using inbound, that should be good enough. Maybe there's something different in how I'm testing it? I'm opening each link in a new tab, looking at that tab, then opening the next one. (The new bug would be a good place to put more precise steps to reproduce; we don't need to keep talking about it here.)
Depends on: 1071666
Depends on: 1069369
(Assignee)

Updated

4 years ago
No longer depends on: 1069369

Updated

4 years ago
Depends on: 895412

Comment 34

4 years ago
Seth, is there any chance of uplifting this to 34 or is that too risky / not an option / dependent on too many other bugs that got fixed on 35?
Flags: needinfo?(seth)
While I have no doubt Seth's code is of awesome quality and stability, uplifting something of this magnitude to what is going to be Beta in less than two weeks really scares me.
(Assignee)

Comment 36

4 years ago
(In reply to Milan Sreckovic [:milan] from comment #35)
> While I have no doubt Seth's code is of awesome quality and stability,
> uplifting something of this magnitude to what is going to be Beta in less
> than two weeks really scares me.

Thanks for the diplomatic answer. =)

I agree with Milan; this patch it built on a whole stack of complex changes that would need to be uplifted. It was only last week I was racing against the clock to fix hard-to-reproduce crash bugs that were threatening to force a backout. Though I wish it were otherwise, I think we need to let this ride the trains normally to make absolutely sure all the bugs are squashed.
Flags: needinfo?(seth)

Comment 37

4 years ago
Release Note Request (optional, but appreciated)
[Why is this notable]: Less ressource usage for scaled images (especially very big images scaled to very small ones), see e.g. Bug 846315
[Suggested wording]: Reduced ressource usage for scaled images
[Links (documentation, blog post, etc)]:
relnote-firefox: --- → ?
ressource -> resource

Updated

4 years ago
Duplicate of this bug: 1105878
relnote-firefox: ? → 35+

Updated

4 years ago
Depends on: 1128467

Updated

3 years ago
Depends on: 1256603

Updated

20 days ago
Blocks: 927507
You need to log in before you can comment on or make changes to this bug.