The default bug view has changed. See this FAQ.

Cache GradientStops instead of gfxPatterns

RESOLVED FIXED in mozilla21

Status

()

Core
Layout
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jrmuizel, Assigned: avih)

Tracking

(Blocks: 1 bug)

unspecified
mozilla21
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [snappy])

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

4 years ago
Our gfxPattern cache is currently way too specific, which causes lots of unnecessary cache misses. The only thing that we actually need to key on is the color stops.

We should change the cache to hold GradientStops when using Azure and nothing when not. We can then pass the cached GradientStops into the gfxPattern when we go to use it.

This should greatly increase our cache hit rate.
(Reporter)

Updated

4 years ago
Blocks: 835284
(Assignee)

Comment 1

4 years ago
What would the cache key be? now it's used as Lookup(aGradient, oneCellArea.Size(), flags, backendType). Should we just exclude the size?

Also, is it too early to think of implications on bug 826422 (render a tile once, then tile the raster) and more specifically bug 835284 which depends on it (also cache those tile rasters), which would probably need the cell size (or else it would need a 4k x 4k px surface to keep the same resolution as the cache)?
(Reporter)

Comment 2

4 years ago
(In reply to Avi Halachmi (:avih) from comment #1)
> What would the cache key be? now it's used as Lookup(aGradient,
> oneCellArea.Size(), flags, backendType). Should we just exclude the size?

The cache key should be, the same as the parameters to CreateGradientStops()  (It should include the stops and the extend mode) plus the backendType. Currently we hash more than just that in aGradient, we'd need to reduce the hash or add a new function to hash just that.

> Also, is it too early to think of implications on bug 826422 (render a tile
> once, then tile the raster)

This should have no impact on that bug. We'll still need to evaluate whether it makes sense to use that path on the different graphics backends.

> and more specifically bug 835284 which depends
> on it (also cache those tile rasters), which would probably need the cell
> size.

The interactions with this bug will be more complex. Personally, I'd like to avoid having to create this cache at all. I'd like to get gradient performance to the point where we don't need it. That being said, we may not get there. In that case I think we're going to need a separate cache for these things, because there cache key will need to be a lot more specific.

> (or else it would need a 4k x 4k px surface to keep the same resolution
> as the cache)?

We don't want to do this.
(Assignee)

Comment 3

4 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)
> > (or else it would need a 4k x 4k px surface to keep the same resolution
> > as the cache)?
> 
> We don't want to do this.

I thought this implication was obvious ;)
(Assignee)

Comment 4

4 years ago
Created attachment 713354 [details] [diff] [review]
Cache GradientStops instead of gfxPattern

- Cache hit-rate has gone up considerably (including tab animation and about:addons) and appears to be as good as it gets.

- I haven't yet compared performance with the old code since the change is too big for a mode-by-pref with the same binary. I'll compare soon and post the results. However, preliminary results show speedup of tab animation.

- The block of the removed |if (pattern == null)| was kept at the same indentation to avoid bigger diff from indentation only. I'll indent it properly before landing.

Thanks to Jeff for the patch sketch on which this patch is based, and his hand-holding ;)
Assignee: nobody → avihpit
Status: NEW → ASSIGNED
Attachment #713354 - Flags: review?(jmuizelaar)
(Assignee)

Comment 5

4 years ago
Nit: the case of (stopScale == 0.0) sets the offsets of the alternative stop elements as already normalized (offset 0.0), while generally the stops array holds un-normalized values which are later normalized for rawStops or gradientPattern. However, normalizing these already-normalized offsets later results in no change since after scaling by 0 the offsets remain 0.
(Assignee)

Comment 6

4 years ago
Created attachment 713390 [details] [diff] [review]
CTOR init order, fixed hash key, nit from comment #5.
Attachment #713354 - Attachment is obsolete: true
Attachment #713354 - Flags: review?(jmuizelaar)
Attachment #713390 - Flags: review?(jmuizelaar)
(Assignee)

Comment 7

4 years ago
There appears to be some regression from this patch. When loading http://console-to-chrome.appspot.com , there are no cache misses, but several seconds (5-10) are spent during the initial paint of the page at gfxContext::Fill of PaintGradient. Clean nightly doesn't seem to have this behavior.

I still don't have a reduced case. Here's the profile of such hang (the first few seconds of the profile): http://people.mozilla.com/~bgirard/cleopatra/#report=b6affcd1199140fef8173860d1ca2170800a3a3a
(Reporter)

Comment 8

4 years ago
Comment on attachment 713390 [details] [diff] [review]
CTOR init order, fixed hash key, nit from comment #5.

Review of attachment 713390 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. It would be good to get either the author/reviewer of the original cache to take a look too.

::: layout/base/nsCSSRendering.cpp
@@ +323,5 @@
>      hash = AddToHash(hash, aKey->mBackendType);
> +    hash = AddToHash(hash, aKey->mRepeating);
> +    for (uint32_t i = 0; i < aKey->mStops.Length(); i++) {
> +      hash = AddToHash(hash, aKey->mStops[i].color.ToABGR());
> +      hash = AddToHash(hash, static_cast<uint32_t>(aKey->mStops[i].offset * (1<<31)));

I think it's probably better to just hash the bit representation of the double. Really AddToHash should probably allow passing in a double, but we can do this as a follow up. In the mean time is there any reason you chose 1<<31 instead of 1<<32?

@@ +344,5 @@
> +      }
> +    }
> +    return (aKey->mBackendType == mBackendType) &&
> +           (aKey->mRepeating == mRepeating) &&
> +           sameStops;

I'd reorder this so that we're checking sameStops first as it is less likely to be equal.

@@ +2347,5 @@
> +    }
> +
> +    bool isRepeat = aGradient->mRepeating || forceRepeatToCoverTiles;
> +
> +    if(backendType == gfx::BACKEND_DIRECT2D) {

This should use ctx->IsCairo()

@@ +2350,5 @@
> +
> +    if(backendType == gfx::BACKEND_DIRECT2D) {
> +      // Offscreen gradient raster cache:
> +      // On some backends (e.g. D2D), the GradientStops object holds an offscreen surface
> +      // which is the rasterization of the gradient base line. This surface can use

s/which is the rasterization of the gradient base line/which is a lookup table used to evaluate the gradient/
Attachment #713390 - Flags: review?(jmuizelaar) → review+
Comment on attachment 713390 [details] [diff] [review]
CTOR init order, fixed hash key, nit from comment #5.

Review of attachment 713390 [details] [diff] [review]:
-----------------------------------------------------------------

This looks adequate.

::: layout/base/nsCSSRendering.cpp
@@ -322,2 @@
>      hash = AddToHash(hash, aKey->mBackendType);
> -    hash = aKey->mGradient->Hash(hash);

This is not used anymore, maybe you can remove it?

See: http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleStruct.cpp#1380

@@ +2364,5 @@
> +      GradientCacheData* cached = gGradientCache->Lookup(rawStops, isRepeat, backendType);
> +      mozilla::RefPtr<mozilla::gfx::GradientStops> gs = cached ? cached->mStops : nullptr;
> +      if (!gs) {
> +        // CreateGradientStops is expensive (possibly lazily)
> +        gs = ctx->GetDrawTarget()->CreateGradientStops(&(rawStops[0]), stops.Length(), isRepeat ? gfx::EXTEND_REPEAT : gfx::EXTEND_CLAMP);

|rawStops.Elements()| instead of |&(rawStops[0])|, maybe?
(Assignee)

Comment 10

4 years ago
Created attachment 714156 [details] [diff] [review]
v3 - Address comment #8 and comment #9, few consts, slight comments changes

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)
> Comment on attachment 713390 [details] [diff] [review]
> > +      hash = AddToHash(hash, static_cast<uint32_t>(aKey->mStops[i].offset * (1<<31)));
> 
> I think it's probably better to just hash the bit representation of the
> double. Really AddToHash should probably allow passing in a double, but we
> can do this as a follow up. In the mean time is there any reason you chose
> 1<<31 instead of 1<<32?

This is a float. I don't think there's a generic way to hash a float to uint32 which is better than the default coercion, and if there was one, I doubt it'll map the range of 0-1 to the full range, which is what we need. As for why not 1<<32: if we multiply this by 1.0 (most last stop offsets) and cast to uint32, the result will be 0, which will get the same hash key as most first stop offsets.  I chose a power of 2 (and not MAX_UINT32) since I'm guessing there could be compiler/cpu/fpu optimizations for this case and we're still left with 31 precision bits.

Still on the issue of float hash. Generally this is not a good idea as it's equivalent to float comparison without an epsilon range. Our case actually has such small error accumulated since the stop offsets could depend on the line length, which is also derived from cell area, for which we'll sometimes still want cache hits (depends on the stops units etc). However, our stops calculations are in doubles, and we hash a float of the offset, which masks any accumulated errors which should theoretically result at the same offset, and is practically equivalence to using epsilon (tab gradients isn't this case since it's a vertical gradient which is only affected by the cell height - which never changes for tabs). So we're good with the float hash afterall.

(In reply to Avi Halachmi (:avih) from comment #7)
> There appears to be some regression from this patch...

Turns out there isn't a regression. It was caused by bad profile (I had gfx.canvas.azure.backend = d2d,skia,cairo instead of direct2d,skia,cairo).


Initial performance tests of this patch on m-c and ux branches (updated today) show nice improvement on m-c, but little improvement with ux (australis). Here are my results with both branches (for complete test setup and tab-animation-repeats patch, see bug 837885 comment #3):

Average paint times results (in ms)
(on windows 7 x64, i7-3630qm, HD4000)
-------------------------------------
m-c
open: 3.5 close: 3.3

m-c cache-key
open: 2.6 close: 2.4

ux
open: 7.3 close: 5.2

ux cache-key
open: 6.9 close: 5.3


Unless there are reservation, I think we'll push this patch (sans the extra indentation which I still didn't remove).
Attachment #713390 - Attachment is obsolete: true
(Reporter)

Comment 11

4 years ago
(In reply to Avi Halachmi (:avih) from comment #10)
> 
> Still on the issue of float hash. Generally this is not a good idea as it's
> equivalent to float comparison without an epsilon range. Our case actually
> has such small error accumulated since the stop offsets could depend on the
> line length, which is also derived from cell area, for which we'll sometimes
> still want cache hits (depends on the stops units etc). However, our stops
> calculations are in doubles, and we hash a float of the offset, which masks
> any accumulated errors which should theoretically result at the same offset,
> and is practically equivalence to using epsilon (tab gradients isn't this
> case since it's a vertical gradient which is only affected by the cell
> height - which never changes for tabs). So we're good with the float hash
> afterall.

We do float comparison without an epsilon range in KeyEquals() so this won't matter. Further, doing direct float comparison here is fine because we only really care to cache gradients that are actually the same, not just close. If that's not sufficient we can always tune things afterwards.
(Assignee)

Comment 12

4 years ago
Created attachment 714422 [details] [diff] [review]
v4 - Fix hash key (again)

This uses the float bits as hash, except for the cases of 0.0 and -0.0, which both map to 0.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #11)
> We do float comparison without an epsilon range in KeyEquals() so this won't
> matter. Further, doing direct float comparison here is fine because we only
> really care to cache gradients that are actually the same, not just close.
> If that's not sufficient we can always tune things afterwards.

1. The float comparison in KeyEquals() are fine since these floats are casted from double, hence mask accumulated errors as well.

2. We actually do have offset values which are different before normalization, which should still map to identical values after normalization, but are not identical as doubles after normalization (depending on the gradient line length), but are identical as floats.

However, when I tried to use offset as doubles for the cache key and tried that with gradients which should have identical normalized stops, I noticed that while they're not always identical, they still map to a small finite number of different doubles (3-4). The testcase I used is the following gradient (with randomized width animation, or width which depends on timestamp):

> background: linear-gradient(to right, green 10px, blue 20px);
Attachment #714156 - Attachment is obsolete: true
(Assignee)

Comment 13

4 years ago
Created attachment 714501 [details] [diff] [review]
v5 - Fixed indentation

Try of v4 (identical except indentation): https://tbpl.mozilla.org/?tree=Try&rev=23dce688fac3
Attachment #714422 - Attachment is obsolete: true
(Reporter)

Comment 14

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/edeee46d8c73
(Assignee)

Comment 15

4 years ago
> Average paint times results (in ms)
> (on windows 7 x64, i7-3630qm, HD4000)
> -------------------------------------
> m-c
> open: 3.5 close: 3.3
> 
> m-c cache-key
> open: 2.6 close: 2.4
> 
> ux
> open: 7.3 close: 5.2
> 
> ux cache-key
> open: 6.9 close: 5.3

Adding some more measurements with the exact same 4 builds:

AMD E-350 (+6310 igpu) 4G ram (HW accel off by default)
-----------------------------------
m-c
open: 6.6 close: 5.3

m-c cache-key
open: 6.7 close: 5.4

m-c (forced D2D)
open: 10.2 close: 9.9

m-c cache-key (forced D2D)
open: 8.3 close: 7

ux
open: 11.9 close: 7.5

ux cache-key
open: 11.8 close: 7.6


Intel Atom N450 (igpu) 1G ram (HW accel off by default)
-----------------------------------------------
m-c
open: 8.1 close: 7.2

m-c cache-key
open: 8.7 close: 8.3

ux
open: 13.4 close: 9.6

ux cache-key
open: 12.9 close: 9.6


Important to note that frame interval averages were at 17ms or lower for all the above measurements, except for both the UX builds on the N450, where they were 18.6ms for tab open, and 17.1ms on average for tab close.

So it seems this patch brings the D2D performance much closer to the non-accelerated performance for tab animation, but possibly introduces some regression (~7%) on some non-accelerated cases (the Atom).
https://hg.mozilla.org/mozilla-central/rev/edeee46d8c73
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Updated

4 years ago
Whiteboard: [snappy]
(Reporter)

Updated

4 years ago
Blocks: 837542
(Assignee)

Comment 17

4 years ago
(In reply to Avi Halachmi (:avih) from comment #15)
> Intel Atom N450 (igpu) 1G ram (HW accel off by default)
> -----------------------------------------------
> m-c
> open: 8.1 close: 7.2
> 
> m-c cache-key
> open: 8.7 close: 8.3
> 
> ux
> open: 13.4 close: 9.6
> 
> ux cache-key
> open: 12.9 close: 9.6
>
> ... but possibly introduces some
> regression (~7%) on some non-accelerated cases (the Atom).

mconley revisited these tests and performed them again, trying to be more carefull and making sure they're consistent. These are the new results:

mc:
Overall tab-close average (over 15 animations): Interval: 17.3 Paint: 7.9
Overall tab-open average (over 15 animations): Interval: 16.1 Paint: 8.5
    
mc-ck
Overall tab-close average (over 15 animations): Interval: 17.7 Paint: 7.8
Overall tab-open average (over 15 animations): Interval: 16.6 Paint: 8.4
     
ux:
Overall tab-close average (over 15 animations): Interval: 16.3 Paint: 12.2
Overall tab-open average (over 15 animations): Interval: 21.6 Paint: 16.9
     
ux-ck:
Overall tab-close average (over 15 animations): Interval: 15.4 Paint: 9.6
Overall tab-open average (over 15 animations): Interval: 19.9 Paint: 14.5

There appears to be no regression afterall even when D2D is disabled.
(Reporter)

Comment 18

4 years ago
Comment on attachment 714501 [details] [diff] [review]
v5 - Fixed indentation

[Approval Request Comment]
Bug caused by (feature/regressing bug #): D2D
User impact if declined: Bad tab gradient performance because of a poor cache the potential for large memory usage.
Testing completed (on m-c, etc.): Has been on m-c for a while. The code should be decently tested and this is mostly just a performance fix and is less likely to have correctness problems.
Risk to taking this patch (and alternatives if risky): Should be easy to back out.
String or UUID changes made by this patch: None
Attachment #714501 - Flags: approval-mozilla-aurora?
Comment on attachment 714501 [details] [diff] [review]
v5 - Fixed indentation

low risk perf win :) Additionally this is easy to back out if needed.

I am ccing QA(juanb,ashughes) on this to keep an eye on any new regressions related to hangs or impacting performance they may come across during testing when D2d is enabled/disabled.

Please let QA know if they can help with any specific or exploratory testing/verfication here.
Attachment #714501 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.