Closed Bug 1624468 Opened 5 years ago Closed 5 years ago

Add a fast path for more gradient types in WR

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gw, Assigned: bpeers)

References

(Blocks 1 open bug)

Details

Attachments

(9 files)

Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-26 1_13_44 p.m.).html.zip 5 years ago Bert Peers [:bpeers] 265.56 KB, application/x-zip-compressed		Details
Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-26 3_34_41 p.m.).html.zip 5 years ago Bert Peers [:bpeers] 510.51 KB, application/x-zip-compressed		Details
stop_hack_bugzilla.png 5 years ago Bert Peers [:bpeers] 431.44 KB, image/png		Details
Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-31 5_19_38 p.m.).html.zip 5 years ago Bert Peers [:bpeers] 1.73 MB, application/x-zip-compressed		Details
Bug 1624468 - Add a fast path for more gradient types in WR 5 years ago Bert Peers [:bpeers] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1624468 - Add a fast path for more gradient types in WR 5 years ago Bert Peers [:bpeers] 47 bytes, text/x-phabricator-request		Details \| Review
timings.svg 5 years ago Bert Peers [:bpeers] 88.55 KB, image/svg+xml		Details
GTX1050 5 years ago Bert Peers [:bpeers] 87.83 KB, image/svg+xml		Details
gradient_documentation_v1.html 5 years ago Bert Peers [:bpeers] 205.07 KB, text/html		Details

Glenn Watson [:gw]

Reporter

Description

•

5 years ago

We have three gradient types supported in WR - linear, radial and conic. These shaders can be quite expensive to draw into the main scene, and also break batching due to requiring a special shader.

We have a fast path for simple linear gradients, which:

Draws the gradient with a custom gradient render task into a texture cache entry that is persisted across frames.
Uses the brush_image shader to draw this cached gradient into the main scene instead of running the full gradient shader.

By selecting this path only for simple gradients (horizontal/vertical, no hard stops, etc) we can draw the cache entry with a very small size, and rely on stretching / bilinear filtering of the brush_image shader when drawing the gradient image into the main scene.

There's two goals of this work (we might want to create separate bugs for each part?):

Support this fast caching path for radial and conic gradients where possible. This should be relatively simple, by either duplicating or abstracting the current linear gradient fast path into shared code.
Consider expanding support in this fast path for more complex gradients (e.g. hard stops, angle start/stop line). It's an open question here how (or if) it's possible to support some of these cases - we can discuss this in detail on matrix/zoom.

The ultimate outcome would be that all gradients get drawn into the main scene via brush_image (with all the gradients drawn into the persistent render task cache), although this probably isn't feasible in all cases.

Glenn Watson [:gw]

Reporter

Comment 1

•

5 years ago

Bert, might be of interest to you to work on?

Flags: needinfo?(bpeers)

Glenn Watson [:gw]

Reporter

Comment 2

•

5 years ago

I should add - some of the assumptions related to how gradients are currently implemented may no longer be true, so don't feel constrained by the current implementation details, it might be worth reconsidering these.

For example, a current constraint is that the gradient should should be as fast as possible, since it's often used on full-screen background gradients. However, if the gradient shader is instead only writing to a small number of render task cached pixels, that are persisted across frames, it might be reasonable to have a more complex shader implementation that allows it to handle more complex gradient types, since drawing into the main scene becomes a straightforward image shader.

Jamie Nicol [:jnicol]

Updated

•

5 years ago

Blocks: wr-perf

Priority: -- → P3

Bert Peers [:bpeers]

Assignee

Updated

•

5 years ago

Assignee: nobody → bpeers

Flags: needinfo?(bpeers)

Bert Peers [:bpeers]

Assignee

Comment 3

•

5 years ago

•

Edited

Attached file Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-26 1_13_44 p.m.).html.zip — Details

Thanks for the overview Glenn.
My first reaction when looking at the code was a bit of surprise :) ... as the gradient shader that we cache is not very complex or heavy on ALU; so I wouldn't expect it to be ALU bound and thus beaten by sampling from a 512x16 cached colorstrip.

Thus the first thing I've done is set up some benchmarks with 4 large gradients: two cacheable and two that aren't (once due to a slight angle, once due to hardstops). I checked in RenderDoc that the caching kicks in as expected.

The result seems to be that there's not an awful lot between cache-vs-no-cache, not enough to distinguish it from normal noise inside a run (the attached HTML has some charts, the error line on each bar is min-max) or between runs (the five groups of bars).

In fact, if anything, the non-cached version seems to be consistently faster on one of the tests for me :/

Disclaimer though: I am on a Quadro P400, which has extremely slow memory bandwidth of 32GB/sec (!). A single 1:1 blit of a 3000x1500xRGBA8 takes over 1ms. It's possible that sampling is actually worse than calculating the gradient (even though the ALU also needs to get its data from a LUT, so, I don't know -- perhaps raw loads come through faster than a bottlenecked texture sampler?). I'll see if I can test on my laptop as well.

I guess for more complex gradients the balance could shift a bit; but even then, we're ad-hoc reimplementing a caching mechanism. You may be right that investing that time in making picture caching more powerful, and/or unifying some of these gradient types to have fewer batch breaks, could pay off more.

Anyway let me know if there's a flaw in my thinking or benchmarking here :) I'll try to grab a few more data points. We can chat a bit next week. Cheers!

Bert Peers [:bpeers]

Assignee

Comment 4

•

5 years ago

Attached file Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-26 3_34_41 p.m.).html.zip — Details

Ok, good thing I checked a few more GPUs: seems I was too pessimistic, and there is definitely a win from the caching. My confidence is still not a 100%, because I would expect both cached tests to perform very similarly -- once the gradient is baked into the cache, it wouldn't really matter anymore what it looks like when blitting it? So I'm not sure why linear-smooth would be ~10% slower than aligned-gradient (source yaml is in the previous HTML file I attached), hmm...

The point about extending the gradient cache versus improving picture-cache/batching is still valid, let's discuss :)

Bert Peers [:bpeers]

Assignee

Comment 5

•

5 years ago

•

Edited

Attached image stop_hack_bugzilla.png — Details

I've experimented a bit with removing the limitation of 4 stops: if there's more, render multiple cached gradients and emit multiple instances to stitch them together. The single cache_handle becomes a Vec of { RenderTaskHandle, prim rect } as emitted by prepare_interned_prim_for_render. It would then also open the door for hard stops -- each hard stop just starts a new segment.

It's just a hack for now, getting my bearings, but it does work, see the attached image. I can tidy it up a bit more next week, add more test cases, support hard stops, etc.

Other than that, I'm still a bit wary of turning this into a bad re-implementation of picture caching :)
Radial/conic gradients seem like they would need 2D textures; perhaps more than one due to hard stops, with alpha testing. Upscaling wouldn't be lossless like it is with the aligned linear gradients.

Misaligned linear gradients have a few options: 2D bake; rotating UVs in the image shader; rotating the vertices and clipping;... Anything other than the bake might cancel out the performance advantage and/or introduce more batching breakage. Maybe there is some more advanced blitter hiding that we can use.

Bert Peers [:bpeers]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1624468 - Add a fast path for more gradient types in WR (2020-03-31 5_19_38 p.m.).html.zip — Details

Cleaned up the hacks
Added hard stops support
Add a bunch of reftests - zooming, clamping, hard stops, clip shape, etc.
Patch forthcoming :)

Bert Peers [:bpeers]

Assignee

Comment 7

•

5 years ago

Attached file Bug 1624468 - Add a fast path for more gradient types in WR — Details

Pulsebot

Comment 8

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3c84f9fd1f93 Add a fast path for more gradient types in WR r=gw

Bert Peers [:bpeers]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Andrei Ciure[:aciure]

Comment 9

•

5 years ago

•

Edited

Backed out changeset 3c84f9fd1f93 (bug 1624468) for causing css-gradients reftest failures
https://hg.mozilla.org/integration/autoland/rev/4d16948c70a718218e6d877768eec27b5e75568d

push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=ecb0f5f13cd361a46f920ad5441ece7473add713&failure_classification_id=2

Flags: needinfo?(bpeers)

Bert Peers [:bpeers]

Assignee

Comment 10

•

5 years ago

•

Edited

Hi Andrei, thanks for catching the regression.
I've checked the failures visually, and they are expected and acceptable -- minor differences in rendering due to pre-caching images.
I've updated the fuzzy values for the 3 failing tests and started a new try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f15d0f2dbfbb3daed0c06564d7163266c7bf95a6
I'll try to re-land once that goes green. Thanks.

Edit: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc315e6d871e11bc53c8c10db8984edceff3beb1

Flags: needinfo?(bpeers)

Bert Peers [:bpeers]

Assignee

Comment 11

•

5 years ago

Hi Glenn, I'm curious on your thoughts for best bang-for-buck next steps?

For caching conic/radial gradients, we would lose quality from upsampling, more than from linear gradients. In particular the center might lose its sharpness, I'm not sure if that's acceptable.
Hard stops would be disallowed; adding them would mean multiple 2D images overlapping, with alpha cutouts.
So both are doable but is it a good idea vs. just relying on picture caching?

Likewise for linear gradients, removing the horizontal-vertical-only restriction would require baking a 2D texture, I think.
We could also rotate UVs in the image brush, but, it doesn't do so right now, and there'd be clamping overhead since the gradient is atlassed. That would seem a batch breaking change, and at some point that overhead must approach the original ALU shader, which isn't all that complex.

So I'm wondering if you had some ideas when you first wrote up this task. Thanks!

Flags: needinfo?(gwatson)

Pulsebot

Comment 12

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9dc84057c6a9 Add a fast path for more gradient types in WR r=gw

Andrei Ciure[:aciure]

Comment 13

•

5 years ago

•

Edited

Backed out changeset 9dc84057c6a9 (bug 1624468) for causing gradient-move-stops.html to fail
https://hg.mozilla.org/integration/autoland/rev/4e874b1ed303fa1ca1960d3caed4cced4288c49d

push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=linux%2C18.04%2Cx64%2Cquantumrender%2Cdebug%2Cweb%2Cplatform%2Ctests%2Ctest-linux1804-64-qr%2Fdebug-web-platform-tests-reftest-e10s-1%2Cw%28wr1%29&fromchange=9dc84057c6a93793865a92a15e54a9231c0f9b03&tochange=4e874b1ed303fa1ca1960d3caed4cced4288c49d&selectedJob=295875161

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=295875161&repo=autoland&lineNumber=14737

[task 2020-04-02T02:43:15.015Z] 02:43:15 [task 2020-04-02T02:43:15.027Z] 02:43:15 [task 2020-04-02T02:43:15.762Z] 02:43:15 [task 2020-04-02T02:43:15.879Z] 02:43:15 [task 2020-04-02T02:43:15.888Z] 02:43:15 [task 2020-04-02T02:43:15.920Z] 02:43:15 [task 2020-04-02T02:43:16.123Z] 02:43:16 [task 2020-04-02T02:43:16.483Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.521Z] 02:43:16 [task 2020-04-02T02:43:16.522Z] 02:43:16 [task 2020-04-02T02:43:16.522Z] 02:43:16 [task 2020-04-02T02:43:17.157Z] 02:43:17 [task 2020-04-02T02:43:17.259Z] 02:43:17 [task 2020-04-02T02:43:17.267Z] 02:43:17 [task 2020-04-02T02:43:17.267Z] 02:43:17 [task 2020-04-02T02:43:17.409Z] 02:43:17 [task 2020-04-02T02:43:17.439Z] 02:43:17 INFO - TEST-START | /css/css-images/gradient-move-stops.html
INFO - PID 13842 | 1585795395018 Marionette INFO Testing http://web-platform.test:8000/css/css-images/gradient-move-stops.html == http://web-platform.test:8000/css/css-images/gradient-move-stops-ref.html
INFO - PID 13842 | 1585795395754 Marionette INFO No differences allowed
INFO - TEST-UNEXPECTED-PASS | /css/css-images/gradient-move-stops.html | Testing http://web-platform.test:8000/css/css-images/gradient-move-stops.html == http://web-platform.test:8000/css/css-images/gradient-move-stops-ref.html
INFO - TEST-INFO expected FAIL | took 846ms
INFO - PID 13842 | [GPU 13905, Compositor] WARNING: Possibly dropping task posted to updater thread: file /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZUpdater.cpp, line 371
INFO - PID 13842 | 1585795396115 Marionette INFO Stopped listening on port 43590
INFO - PID 13842 | [Child 14190, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp, line 3352
INFO - PID 13842 | nsStringStats
INFO - PID 13842 | => mAllocCount: 13510
INFO - PID 13842 | => mReallocCount: 0
INFO - PID 13842 | => mFreeCount: 13510
INFO - PID 13842 | => mShareCount: 18304
INFO - PID 13842 | => mAdoptCount: 350
INFO - PID 13842 | => mAdoptFreeCount: 426
INFO - PID 13842 | => Process ID: 14190, Thread ID: 140365280118656
INFO - PID 13842 | [GPU 13905, Compositor] WARNING: Possibly dropping task posted to updater thread: file /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZUpdater.cpp, line 371
INFO - PID 13842 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
INFO - PID 13842 | [2020-04-02T02:43:17Z WARN xulstore::persist] tried to remove key that isn't in the store
INFO - PID 13842 | [2020-04-02T02:43:17Z WARN xulstore::persist] tried to remove key that isn't in the store
INFO - PID 13842 | [Child 14027, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp, line 3352
INFO - PID 13842 | nsStringStats

Flags: needinfo?(bpeers)

Bert Peers [:bpeers]

Assignee

Comment 14

•

5 years ago

•

Edited

I've verified in Chrome and Edge that gradient-move-stops.html and gradient-move-stops-ref.html do look the same, and indeed they don't in the current Nightly; but they do, after my change. So I assume the best thing to do is to remove the "if webrender expect fail" in gradient-move-stops.html.ini.

I'll stop trying to "save time and resources" by carefully picking my try runs, because clearly I'm not -- so here's a try auto run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbd24005ee3d19012ed820ee875f4382f641c5c0

(I'm not adding a new pair of YAML reftests because I'm not sure how to translate this:
background-image: linear-gradient(to right, yellow, blue 70%, green 0);
to a YAML file that'd be any different from the reference 0.7 blue, 0.7 green, 1.0 green.)

Edit: try auto doesn't seem to work. New run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20d5672c6804436227b41bb7476af6a957bbed0a

Flags: needinfo?(bpeers)

Pulsebot

Comment 15

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dfa94a886779 Add a fast path for more gradient types in WR r=gw

Stefan Hindli [:stefan_hindli]

Comment 16

•

5 years ago

Backed out changeset dfa94a886779 (Bug 1624468) for causing web platform permafailures in /css/css-images/gradient-move-stops.html CLOSED TREE

https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=7b509941a4f552261a47e5b75ea755fd5e6b9ea4&searchStr=windows%2C10%2Cx64%2Cquantumrender%2Cdebug%2Cweb%2Cplatform%2Ctests%2Ctest-windows10-64-qr%2Fdebug-web-platform-tests-reftest-e10s-4%2Cw%28wr4%29&selectedJob=296015444

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296015444&repo=autoland&lineNumber=20944

Flags: needinfo?(bpeers)

Pulsebot

Comment 17

•

5 years ago

Backout by shindli@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/92912f17f2b8 Backed out changeset dfa94a886779 for causing web platform permafailures in /css/css-images/gradient-move-stops.html CLOSED TREE

Bert Peers [:bpeers]

Assignee

Comment 18

•

5 years ago

Right, so if I understand correctly, if the test expects failure then Windows says:
TEST-UNEXPECTED-PASS | /css/css-images/gradient-move-stops.html | Testing http://web-platform.test:8000/css/css-images/gradient-move-stops.html == http://web-platform.test:8000/css/css-images/gradient-move-stops-ref.html

but if I change the test to expect equality, Linux says:

TEST-UNEXPECTED-FAIL | /css/css-images/gradient-move-stops.html | Testing http://web-platform.test:8000/css/css-images/gradient-move-stops.html == http://web-platform.test:8000/css/css-images/gradient-move-stops-ref.html

Flags: needinfo?(bpeers)

Bert Peers [:bpeers]

Assignee

Comment 19

•

5 years ago

Attached file Bug 1624468 - Add a fast path for more gradient types in WR — Details

Add support for repeating gradients in the cached fast path.

Glenn Watson [:gw]

Reporter

Comment 20

•

5 years ago

(In reply to Bert Peers [:bpeers] from comment #11)

Hi Glenn, I'm curious on your thoughts for best bang-for-buck next steps?

For caching conic/radial gradients, we would lose quality from upsampling, more than from linear gradients. In particular the center might lose its sharpness, I'm not sure if that's acceptable.
Hard stops would be disallowed; adding them would mean multiple 2D images overlapping, with alpha cutouts.
So both are doable but is it a good idea vs. just relying on picture caching?

Likewise for linear gradients, removing the horizontal-vertical-only restriction would require baking a 2D texture, I think.
We could also rotate UVs in the image brush, but, it doesn't do so right now, and there'd be clamping overhead since the gradient is atlassed. That would seem a batch breaking change, and at some point that overhead must approach the original ALU shader, which isn't all that complex.

So I'm wondering if you had some ideas when you first wrote up this task. Thanks!

I think you're probably right - it's probably not worth looking into radial / conic gradients, or the other limitations at this point. In future, we might revisit if we come across some pages where these show up in profiles. Thanks for looking into this!

Flags: needinfo?(gwatson)

Bert Peers [:bpeers]

Assignee

Comment 21

•

5 years ago

•

Edited

Attached image timings.svg — Details

Thanks for the input Glenn. I think you're right, when we see a real-world case, it'll be easier to judge how much time to invest further into special cases.

I've got a green Try for the above 2 fails, but will hold off until after freeze to land.

Intel HD630 is happy with all the optimizations. GTX1050 agrees, except for the repeat in D69476 which shows a regression. Intel probably needs it more badly though and sees gains of up to 20%, so it's probably still valid? If we change our minds, it's easy enough to put back ExtendMode == Clamp in supports_caching, no need to rollback the whole patch.

Bert Peers [:bpeers]

Assignee

Comment 22

•

5 years ago

Attached image GTX1050 — Details

GTX1050

Bert Peers [:bpeers]

Assignee

Comment 23

•

5 years ago

Comment on attachment 9138161 [details] timings.svg HD630

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 24

•

5 years ago

== Change summary for alert #25534 (as of Thu, 02 Apr 2020 17:58:36 GMT) ==

Improvements:

7% Heap Unclassified windows10-64-shippable-qr opt 64,626,073.69 -> 59,793,242.54

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25534

Bert Peers [:bpeers]

Assignee

Comment 25

•

5 years ago

Attached file gradient_documentation_v1.html — Details

Repeating gradient Documentation version 1 as of April 6, 2020.

Pulsebot

Comment 26

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/376011a4881e Add a fast path for more gradient types in WR r=gw

Andrei Ciure[:aciure]

Comment 27

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/376011a4881e

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 28

•

5 years ago

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #24)

== Change summary for alert #25534 (as of Thu, 02 Apr 2020 17:58:36 GMT) ==

Improvements:

7% Heap Unclassified windows10-64-shippable-qr opt 64,626,073.69 -> 59,793,242.54

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25534

False alarm, sorry.

Pulsebot

Comment 29

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b2acec83166 Add a fast path for more gradient types in WR r=gw

Stefan Hindli [:stefan_hindli]

Comment 30

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1b2acec83166

Nicolas Silva [:nical]

Comment 31

•

5 years ago

I think you're probably right - it's probably not worth looking into radial / conic gradients, or the other limitations at this point. In future, we might revisit if we come across some pages where these show up in profiles. Thanks for looking into this!

Let's close and revisit in a new bug if/when it makes sense.

Status: NEW → RESOLVED

Closed: 5 years ago

Keywords: leave-open

Resolution: --- → FIXED

Timothy Nikkel (:tnikkel)

Updated

•

4 years ago

Regressions: 1703141

You need to log in before you can comment on or make changes to this bug.