Open Bug 1479792 Opened 6 years ago Updated 22 days ago

Use ClientStorage for texture upload on Mac

Categories

(Core :: Graphics: WebRender, enhancement, P3)

63 Branch
enhancement

Tracking

()

Tracking Status
firefox63 --- affected

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 3 open bugs, Blocks 2 open bugs)

Details

(Whiteboard: wr-planning)

The current upload path still seems pretty unoptimal
Blocks: wr-mac
Priority: -- → P3

Here's a profile that shows some of the problems: https://perfht.ml/2XQPazL

Depends on: 1604546

Discussed this with Dzmitry a bit.

(In reply to Markus Stange [:mstange] from comment #1)

Here's a profile that shows some of the problems: https://perfht.ml/2XQPazL

There are two steps to eliminating this copy:

  • First, we want to remove the glBufferSubData call. Rather than having the driver copy our data into the PBO, we want to have a mapped PBO and copy into the mapping ourselves. Bug 1602550 will help with this.
  • Then, we want to eliminate the copy by having the RenderBackend write data directly into the PBO mapping. This is bug 1604546.

At that point, texture upload with PBOs "should" be as efficient as using ClientStorage. (We should make a small test app to make sure that this is the case. The driver might not be doing exactly what we expect.)

However, adding a path for texture upload that uses ClientStorage textures would allow for the following:

  • It would mean that we wouldn't need to map buffers ahead of time and guess their sizes. ClientStorage allows us to make our own allocations without talking to GL.
  • It would probably encounter fewer bugs with GL drivers on macOS.

This is probably the path that gets taken for things like software-decoded video. We can improve this by having Gecko do the upload (which will be using ClientStorage from an existing code path) and then expose the texture to WebRender as an external-image-native-texture rather than as a bag of bytes. I'll file a separate bug about this. (I thought we already had one on it but I cannot find it at the moment.)

(In reply to Markus Stange [:mstange] from comment #2)

This is probably the path that gets taken for things like software-decoded video. We can improve this by having Gecko do the upload (which will be using ClientStorage from an existing code path) and then expose the texture to WebRender as an external-image-native-texture rather than as a bag of bytes. I'll file a separate bug about this. (I thought we already had one on it but I cannot find it at the moment.)

I assume bug 1403618 is meant. (?)

Depends on: 1403618

YouTube VP9 video playback also shows the problem pretty obviously

Whiteboard: wr-planning

As a first step to comment 2 we should see if we can get a texture upload path that works well enough on mac using PBOs. Using https://github.com/jrmuizel/client-storage-rs should make it easy to compare PBOs and client storage.

Related to bug 1640952, which is required to get the staging belt overhead close to the client storage.

No longer blocks: wr-mac-nightly
Depends on: 1544050

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

As a first step to comment 2 we should see if we can get a texture upload path that works well enough on mac using PBOs. Using https://github.com/jrmuizel/client-storage-rs should make it easy to compare PBOs and client storage.

I wrote a benchmarking script to try out all different texture upload methods in this test app. The benchmark renders 300 frames with vsync off.

Results on 2019 MBP with Intel UHD Graphics 630:

['--apple-format', '--texture-storage', '--pbo', '1'] 8.89855162s, 8.831704057s, 8.85521106s
['--apple-format', '--texture-storage', '--pbo', '1', '--pbo-reallocate-buffer'] 8.923318468s, 8.902088411s, 8.919655329s
['--apple-format', '--texture-storage', '--pbo', '2'] 8.843013143s, 8.829951477s, 8.843549752s
['--apple-format', '--texture-storage', '--pbo', '2', '--pbo-reallocate-buffer'] 8.876591648s, 8.892376154s, 8.871327972s
['--apple-format', '--texture-storage', '--client-storage'] 7.124985015s, 7.289057942s, 7.229166325s
['--apple-format', '--texture-storage', '--texture-rectangle', '--pbo', '1'] 8.844125617s, 8.835762945s, 8.859816758s
['--apple-format', '--texture-storage', '--texture-rectangle', '--pbo', '1', '--pbo-reallocate-buffer'] 8.863563841s, 8.873639903s, 8.892736942s
['--apple-format', '--texture-storage', '--texture-rectangle', '--pbo', '2'] 8.784697434s, 8.820122089s, 8.831129245s
['--apple-format', '--texture-storage', '--texture-rectangle', '--pbo', '2', '--pbo-reallocate-buffer'] 8.88656608s, 8.873930213s, 8.888188097s
['--apple-format', '--texture-storage', '--texture-rectangle', '--client-storage'] 7.233205088s, 7.228350659s, 7.183014964s
['--apple-format', '--texture-storage', '--texture-array', '--pbo', '1'] 8.833866519s, 8.840146932s, 8.838348655s
['--apple-format', '--texture-storage', '--texture-array', '--pbo', '1', '--pbo-reallocate-buffer'] 8.860797364s, 9.043897408s, 8.887586905s
['--apple-format', '--texture-storage', '--texture-array', '--pbo', '2'] 8.831442529s, 8.860461698s, 8.851406145s
['--apple-format', '--texture-storage', '--texture-array', '--pbo', '2', '--pbo-reallocate-buffer'] 8.890968003s, 8.899123153s, 8.915493115s
['--apple-format', '--texture-storage', '--texture-array', '--client-storage'] 7.173057978s, 7.156567533s, 7.200522028s
['--apple-format', '--texture-rectangle', '--pbo', '1'] 4.70128211s, 4.695886643s, 4.698270348s
['--apple-format', '--texture-rectangle', '--pbo', '1', '--pbo-reallocate-buffer'] 7.770578027s, 7.686980041s, 7.750275808s
['--apple-format', '--texture-rectangle', '--pbo', '2'] 4.706645099s, 4.660831551s, 4.663864238s
['--apple-format', '--texture-rectangle', '--pbo', '2', '--pbo-reallocate-buffer'] 4.474235542s, 4.464127835s, 4.47519886s
['--apple-format', '--texture-rectangle', '--client-storage'] 3.228521601s, 3.225977283s, 3.234716105s
['--apple-format', '--texture-array', '--pbo', '1'] 4.693244955s, 4.697209266s, 4.697757732s
['--apple-format', '--texture-array', '--pbo', '1', '--pbo-reallocate-buffer'] 7.704128247s, 7.719084604s, 7.611890613s
['--apple-format', '--texture-array', '--pbo', '2'] 4.710451563s, 4.680535763s, 4.693366143s
['--apple-format', '--texture-array', '--pbo', '2', '--pbo-reallocate-buffer'] 4.465463219s, 4.470689369s, 4.474402022s
['--apple-format', '--texture-array', '--client-storage'] 4.459025465s, 4.461897402s, 4.484525335s
['--swizzle', '--texture-storage', '--pbo', '1'] 4.689885722s, 4.660584195s, 4.701891236s
['--swizzle', '--texture-storage', '--pbo', '1', '--pbo-reallocate-buffer'] 7.718375091s, 7.695965396s, 7.662461593s
['--swizzle', '--texture-storage', '--pbo', '2'] 4.719654608s, 4.686956754s, 4.688548537s
['--swizzle', '--texture-storage', '--pbo', '2', '--pbo-reallocate-buffer'] 4.466023294s, 4.459831265s, 4.465406599s
['--swizzle', '--texture-storage', '--client-storage'] 4.456106935s, 4.425488113s, 4.445511299s
No longer depends on: 1544050

We put together a "shortsighted" plan for what will happen next here:

The kinds of uploads that we need to deal with:

  1. imagelib - content process, also needs to respect alignment
  2. blobs - gpu process, needs pointer passing, lifetime is extended
  3. glyphs - gpu process (scene building), async and ahead of time, needs alignment
  4. video - lifetime issues, cross-process, can encapsulate the client storage

Complications:

  • lifetimes (client storage buffer needs to stay alive for the duration of the texture)
  • only can allocate PBOs on the GL threads
  • PBO can't work with shmems

Current issues:

  • We're currently hitting zero-fill-on-demand when copying into the PBO
  • reallocating is bad
    • drivers searches the orphan lists
    • drivers zero-fill on demand

Plan:

  • kvark: reduced test case for amd issue
  • miko: software video decoding use existing client storage infrastructure.
  • jeff: look into why/if we are page faulting on a re-used orphan PBO
    • i.e. why isn't the memory being reused for blobs
No longer blocks: wr-mac-block

Client storage upload for software-decoded video and canvas2d was implemented in bug 1536515.

No longer blocks: 1536515
Depends on: 1536515
Depends on: 1690682
Depends on: 1690685

I filed two bugs about using client storage in the texture cache:

Severity: normal → S3
Blocks: wr-todos
You need to log in before you can comment on or make changes to this bug.