1599502 - A long time is spent in glBufferData during draw_instanced_batch (Intel)

Reporter

Description

•

5 years ago

In this profile of a 1080p VP9 60FPS video playing, we spend in 13% of the non-idle time in glBufferData -> gleAcquireBufferData -> gleGetFreeOrphanNode: https://perfht.ml/2XQPazL

Markus Stange [:mstange]

Reporter

Comment 1

•

5 years ago

It's not clear to me which path from draw_instanced_batch to buffer_data_untyped is being taken here because there's a lot of inlining going on.

Dzmitry Malyshau [:kvark]

Comment 2

•

5 years ago

•

Edited

One idea I was thinking about earlier today is to create bigger buffers when uploading. That would make the driver to do less allocation and internal buffer renaming, hopefully avoiding the associated slowness.

Here is what we do today:

FrameBuilder:
  - make vectors of instance data, one per batch
Renderer:
  - upload texture data
  - for each target
    - for each batch
      - create a buffer with the instance data for *this batch*
      - draw

Instead, we could do the following:

FrameBuilder:
  - make vectors of instance data, one per *type of a* batch
  - an actual batch would then just contain the range of that instance vector
Renderer:
  - upload texture data
  - upload all batch data
  - for each target
    - for each batch
      - bind the relevant buffer (that is already on GPU)
      - draw with specified base instance

Aside from having less driver work for managing the buffers (tracking, renaming, allocating), this approach also has a benefit of reducing our heap allocations. It also plays better with the Szeged fork.

Aria Beingessner [:Gankra]

Updated

•

5 years ago

Blocks: wr-73

Keywords: perf

Priority: -- → P3

Aria Beingessner [:Gankra]

Comment 3

•

5 years ago

dropping wr-73 on the assumption that this is mac-specific

No longer blocks: wr-73

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: texture-upload-perf

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: wr-intel

Summary: A long time is spent in glBufferData during draw_instanced_batch on my Intel GPU → A long time is spent in glBufferData during draw_instanced_batch on my Intel GPU on Mac

Jeff Muizelaar [:jrmuizel]

Updated

•

5 years ago

Blocks: 1576637

Nicolas Silva [:nical]

Comment 4

•

5 years ago

We are seeing issues with that on non-mac platforms as well.

Summary: A long time is spent in glBufferData during draw_instanced_batch on my Intel GPU on Mac → A long time is spent in glBufferData during draw_instanced_batch (Intel)

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: wr-perf-p1

Kris Taeleman (:ktaeleman)

Comment 5

•

5 years ago

@Markus: Could you check if this is still occurring?

Flags: needinfo?(mstange.moz)

Dzmitry Malyshau [:kvark]

Comment 6

•

4 years ago

https://phabricator.services.mozilla.com/D102333 is implementing the instance data consolidation, which reduces the number of PBOs we create for the instance data. The last try push with artifacts is https://treeherder.mozilla.org/jobs?repo=try&revision=c8330a8863a258f68e3b77c3aba8917007e41653 . Where is this reproducible, exactly? If I can't find a good repro case, I'd have to ask one of you guys to test an artifact from this build.

Flags: needinfo?(nical.bugzilla)

Markus Stange [:mstange]

Reporter

Comment 7

•

4 years ago

(In reply to Kris Taeleman (:ktaeleman) from comment #5)

@Markus: Could you check if this is still occurring?

I haven't noticed it recently... but I also don't currently get driver symbols in my profiles (bug 1683758), and it's an issue that gets worse over time as we accumulate orphaned PBOs. I'm not sure how to reproduce it. The 1080p video case from comment 0 no longer reproduces it on macOS because those videos are now handled in the native compositor.

Flags: needinfo?(mstange.moz)

Dzmitry Malyshau [:kvark]

Comment 8

•

4 years ago

•

Edited

I looked at a profile of Element web client just scrolling back and forth on mac with Intel 550. The CPU timings for draw_instanced_batch take about 1%-1.5% total time, while the number of draw calls is within 50-100. This isn't reproducing the issue here, unable to optimize this.

Edit: after playing with the profile some more, I see the total of 6% time in drawing batches.

Nicolas Silva [:nical]

Comment 9

•

4 years ago

From a quick profile of scrolling Elements on Linux + nvidia width proprietary drivers I see ~14% of the renderer frame time spent in glBufferData under draw_instanced_batch. Note that the total render time was pretty good so even if 14% is a somewhat significant portion, it's a portion of something small, so I'm not overly worried.

No longer blocks: wr-perf-p1

Flags: needinfo?(nical.bugzilla)

Dzmitry Malyshau [:kvark]

Comment 10

•

4 years ago

Nicola, are these numbers with or without the change I linked? If the time is already OK and doesn't need fixing, any reason to keep the issue open?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 11

•

4 years ago

These numbers are without the change (just an official nightly build). I just checked on the kangax compatibility table which has loads of primitives and is a bit heavier on the renderer thread (around 16ms Renderer::update on average) https://kangax.github.io/compat-table/es6/

on Linux+Intel The time spent in glBufferData is 17% of frame building vs 40% spent in glDrawElementsInstanced.
I'm giving this number beause I have a linux box handy but some intel+windows number would be more useful probably (Linux tends to do better on average for renderer times).

I think that things used to be worse, and picture caching has helped a lot papering over driver overhead in common cases. I wouldn't say that this is very high priority but if you think there is significant room for improvement and fruits are hanging low enough we can keep it open.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

Bugzilla

A long time is spent in glBufferData during draw_instanced_batch (Intel)

Categories

(Core :: Graphics: WebRender, defect, P3)

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 4 open bugs)

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Updated

Updated

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated