Closed Bug 1642495 Opened 5 years ago Closed 5 years ago

Lots of time spent in buffer_data_untyped|gleGetFreeOrphanNode on macOS

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: kvark)

References

(Blocks 3 open bugs)

Details

Attachments

(3 files)

Limit WebRender instance buffer sizes 5 years ago Dzmitry Malyshau [:kvark] 47 bytes, text/x-phabricator-request		Details \| Review
Use the same usage hint in WebRender for one-time reset 5 years ago Dzmitry Malyshau [:kvark] 47 bytes, text/x-phabricator-request		Details \| Review
Switch all WebRender HW-accelerated GPU cache updates to Scatter 5 years ago Dzmitry Malyshau [:kvark] 47 bytes, text/x-phabricator-request		Details \| Review

Jeff Muizelaar [:jrmuizel]

Reporter

Description

•

5 years ago

https://share.firefox.dev/2ZXsHV1

This happened while scrolling a zoomed in hackernews page.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

Blocks: wr-mac, wr-perf

Timothy Nikkel (:tnikkel)

Updated

•

5 years ago

Blocks: desktop-zoom-release

Jessie [:jbonisteel] pls NI

Updated

•

5 years ago

Severity: -- → S3

Priority: -- → P3

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

Summary: Blocking in buffer_data_untyped on macOS → Lots of time spent in buffer_data_untyped|gleGetFreeOrphanNode on macOS

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 1

•

5 years ago

Looking at a profile from Instruments it looks like we're spending a lot of time looping traversing a hashtable/linked list in gleGetFreeOrphanNode

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 2

•

5 years ago

In 10.15 the buffer recycling logic looks something like:

if (size > 0x20000) {
   round size up to nearest 0x1000
} else {
   bucket into [0x1000, 0x2000, 0x4000, 0x8000, 0x10000, 0x20000]
}

do a hash lookup based on the bucketed size and look for an orphaned bucket that matches.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 3

•

5 years ago

for 72 calls to buffer_data in one frame we go around the search loop 3426 times.

Dzmitry Malyshau [:kvark]

Assignee

Comment 4

•

5 years ago

We should have a define like MAX_INSTANCE_BUFFER_SIZE in renderer.rs, compute the maximum number of instances from it, and chunk the instance vectors accordingly. This is easy to do, cheap to do, and will make sure we never hit the bad case on macOS.

Dzmitry Malyshau [:kvark]

Assignee

Comment 5

•

5 years ago

Attached file Limit WebRender instance buffer sizes — Details

This is an attempt to improve our relationship with the drivers.
Currently, we work with the instance buffer limit dictated by the macOS drivers.
We can consider lowering it, since it will only make the driver work eaiser.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → dmalyshau

Status: NEW → ASSIGNED

Dzmitry Malyshau [:kvark]

Assignee

Comment 6

•

5 years ago

I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 7

•

5 years ago

I see some large number of draw calls when zooming, I suspect there might be some badness here where the amount of time we spend looking for an orphaned buffer is proportional to the number of draw calls that we're doing because we end up with that many buffers.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 8

•

5 years ago

(In reply to Dzmitry Malyshau [:kvark] from comment #6)

I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?

Yeah, I think it's actually the small buffers causing the problem not the big ones.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 9

•

5 years ago

It looks like the reason we don't find a match with the orphaned buffers is because mostly because the usage doesn't match.
We're looking for GL_STREAM_DRAW and at least some of the orphaned buffers are GL_DYNAMIC_DRAW.

Dzmitry Malyshau [:kvark]

Assignee

Comment 10

•

5 years ago

Attached file Use the same usage hint in WebRender for one-time reset — Details

This should let the GL drivers to re-use the PBOs more aggressively,
and traverse the orphan list less.
Fwiw, it doesn't look like Angle differentiates between StreamDraw and DynamicDraw:
https://searchfox.org/mozilla-central/rev/598e50d2c3cd81cd616654f16af811adceb08f9f/gfx/angle/checkout/src/libANGLE/renderer/d3d/BufferD3D.cpp#65-66

Dzmitry Malyshau [:kvark]

Assignee

Comment 11

•

5 years ago

Attached file Switch all WebRender HW-accelerated GPU cache updates to Scatter — Details

scattered GPU updates use data transfers most efficiently, since
they need a single slice of a buffer to do all the updates per frame, instead
of uploading each small section of a row independently.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 12

•

5 years ago

•

Edited

10.15 also has a hard coded maximum combined size of orphaned buffers of 0x4000000 bytes (~67MB) and will free orphans until it gets below that max.

Dzmitry Malyshau [:kvark]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Pulsebot

Comment 13

•

5 years ago

Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a9f33b1f23a0 Limit WebRender instance buffer sizes r=gw

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 14

•

5 years ago

The usage hint definitely improves things. gleGetFreeOrphanNode is the hotest function before and is much lower after.

Pulsebot

Comment 15

•

5 years ago

Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ec174e49592b Use the same usage hint in WebRender for one-time reset r=gw

Narcis Beleuzu [:NarcisB]

Comment 16

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a9f33b1f23a0

Bogdan Tara[:bogdan_tara | bogdant]

Comment 17

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ec174e49592b

Markus Stange [:mstange]

Comment 18

•

5 years ago

See also bug 1645716 for some more discussion on this problem.

Pulsebot

Comment 19

•

5 years ago

Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7abdf136d365 Switch all WebRender HW-accelerated GPU cache updates to Scatter r=gw

Narcis Beleuzu [:NarcisB]

Comment 20

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7abdf136d365

Dzmitry Malyshau [:kvark]

Assignee

Comment 21

•

5 years ago

I think the patches improved the situation. Ultimately, we need to move to a world when PBOs are allocated in fixed-size chunks, and in less amount than today. This would make them trivially reusable (by either us or the driver), and stress the user-mode driver side less.

Kris Taeleman (:ktaeleman)

Updated

•

5 years ago

Blocks: desktop-zoom-post
No longer blocks: desktop-zoom-release

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: wr-renderer-perf
No longer blocks: wr-perf

Kris Taeleman (:ktaeleman)

Comment 22

•

5 years ago

@kvark: can you close this bug and create a followup bug instead?

Flags: needinfo?(dmalyshau)

Dzmitry Malyshau [:kvark]

Assignee

Comment 23

•

5 years ago

That other bug is already on file -
https://bugzilla.mozilla.org/show_bug.cgi?id=1602550

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Flags: needinfo?(dmalyshau)

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: leave-open

You need to log in before you can comment on or make changes to this bug.