Lots of time spent in buffer_data_untyped|gleGetFreeOrphanNode on macOS
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: jrmuizel, Assigned: kvark)
References
(Blocks 3 open bugs)
Details
Attachments
(3 files)
https://share.firefox.dev/2ZXsHV1
This happened while scrolling a zoomed in hackernews page.
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
Looking at a profile from Instruments it looks like we're spending a lot of time looping traversing a hashtable/linked list in gleGetFreeOrphanNode
Reporter | ||
Comment 2•5 years ago
|
||
In 10.15 the buffer recycling logic looks something like:
if (size > 0x20000) {
round size up to nearest 0x1000
} else {
bucket into [0x1000, 0x2000, 0x4000, 0x8000, 0x10000, 0x20000]
}
do a hash lookup based on the bucketed size and look for an orphaned bucket that matches.
Reporter | ||
Comment 3•5 years ago
|
||
for 72 calls to buffer_data in one frame we go around the search loop 3426 times.
Assignee | ||
Comment 4•5 years ago
|
||
We should have a define like MAX_INSTANCE_BUFFER_SIZE
in renderer.rs
, compute the maximum number of instances from it, and chunk the instance vectors accordingly. This is easy to do, cheap to do, and will make sure we never hit the bad case on macOS.
Assignee | ||
Comment 5•5 years ago
|
||
This is an attempt to improve our relationship with the drivers.
Currently, we work with the instance buffer limit dictated by the macOS drivers.
We can consider lowering it, since it will only make the driver work eaiser.
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?
Reporter | ||
Comment 7•5 years ago
|
||
I see some large number of draw calls when zooming, I suspect there might be some badness here where the amount of time we spend looking for an orphaned buffer is proportional to the number of draw calls that we're doing because we end up with that many buffers.
Reporter | ||
Comment 8•5 years ago
|
||
(In reply to Dzmitry Malyshau [:kvark] from comment #6)
I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?
Yeah, I think it's actually the small buffers causing the problem not the big ones.
Reporter | ||
Comment 9•5 years ago
|
||
It looks like the reason we don't find a match with the orphaned buffers is because mostly because the usage doesn't match.
We're looking for GL_STREAM_DRAW and at least some of the orphaned buffers are GL_DYNAMIC_DRAW.
Assignee | ||
Comment 10•5 years ago
|
||
This should let the GL drivers to re-use the PBOs more aggressively,
and traverse the orphan list less.
Fwiw, it doesn't look like Angle differentiates between StreamDraw and DynamicDraw:
https://searchfox.org/mozilla-central/rev/598e50d2c3cd81cd616654f16af811adceb08f9f/gfx/angle/checkout/src/libANGLE/renderer/d3d/BufferD3D.cpp#65-66
Assignee | ||
Comment 11•5 years ago
|
||
scattered GPU updates use data transfers most efficiently, since
they need a single slice of a buffer to do all the updates per frame, instead
of uploading each small section of a row independently.
Reporter | ||
Comment 12•5 years ago
•
|
||
10.15 also has a hard coded maximum combined size of orphaned buffers of 0x4000000 bytes (~67MB) and will free orphans until it gets below that max.
Assignee | ||
Updated•5 years ago
|
Comment 13•5 years ago
|
||
Reporter | ||
Comment 14•5 years ago
|
||
The usage hint definitely improves things. gleGetFreeOrphanNode is the hotest function before and is much lower after.
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
bugherder |
Comment 17•5 years ago
|
||
bugherder |
Comment 18•5 years ago
|
||
See also bug 1645716 for some more discussion on this problem.
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
bugherder |
Assignee | ||
Comment 21•5 years ago
|
||
I think the patches improved the situation. Ultimately, we need to move to a world when PBOs are allocated in fixed-size chunks, and in less amount than today. This would make them trivially reusable (by either us or the driver), and stress the user-mode driver side less.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 22•5 years ago
|
||
@kvark: can you close this bug and create a followup bug instead?
Assignee | ||
Comment 23•5 years ago
|
||
That other bug is already on file -
https://bugzilla.mozilla.org/show_bug.cgi?id=1602550
Updated•5 years ago
|
Description
•