Closed Bug 1642495 Opened 5 years ago Closed 5 years ago

Lots of time spent in buffer_data_untyped|gleGetFreeOrphanNode on macOS

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: kvark)

References

(Blocks 3 open bugs)

Details

Attachments

(3 files)

https://share.firefox.dev/2ZXsHV1

This happened while scrolling a zoomed in hackernews page.

Blocks: wr-mac, wr-perf
Severity: -- → S3
Priority: -- → P3
Summary: Blocking in buffer_data_untyped on macOS → Lots of time spent in buffer_data_untyped|gleGetFreeOrphanNode on macOS

Looking at a profile from Instruments it looks like we're spending a lot of time looping traversing a hashtable/linked list in gleGetFreeOrphanNode

In 10.15 the buffer recycling logic looks something like:

if (size > 0x20000) {
   round size up to nearest 0x1000
} else {
   bucket into [0x1000, 0x2000, 0x4000, 0x8000, 0x10000, 0x20000]
}

do a hash lookup based on the bucketed size and look for an orphaned bucket that matches.

for 72 calls to buffer_data in one frame we go around the search loop 3426 times.

We should have a define like MAX_INSTANCE_BUFFER_SIZE in renderer.rs, compute the maximum number of instances from it, and chunk the instance vectors accordingly. This is easy to do, cheap to do, and will make sure we never hit the bad case on macOS.

This is an attempt to improve our relationship with the drivers.
Currently, we work with the instance buffer limit dictated by the macOS drivers.
We can consider lowering it, since it will only make the driver work eaiser.

Assignee: nobody → dmalyshau
Status: NEW → ASSIGNED

I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?

I see some large number of draw calls when zooming, I suspect there might be some badness here where the amount of time we spend looking for an orphaned buffer is proportional to the number of draw calls that we're doing because we end up with that many buffers.

(In reply to Dzmitry Malyshau [:kvark] from comment #6)

I have a feeling this isn't going to solve HN, since it's unlikely hitting the instance limit anyway.
Perhaps, we can test the build before going forward with it?

Yeah, I think it's actually the small buffers causing the problem not the big ones.

It looks like the reason we don't find a match with the orphaned buffers is because mostly because the usage doesn't match.
We're looking for GL_STREAM_DRAW and at least some of the orphaned buffers are GL_DYNAMIC_DRAW.

This should let the GL drivers to re-use the PBOs more aggressively,
and traverse the orphan list less.
Fwiw, it doesn't look like Angle differentiates between StreamDraw and DynamicDraw:
https://searchfox.org/mozilla-central/rev/598e50d2c3cd81cd616654f16af811adceb08f9f/gfx/angle/checkout/src/libANGLE/renderer/d3d/BufferD3D.cpp#65-66

scattered GPU updates use data transfers most efficiently, since
they need a single slice of a buffer to do all the updates per frame, instead
of uploading each small section of a row independently.

10.15 also has a hard coded maximum combined size of orphaned buffers of 0x4000000 bytes (~67MB) and will free orphans until it gets below that max.

Keywords: leave-open
Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a9f33b1f23a0 Limit WebRender instance buffer sizes r=gw

The usage hint definitely improves things. gleGetFreeOrphanNode is the hotest function before and is much lower after.

Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ec174e49592b Use the same usage hint in WebRender for one-time reset r=gw

See also bug 1645716 for some more discussion on this problem.

Pushed by dmalyshau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7abdf136d365 Switch all WebRender HW-accelerated GPU cache updates to Scatter r=gw

I think the patches improved the situation. Ultimately, we need to move to a world when PBOs are allocated in fixed-size chunks, and in less amount than today. This would make them trivially reusable (by either us or the driver), and stress the user-mode driver side less.

Blocks: wr-renderer-perf
No longer blocks: wr-perf

@kvark: can you close this bug and create a followup bug instead?

Flags: needinfo?(dmalyshau)

That other bug is already on file -
https://bugzilla.mozilla.org/show_bug.cgi?id=1602550

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(dmalyshau)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: