Closed Bug 1683323 Opened 5 years ago Closed 5 years ago

2.2 - 9.55% Explicit Memory / Heap Unclassified / Resident Memory (linux1804-64-shippable-qr) regression on push 5589923a52c80ed09c8ce383afb0385cabc0ec18 (Tue December 15 2020)

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- unaffected
firefox84 --- unaffected
firefox85 --- unaffected
firefox86 --- affected

People

(Reporter: aesanu, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a awsy performance regression from push 5589923a52c80ed09c8ce383afb0385cabc0ec18. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% Heap Unclassified linux1804-64-shippable-qr 257,021,375.60 -> 281,573,267.14
5% Explicit Memory linux1804-64-shippable-qr 517,304,866.81 -> 543,739,422.35
3% Resident Memory linux1804-64-shippable-qr 975,430,047.02 -> 1,004,053,983.23
2% Explicit Memory linux1804-64-shippable-qr 522,660,870.67 -> 534,168,924.52

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(gwatson)

If I understand correctly, the patch that caused this regression (pt 2 in that series) was landed, then backed out, and has re-landed on m-c now. So is there another perf alert that shows the heap usage regressing again? (asking because pt 3 also landed yesterday but was backed out, and I want to make sure I've investigating the right part of the patch).

Flags: needinfo?(gwatson) → needinfo?(aesanu)

OK, I was able to repro locally and can see what is going on now.

Previously, what we try and do is to allocate the smallest possible set of render targets (in terms of size) that would allow webrender to draw the render task graph. Then, we'd try to keep these in a memory pool until a threshold is exceeded and garbage collect them.

We try to reuse render targets from the pool next frame, because GPU driver allocations of render targets can be expensive and cause frame spikes. The problem with this scheme is that the nature of web content means that the required sizes of render targets often change from frame to frame, and so we spend a lot of time freeing old render targets, and allocating new targets from the GPU driver. So, even though we're using a render target memory pool, because the sizes often don't match, we still end up seeing a lot of frame spikes due to driver memory allocations.

Now, whenever we create a shared render target, we create it at a fixed size (2048 x 2048), and that is used for allocating render tasks. Once that 2k target is fully allocated, we then allocate another 2k x 2k target, and so on.

This has a couple of effects:

  • In some cases, especially on simple pages, the allocated memory can be higher as we've allocated targets (size 2k) larger than strictly needed.
  • We get much better reuse of render targets, so we reduce frame performance spikes because we don't need to alloc/free targets from the GPU driver.
  • Although the reported memory use in some cases is higher, from some testing it seems to be much more stable and as good or better on complex pages (since we don't have targets of unusual sizes retained in the memory pool until they are garbage collected).

Given those findings, I think it's fine to close this as a WONTFIX. Does that sound reasonable to you?

10% jumps in heap-unclassified always make me nervous that there's something we should be measuring that we currently aren't. Is there a memory probe we need to add here?

Yes - I'm not sure exactly what's involved in doing that in WR, but it does sound like a good idea to ensure those allocations are included.

We don't have access to the allocation or pointer itself (they are allocated by the GPU driver and we have an opaque handle), which might complicate things?

We can calculate a rough estimate of what the GPU driver is going to allocate, maybe that's enough?

Flags: needinfo?(gwatson) → needinfo?(ryanvm)

I don't think I'm in a position to weigh in on that, sorry. I'll defer to the judgement of the gfx team :)

Flags: needinfo?(ryanvm)

Removing needinfo since the patch that needs investigation was detected

Flags: needinfo?(aesanu)

jrmuizel, given the details in [1] and the memory improvements in other areas from this patch series [2], are you OK with closing this as WONTFIX?

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1683323#c3,
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1682365#c29

Flags: needinfo?(jmuizelaar)

Yep.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jmuizelaar)
Resolution: --- → WONTFIX
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.