Closed Bug 1875295 Opened 1 year ago Closed 1 year ago

7.14 - 5.95% perf_reftest_singletons tiny-traversal-singleton.html / perf_reftest_singletons tiny-traversal-singleton.html (Linux) regression on Fri January 5 2024

Categories

(Core :: Panning and Zooming, defect)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr115 --- unaffected
firefox121 --- unaffected
firefox122 --- unaffected
firefox123 --- affected

People

(Reporter: aglavic, Unassigned)

References

(Regression)

Details

(4 keywords)

Perfherder has detected a talos performance regression from push 216834d50c953afb41fe8f7ad157ad7c1740912d. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
7% perf_reftest_singletons tiny-traversal-singleton.html linux1804-64-shippable-qr e10s fission stylo webrender 980.94 -> 1,050.98

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 41048

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(tnikkel)

Set release status flags based on info from the regressing bug 1872563

This is a bit odd, the regressing patch is a fairly straight forward simplification of a data structure. And it looks like the regression went away with bug 1649696 a little while later.

It does appear to be real and independent from bug 1649696 . (by backing out each bug separately and together)

But I don't understand why at all. There is supposed to be a 70ms regression here. The code changed by the regressing patch only happens during the paint phase. But in the entire profile of a run of this test there are only 2 ms in the painting phase!

Comparing profiles before/after there really isn't much difference either, looking at all phases, the numbers are about the same.

I tried limiting the change to WebRenderLayerScrollData only (skipping the HitTestingTreeNode changes), that shows the regression as well.

I tried changing the rect to a maybe<rect>, but that also shows the regression.

I also happened upon the Region serialization code, it just sends the rects of the region one by one, and then an empty rect sentinel. So switching from region to rect saves nothing in the common case of an empty region/rect, and saves one rect otherwise. Not as much of a win as I thought.

I've done a bunch more experiments.

If I remove one rect from WebRenderLayerScrollData it makes the score slightly worse.

If I add one rect, or one region, or ten rects, or ten regions to WebRenderLayerScrollData then the regression is fixed (all of these seemed to produce about the same score on this test).

I'm not sure what's going on here, but given these facts:

the code that was changed by the regressing patch is barely executed (max 2 ms vs the 70 ms regression)

increasing the size of the structure improves things, and decreasing the size of the structure makes things worse

the regression only appeared on one micro test on one platform

any further investigation would take more time and unlikely to lead to any kind of win that is worth that time

I think it makes sense to close this and leave the original patch in. Let me know if anyone has any other ideas.

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(tnikkel)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.