1843930 - Painting is more than 2x slower than Chrome on the Editor-CodeMirror subtest of speedometer3

Reporter

Description

•

2 years ago

https://speedometer-preview.netlify.app/?suite=Editor-CodeMirror

According to the comparison report, on Windows, Firefox spends 2.17x as much time painting during the measured time compared to Chrome. Eliminating the difference would improve our total time on this subtest by 2.6%.

Firefox: https://share.firefox.dev/3KmYW6z
Chrome: https://share.firefox.dev/3JZTvde

I noticed a few things in the profiles:

Retained Display List partial update appears to fail, because we spend just as much time doing the full rebuild as we spend in the partial update rebuild.
We hit fallback painting (WebRenderCommandBuilder::PushItemAsImage), probably for the Windows scrollbar arrows.
There's a lot of time spent in RetainedDisplayListBuilder::ComputeRebuildRegion. I think this page would probably be faster to paint with RDL turned off entirely.

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-452

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

2 years ago

URL: https://speedometer-preview.netlify.a...

Timothy Nikkel (:tnikkel)

Assignee

Comment 1

•

2 years ago

Hmm, I don't get PushItemAsImage in my profiles at all. I tested on macos and on Windows 10 (there are scrollbars with arrows visible). The only thing that shows up as a blob with gfx.webrender.debug.blob.paint-flashing on is the svg inside the buttons at the top.

Maybe you can share some more info about your setup so I can try to emulate that to reproduce?

Flags: needinfo?(mstange.moz)

Timothy Nikkel (:tnikkel)

Assignee

Comment 2

•

2 years ago

I think the profile is hitting this

https://searchfox.org/mozilla-central/rev/7a4c08f2c3a895c9dc064734ada320f920250c1f/widget/Theme.cpp#1317

but I don't see how I would not be hitting that. Maybe it's just so fast on my machine there are no samples in it?

Markus Stange [:mstange]

Reporter

Comment 3

•

2 years ago

That's possible. It is a rather small part of the profile. The profile was taken with ?iterationCount=100 and a profiling interval of 0.25ms.

Flags: needinfo?(mstange.moz)

Timothy Nikkel (:tnikkel)

Assignee

Comment 4

•

2 years ago

Seems like the major thing that makes ComputeRebuildRegion take so long is calling nsLayoutUtils::GetTransformToAncestor. It looks like we call it 3 separate times for the same frame. Two of them in the displayport getter, and a third when we transform the rect in ProcessFrameInternal. Calling it once and sharing it would be a good optimization.

Timothy Nikkel (:tnikkel)

Assignee

Comment 5

•

2 years ago

The long partial display list updates look to be when the syntax highlighting it turned on. We get ~300 modified frames, which is just below the limit of a partial update (500). They end up making the dirty rect be everything but the toolbar of buttons at the top, so we are rebuilding almost the whole thing anyway, so all the other retained stuff (merging, preprocessing etc) is just pure overhead.

It makes me think that a good heuristic to add might be to bail out of a partial update if the root dirty rect becomes most of the whole page. The root dirty rect doesn't tell the whole story because we can store an override dirty rect on every frame with a display port and every stacking context. So the root dirty rect could be large, but we immediately enter a stacking context or displayport with a very small rect. But I'm not sure if that consideration is important.

Timothy Nikkel (:tnikkel)

Assignee

Comment 6

•

2 years ago

And the scrollbar parts that we use fallback for, they aren't blobs, they are images because nsDisplayThemedBackground::MustPaintOnContentSide returns true. But I think since we are using the non-native theme we are just drawing using simple DrawTarget commands and not asking the OS for bitmaps, at least in the common cases, that we can just use blobs, which should be a small win.

Bob Hood [:bhood]

Updated

•

2 years ago

Assignee: nobody → tnikkel

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

2 years ago

Severity: -- → S3

Priority: -- → P2

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

2 years ago

Comment 7

•

1 year ago

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

Seems like the major thing that makes ComputeRebuildRegion take so long is calling nsLayoutUtils::GetTransformToAncestor. It looks like we call it 3 separate times for the same frame. Two of them in the displayport getter, and a third when we transform the rect in ProcessFrameInternal. Calling it once and sharing it would be a good optimization.

Bug 1566826, comment 3 also notes a case where getting transform matrices is significant and caching would save us time.

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

1 year ago

Updated

•

1 year ago

Depends on: 1853712

Bugzilla

Painting is more than 2x slower than Chrome on the Editor-CodeMirror subtest of speedometer3

Categories

(Core :: Web Painting, defect, P2)

Tracking

()

People

(Reporter: mstange, Assigned: tnikkel)

References

(Blocks 1 open bug,
URL
)

Details

(Whiteboard: [sp3])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Comment 7

Updated

Updated