Closed Bug 1539027 Opened 10 months ago Closed 5 months ago

Demo spends 150-200ms styling 1000 elements in Windows (lots of time spent in roundf in layout code)

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox68 --- affected

People

(Reporter: mayankleoboy1, Assigned: emilio)

References

Details

Attachments

(2 files)

Attached file rotate.html

Open the attachment, and run it

http://bit.ly/2UgiJf6

Spends a lot of time styling. Maybe we can be faster?

The demo is run by clicking on the " scroll with moztranform test " line at top left.

I have no idea how real world test is this. Feel free to WONTFIX.

bug 1495411 has a slightly different test case, which also is slow. But the time there is dominated by layer stuff.

Flags: needinfo?(emilio)
See Also: → 1495411

Yeah, the styling time is actually pretty good there, but we spend a ridiculous amount of time in DidSetComputedStyle.

For some reason, my profile looks pretty different: http://bit.ly/2TE8Cfq

Do you have any hint on why what may be causing the difference? Still there are a few wins to be had there... Making mAnonKids not allocate will help trim a bit here, making style refcounting a bit faster maybe will shave a bit more from that restyle.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

For some reason, my profile looks pretty different: http://bit.ly/2TE8Cfq

Do you have any hint on why what may be causing the difference?

Windows Vs Linux and the native theming/drawing of widgets?

Anyway, I took a new profile, with a fresh install and sequential styling : http://bit.ly/2UcveId

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Do you have any hint on why what may be causing the difference?

Or maybe a CPU difference? I have a mobile 2C/4T system, so less number of rayon threads to spin?

Hmm, not really, that should not be the issue, specially if you're using sequential styling for the profile.

Mike, do you know (or know anyone who would know) anything about roundf on windows being slow? The amount of roundf time spent in the profile in comment 3 and earlier is very worrisome... The fact that roundf gets compiled into a dll call just to do float rounding is worrisome on its own. Or maybe there's something wrong with the profiler.

Mayank, can you do something for me? Can you profile the same test-case in a build from before bug 1527410 landed (which introduced the relevant roundf call in this case I believe), and post the result here?

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mayankleoboy1)

That's more a question for David, but that looks a lot like it could be a cousin of https://bugs.llvm.org/show_bug.cgi?id=40541.

Flags: needinfo?(mh+mozilla) → needinfo?(dmajor)

I downloaded 24 January build from the mozregression tool (x64, opt)
Buildid : https://hg.mozilla.org/mozilla-central/rev/b55ddb97722a51b3365f1a72bdb416f028fd2073

Profile with sequential styling : http://bit.ly/2UbJXTX

(times in restyles look to be reduced to 99ms, instead of 160ms)

Flags: needinfo?(mayankleoboy1)

Note that in that new profile roundf is also a dll call.

Right, though the roundf calls are now in rust code as expected. I basically moved the calls from:

https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/third_party/rust/app_units/src/app_unit.rs#277

To here:

https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/layout/style/nsStyleCoord.h#67

So that's good, that means that the profiler isn't lying. It also means that that rust version also doesn't manage to completely optimize away the dll call, which means that this might not be a massive regression.

However it's still a pity to spend so much time rounding numbers, is there a way to not pay the function call cost? These callsites are pretty hot.

Tentatively P2-ing since this seems pretty bad, and my patches made it happen more often (in this case, though less in other) by moving the computation to happen later.

We should probably figure out a way to get this into a better state.

Priority: -- → P2
Summary: Demo spends 150-200ms styling 1000 elements → Demo spends 150-200ms styling 1000 elements in Windows (lots of time spent in roundf in layout code)

https://hg.mozilla.org/try/rev/d66a78adf5f5 is what I'm trying at the moment.

Mind profiling the builds at https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f162f2642893f8d205f2996d611caeeafe48e40 when they're ready?

And thanks so much for everything Mayank :)

Flags: needinfo?(emilio) → needinfo?(mayankleoboy1)
Flags: needinfo?(emilio)

quick question : do i download the pgo build, or the opt build?

downloaded the opt build :
https://hg.mozilla.org/try/rev/6f162f2642893f8d205f2996d611caeeafe48e40

http://bit.ly/2UcREsR with sequential styling
profile doesnt seem to be symbolicated properly.

LLVM doesn't manage to use any fast code for froundf, so use NSToIntRound
instead, which is probably what we should be using anyway for consistency.

Let's try that patch, we can reprofile once it's on a Nightly build. That patch should probably land regardless.

Assignee: nobody → emilio
Flags: needinfo?(emilio)
Keywords: leave-open

And this is the PGO profile, with sequential styling : http://bit.ly/2UgN1hy
this looks closer to the profile in comment 7

Flags: needinfo?(mayankleoboy1)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fb7fb6f72fb1
Use a hopefully faster rounding for float -> app units conversion. r=dholbert

I assume I'm no longer needed for this.

Flags: needinfo?(dmajor)
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.