Firefox spends more CPU time resolving style during speedometer 3 TodoMVC-*-Complex-DOM "prepare" steps
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
People
(Reporter: mstange, Unassigned, NeedInfo)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [sp3])
Bug 1926423 improved this a lot but there is still a very large difference in CPU time spent on resolving style on these tests.
During the "prepare" step of some of the TodoMVC Complex-DOM tests on sp3, we spend much more time doing Style things than Chrome.
This is still the biggest contributor to using more CPU time overall (bug 1925359).
You can run the Svelte test here: https://www.browserbench.org/Speedometer3.0/?suite=TodoMVC-Svelte-Complex-DOM&iterationCount=5
Profile of style::parallel::style_trees
across all of sp3: https://share.firefox.dev/3C1dPdf
This profile shows around 60% in various sources of overhead:
self.style_source.clone()
duringto_applicable_declaration_block
- Allocating boxed rule nodes in
ensure_child
- Running the
Arc<T>
drop implementation inensure_child
, which callsis_static
and does the refcount decrement StyleSource::read()
(maybe inis_read_only_lock()
but unclear) duringupdate_for_node
and insert_ordered_rules_with_important`- and maybe 30% in what looks like useful work
Updated•4 days ago
|
Comment 1•4 days ago
|
||
Hmm, I'm a bit confused about the profiler, but:
self.style_source.clone() during to_applicable_declaration_block
This is not particularly avoidable, but also, how can this be? This code doesn't seem particularly crazy, and there's no way that's more expensive than the actual work we need to do for styling (if we hit that code-path, we're already doing a full selector-match...).
Is there any chance the profiler is somehow coalescing calls to Arc::clone
or something?
Allocating boxed rule nodes in ensure_child
Hmm, there's no easy way those can be unboxed afaict, we can try to recycle them somehow I guess (not trivial)...
StyleSource::read() (maybe in is_read_only_lock() but unclear) during update_for_node and insert_ordered_rules_with_important`
So, this one is trivial to test out, by commenting out or making this release assert a debug_assert!
:
This codepath is indeed very hot, but I'm a bit skeptic about this being indeed the culprit, in the sense that the profiles seem to hint at places where we're memory bound (the first time that we touch the relevant ApplicableDeclarationBlock
/ DeclarationBlock
).
I suspect that removing that memory access would just move the CPU time elsewhere? Markus, do you know how is this exactly getting measured?
Comment 2•3 days ago
|
||
Bug 1925335 is another where we see refcount related things taking what seems like an unusually high amount of time on this same machine. Added to see also.
Description
•