Closed
Bug 572200
Opened 15 years ago
Closed 13 years ago
Consider always caching reset structs on rulenodes when asked for them
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla19
Tracking | Status | |
---|---|---|
firefox18 | --- | fixed |
People
(Reporter: bzbarsky, Assigned: bzbarsky)
References
Details
Attachments
(5 files)
3.52 KB,
patch
|
Details | Diff | Splinter Review | |
7.59 KB,
patch
|
Details | Diff | Splinter Review | |
3.26 KB,
patch
|
bzbarsky
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
1.04 KB,
patch
|
bzbarsky
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
17.43 KB,
patch
|
bzbarsky
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
I used the attached patch to record which rulenodes had PropagateDependentBit called on them, on the assumption that we'd cache at those callsites. Then in ~nsRuleNode I classified rulenodes into 4 bins: either the rulenode has _some_ reset struct cached on itself or it does not, and either it's had its PropagateDependentBit member called or it has not.
The data varies slightly depending on the test load, but here are the numbers for our Tp4 test:
43471 had PropagateDependentBit but no cached data
62778 had cached data but not PropagateDependentBit
109515 had both
78368 had neither.
So looks like about 41% of rulenodes are saving the memory needed by a nsCachedResetData. 14% would stop doing so if we cached in PropagateDependentBit. The memory involved is 14 words; I'm still working on measuring the performance impact, if any.
If I look at actual dependent bits, not just the nodes that had PropagateDependentBit called on them, then a higher percentage of nodes has the bit but not cached data; apparently some rulenodes have dependent bits but never have style data gotten from them directly.
For inherit structs we'd almost certainly not want to do this. I didn't measure, but the chance of many rulenodes having at least one inherit struct is ... low.
![]() |
Assignee | |
Comment 1•15 years ago
|
||
I tried doing this, and it seems to be a 1% win or so, maybe, on the benchmark I was measuring on. I'll try to get more data, ideally with less noise...
![]() |
Assignee | |
Updated•15 years ago
|
Priority: -- → P1
![]() |
Assignee | |
Updated•15 years ago
|
Priority: P1 → P2
![]() |
Assignee | |
Comment 2•14 years ago
|
||
One other note. If we did this, so that the mCachedResetData would never be null for a rulenode, then we could also ensure that mCachedResetData is non-null for a style context by just pointing it at mRuleNode's mCachedResetData as needed.
We could further cache structs on the rulenode that asked for them, so that reset structs are treated in the rule tree just like inherit structs in the context tree: always cached right where you want them...
So the idea here is that you're proposing that we cache structs when we set dependent bits, possibly only on the requesting node rather than on all nodes on the path? It seems like a classic space vs. size tradeoff; I don't have an opinion on what the right thing is other than that we should measure the tradeoff.
![]() |
Assignee | |
Comment 4•14 years ago
|
||
> you're proposing that we cache structs when we set dependent bits
Yes, pretty much. And that if we do this for all nodes in that path, then we should consider not heap-allocating the mCachedResetData for rulenodes.
Any idea what would actually be useful to measure here?
![]() |
Assignee | |
Comment 5•14 years ago
|
||
I did do some measurements basically using the following testcase:
1) Load HTML5 single-page spec.
2) Change body.style.top.
3) Flush out restyles only (not layout). Measure the time this step takes.
Note that the vast majority of the time is reparenting style contexts, so there is no selector matching or anything involved; just recursing down in ReResolveStyleContext, creating the new style contexts, and the various Get/PeekStyleData involved, plus the CalcDifference calls.
On my machine a vanilla Mac build does this in 275ms or so.
If I implement the suggestion from this bug, that drops to 260ms.
If I then inline the fast path in nsRuleNode::GetStyle* (which is now always hit after the first time for that rulenode), it drops to 245ms.
The overall time for that testcase is about 23% recursive calls in ReResolveStyleContext (maybe we should pass along fewer arguments... if we can), 15% GetChildList calls on nsBlockFrame and nsContainerFrame (mostly the latter; see bug 645597), 6% the actual creation of new style contexts.
The places where time is spent getting styles are DidSetStyleContext (5% of total time after the changes listed above), CalcStyleDifference (20-25% of the time after said changes).
So there's no way we can get more than a 30% speedup on that testcase no matter what we do in this bug. I _think_ that testcase is the one where we call into the ruletree most in comparison to other things we're doing. I could be wrong, of course.
I find myself wishing for the ability to write self-modifying code so that after the first GetStyle* call and caching the data I can just self-modify away all the null-checks and gunk and have a fast-path that gets from the cache. I've clearly been spending too much time with the jits. :(
I wonder whether a better approach is reducing the number of style structs we have, such that we can stop lazily heap-allocating the storage for them. That might get us enough wins in terms of not needing null-checks plus better cache locality
![]() |
Assignee | |
Comment 6•14 years ago
|
||
To continue:
to do even better than the above numbers (if nothing else because there would be no cost then to caching everything on every rulenode, etc).
![]() |
Assignee | |
Comment 7•14 years ago
|
||
This could be an interesting project for a layout intern if we ever get one: measuring just what properties we actually care about how much and figuring out whether our current set of style structs is the right one...
![]() |
Assignee | |
Comment 8•14 years ago
|
||
So as a result of profiling the testcase in bug 780341 comment 56, I came back to this.
I wrote a somewhat similar patch to Boris's, also based on the idea of caching only on rule nodes that actually have a style context (though that's technically slightly different from Boris's idea of doing it on-request), rather than the intermediate rule nodes in the chain that aren't pointed to. Except I took a slightly different tack, and decided to (a) mark rule nodes that have a style context pointing to them and (b) maintain the invariant that any rule node that has a style context pointing to it *always* has any style structs in its mDependentBits cached, which allows completely eliminating nsRuleNode::GetParentData/GetParent##struct_ and removing the mDependentBits check in nsRuleNode::GetStyleData/GetStyle##struct_ around the call to it. (I'm also doing it for both reset and inherited structs; I'm not sure whether that's a good idea or not, though I expect it won't make much difference, and I think the uniformity is probably valuable.)
This ended up giving me a 14% improvement on the testcase, which is at least intended to have the same performance characteristics as the gaia homescreen, and thus basically be a real testcase (related to dynamic changes of transform). (This is substantially bigger, by percentage, than the 275ms to 260ms that Boris reported in comment 5; presumably the additional inlining that Boris used there to get to 245ms could help here as well.)
The patches I tested with are:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/rule-node-know-used-directly
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/rule-node-dont-destroy-structs
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/38d0e857c5a4/cache-on-used-rule-nodes
I think it's worth noting that this is likely to have the most performance improvement on testcases that are doing simple dynamic changse to non-inherited properties on elements with a lot of descendants that match a lot of style rules.
![]() |
Assignee | |
Comment 10•13 years ago
|
||
Hmm. So those patches don't give us an always-non-null mCachedResetData, right? So comment 2 won't really apply...
Even without that I think it's a good idea.
Some performance numbers:
without patches: https://tbpl.mozilla.org/?tree=Try&rev=1c8e1f919a6d
with patches: https://tbpl.mozilla.org/?tree=Try&rev=283945137774
![]() |
Assignee | |
Comment 12•13 years ago
|
||
I'm not seeing any obvious positive changes there based on http://perf.snarkfest.net/compare-talos/index.html?oldRevs=1c8e1f919a6d&newRev=283945137774&submit=true :(
Are any of the numbers a page loading performance test? None of them look like it to me. (Do we still run such a test?)
I was also interested in looking at memory numbers for the pageload tests.
![]() |
Assignee | |
Comment 14•13 years ago
|
||
> Are any of the numbers a page loading performance test?
Hrm. I don't know.... :(
![]() |
Assignee | |
Comment 15•13 years ago
|
||
The claim on irc is that tp5n_paint may or may not be the pageload time. And that compare-talos didn't get updated when people once again renamed tests....
On the assumption that tp5n_paint is the relevant pageloading performance test, and tp5n_main_rss_paint is the relvant pageloading memory test:
Roughly a 1% improvement on tp5n_paint Linux32:
Welch Two Sample t-test
data: c(366.71, 369.69, 368.22, 367.83, 368.06) and c(365.59, 366.56, 363.35, 361.9, 368.34)
t = 2.3839, df = 5.35, p-value = 0.05955
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1695921 6.0775921
sample estimates:
mean of x mean of y
368.102 365.148
Roughly a 2% improvement on tp5n_paint Linux64:
Welch Two Sample t-test
data: c(317.95, 323.61, 322.27, 322.78, 319.85) and c(314.7, 315.12, 316.87, 316.68, 313.85)
t = 4.8944, df = 6.264, p-value = 0.002417
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.953949 8.742051
sample estimates:
mean of x mean of y
321.292 315.444
Roughly a 1% increase on tp5n_main_rss_paint Linux32:
Welch Two Sample t-test
data: c(109.9, 109.9, 109.8, 109.6, 109.3) and c(110.7, 111.2, 114.9, 110.5, 110.3)
t = -2.1024, df = 4.141, p-value = 0.101
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.1915361 0.5515361
sample estimates:
mean of x mean of y
109.70 111.52
Much smaller increase on tp5n_main_rss_paint Linux64:
Welch Two Sample t-test
data: c(143.4, 143.2, 142.8, 142.7, 142.4) and c(143.3, 142.8, 143.3, 142.9, 143)
t = -0.7752, df = 6.388, p-value = 0.4659
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6576939 0.3376939
sample estimates:
mean of x mean of y
142.90 143.06
Performance tests on additional platforms:
baseline: https://tbpl.mozilla.org/?tree=Try&rev=dedd3fa71549
with patches: https://tbpl.mozilla.org/?tree=Try&rev=88183b4e69e8
Some but not all of those results (before patches, then after):
OSX 10.6 tp5n_paint:
data: c(257.82, 258, 258.68, 258.84, 257.93) and c(255.72, 257.29, 256.31, 255.71, 257.52)
t = 3.993, df = 6.208, p-value = 0.006694
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.6838871 2.8041129
sample estimates:
mean of x mean of y
258.254 256.510
OSX 10.6 tp5n_main_rss_paint:
data: c(179.2, 179.8, 179.2, 178.7, 179.2) and c(178.6, 177.9, 179.1, 178.1, 178.7)
t = 2.6702, df = 7.667, p-value = 0.02944
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.09608106 1.38391894
sample estimates:
mean of x mean of y
179.22 178.48
WINNT 6.1 tp5n_paint:
data: c(318.46, 318.38, 315.55, 316.99, 313.13) and c(314.06, 314.48, 314.6, 314.35, 312.33)
t = 2.3477, df = 5.366, p-value = 0.06225
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1849325 5.2609325
sample estimates:
mean of x mean of y
316.502 313.964
WINNT 6.1 tp5n_pbytes_paint:
data: c(119.7, 121.1, 121.4, 121.9, 119.9) and c(121.3, 120.7, 121, 120, 121.2)
t = -0.0819, df = 6.182, p-value = 0.9373
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.226767 1.146767
sample estimates:
mean of x mean of y
120.80 120.84
Attachment #678991 -
Flags: review?(bzbarsky)
Attachment #678992 -
Flags: review?(bzbarsky)
Attachment #678994 -
Flags: review?(bzbarsky)
![]() |
Assignee | |
Comment 22•13 years ago
|
||
Comment on attachment 678991 [details] [diff] [review]
, part 1: Make rule nodes know whether they're used directly by a style context.
r=me
Attachment #678991 -
Flags: review?(bzbarsky) → review+
![]() |
Assignee | |
Comment 23•13 years ago
|
||
Comment on attachment 678992 [details] [diff] [review]
, part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own.
r=me
Attachment #678992 -
Flags: review?(bzbarsky) → review+
![]() |
Assignee | |
Comment 24•13 years ago
|
||
Comment on attachment 678994 [details] [diff] [review]
, part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly.
I don't think allocation from the presshell arena is infallible yet. We should really fix that...
r=me modulo that.
Attachment #678994 -
Flags: review?(bzbarsky) → review+
Depends on: 809533
(In reply to Boris Zbarsky (:bz) from comment #24)
> Comment on attachment 678994 [details] [diff] [review]
> , part 3: Cache data that lives in the rule tree on every relevant rule
> node that has a style context directly pointing to it directly.
>
> I don't think allocation from the presshell arena is infallible yet. We
> should really fix that...
>
> r=me modulo that.
ok, fixing that in bug 809533.
Comment 27•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/bc2720708bc2
https://hg.mozilla.org/mozilla-central/rev/d5566569a5ce
https://hg.mozilla.org/mozilla-central/rev/074e0ab07f0b
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Comment on attachment 678991 [details] [diff] [review]
, part 1: Make rule nodes know whether they're used directly by a style context.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen.
Testing completed (on m-c, etc.): on mozilla-central
Risk to taking this patch (and alternatives if risky): Very low; just setting a bit on an an object.
String or UUID changes made by this patch: None
Attachment #678991 -
Flags: approval-mozilla-aurora?
Comment on attachment 678992 [details] [diff] [review]
, part 2: Allow rule nodes to cache data owned by an ancestor rule node: don't destroy structs that we don't own.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen.
Testing completed (on m-c, etc.): on mozilla-central
Risk to taking this patch (and alternatives if risky): Low; just tells destructor not to destroy objects that we don't own.
String or UUID changes made by this patch: None
Attachment #678992 -
Flags: approval-mozilla-aurora?
Comment on attachment 678994 [details] [diff] [review]
, part 3: Cache data that lives in the rule tree on every relevant rule node that has a style context directly pointing to it directly.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: Lack of significant performance improvement to dynamic changes of non-inherited CSS properties on elements with large numbers of descendants, which is a signficant improvement for the B2G homescreen.
Testing completed (on m-c, etc.): on mozilla-central
Risk to taking this patch (and alternatives if risky): The risk from this patch is probably relatively low simply because the code is exercised so well by just about everything in the browser that we'd notice quickly if there were something wrong with it. That said, there's always the possibility that it has some sort of mistake that we don't catch quickly (e.g., a new rare crash, a new memory leak).
String or UUID changes made by this patch: None
Attachment #678994 -
Flags: approval-mozilla-aurora?
Comment 31•13 years ago
|
||
Comment on attachment 678991 [details] [diff] [review]
, part 1: Make rule nodes know whether they're used directly by a style context.
Approving on aurora as the risk is manageable and this is something that helps b2g
Attachment #678991 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•13 years ago
|
Attachment #678992 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•13 years ago
|
Attachment #678994 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/af8428e09956
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/44de69873585
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/dd0abb35994c
status-firefox18:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•