Closed Bug 43457 Opened 24 years ago Closed 24 years ago

MEM: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: attinasi, Assigned: pierre)

References

(Blocks 1 open bug)

Details

(Keywords: embed, perf, Whiteboard: [whitebox])

Attachments

(21 files)

1.27 KB, text/plain
Details
175.62 KB, patch
Details | Diff | Splinter Review
1.02 KB, text/plain
Details
255.84 KB, patch
Details | Diff | Splinter Review
137.10 KB, text/plain
Details
25.98 KB, text/plain
Details
8.65 KB, text/plain
Details
137.20 KB, text/plain
Details
252.90 KB, patch
Details | Diff | Splinter Review
257.70 KB, patch
Details | Diff | Splinter Review
15.74 KB, image/gif
Details
252.95 KB, patch
Details | Diff | Splinter Review
11.40 KB, text/plain
Details
6.31 KB, patch
Details | Diff | Splinter Review
281.25 KB, patch
Details | Diff | Splinter Review
11.96 KB, patch
Details | Diff | Splinter Review
15.10 KB, image/gif
Details
14.43 KB, image/gif
Details
15.10 KB, image/gif
Details
207.66 KB, patch
Details | Diff | Splinter Review
5.93 KB, patch
Details | Diff | Splinter Review
Style Context Data sharing currently saves a lot of memory at the expense of a moderate performance hit. We need to improve the sharing design to minimize the performance hit and to increase the space savings. Some initial ideas include: - Allocate StyleContextData out of a Pool or Arena to minimize heap allocations - Share style data across documents: presumably this means storing the style context cache (or a style data cache) at a more global scope than the style set currently used to root the collection. - Enable the FastCaching algorithm: this requires closing the encapsulation-hole of GetMutableStyleData and having the style system manage changes to style data effectively (updating the CRC, updating the style context cache, possibly validating the style data is consistent) This bug is intended to track ideas and log results of experimentation and prototyping of designs that improve style context sharing in terms of performance and memory footprint (including bloat).
Added depends on bug 39618 since there is valuable information in that bug that pertains to this one.
Status: NEW → ASSIGNED
Depends on: 39618
Target Milestone: --- → M20
Adding myself to the Cc list...
Adding perf keyword
Keywords: perf
Moving all non-nsbeta3 bugs to future milestone: these will be worked on after beta3/rtm.
Target Milestone: M20 → Future
adding myself to interest list
Keywords: embed
Setting milestone. Adding additional idea too: Sharing is now all or nothing (all style data for a context is shared or not). A finer granularity of sharing, where each style struct can be shared seperately, would increase the sharing potential.
Target Milestone: Future → mozilla0.8
I broke up nsStyleSpacing into 4 smaller structs (nsStyleMargin, nsStylePadding, nsStyleBorder and nsStyleOutline). It is the first step to reduce the memory footprint as described in news://news.mozilla.org/3A5A072F.5F0B752D%40netscape.com
Attached file list of modified files
I have reviewed the code and I have only a couple of minor questions/comments: Zeroth, I'm amazed at the amount of changes, and the lack of problems - well done! First, I noticed that nsButtonFrameRenderer uses two methods for getting the borderAndPadding, one is to get border and padding individually and then combine them, and the other is to use the new shortcut eStyleStruct_BorderAndPaddingShortcut. Is this intentional? I would have thought that the shortcut would be the way to go in all cases. Secondly, I started seeing this #ifdef XP_MAC \ pragma mark - \ #endif construct with no documentation: what is it, and should it be removed or commented? Third, it looks like nsCSSRendering has an 'overloaded' argument type now, since there are casts of aBorderStyle to nsStyleOutline. This is seriously ugly, and should probably be changed to pass in both types and check for one being null. At the very least, document the hackery. Fourth, I saw several examples where there is a call like 'this->CalcBorderPadding(...)'. Why bother with the 'this->'? It is equivalent to 'CalcBorderPadding(...)'. See the nsGfxCheckboxControlFrame.cpp diff for the first example of this construct. Fifth, nsHTMLReflowState.h introduced forward declarations for nsStyleBorder,Margin, and Padding, but none was there previously for nsStyleSpacing. I'm curious why we need the forward declarations now. r=attinasi, assuming these points are addressed, especially point (3).
1) Fixed. 2) "#pragma mark -" puts separators in the list of functions of the Mac editor. You can see them in other particularly long files that are often modified by Mac folks. 3) Correct, I made uglier something that was already there. It's fixed. 4) Fixed, but this notation can be found in many places. 5) There *was* a forward declaration of nsStyleSpacing. You comments allowed me to make a global search on nsStyleSpacing and to remove it entirely, including in comments.
Performance regression caused by this checkin is in bug 66263
Important Comment from bug 66263 ------ Additional Comments From Axel Hecht 2001-01-24 23:24 ------- This checkin has caused a regression in the XSLT code, which caused some worries to both Peter and me over the last few days. The backing out made our life happy again. To be specific, we couldn't style a xhtml body tag anymore. If code like the one backed out should go back in, could you (pierre?) please speak out load, so we can test in advance? Thanx Axel
Moving to Mozilla 1.0 - this cannot be completed in the next week or two...
Target Milestone: mozilla0.8 → mozilla1.0
I tried to apply this patch (to test XSLT), to no avail. I got around the whitespace mess, but the filenames are just not found. I need a commandline to apply this on unix, I tried all the commandline opts I know. Btw, XSLT is targeted to go into the default builds for 0.9, if this patch is the code that caused us to regress this will be pretty serious. Axel
Is the performance problems solved now?
It seems that the patches have been checked in on Feb 7. Does XSLT work now? This also caused some troubles for us trying to break content from layout, but I believe peterv has solved them for that case(?)
XSLT didn't regress this time. Axel
From an email about some profiling data: Brendan Eich wrote: - 27% of this time (9msec) is spent in 70 calls to nsXBLBinding::GenerateAnonymousContent(). About half of this time (5msec) goes to nsXMLElement::CloneNode() (copying the binding). - 20% of the time (17msec) is spent in StyleSetImpl::ResolveStyleFor(). - Most of the time (9msec) is spent in 160 calls to NS_NewStyleContext(). The single biggest consumer is AccumlateCRC() (1msec). 1msec is not big in light of other numbers, but it's still a lot of cycles. Marc, wouldn't a rotating xor a la http://lxr.mozilla.org/mozilla/source/js/src/jsdhash.c#74 be sufficient? I don't see the need for full-on Galois field arithmetic here! I'm all for making the CRC calculation faster, however we do need to make sure that we still get a good ration of unique CRCs to total CRCs. In fact, I bet we can also limit the CRC calculation to only select fields int he StyleContextData structures, and still get most of the benefit. Me thinks that this is worth looking into - being carefule to track the CRC uniqueness.
I will soon have another big checkin. Reassigned to myself. Set milestone to mozilla0.8.1 (March 13th).
Assignee: attinasi → pierre
Status: ASSIGNED → NEW
Priority: P3 → P2
Summary: PERF: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation → MEM: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation
Target Milestone: mozilla1.0 → mozilla0.8.1
Moving to mozilla0.9
Target Milestone: mozilla0.8.1 → mozilla0.9
Blocks: 71874
I've got something that apparently works nicely. The StyleData, which accounts for about half the memory used by the style system, is reduced by 65% or 70%. The overall reduction for the style system is around 35%, more on complex pages. The reduction can reach a couple of hundreds K on pages like Netscape.com. Apparently again, the performance wasn't hurt much, if at all. I'll post more accurate data later. I'm going to attach the patch for other people to start play with it.
Attached file list of modified files
For those who want to review the code or test the diffs, the basic principles are the following: 1) Change of interfaces to modify the style data: GetMutableStyleData() was replaced by 2 methods: ReadMutableStyleData() and WriteMutableStyleData(). Look into nsIMutableStyleContext.h for more info on the helper classes and the implementation. 2) Memory footprint reduction: In nsStyleContext.cpp, the nsStyleContextData now contains pointers to the 15 individual style structures, instead of the structures themselves. When a pointer is null, it means that we are inheriting the data from the parent. 3) Simplification of the code: The 15 pointers in nsStyleContextData are stored as a union of individual pointers and an array of generic pointers. The array allows to loop through the structures instead of duplicating the code 15 times in switch() statements. This simplification required a slightly unfortunate change in the definition of nsStyleStruct (in nsStyleStruct.h) where we now expose methods that should be hidden from everybody but the StyleContext. It's a small drawback for a big gain.
I'm trying to build with the latest patch, and I get mozilla/content/base/src/nsStyleContext.cpp:4218:35: nsStyleContextDebug.cpp: No such file or directory Did you miss a file in the diff?
In nsStyleContext.cpp, methods StyleDisplayImpl::SetFrom and StyleDisplayImpl::CopyTo, you haven't made the same change within the #ifdef IBMBIDI block that you made in the surrounding code. Another tiny point: at the very beginning of the same file, there is a regression from "Bidi" to "BiDi" in a comment. Please change this back to "Bidi" to retain consistency.
Thanks to Mike and Simon for pointing out these errors. I'm going to attach the files themselves instead of the diffs. Note that the debug features in nsStyleContextDebug.cpp/.h can only be activated on the Mac for now. Someone has to adapt IsTimeToDumpDebugData() for other platforms.
Attached file nsStyleContext.cpp
Attached file nsStyleContextDebug.h
nsStyleContextDebug.cpp and nsStyleContextDebug.h should be stored in the same directory as nsStyleContext.cpp (mozilla/content/base/src)
Adding myself to the list.
Rather than exposing some methods, couldn't you make StyleContext a friend of nsStyleStruct ? (Didn't look at the patch, so dunno if this makes sense.)
If you broke the XUL stuff, you'll know just by running the product. If a navigator window and a prefs window comes up and looks ok, then the XUL struct must work.
Heikki: I could have declared the methods private and make StyleContextImpl and nsStyleContextData friends of nsStyleStruct but it seemed to me that it wasn't much of an improvement. It would also require to move some debugging code into a wrapper class and declare it as friend of nsStyleStruct.
Dave & others: Here is yet another version of nsStyleContext.cpp that fixes some obvious problems with XUL...
Blocks: 72381
Pierre, what testing have you done on this? Specifically, have you run the style/block/table regression tests? Have you run John Morrison's Load Time tests? Just curious.
I did not test against any test suites yet. Please check Block & Tables as they require a Window machine and I'll take care of the CSS test suites. The tests I did until now, and that I just completed, where about memory footprint and performance in Debug and Optimized builds, before and after my changes. As you can see, we have a slowdown of around 5% as a trade-off to reduce the memory by 30% to 40% in the Style System overall, which represents about 80k for Yahoo, 160k for CNN and Netscape and 275k for MSN. The savings for the chrome are a bit disappointing, around 15%, which represents 64k for Navigator and 100k for Messenger. The percentage is so low in the chrome because most of the memory allocated for the style system in XUL files is taken by CSSDeclarations and StyleRules. The XUL files generate between 2 and 2.5 times the average number of declarations and style rules that we find in web pages. Slowdown/DBG(%) Slowdown/OPT(%) Footprint(%) Footprint(Kb) Yahoo 2 2 -30 -83 Slashdot 4 6 -35 -140 CNN 5 0 -31 -158 Netscape 1 2 -35 -170 MSN 3 5 -40 -275 Navigator.xul n/a n/a -14 -64 Messenger.xul n/a n/a -18 -100 CSSDeclarationImpl CSSStyleRuleImpl Yahoo 332 582 Slashdot 339 584 CNN 466 710 Netscape 380 625 MSN 402 644 Navigator.xul 928 1389 Messenger.xul 1002 1457
Moving to correct QA Contact- here ya go Ian!
QA Contact: ckritzer → ian
I see no r= or sr=, but understand from pierre that marc has not tested the patch (or perhaps neither tested nor reviewed it? The message forwarded to me isn't as clear as an r= in the bug). Has anyone tried the patch on Linux? Should these changes carpool in, to get verification and followup fixes if needed? Today is not a good day to avoid a carpool. /be
I posted a request for review last Monday but got basically no response. My partner Marc is way too busy fighting other fires in Layout. Could someone please help me to review these changes? I would especially appreciate if someone could build and play a bit with Mozilla on Unix. Thanks!
Sounds like testing is more important than reviewing at this stage. Pierre, I can try an up-to-date patch -- should I just detach the diffs and the whole files and use them, or are they now slightly out of date with respect to top of trunk? Cc'ing jrgm, who should run the patch too, if he's able. /be
To build it, I need (?): attach_id=28072 -- diffs to layout/content, but remove the nsStyleContext*.(h|cpp) stuff attach_id=28118 -- nsStyleContextDebug.cpp attach_id=28119 -- nsStyleContextDebug.h attach_id=28142 -- nsStyleContext.cpp (updated version) where the last three are whole files, and go in mozilla/content/base/src. Is that correct? I will do this on win32, as my linux box is (currently) the test server, and my mac build-fu is weak. However, if someone can, running the gecko perf tests (which are more specific to this bug); and the block/table/layout regression tests would be good if people are concerned.
I haven't build in the past 4 days but according to Bonsai & CVS, it should still be ok. To merge the changes: - apply the diffs - add nsStyleContextDebug.cpp/.h to mozilla/content/base/src - overwrite nsStyleContext.cpp
sorry. I didn't get to this quite yet, but I will do it by 3pm tomorrow.
once we get a build of some sort, we can coordinate to run the block/table/layout regression tests which Pierre wasn't able to. I'll get this coordinate with the cia@netscape.com group. Thanks.
I checked in nsStyleContextDebug.cpp/.h and updated my local tree. I'm going to attach a new set of diffs in a single file. To try the new code, just do an update and apply the patch.
Sorry, I finally got these run done. As pierre mentioned, the times are about 4% slower, on either the first load (variable range), or a subsequent visit when cached (fairly consistent across the board). (This was for a tree pulled ~11pm last night, built, ran test, patched, built, ran test 2, on win2k).
Hrm, that's a pretty steep price to pay. What were the savings again?
jrgm, still got that build? can you provide a pointer to it? how 'bout we get curt to to a special mem test run to look at that graph?
Yes, I still have that build on win2k. It's not on a shared drive, but I can make it available (eek! virus). But, those memory tests run on linux, don't they? Are you really looking for a linux build (which waterson made).
I can potentially do a round of builds on three platforms for this once we get scc's string changes tested and in.
I have a fix for the two regressions detected by QA in the test builds that were made last week. I sent Leaf the new diffs this morning in order to generate new test builds. I'm going to attach them there too.
People started to inquire about performance lately. Here is a copy... Chris Waterson wrote: > > Asa Dotzler wrote: > > > It looks like Pierre is working to get the Style Context Sharing changes > > landed for Mozilla 0.9. drivers is looking for input on this one. Do > > you think it's something we want to do. Without further objection I > > suspect that they are going to try to land it this week. I'm getting > > ready to start waving off major landings until after 0.9 real soon now > > but this could make it in sooner than that. > > > --Asa > > Pierre, have you fixed the performance problems yet? If so, then let's > take the changes if you can land them by tomorrow. > > If not, then I think these should be waved off for mozilla-0.9. We're > simply out of time to fix the performance problems. (I'd originally > endorsed ``just landing the changes'' because the landing date was early > last week.) > > My $0.02. > > chris It was never in my plans to fix the 4% performance drop before landing because nobody objected that the trade-off, memory footprint vs performance, was not very much acceptable. Phil inquired about the perf yesterday. Here is my response (you are mentioned in it, Chris). It also describes two ways to improve the speed a bit: one which consists in tuning up the memory allocators (sfraser usually does that on the Mac, I don't know for other platforms), another in adding arenas. As for checking in, I've been waiting for the Build team to make some test builds since Thursday morning. They promised them for sometime today, Tuesday. QA needs to do a last round of verifications, then I'll checkin. When doing a first round of verification 10 days ago, QA found 2 regressions. It's fixed now so if you like I can skip this second round of tests and just checkin. Pierre Pierre Saslawsky wrote: > > Phil Peterson wrote: > > > Chris and Pierre, I heard some talk today that the CSS footprint work > > comes with some performance degradation. Is this the case? If so, could > > we talk sometime tomorrow about the details of how much heap space is > > saved and how much performance is lost? I'd like to understand this > > before the work gets checked in. > > > > -- Phil. > > Basically, we save 70% of the StyleData which represents half the footprint > of the style system. It means that we save 30% to 40% of the entire memory > used by the style system. > > The savings represent approximately: > > * 80k on Yahoo > * 160k on Netscape and CNN > * 275k on MSN > * 64k per empty Navigator window (no sidebar) > * 100k per empty Messenger window (no lists, just the chrome) > > I originally estimated the slowdown at less than 5%. John Morrison > confirmed that it is in fact 4%. Note that this degradation has been > measured without doing any optimization at all. > > Chris Waterson had the same question and concern as you: > > ------- Additional Comments From Chris Waterson 2001-03-27 23:59 > ------- > Hrm, that's a pretty steep price to pay. What were the savings > again? > > When I gave him the numbers, his response was: > > Chris Waterson wrote: > > > > Wow, that's great. > > > > chris > > Anyhow, the easiest way to improve the performance a bit would be to tune > up the memory allocators to reserve larger pools of blocks that match the > individual style structures (we no longer allocate the StyleData as a > single block of 800 bytes but as 14 or 15 smaller blocks that correspond to > the style structures like font, margin, color etc...). Tuning up the > memory allocators is something we should always do before shipping a > version anyhow (and that we did, afaik). It would be a good idea to do it > after I checkin as the pattern of the block sizes we allocate across the > app would change quite a bit for at least 5 of them (out of the 15). > > Another way to improve the performance would be to get these style > structures out of arenas. It involves a little bit of work. I haven't > looked into it but at first sight it's something that can be done in a few > days. > > Below are the comments and chart from John Morrison: > > ------- Additional Comments From John Morrison 2001-03-27 21:35 > ------- > Sorry, I finally got these run done. As pierre mentioned, the > times are > about 4% slower, on either the first load (variable range), or a > subsequent > visit when cached (fairly consistent across the board). (This was > for a > tree pulled ~11pm last night, built, ran test, patched, built, > ran test 2, > on win2k). > > More details are at http://bugzilla.mozilla.org/show_bug.cgi?id=43457 > > Pierre > > > [Image] > > >
A point to consider is that when I landed the first part of these changes back in February, I did a 1-line fix that improved the overall performance by 5%. Could we consider the two parts of my changes as a whole, and say that a 5% improvement minus a 4% slowdown is still a 1% improvement? Another idea: why don't we land my changes and disable the style context sharing? Mine reduce the StyleData by 30 to 40%, the style context sharing by 15 to 40%. Mine cause a 4% performance drop across the board, the style context sharing a 10% performance drop inside the style system (http://bugzilla.mozilla.org/ showattachment.cgi?attach_id=10050). We need to figure out what was the overall performance degradation of the style context sharing across the app. It was enabled them on December 14th 2000 according to the bug report. Was there any page load measurements taken back then?
Pierre has an interesting idea here, about disabling style context sharing when we enable his style data sharing. However, it is not clear what the savings would be for style data sharing if style context sharing is disabled - it may not yield the same level of memory benefit. Also, style context sharing is actually faster than what I had originally measured. This is largely due to the faster StyleContextCache being put in, which I believe resulted in only a 5% slowdown for Viewer (and even less for SeaMonkey, reported anecdotally). I'd love to see what the performance cost of style conext sharing is for SeaMonkey, and now we have the tools to do that (eg. jrgm). I suggest running tests to characterize the performance and memory gains/losses across two dimensions: With and Without context sharing With and Without data sharing If we see that the footprint gains due to style context sharing alone are less than the gains due to style data sharing alone, and the performance gain from disabling style context sharing make style data sharing's performance loss acceptable (or zero), then we should turn off context sharing and turn on style data sharing. I think this is a good approach because unlike the style context sharing can be enabled or disabled with a #define in one file, so it ie easy to keep in the tree while we work on performance issues in style data sharing. (I tried to make this more confusing, but this is as obfuscated as I could get it...)
[mid-air collision - resubmittig - however, I noted that marc raised similar points to mine] BTW, there hasn't been much comments about the interaction of your "StyleData sharing" and the previous "StyleContext sharing". Could you brief us a little bit on what is happening on this front? Is there any inter-dependence? When both are used, do your changes entail an extra overhead in the implementation of the style context sharing? If both are used, does that lead to "double" (or some other factor) extra savings. Do your "low-level" styledata changes provide an independent alternative that aptly removes the need for the most expensive "higher-level" style-context sharing? [I am just asking these questions as an indication of the things that some might find useful to understand. Thanks]
StyleContext and StyleData sharing are independant. I'll try to get some numbers today.
Here is the first batch of numbers... In terms of memory footprint, the StyleData sharing represents an improvement of 7% to 21% over the StyleContext sharing. I compared the overall size used by the style system. If we compare just the style data, the difference is twice as much. I collected the numbers from 3 test pages only: Yahoo, Netscape and MSN. Now I have to compare the performance. That's where I fear some problems because we have much more memory allocations with the StyleData sharing. We'll see... It will take a couple of hours: I have to make some optimized builds. ================================================================ Count TotalSize MinSize MaxSize AvgSize ================================================================ Yahoo - StyleContext sharing only StyleContextData 177 135228 764 764 764 StyleContextImpl 295 20954 56 110 71 *** Total *** 2459 291325 ---------------------------------------------------------------- Yahoo - StyleData sharing only StyleContextData 293 74164 68 884 253 StyleContextImpl 293 20810 56 110 71 *** Total *** 2573 230053 ================================================================ Netscape - StyleContext sharing only StyleContextData 369 281916 764 764 764 StyleContextImpl 1035 72902 56 138 70 *** Total *** 3613 508247 ---------------------------------------------------------------- Netscape - StyleData sharing only StyleContextData 1035 247012 68 884 238 StyleContextImpl 1035 72902 56 138 70 *** Total *** 4279 473279 ================================================================ MSN - StyleContext sharing only StyleContextData 587 448468 764 764 764 StyleContextImpl 1418 105952 56 110 74 *** Total *** 4324 716391 ---------------------------------------------------------------- MSN - StyleData sharing only StyleContextData 1411 319808 68 884 226 StyleContextImpl 1411 105464 56 110 74 *** Total *** 5141 587179 ================================================================ Results: Yahoo Data/Context = 230053/291325 = 79% or -21% Netscape Data/Context = 473279/508247 = 93% or -7% MSN Data/Context = 587179/716391 = 82% or -18%
A quick food for thought: if the style data are shared (i.e., using pointers to the actual values), then may be the style context sharing logic can be re-worked a little bit so as to determine the sharing based on these shared data pointers. This way, no cost will be incurred in accummulating the CRC with the actual values. Seems like combining the two in this way might be of some help in the -4% perf problem.
It's already the way it works now: the CRC is not computed for these style data that are inherited from the parent. But, wait, I've got some good news. Here is the second batch of numbers... In terms of performance, the StyleData sharing represents an improvement of 2% to 4% over the StyleContext sharing. I compared the overall performance of the application. If we compare just the time spent inside the style system, the difference should be at least 3 times higher. I collected the numbers from 2 test pages only: Yahoo and CNN. I could not test Netscape and MSN like before because of a regression that was in the tree when I checked out last week, which causes some pages not to post a notification when the page is fully loaded. It appears that the removal of CRC calculations and table lookup that we have in the StyleContext sharing has offset the larger number of memory allocations in the StyleData sharing. It's important to note that in my version, the nsStyleContextData is still allocated separately from the StyleContextImpl. If I put it this structure back into the StyleContextImpl, we'll allocate larger blocks overall but half as many. This could represent an additional improvement in performance. I'll see if I can work something out relatively quickly. ================================================================== Yahoo - StyleContext sharing only 814+812+817+817+814+815+815+815+811/9 = 814 ms ------------------------------------------------------------------ Yahoo - StyleData sharing only 801+792+794+797+793+796+796+799+796/9 = 796 ms ================================================================== CNN - StyleContext sharing only 1315+1301+1363+1275+1320+1309+1262+1277+1274+1301/10 = 1300 ms ------------------------------------------------------------------ CNN - StyleData sharing only 1238+1242+1241+1246+1254+1260+1257+1260+1244+1256/10 = 1250 ms ================================================================== Results: Yahoo StyleData/StyleContext = 796/814 = 97.8% or -2.2% CNN StyleData/StyleContext = 1250/1300 = 96.2% or -3.8% Note: In both cases, I dropped 1 out of the 10 measures taken with Yahoo because it was clearly out of bounds (something like 900ms instead of 800).
Conclusion: Without any tune-up of any kind, the StyleData sharing represents improvements over the StyleContext sharing of: 7% to 21% in terms of memory footprint in the style system overall, 2% to 4% in terms of performance across the application. So... may I checkin now? I would feel more comfortable with another round verifications but that would require making it a priority for the Build team to generate some builds on Win32 and Unix, and for QA to play with them a little bit. Otherwise if the time is running short, I can just checkin. After all, that's the way everybody else proceeds. Who wants to sign on this?
Assume the following situation: - you build your own mozilla - you have plenty of memory - you want to tune for speed instead of size Will there be a way to turn off these two things (style data sharing / style context sharing)? Or is the performance loss irreversible?
From pierre analysis, it appears that StyleData sharing is faster and more memory efficient than StyleContext sharing. This can be understood that there is usually more chance for two things (style contexts) to share a common "subpart" (style data) than to be identical as a whole. Plus, there is a cost in determining if sharing is possible in the first place. And it appears that this determination is less expensive with StyleData sharing that with StyleContext sharing. So pierre, what is your call, do you recommend to enable StyleData sharing and disable StyleContext sharing by default?
Sure, it's smaller and faster. I'm fairly confident it won't break anything because we already went through a first round of QA verifications. But a second one wouldn't hurt either. I'll let the drivers decide how to proceed because it would require the Build and QA team to spend some extra time on it. And of course, I'll be glad to check in. I don't like to have the code sitting in my local tree. If it's checked in, it would allow me or others to start playing with an idea I had earlier this week. I'll post it here when I can get access to my email account again, or maybe I'll open a related bug report. Basically it consists in storing the style data into hash tables. It would be a killer in terms of memory footprint but there are some risks in terms of performance. It's worth doing some instrumentation and get some numbers at least. After moz0.9, of course.
To answer Andreas's question, I think it is out of the question to disable both optimizations. According to Marc in bug 39618, the StyleContexts + StyleData on CNN used to take about 1Mb. With either method we can divide it by 3.
linux bits in http://ftp.mozilla.org/pub/mozilla/nightly/latest-43457/ windows builds are running now, and will appear in the same directory.
I tried out the win32-talback drop (in fact I am sending this comment with it). No weirdness to report so far. Yep, pierre, now that instrumentation has shown an improved perf, your call makes sense. Indeed, the style data sharing is "inclusive" of the style context sharing (i.e., the very fact that two style contexts are identical means that all their style data are indentical and shared). By single-mindely focussing on style data sharing, maybe more consolidation and efficiency can be implemented in due time before 1.0.
So pierre is measuring a speedup, yet at one point I thought Netcsape QA measured a slowdown (I'm guessing with jrgm's test suite). Pierre, I presume you've only tried this on Mac? Might other platforms be affected differently? (e.g., because malloc() is more expensive on Win32/Linux due to lock contention?) It'd be good to undesrtand why we're seeing different reports. (Maybe the delta is just too small to realisitically resolve with our tools, in which case, performance should be considered a non-issue.) Out of curiosity, what do you see on a larger pages? (e.g., nsCSSFrameConstructor.cpp in LXR, tinderbox, table stress test.) Do we know the impact of this stuff on chrome? (Menus, tree, hover, tooltips, etc.) Maybe hyatt could help you assess that. Are you still looking for r= and sr=? Or are those covered?
Apparently this is the first time it is tested in isolation (without SC sharing)
Chris, - It is indeed the first time StyleData sharing is tested independently from StyleContext sharing. - I tested on the Mac only. - Results may vary depending on your platform but if experience is of any indication, they should really not vary by a lot. For instance, when I predicted a slowdown lower than 5% for SC + SD sharing on the Mac, jrgm found around 4% on other platforms. - I will test larger pages later today. - I don't know the impact on the chrome but if experience is of any indication, it will be good. - There has not been any formal review because _you_ and other people declined to reply to my repeated requests. So with Brendan's suggestion, I went through the process of a landing, with verification builds handed over to QA before checking in. I also supported Marc's standpoint that for these changes a thorough testing would be much more important that a formal review. Anyhow, if you'd like to review now, please be my guest: the diffs are attached. If you want the exact code, I can attach new diffs where StyleContext sharing is disabled.
Were jrgm's tests run using optimized builds with these changes, or debug? Was the comparison made to a debug or optimized build? Could that account for the performance hit?
pierre, I apologize if you felt that I ignored you. (IIRC, when you asked me to sr=, I replied, asking if Marc liked the changes. You said something along the lines of ``let's just get a Unix build to work first''.) If you'd still like me to sr= these changes, attach the latest diffs to the bug and I'll go through them.
Leaf, all the performance measurements are done with optimized bits. The debug builds are at least 30% slower, often much more than that. Chris, thanks for your offer. I'll attach the diffs within the next hour or so. Look at my comments from [2001-03-18 19:30] for more info.
So to make sure I understand, the current plan is to disable style context sharing and enable style data sharing.
Yes, the plan is to disable StyleContext sharing and use StyleData sharing instead because StyleData sharing results in a smaller memory footprint and a better performance. We'll address the performance problem and re-enable StyleContext sharing later. The benefit of having StyleContext sharing together with StyleData sharing is that both optimizations work separately. StyleData sharing reduces the memory used by the nsStyleContextData by 70%, while StyleContext sharing reduces it by 60% or so. The two reductions are almost cumulative in real life tests. Chris: I'm going to attach new diffs with StyleData sharing only. I had to do a complete update (meaning: you shouldn't have any problem to apply the patch).
Pierre, I'm confused by the new patch (30514)... Are you just removing the #define SHARE_STYLECONTEXTS or removing the context sharing code? For example, what happended to the GETSCDATA macros? Like I said, I'm confused.
Never mind - I see what is going on. Sorry for the noise.
I think you should talk to scc about the nsMutableStyleXXX stuff. I'd bet he could help you design a solution that avoids the C-style casts, and is therefore type-safe. As it stands, you're forcing a programmer to circumvent the compiler's type-checking mechanisms to use this stuff! Personally, I'd have overridden |operator->()| and |operator*()|; e.g., class nsMutableStyleBorder { public: nsMutableStyleBorder(nsIMutableStyleContext*) {...} nsMutableStyleBorder(nsIStyleContext*) {...} nsStyleBorder* operator->() { return NS_STATIC_CAST(nsStyleBorder*, mStyleStruct); } const nsStyleBorder* operator->() const { return NS_STATIC_CAST(const nsStyleBorder*, mStyleStruct); } nsStyleBorder& operator*() { return *NS_STATIC_CAST(nsStyleBorder*, mStyleStruct); } const nsStyleBorder& operator*() const { return *NS_STATIC_CAST(const nsStyleBorder*, mStyleString); } }; s.t. a user could do: nsMutableStyleBorder styleBorder(mStyleContext); styleBorder->mTextAlign = /*whatever*/; MapHTMLBorderStyle(aPresContext, *styleBorder, tableFrame); It might even read more nicely as nsMutableStyleBorder styleBorder = GetStyleBorderFrom(mStyleContext); Kinda makes it more clear that you're extracting something from the style context, rather than ``constructing'' something. (Maybe not. Take it or leave it.) You could probably do this with templates and typedefs with a lot less typing, too. (But that's really scc's domain...) +#ifdef XP_MAC +#pragma mark - +#endif What does this do? struct nsStyleStruct { + // These declarations should not be here but they make + // the code in nsStyleContext.cpp so much easier!... + // It allows to store the style data as a union of individual + // data structures and an array of generic data structures. + virtual void ResetFrom(const nsStyleStruct* aParent, nsIPresContext* aPresContext) {}; + virtual void CopyTo(nsStyleStruct* aDest) const {}; + virtual PRInt32 CalcDifference(const nsStyleStruct* aOther) const {return 0;}; + virtual PRUint32 ComputeCRC32(PRUint32 aCrc) const {return 0;}; }; Should these be pure virtual? Or do you really have subclasses that delegate up to these implementations? Is AssignFrom() a better name than ResetFrom()? Would symmetry be better if you had a single |CopyTo()| method to copy structures, and a |GetDefaultsFrom(nsIPresContext*)| to fill in default values? e.g., if (aParent) aParent->CopyTo(STYLESTRUCT[i]); else STYLESTRUCT[i]->GetDefaultsFrom(aPresContext); There are a couple places where you currently do STYLESTRUCT[i]->CopyFrom(nsnull, aPresContext); which is a bit weird, and might make more sense using the GetDefaultsFrom() scheme. Certainly gets rid of a bit of duplicate assignment code. + switch (aSID) { + case eStyleStruct_Font: + if (aPresContext) { + result = new DATAIMPL(Font)(aPresContext->GetDefaultFontDeprecated(), aPresContext->GetDefaultFixedFontDeprecated()); + } + else { + result = new DATAIMPL(Font); + } + break; + case eStyleStruct_Color: result = new DATAIMPL(Color); break; + case eStyleStruct_List: result = new DATAIMPL(List); break; + case eStyleStruct_Position: result = new DATAIMPL(Position); break; + case eStyleStruct_Text: result = new DATAIMPL(Text); break; + case eStyleStruct_Display: result = new DATAIMPL(Display); break; + case eStyleStruct_Table: result = new DATAIMPL(Table); break; + case eStyleStruct_Content: result = new DATAIMPL(Content); break; + case eStyleStruct_UserInterface: result = new DATAIMPL(UserInterface); break; + case eStyleStruct_Print: result = new DATAIMPL(Print); break; + case eStyleStruct_Margin: result = new DATAIMPL(Margin); break; + case eStyleStruct_Padding: result = new DATAIMPL(Padding); break; + case eStyleStruct_Border: result = new DATAIMPL(Border); break; + case eStyleStruct_Outline: result = new DATAIMPL(Outline); break; Could you use a function table here to avoid the fact that switch degenerates to branch-and-test? (Not sure if this is called a lot; if not, don't worry about it.) +static PRBool CanShareStyleData(nsIStyleContext* aStyleContext) { -#define max_structs eStyleStruct_Max + PRBool bSharingSupported = PR_TRUE; Hungarian notation. Yuck. +nsStyleStruct* gStructPointer[eStyleStruct_Max][kStructIndexMax]; +bool gStructBusy[eStyleStruct_Max][kStructIndexMax]; Please use PRBool, or PRPackedBool. + static bool initialized = false; + if (!initialized) { Ibid. + short structType = aSID - 1; + for (short structIndex = 0; structIndex < kStructIndexMax; structIndex++) { + if (!gStructBusy[structType][structIndex]) { + result = gStructPointer[structType][structIndex]; + gStructBusy[structType][structIndex] = true; PR_TRUE. + NS_ASSERTION(false, "Can't release mutable style struct"); NS_ERROR(...) I don't really understand why you've decided to pool the style structs for purposes of nsMutableStyleXXX reading and writing? Why didn't you just make a member variable in each nsMutableStyleXXX class that would serve as the temporary storage for reading and writing the struct? + if (aSID == eStyleStruct_BorderPaddingShortcut) { +/* + nsMargin border, padding; + if (GETDATA(Border)->GetBorder(border)) { + if (GETDATA(Padding)->GetPadding(padding)) { + border += padding; + ((nsStyleBorderPadding&)aStruct).SetBorderPadding(border); + } + } +*/ + nsMargin border, padding; Use #if 0, and comment as to why this is not currently part of the code. Index: mozilla/layout/xul/base/src/outliner/src/nsOutlinerBodyFrame.h =================================================================== RCS file: /m/pub/mozilla/layout/xul/base/src/outliner/src/nsOutlinerBodyFrame.h,v retrieving revision 1.25 diff -u -2 -r1.25 nsOutlinerBodyFrame.h --- nsOutlinerBodyFrame.h 2001/04/11 02:44:25 1.25 +++ nsOutlinerBodyFrame.h 2001/04/12 00:15:59 @@ -29,4 +29,5 @@ #include "nsIScrollbarMediator.h" #include "nsIWidget.h" +#include "nsHashtable.h" Why does nsOutlineBodyFrame now need nsHashTable.h? Ok, that's all I can do for now. I'll hope marc has covered the hard stuff. :-)
I'm gonna have to back Chris up on this one, Pierre. The preferred way to make an object act like a pointer to another object is to implement |operator->| and |operator*|. For a non-pointer object, these have no built-in implementation, so you're not removing functionality. |operator&| is defined for _every_ object, so redefining that takes away a power permanently, and for no reason. Someday, someone might want to pass one by address. If you override |operator&|, they can't. And you can do the same job better (more easily used by clients) with |operator->| and |operator*|; plus it'll be that much easier for someone trying to understand your code, because that's the idiom for smart-pointer style implementations. Two other things Chris pointed out are worth re-iterating: (1) avoid C-style casts; new style casts catch errors that old casts hide; and (2) |bool|, |true|, and |false|, aren't defined on every platform I'm ashamed to say. Good use of the stack-based transaction idiom for style modification though. I like it. I only looked at the auto-style objects; I've got to go examine the mutable-style cast situation now. More comments when I have a better understanding of the big picture.
1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct: ---------------------------------------------------------------------------- scc was copied on the review requests too. My fault if I wasn't more specific when I contacted him because that's where his input would have been the most valuable: your proposal is the kind of solution that I hoped he had made. I like the solution but if you don't mind, I'd prefer to implement it after checking in because it is only a syntactical change to enforce the type-checking. 2) GetStyleXXXFrom(mStyleContext): ---------------------------------- I prefer the current notation where it is clear that the helper classes are in fact constructing something, because it is the destructor that causes the modified data to be written in the style context. You might have seen a couple of places, once in Table code and maybe once in Block layout, where I added blocks {...} to make sure the data was written before we continued the processing. 3) Templates: ------------- Same thing as (1). I'd say let's checkin and work on the notation later if scc comes with a better idea. 4) #pragma mark -: ------------------ It draws a separator in the list of functions that pops up in the CodeWarrior editor. Very convenient... You can see it in large files written or maintained by folks who use the Mac as main developement machine. 5) Pure virtual functions in nsStyleStruct: ------------------------------------------- It was my idea at first, of course. Unfortunately it would have required all the classes that inherit from nsStyleStruct to declare and implement these methods. The breakup of Layout into 2 DLLs made the task a bit more complex, but especially the result would have been much worse. 6) Is AssignFrom() a better name than ResetFrom()? -------------------------------------------------- Maybe... It has been there forever, I kept it. I did not like it much at first too but I got used to it. Other names could have been InheritFrom() or InitializeFrom(). I usually try to be flexible when I work on code developed by someone else, and after all ResetFrom() makes sense too. 7) Symetry of ResetFrom() and CopyTo(): --------------------------------------- There is no symetry between ResetFrom() and CopyTo(). The symetrical function to CopyTo() was SetFrom(). It was so symetrical that the code was duplicated (but reversed) in the 15 implementations. One function copied member variables from | this| to the destination; the other function from the origin to |this|. It's something I described to Marc and you 2 or 3 weeks ago: I removed nsIMutableStyleContext::SetStyle() which wasn't used anywhere and it allowed me to remove the 15 SetFrom() methods, and replace the calls to SetFrom() with calls to CopyTo() in the 2 places where it was done. It doesn't sound obvious at first but the two issues were indeed related. 8) Replace ResetFrom() with CopyTo() + GetDefaultsFrom(): --------------------------------------------------------- We cannot replace ResetFrom() with the following lines, as you suggest. if (aParent) aParent->CopyTo(STYLESTRUCT[i]); else STYLESTRUCT[i]->GetDefaultsFrom(aPresContext); ResetFrom() defines the inheritance. Some properties are inherited, some others are not. We cannot replace that with a CopyTo(). 9) CopyFrom(nsnull): -------------------- There is no CopyFrom(nsnull), but we have some ResetFrom(nsnull) and they make sense. It means that we create a style structure that doesn't inherit from any parent. We need this, for instance, to resolve the style within tables in quirks mode (Nav4x tables act almost like embedded documents: they don't inherit anything but a couple of properties). 10) Replace switch() with jump table: ------------------------------------- I believe good compilers do it automatically when the number of cases within the switch() reaches a certain threshold. 11) Yucky "bSharingSupported" Hungarian notation: ------------------------------------------------- Yes, I agree. Marc indeed developed a few bad habits in previous lives but they are so negligible compared to the quantity and the quality of the work he otherwise produces that I would have felt really bad pointing out that kind of things to him. So I kept the name. 12) Replace bool with PRBool: ----------------------------- Correct. 13) Replace ASSERTION(false) with NS_ERROR(): --------------------------------------------- Correct. 14) Member variable instead of pooling: --------------------------------------- To minimize the impact on the existing code, I wanted ReadMutableStyleData() to behave exactly like the current GetMutableStyleData() with the only difference that the caller must call WriteMutableStyleData() when he's done with the changes. 15) Commented-out code: ----------------------- Correct: I should have removed the these lines instead of commenting them out. 16) nsHashTable.h in nsOutlineBodyFrame: ---------------------------------------- nsOutlineBodyFrame always used a nsHashTable but it was getting the declaration from nsIStyleSet.h. The problem is that nsStyleSet.h doesn't include nsHashTable.h when SHARE_STYLECONTEXTS is not defined so when I disabled the StyleContext sharing, I prefered to add the #include "nsHashTable.h" in nsOutlineBodyFrame which is the real customer, instead of always including it in nsIStyleSet.h, even when not needed. ==== Thanks a bunch for this very thorough review. The actions I'm proposing to take are: 1) Give some builds with the current code to jrgm to evaluate the performance. My own evaluation shows that we can expect a small gain but following your recommendation, I'm going to test with very large test pages. 2) Fix the small points: #12 + #13 + #15. 3) Checkin if that's ok with everybody. 4) Ask scc if he's got a better solution than the one you proposed in #1 for type-checkin, plus possibly something with templates as suggested in #3. 5) Check in these changes for type-checking whenever we want or can. 6) Continue our merry way towards new adventures. I'm not planning to change anything on the following points: #4, #7, #8, #9, #14, #16. I think my explanations are sufficient on #5 but I may adapt if someone comes up with a better solution. I promise I tried very hard. I think my explanations are sufficient on the following points but I will adapt if you insist: #2, #6, #10, #11.
Scott: I posted my comments before I read yours. I agree with you and Chris on #1. If you prefer, I can implement it before checking in but since it doesn't change anything to the execution, I would not want that to cause another round of verification builds for QA. About the mutable style, my main guideline in all the declarations was to have as little an impact as possible in the code outside the StyleContext. GetMutableStyleData() is called in approximately 130 places in 35 files, and the pointers to the mutable style are used in probably hundreds of places (and fortunately nowhere in a persistent way). The thing that will make you frown the most is probably #5. If you want to give it a try, you're welcome. I sincerely tried many solutions with pure virtual methods or just virtual methods (like having a second base class that the StyleXXXImpl structures could inherit from, etc...) but given the facts that: - customers had to be able to use an object created by the StyleContext (ReadMutableStyleData) - the StyleContext had to be able to use an object created by the customer (GetStyle) - customers could be in another DLL (Layout) I believe I ended up with a very honorable solution but if there is a better one, I'm confident you will find it (beware of crashes when virtual tables don't match).
I will cast my vote for #1 to be addressed before check-in. Motivation: strike the iron while it is hot... Now that all eyes are on this, let's make (or at least try to make) the most of the brainstorming that is going on and the eagerness around w.r.t. reaping the savings of this style data sharing.
> 1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct: What rbs said: you're essentially redefining the API at this point (for all practical purposes, callers must use your nsMutableStyleXXX objects). If we don't fix this now, you'll be chasing the tree as people begin to start writing new code that uses the new API. One thing that I forgot to ask last night was, what is the linkage impact of these new classes? Does layout need to link against content now? (I guess not, since you've not changed any makefiles AFAICT.) Which would mean the code probably gets duplicated in each of content and layout. (Tangent: we need to redouble our efforts to get content & layout converted to using NS_IMPL_NSGETMODULE so they can be recombined for release.) > 2) GetStyleXXXFrom(mStyleContext): > ---------------------------------- > I prefer the current notation where it is clear that the helper classes are in > fact constructing something, Agreed. That makes sense. > 5) Pure virtual functions in nsStyleStruct: > ------------------------------------------- > It was my idea at first, of course. Unfortunately it would have required > all the classes that inherit from nsStyleStruct to declare and implement > these methods. What style structs *don't* need this? Why? > 14) Member variable instead of pooling: > --------------------------------------- > To minimize the impact on the existing code, I wanted ReadMutableStyleData() > to behave exactly like the current GetMutableStyleData() with the only > difference that the caller must call WriteMutableStyleData() when he's > done with the changes. But because of the nsMutableStyleXXX classes, no sane person would call [Read|Write]MutableStyleData() by hand, would they? (Are we bending over backwards to support a non-use case?) I'll have to read the code more carefully to really understand what you're trying to do here. More comments later.
1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct: ---------------------------------------------------------------------------- Agreed but for the reason rbs mentions. I did redefine the API but not the way you seem to think. The previous API was: nsStyleStruct* GetMutableStyleData(nsStyleStructID aSID) Now it is: nsStyleStruct* ReadMutableStyleData(nsStyleStructID aSID) nsresult WriteMutableStyleData(nsStyleStructID aSID, nsStyleStruct* aStyleStruct) The difference being that everywhere there is a Get(), we have to replace it with a pair of Read()/Write(). The nsMutableStyleXXX objects were defined only as helper classes to ease the transition of the existing code to the new API. My goal was to change all the GetMutableStyleData() to something else that fits in only 1 or 2 lines. I did not want the Write() calls to appear because it would have made the task much more complex (ie. make sure Write() is called before every |return| or |break| etc...) However callers are not required to use the helper classes. The following snippet is perfectly valid: nsStyleBorder* border = mContext->ReadMutableStyleData(eStyleStruct_Border); border->SetBorderStyle(0,NS_STYLE_BORDER_STYLE_SOLID); nsresult result = mContext->WriteMutableStyleData(eStyleStruct_Border, border); 1.1) Linkage impact: -------------------- None. 1.2) Duplication of the code: ----------------------------- Yes, the code of the helper classes is duplicated in Content and Layout. It doesn't take much space, it is fairly straightforward and it doesn't do much anyhow (for those who haven't looked at the diffs, the helper classes call Read()/Write() when the objects are created/destructed). <gentle_settling_of_accounts> Honestly it doesn't bother me as much (and far from it) as the big duplication of "real" code that you approved in nsIStyleContext.h, without my reviewing, in order to enable the split between Content and Layout: http://bonsai.mozilla.org/ cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsIStyleContext.h&root=/ cvsroot&subdir=mozilla/layout/base/public&command=DIFF_FRAMESET&rev1=3.88&rev2= 3.89 </gentle_settling_of_accounts> 5) Pure virtual functions in nsStyleStruct: ------------------------------------------- I haven't made myself clear, sorry. One of the aspects of the problem is related to the linkage (or absence of) between Content and Layout. GetMutableStyleData() is called in a few places in Layout (in the CSSFrameConstructor mainly) so if nsStyleStruct had pure virtual functions, we would have to link the two libraries. 14) Member variable instead of pooling: --------------------------------------- See the code snippet in #1. The helper classes are not mandatory, not using them removes a little overhead. Some folks may like the "Read/Modify/Write" paradigm better, after all it's a very common one in computing. Besides we can't always force people to do the way we like: nsCOMPtrs have been there for ages and we can still see some NS_RELEASE() checked in everyday. ---- As an incidental note, I very much enjoy this particular code-review. French people have a thing about arguing. I wish all the big changes were under the same scrutiny (as much as I wish the small ones were given some slack, but that's another debate) so thank you, and scc and rbs, for your input.
pierre: my 2 cents, but they were gained over years at much greater expense: Don't treat the Read/Write pattern as a good low-level device for the masses. Do consider making it private/protected/friend/whatever-restricts-it and mandating the helper classes. nsXPIDLString, nsCOMPtr, etc. and their lack show the folly of letting low-level, relatively-error-prone tools loose in a large, mixed-strength community such as Mozilla's. What is the overhead of the helper classes? It should be noise to zero, I hope. /be
> <gentle_settling_of_accounts> Touche! > 5) Pure virtual functions in nsStyleStruct: > ------------------------------------------ > I haven't made myself clear, sorry. One of the aspects of the problem is > related to the linkage (or absence of) between Content and Layout. > GetMutableStyleData() is called in a few places in Layout (in the > CSSFrameConstructor mainly) so if nsStyleStruct had pure virtual functions, > we would have to link the two libraries. I don't understand. Pure virtual methods should *eliminate* linkage dependencies, not *cause* them. Could you please elaborate? > 14) Member variable instead of pooling: > --------------------------------------- > See the code snippet in #1. The helper classes are not mandatory, not > using them removes a little overhead. Some folks may like the > "Read/Modify/Write" paradigm better, after all it's a very common one > in computing. Besides we can't always force people to do the way we like: > nsCOMPtrs have been there for ages and we can still see some NS_RELEASE() > checked in everyday. But my point is, your scheme isn't required to support the read/modify/write paradigm. I believe that I've now understood the code a bit better, and I think I see what you're trying to accomplish now. (I think I could even guess at the code geneology, and how you've arrived at the implementation you did.) I think you should: 1. Change ReadStyleData() to copy the style data into a caller-provided struct. WriteStyleData() will write data back from the struct. 2. Your auto-classes should each have an appropriate struct as a member. Each will use its member to call [Read|Write]StyleData. 3. Any macho men that want to use [Read|Write]StyleData directly will just have to have their own style struct on the stack as a buffer. This should simplify the back-end implementation of [Read|Write]StyleData. It also removes the artificial ``only four live requests'' at a time. It even makes it so that we could one day hope to have a thread-safe layout engine, etc. (Your current implementation is just one more stake in thread-safety's heart here!) FWIW, I am enjoying this too. It's been a good learning experience. Also, I *don't* think I'm hen-pecking here: I claim that I *am* arguing about the big changes. You're fundamentally altering how clients will interact with the style system, and your implementation will have far-reaching consequences.
Chris Waterson wrote: >1. Change ReadStyleData() to copy the style data into a caller-provided > struct. WriteStyleData() will write data back from the struct. This is what the old methods on nsIMutableStyleContext NS_IMETHOD GetStyle(nsStyleStructID aSID, nsStyleStruct& aStruct) const; NS_IMETHOD SetStyle(nsStyleStructID aSID, const nsStyleStruct& aStruct); did, and Peter Linss put those in to facilitate migration from GetMutableStyle calls to a better encapsulation that does not expose the internal data structures of the style context.
I think you're missing my point. I want ReadStyleData() to copy the style data into a *caller-provided buffer*. (I think the caller can do a better job managing the memory that ReadStyleData() can; specifically, the caller can do it in a re-entrant way.)
Chris, GetStyle(...) does exactly that, which is why I brought it up. The caller creates their style struct and passes it to GetStyle, then mucks with it, and then calls SetStyle passing it back in. Does that make sense, or am I missing something? example usage - nsStyleDisplay disp; GetStyle(eStyleStruct_Display, disp); disp.mDisplay = NS_STYLE_DISPLAY_NONE; SetStyle(eStyleStruct_Display, disp);
Am I being dense? What is the point of the style struct pool then? I've got the diffs applied in a tree. I'll spend some more time looking at them.
Brendan: sure, the overhead isn't much higher than, let's say, the one we have in switch() statements vs jump tables, so no problem: we can enforce the use of the helper classes. Marc & Chris: There is an important point that you may have missed and you will see how the issues of interfaces and pure virtual methods are intertwined... In the current implementation, the nsStyleXXX structures don't have any real base class other than the empty nsStyleStruct which is basically a void*. This leads to some aberrations like big duplications of code or useless switch() statements. Compare the current implementation of StyleContextImpl::CalcStyleDifference(): http://lxr.mozilla.org/seamonkey/source/content/base/src/nsStyleContext.cpp#3658 with what I propose. Same thing for GetStyle(), GetStyleData() etc... or even RemapStyle(). In the current code, none of the 5 basic methods of the nsStyleXXX structures (ResetFrom, CopyTo, SetFrom, CalcDifference and ComputeCRC32) is inherited. They could very well have completely different names and different parameters from a nsStyleXXX struct to another. Yucky, isn't it? It was really important in my opinion to clean that up in order to improve the legibility of the current code and ease: - The implementation of the StyleData sharing. - The support for more data structures in the style context, like the one added recently for XUL. We may have some more for CSS2/3 modules that we don't support yet (aural, webfonts, ruby and whatnot) or external modules like SVG. - Implement the memory model we want in the future. The cleanup allowed to declare the nsStyleContextData as a union of individual data structures (StyleXXXImpl*) and an array of generic data structures (nsStyleStruct*). So... I made these 5 functions (later reduced to 4, having removed SetFrom) part of the base class nsStyleStruct and declared them pure virtual. The problem is that: 1) In public interfaces, nsStyleStruct is the base class for the structures that are manipulated by the customers (nsIStyleContext.h declares nsStyleXXX structures which all inherit from nsStyleStruct). 2) Some customers in other DLLs (read "Layout" but it could be other ones) declare some of these structures on the stack, for instance to call GetStyle() and fill up the structure they are passing by reference. 3) None of these public interfaces implements these 5 functions. The only place where the functions are implemented is in the StyleXXXImpl classes inside nsStyleContext.cpp (StyleFontImpl, StyleColorImpl, etc...) 4) Result: You get link errors in these other DLLs (er, "Layout") because you construct objects with pure virtual methods that are not implemented anywhere. A "solution" would have been to somehow declare the nsStyleStruct as an XPCOM object and provide constructors in the StyleXXXImpl, or something along that line... Another "solution" could have been to link back Content and Layout... Nothing we wanted anyhow because another requirement was that the customers should be able to continue to access directly the member variables of the nsStyleXXX objects which are defined the public interfaces. If you have another idea, I would be interested to look it but I would prefer if you played with it a little bit in the comfort of your own tree. I think I already spent enough time on it and I still believe that my solution is close to the best you can get. C'mon, I double-dare you :-) ...That was about the pure virtual methods. Now about the APIs... I maintain (you will be interested!) that SetStyle() is evil, as well as any API where we would have to accept a structure allocated by the customer with the purpose of modifying the style context. This is for almost the same reason as above: the implementations are done in the StyleXXXImpl, which are a certain way private to the StyleContext, and when writing into the StyleContext we need to receive back objects that we have created in order to be able to talk to them (ie. objects that implement our code). The principle behind it is fairly clear. From the StyleContext standpoint, we publish interfaces to these objects but we must remain the entity that creates them. It's nothing new: COM objects work the same. Basically, my points are: 1) We need a base class with the 5, or 4, basic methods for the reasons listed above. 2) Because of (1), the base class nsStyleStruct must come with its own dummy implementation of these methods so that (a) other DLLs can link correctly and (b) the virtual tables are identical everywhere and customers can access the member variables directly inside the nsStyleXXX structures. 3) Because of (1), we generally cannot accept objects that we don't create. The only exception (I think, but there is maybe another one) is GetStyle() because we fill-up the structure passed by the customer by doing "ourOwnStruct->CopyTo(& aCustomerStruct)".
Damn, it looks like the diffs for nsCSSStyleRule.cpp are missing.
To be honest, if you are looking for better solutions for the declaration of nsStyleStruct, the closest thing I found involved a big reshuffling of all the StyleXXXImpl and quite a bit of modifications in the StyleContext too: - Leave nsStyleStruct the way it is. - Use the nsStyleXXX structures for the only purpose of passing data back and forth with the customers, meaning that when we receive nsStyleXXX, we have to copy it into a StyleXXXImpl before we do anything (perf?). - Make the StyleXXXImpl structures not inherit from nsStyleStruct but from another base class that implements these 4 functions. - Declare a nsStyleXXX as a member variable in its corresponding StyleXXXImpl. It sounds simple like that but I think there was a problem (besides the reshuffling) and I can't remember what.
With the patches applied, looking at a tree, it's generally much easier to understand. :-) > I maintain (you will be interested!) that SetStyle() is evil, as well as > any API where we would have to accept a structure allocated by the > customer with the purpose of modifying the style context. This is for > almost the same reason as above: the implementations are done in the > StyleXXXImpl, which are a certain way private to the StyleContext, and > when writing into the StyleContext we need to receive back objects that > we have created in order to be able to talk to them (ie. objects that > implement our code). I find this argument to be pretty weak. I think the *real* issue is that this has been designed around trying to use C++ vtable dispatch in the back-end implementation, and that's lead you down this garden path. The style structs are the interface by which your consumers communicate with you about style data. Even if you did, say, need to compute a whole bunch of extra information for your own private use (which you don't), I still don't think the argument holds water. You don't need to *create* the style struct to be able to ``talk'' to it efficiently. You need to know its style struct ID. Think of *that* as your vtable. If you find the pervasive `switch' logic too burdensome (I would!), then roll your own dispatch! Places where you've used C++'s build-in virtual dispatch should simply use the style struct ID to do hand-rolled dispatch: use the style struct IDs indexes into a table of function pointers, and a *lot* of the ugly switch statement code will go away. I don't think that this would really entail much change to what you've already got. But, it would 1. Get rid of the need for you to have the nsStyleStruct implement any virtual methods. (This is unsighly anyway: it's propagated implementation detail out to a place where nobody should know or care about it.) 2. Get rid of the ad hoc style struct pool, which is another nail in the layout re-entrancy coffin. 3. Still allow you to achieve switch-less nirvana. 4. Still allow you to use your own private storage, if the day comes where style maintains ``extra'' computed information that's not visible to consumers of the API. What do you think?
You are correct about the origin of the problem, I wanted to keep the C++ dispatch. And I still do: I'll give a second look at what I described above because when I tried it, I had in fact the StyleXXXImpl inheriting both from a base class that declared the pure virtual methods and from the 'empty' nsStyleStruct. If we cut the inheritance between nsStyleStruct and StyleXXXImpl, we are left with quite a bit of reorg in the StyleXXXImpl, but I prefer to put the burden on these objects rather than on the rest of the code. The benefits would be the same, with in addition the C++ dispatch and inheritance, and less changes in StyleContext and StyleContextData. Same question: what do you think? And what do you mean by "Still allow you to use your own private storage..."? How can I have private storage if I receive a structure that I did not create?
Never mind... I guess I can put a "void* mPrivate" in nsStyleStruct and store whatever I want in it during ReadMutableStyleData().
> You are correct about the origin of the problem, I wanted to keep the C++ > dispatch. And I still do: Why? What is it buying you? What I'm proposing is the same design you've got now, with a slightly different mechanism. AFAICT, C++ is just getting in the way of your design here. I could understand if you were doing virtual dispatch all over the place (and C++'s syntactic sugar was worth it), but you have: - four places where you call the virtualized CopyTo() - four places where you call ResetFrom() - three places where you call CalcDifference() - two places where you call ComputeCRC32() So the syntactic brain-print of doing your own dispatch is marginal. And it's all *internal* to the style context's implementation: you're not forcing this mechanism on your users. As you note, a top-to-bottom C++ design where the nsStyleStruct is an abstract base class won't work here because of linkage issues, even though it might have been the most natural. But the alternative that you suggest seems like the worst of both worlds: you'll still have plenty of places where you need the big ugly switch statements, and you're only taking advantage of the C++ syntactic sugar in a dozen places. (Well, a baker's dozen.) > And what do you mean by "Still allow you to use your own private storage..."? The structures used to *communicate* style information need not have anything to do with the structures that you use to *store* that style information in the style context.
>As you note, a top-to-bottom C++ design where the nsStyleStruct is an abstract >base class won't work here because of linkage issues, I haven't made myself clear maybe: nsStyleStruct and the nsStyleXXX structures would not change compared to what is currently in the tree. Only the StyleXXXImpl would change and since they are internal, I can do pretty much what I want with them, including not inherit from nsStyleStruct at all, and inherit from a base class with pure virtual methods. It's not the worst of both worlds at all: of course, we won't have the ugly switch statements back. The solution I propose is identical in all points to yours in its interfaces with the outside but in addition, it keeps the benefit of using true C++ objects in the inside (and we are going to need it for the next round of optimizations). As for the private storage, I guess you meant it could be handled through the helper classes. If not, please stop playing games and tell me what's on your mind (or tell me you want to play and I'll gladly play too :-).
> The solution I propose is identical in all points to yours in its > interfaces with the outside but in addition, it keeps the benefit > of using true C++ objects in the inside (and we are going to need > it for the next round of optimizations). Okay, I guess I'm most concerned about the external interface, so I'd be willing to cave on this. I still think that the mechanism that I propose would be simpler and cleaner. But, I don't have to do the typing. :-) So we're clear, could you elaborate as to what the benefit of using ``true C++ objects'' are here? How will it affect your future optimizations? > As for the private storage, I guess you meant it could be handled through the > helper classes. If not, please stop playing games and tell me what's on your > mind (or tell me you want to play and I'll gladly play too :-). My point was that the storage need not have anything to do with the interface used to communicate the style data. Since I'm not sure *why* you'd want to do this, my example will be rather bizarre, extreme, and contrived but...here goes. Let's say, for example, that instead of storing the font information like we do now, we wanted (for unknown some reason) to store a handle to the resolved font on the operating system. A referent to the following would be what would be stored in the style context: struct nsStyleFontPrivate { nsIFont* mFont; }; To get information *into* this object, your ``write'' dispatch for eStyleStruct_Font would examine the nsStyleFont struct, use GFX to find an appropriate font, and stuff it into an nsStyleFontPrivate structure. Conversely, to get information *out of* this object, your ``read'' dispatch would use the nsIFont object to fill in the nsStyleFont structure. Like I said, contrived. But the point is, the storage need not have anything to do with the structures used to communicate about the storage.
About C++: I sent you and other people earlier this week a description of a memory model that could bring huge gains in terms of memory footprint. It consists in storing the style data (nsStyleXXX) in a library composed of hash tables, with one hash table per type of style data. We'd have one for all the fonts used in the document, one for all the colors, etc... To implement it, we would need to add some methods and member variables to a base class that all the StyleXXXImpl inherit from. For instance, to store the CRC locally, or refcount these objects, or maybe add some debug functions on the way... Overall, I don't know if a system like this one would be viable in terms of performance but if it's not, we'll find something else which may or may not benefit from having a base class for StyleXXXImpl. I just want to keep the possibility open. About the storage: Understood. We were not talking about the same thing. The kind of private storage that I had in mind was a tag that we would set in ReadMutable() and retrieve in WriteMutable(). It could be very useful for some optimizations. For instance, we could store a direct pointer to the data inside the library, or the CRC of the previous data. With the helper classes, we can pass something like that between Read() and Write(). It would be a strong argument in favor of Brendan's suggestion for not exposing the 2 methods but just the helper classes.
Ok, so I'll look forward to seeing what you come up with. (I have to admit I was a little bit befuddled by those emails that were flying around; I'll probably be able to understand them better now.) > It would be a strong argument in favor of Brendan's suggestion for not > exposing the 2 methods but just the helper classes. I'm all for it.
scc/waterson, have you had any chance to look at the template magic that was alluded to earlier? It looks something elegant might be possible in this area.
I hope the code will be easier to understand than some of my comments. The diffs are following. As a summary: - there is no pooling anymore (the helper classes contain the style structure that is used for Read/Write) - the APIs are easier to use (1 line for each call to GetMutableStyleData) - private methods are no longer exposed (nsStyleStruct is an empty struct like before). All the other points are fixed too, except #5 about the templates: we'll see if scc makes a proposal. My concern is the performance. According to some measurements, it should be less than 2% slower than the current code and maybe 2 or 3% faster. It's difficult to tell because it seems that something happened in the tree lately that causes larger fluctuations in the results I'm getting. I did several implementations because of the performance: - In the first one, the style data (nsStyleXXX) were embedded as a member variable inside the StyleBlobs (formerly StyleXXXImpl). It was too slow (between 5 and 10% slower). - In the second one, the StyleBlobs inherited from both the style data (as they do now) and from a base class. It was still slightly too slow. - In the third one, just as a test, I put back in the pooling of style structs that are used for Read/Write. It was slightly faster than the current implementation. - In the last implementation (diffs attached), I took back the second implementation and streamlined the helper classes to win some cycles. It seems to have worked a bit. My guess is that it is slightly slower than the current builds but depending on the platform it may very well be slightly faster too. Chris & Jrgm: if you do any kind of performance evaluation, please don't use a daily build to compare with a fresh pull over which you apply my changes. Use the same pull for both (ie. pull, build, take measurements, apply the diffs over the same build, take more measurements). Chris: If you think that performance matters more than trying to keep Layout thread-safe (or at least the style resolution part of Layout), I can attach some diffs with the pooling for Read/Write. The rest would not change, it's hidden by the helper classes.
I'll do test builds again today with the latest patch.
Just an idea. Might this be a more elegant way to declare the helper classes?
Thanks for your speedy turnaround on this stuff, pierre! I think the helper classes are much more natural now. - I'm surprised that you see a speedup with your pooled contexts. What is happening differently there? Seems like the only difference would be whether we were copying to/from the stack versus copying to/from your pool. Could you post a patch that illustrates how the pooled implementation works, too? (If you still have it around.) - I'm a bit fuzzy on what the difference is between nsIStyleContext and nsIMutableStyleContext. Is the distinction not important once these changes have been made? If so, should we log a task to remove one or the other? - I just noticed that there's another re-entrancy problem in RemapStyle(). Sorry that I didn't notice that before :-(. Rather than making the temporary remap data in the static global area, how about making the PresContext own it? Anyway, I'd be willing to let this slide if you'll commit to fixing it eventually. - I still think the tail is wagging you with respect to your dogged insistence on using C++ virtual dispatch. That said, I'm flattered that you were able to try so many alternative implementations over the weekend and were still able to eschew my proposal. :-) As an example of this, note that it's unfortunate that you need both SetFrom() and CopyTo() methods on your nsStyleBlob class. In each place where you call these routines, you have the style struct ID available to you! The code blaot is minor, but you were doing this to avoid maintentance hassle, remember? Remind me again why C++ is going to make things so great here someday? - There's still some use of C++ |bool| in here that may bite you on some of the ports. - Consider use of a C++ template to define your helper classes; see above patch.
1) Speedup with pooled strutures: I can't really explain it. Most of the constructors/destructors are null. The difference was about 10ms over 800ms to load Yahoo, and 30ms over 1300ms to load CNN. I can send you the diffs but it would take a little bit of time to merge them again. If you just want an illustration of how it works, you can look at the previous diffs. Otherwise would you seriously consider this solution or did you ask out of curiosity? We could put these structures in the PresContext like the rest of the permanent data. I was planning to keep it as a backup solution if jrgm finds some performance problems. 2) Difference between nsIStyleContext and nsIMutableStyleContext: The goal, I guess, was to limit the scope of what people can do during style resolution. I don't think it would be a good idea to change it but you can file a bug to remove nsIMutableStyleContext if you like. 3) Pool in RemapStyle(): Sure, it can be moved to the PresContext. I'll have other things to store there too. Besides it will allow us to break WrapRemapStyle() in two as it should be. 4) C++: I worked with jump tables long ago, before C++. Jump tables can be really nifty, they ensure a very robust code that can easily sustain the burden of the years. The best example is probably jwz's mime engine, one of the only pieces of code that's survived since 3.0. I gave you twice a description of the evolution that I would like to see for the memory model in the style context, but really it doesn't matter. Whatever we do and whenever we do it, I'd like to keep it open and flexible so I'll stick with C++. Unless you veto it, of course. As always, it's all a question of what we want to see in the tree and when we want to see it but nobody else issued an opinion so I'll let you decide. Thanks for volunteering, by the way. 5) bools: They only ones left are in a debug code that I did not maintain. I did not decide yet whether I'll update it or remove it. Not part of the build, as they say. 6) Templates: Really neat, thanks! I thought only scc could speak template...
> The goal, I guess, was to limit the scope of what people can do > during style resolution. I.e., during style resolution you get the ``immutable'' nsIStyleContext? > I don't think it would be a good idea to change it but you can file > a bug to remove nsIMutableStyleContext if you like. Not sure what this means. It seems to me that either: 1. nsIStyleContext and nsIMutableStyleContext are redundant: all style contexts are immutable, all the time. In which case let's do away with one of extra interfaces. 2. nsIStyleContext indicates an immutable style context, and therefore should not have a GetStyleData() member that allows mutation. In which case, the helper classes are conspiring with a bunch of naughty callers to write to supposedly ``immutable'' contexts. Given your comments, it looks like we sort of tried to pay lip-service to the latter (e.g., nsIStyleContext returns a |const nsStyleData*|), but we then turn around and write to the ``immutable'' contexts anyway! (The helper classes all toss the |const| and write data back regardless of whether the original context was mutable or not). I'm just trying to understand what the intent of the API is here. I'm not asking you to fix it right now; rather, I'm asking you to sketch out a roadmap of where things ought be, and asking you to commit to Making It Right Some Fine Day (i.e., file a bug). > Sure, it can be moved to the PresContext. I'll have other things to > store there too. Besides it will allow us to break WrapRemapStyle() > in two as it should be. Ok, great. Again, bug filed please. > I worked with jump tables long ago, before C++. Jump tables can be > really nifty, they ensure a very robust code that can easily sustain > the burden of the years. The best example is probably jwz's mime > engine, one of the only pieces of code that's survived since 3.0. I > gave you twice a description of the evolution that I would like to > see for the memory model in the style context, but really it doesn't > matter. Whatever we do and whenever we do it, I'd like to keep it > open and flexible so I'll stick with C++. This either is a non sequitor, or an extremely sarcastic remark. I'm guessing the latter. :-) I'll happily sr= the code as it stands (I can't _force_ you to do anything, but at least the impact is constrained). To be frank, I jump tables would provide a more elegant solution here. As it stands, 1. The C++ vtable is redundant with the style struct ID. 2. You need a translation layer to switch back and forth between the SIDs and the vtables anyway. 3. There are only a handful of places where the C++ syntactic sugar is even used. 4. You've ended up with redundant code (CopyTo, SetFrom). But hey, it's your name on the cvs blame! :-) > They only ones left are in a debug code that I did not maintain. I > did not decide yet whether I'll update it or remove it. Not part of > the build, as they say. Ok. > Really neat, thanks! I thought only scc could speak template... I'm an amateur. If you're seriously considering them, let's let scc look at it first. So, a couple rhetorical questions left: 1. nsIStyleContext vs. nsIMutableStyleContext. Need roadmap and a bug, or a rebuttal as to why it's good as it is. 2. Need a bug for re-entrancy issue in WrapRemapStyle(). 3. Need blessing from scc if you decide to use templates. sr=waterson. Thanks for your patience.
I ran with the builds that waterson passed on to me, for linux and win2k. I ran (1) win2k/500Mhz/128MB with normal settings, looking at 'already cached' load times, (2) win2k/500MHz/128MB with disk cache set to 0KB, forcing a network request for all elements of a page, (3) on a slower linux system (266MHz/128MB) looking at 'already cached' load times, and (4) on a 800Mhz linux system (results not shown but they're the same story). I all cases, the code with pierre's changes measured in as "something like" 2% slower than the baseline build. But the reason I say "someting like" is that I don't think this page loading test is precise to that level. This is a wash (with a small hint of small difference if tested on a very quiet network under very careful initial conditions ;-).
I tried running jrgm's tests (default settings) on an Extremely Crappy Machine (133MHz PC with 48MB of RAM running Win98). Here's what I found: AVG MED MAX MIN TIMES before 7442 7206 19838 1363 8568 7061 6912 7009 7037 after 7805 7591 20357 1613 9005 7457 7362 7436 7351 delta -363 -385 -519 -250 -437 -396 -450 -427 -314 %improvement -4.8 -5.3 -2.6 -20.9 -5.1 -5.6 -6.5 -6.1 -4.5 This was run from inside Mozilla proper; I'll take another set of measurements in the mfcEmbed shell. Win32 and Linux Builds from the tree this morning are available (internally only, sorry) at ftp://btek.mcom.com/pub/. pierre, could you produce another set of patches that use the global contexts instead of the stack-based ones? If there really is a delta, I'll roll up my sleeves and profile.
Actually, before anyone goes off hacking more code, maybe we should get a 2nd opinion. Could somebody else confirm or refute my results on a low-end machine?
I'd say: low-end, yes, but with a little bit more memory. I think that if the difference is more important than on the other machines (the slowdown is 5 or 6% instead of 2%), it's because we allocate around 3 times as many blocks as we used to (the blocks are way smaller) and we hit much more often I don't know what Win32 function that cleans up the memory. I'm no Windows specialist so I don't know if it makes sense...
Still, I would be curious to see if what I noticed with globals vs stack can be confirmed on other platforms. I'll make diffs but not until late.
Regarding the [im]mutable question, let me chime in to point to a comment of Peter Linss that I once saw in bug 1230. Basically, he was saying that there were two pathways to the style structs, and only some blessed objects were allowed to use the mutable pathway. On the same topic, in a reply to an e-mail message where I asked if MathML frames could use the GetMutable API to alter their resolved style data [I asked this because I was hoping that the MathML frames could change their style contexts so as to reflect MathML attributes that are not supported by the Style System], he replied something along the lines: "Nope, unless you are a style rule". And he said that he was planning to remove that method (that's why it was deprecated BTW), and via this process to fix the bad callers (I guess by this he meant calls that are modifying the style context outside the style resolution -- these can be seen on pierre's migration patch: the mutations at the frame-level, e.g., [nsGfxTextControl,Table,nsCSS]Frame). Pierre might better explain the Right Thing that should to be done for these random mutations: it is a costly style re-resolution in the associated sub-trees to bring the style tree back in sync. I guess that maybe why Peter Linss wanted to avoid the profileration of this practice, and to only let the possibility of straight mutations to the blessed objects during the controlled style resolution process. Hope these comments provide some insights into waterson's point 1 and to the genesis of the problem.
Also, unless there are some trickery that I am not aware, these mutations that a frame makes outside the style resolution are lost when a re-resolve is fired from one of its ancestor frames.
I kicked off a test run(s) with a Not Quite As **** 64MB/166MHz Windows ME system, and I'll update when I have that info.
Correct on all points. If we want to simplify the API, we should leave nsIMutableStyleContext the way it is and remove Read/WriteMutableStyle from nsIStyleContext. We have been bitten with Marc with various problems coming from gfx scrollframes that hack into the style context. If you look at the diffs, you will notice that both StyleContext sharing and StyleData sharing are disabled for the contexts that correspond to a small list of scrolling elements.. Chris, I'm going to attach a partial diffs that uses globals in Read/Write. I had a full backup so it should compile ok. The files included in the diffs are: nsStyleContext.cpp nsIMutableStyleContext.h nsIStyleContext.h nsIStyleSet.h
Okay, so that system was not (is not?) well. I had to clean a bunch of stuff up to get it to run sanely at all (e.g., a virus program that meant that deleting an old profile took >5min). I'm still suspicious of the state of that machine, so I don't think these numbers mean a heck of a lot (e.g., it is nominally a faster machine with more memory than what waterson tested, but the times are slower than what he measured). But, here they are anyways (I only did 3 cycles; first is uncached, 2 and 3 are cached): avg med max min #1 #2 #3 before 9674 9694 25181 1694 9864 9283 9878 after 9842 9772 26890 1691 9851 9871 9805 -168 -78 -1709 3 13 -588 73 -1.7% -0.8% -6.8% 0.2% 0.1% -6.3% 0.7%
I'll check in tonight if nobody objects. Chris, will you have enough time to get some performance data about using globals instead of stack variables?
Pierre, drivers@mozilla.org already decided to keep this change out of 0.9, so we don't have too many confounding variables in trying to unregress performance (we have regressions already, as well as stability bugs and other things to fix). waterson sees a 5-6% slowdown on a low-end machine running jrgm's tests, but his results need to be confirmed. I understand he's gonna help sort out the performance effects with you. Until they're well in hand, and especially right now, please do not check this patch into the trunk. /be
I would like to checkin the change of APIs: that's all the files in the diffs, except for nsStyleContext.cpp and part of .h files. The reason is that I'm leaving on sabbatical in a little bit more than 3 weeks and I expect not to have the time to work on performance or memory footprint again before mid-july. I have to take care of the bug list that really got out of hands in the last several weeks. The change of API that I would like to checkin should not have any effect on performance whatsoever. The nsMutableXXX helper class will be reduced to a simple constructor that calls the current GetMutableStyleData(). Please advise.
Note also that the platform that Waterson tested on does not match the requirements for Netscape 6 (64Mb) http://home.netscape.com/browsers/6/datasheet/index.html?cp=n6i#platforms Jrgm's tests on a 64Mb machine shows an average slowdown of 1.7% and a medium slowdown of 0.8%.
Jrgm's tests also show a speedup of 0.1% on first load on a machine and what appears like a negligible slowdown on another one (no numbers for this one, but a graph at http://bugzilla.mozilla.org/showattachment.cgi?attach_id=31092)
Ooops, I'm leaving on sabbatical in a bit more than 2 weeks, not 3. Yeehaaa!
drivers@mozilla.org don't see the need to get the API changes in for 0.9 -- the patches here are backed up, as well as on your machine. It's very hard right now to get any review and testing cycles, because everyone is rushing for the 0.9 deadline. We'll have plenty of confounding and causative variables to untangle tomorrow, and for a week (until 0.9 releases). Do we know exactly what the performance effects of the API-only patch is? Unless you get a windows build and jrgm tests going fast, I don't know those effects. They should be zero, but Murphy was an optimist. We don't need any more risk here, even if it is tiny -- we don't even have the "brainprint" to take these changes into the mix. Sorry, and we will get them in after 0.9, with performance improvements too, I hope -- possibly even next week if you, waterson, et al. can pin down where the cycles are going and optimize them away. /be
Sorry, I can't commit myself to work a week on this or even 3 days, I think. I have more than a hundred bugs to triage before I leave (out of 170+), and more than a couple of them would be nice to fix for 0.9.1 or 1.0. I'm not sure I'll be back before the tree closes for 1.0. Checking in the API would allow someone else to work on the style system memory model much more easily, as only 3 files would be different. The change of API would consist in changing the calls from: nsStyleXXX myStyle = myContext->GetMutableStyleData(eStyleStruct_XXX); to: nsMutableStyleXXX myStyle(myContext); where the template for nsMutableStyleXXX would be: template <class T, nsStyleStructID SID> class basic_nsAutoMutableStyle { protected: T mStyleStruct; public: basic_nsAutoMutableStyle(nsIMutableStyleContext* aContext) { aContext->GetMutableStyleData(SID, &mStyleStruct); } basic_nsAutoMutableStyle(nsIStyleContext* aContext) { aContext->GetMutableStyleData(SID, &mStyleStruct); } T* get() { return &mStyleStruct; } T* operator->() { return get(); } T& operator*() { return *get(); } }; As you can see, it's pretty much empty. A point to consider is that whatever the risk of checking in this empty code might be, it would better be managed if I were still around for a couple of weeks to cover any problems that may arise. After all, if risk there is, it would make even less sense to finally proceed to do the checkin the day before I leave. I would check it in now without hesitation but it's up to drivers@mozilla.org to balance the risk involved by any code, even empty, and the benefit to the Community of making it possible for its members to work on something important. Anyhow, API or not, now or later, a full resolution is in the Future for me. Marking as such.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9 → Future
Of course my point was that this mostly empty code doesn't need any performance testing and to answer your concerns, it was reviewed (and even contributed to) by waterson.
Based on the current API, an even emptier code would be: template <class T, nsStyleStructID SID> class basic_nsAutoMutableStyle { protected: T* mStyleStruct; public: basic_nsAutoMutableStyle(nsIMutableStyleContext* aContext) { mStyleStruct = aContext->GetMutableStyleData(SID); } basic_nsAutoMutableStyle(nsIStyleContext* aContext) { mStyleStruct = aContext->GetMutableStyleData(SID); } T* get() { return mStyleStruct; } T* operator->() { return get(); } T& operator*() { return *get(); } };
pierre, I understand that you're going on sabbatical in short order, but I don't think that means we need to rush this checkin. The patches aren't going anywhere (they're in the bug, as well as two of my trees). The tree is going to be locked down so there won't be much bit-rot with respect to the new APIs. I think this should be a no-brainer to land as soon as the tree opens for mozilla-0.9.1 (Friday? Monday?). By then we'll have sorted out this performance weirdness. (I'm not leaving you high and dry on that; just need to knock out my own 0.9 bugs.)
On the Very Crappy Machine (133MHz Win98, but with more memory this time, 160MB), results with mfcEmbed show little difference in page load time with jrgm's test; in fact, it looks like style sharing has a slight edge: AVG MED MAX MIN ---------- TIMES ----------- before 3653 3486 10512 643 4602 3369 3449 3425 3420 after 3520 3456 10538 644 3765 3471 3429 3460 3475 delta 133 30 -26 -1 %improv 3.6 0.8 -0.2 -0.1 I'll next try to repeat the seamonkey results to see if there's something happening with chrome that's introducing the 5% penalty that my previous run indicated.
It confirms what I have been seeing. The fluctuations in performance seem to have increased in the past 2 weeks, and the difference between the before/after results are beyond the margin of error. So why not apply the old mantra and keep recounting until we get the numbers we want?
I'm all for that!
Any follow-up on this? Is it in a branch somewhere? [Reading the code, it looks like there could be room for futher tweaks to make it more space-time efficient in subsequent iterations.]
Nothing changed. According to jrgm, the code I'm proposing is on average 1.7% slower than the original code. It means that a page that takes 2 seconds to load would take 2.034 seconds. That's 3.4 hundredth of a second more and this is totally unacceptable to drivers@mozilla.org. I guess my vision of performance and memory improvement in software engineering was a bit outdated. I always thought that to improve memory a lot, performance had to be compromised a little bit and vice versa, to improve performance a lot, one had to use a little bit more memory. As a result by doing successive steps (- a lot of memory + a bit of cpu - a lot of cpu + a bit of memory) we still had in the end a lot less memory a lot less cpu. Mozilla is different. We don't compromise. With my best smileys ;-) of course... :-)
Blocks: 47159
Would that be possible to have a branch with the latest? Continuous large diffs are scary. With a branch, anyone could pull, and try, and perhaps add or refine. And QA could easily pull and try from there too if some progress is made.
My first preference is that drivers@mozilla.org treat the bug fairly and allow the code to go in. We can even use APIs with globals instead of stack variables and it will reduce the 1.7% slowdown a bit, or maybe even go faster than the current code as I measured on the Mac. If it is not possible, my second preference is to checkin the change of APIs so that the entire changes would be limited to just a few files (nsStyleContext.cpp/ .h, nsIMutableStyleContext.h and maybe nsIStyleSet.h). If it is not possible either, a branch is nice but it would rate fairly low on my list of priorities for moz0.9.1. Anybody can do it anyhow. I would be able to help, though, for instance to update the diffs. Otherwise I'm leaving this Friday, I'll be back July 9th, and as Chris wrote: the patches aren't going anywhere. Note that if I checkin this thing, the support for the aural stylesheets will move on top of my list. Accessibility issues have a high priority in 6.5.
pierre, I don't think anyone is averse to this going in for mozilla-0.9.1, but maybe you are privvy to some email from drivers@ that I am not.
Not really: the only things I heard from drivers were the conclusions copied on this bug report. What you are saying seems like good news. If a checkin is ok, I can get in any solution: StyleData sharing with or without StyleContext sharing, APIs with or without variables on the stack. As far as I can tell, with StyleData sharing only and with global variables, it uses less memory and it goes a tad faster than the current code base.
Pierre, I commented for drivers on 2001-04-17. At 19:33, I wrote: "...we will get them in after 0.9, with performance improvements too, I hope -- possibly even next week if you, waterson, et al. can pin down where the cycles are going and optimize them away." This by way of asking you to hold off until after 0.9 had branched, with an apology for the request that you wait. Previously that day, I had commented: "Pierre, drivers@mozilla.org already decided to keep this change out of 0.9, so we don't have too many confounding variables in trying to unregress performance (we have regressions already, as well as stability bugs and other things to fix). waterson sees a 5-6% slowdown on a low-end machine running jrgm's tests, but his results need to be confirmed. I understand he's gonna help sort out the performance effects with you. Until they're well in hand, and especially right now, please do not check this patch into the trunk." It sounds like you guys have sorted things out, and that the performance effects are well in hand. Sorry to leave any doubt in your mind about checking into the trunk, but really, I don't think you can say that drivers was clearly against you checking in a 1.7% performance hit, since you didn't know that was the size of the hit when I wrote my comments. Please be fair here, and by all means check in. /be
Time to weigh in here. I am of the opinion that all of the efforts taken so far to reduce the style system's footprint, although admirable, are moving in the wrong direction. None of the patches that have been produced have addressed the fundamental problem of the style system, namely that it eagerly resolves all possible values for every style context. The sharing strategies employed thus far continue to work under a fundamentally broken assumption: that all style data must be computed and only after a match is found can data be thrown away. This patch adds even more complicated sharing logic, and although it achieves a runtime footprint reduction, it does nothing to reduce the obscene amount of churn that the style system generates as it resolves all of these property values. I have been working on a proof of concept for a new system of rule matching that uses lazy resolution and that uses other tricks to reduce both footprint and performance costs. I've gotten far enough with the code that I'm confident it will be not only a large performance win, but that it will also be a large footprint win. I would like to work on developing this new rule-matching strategy for real, but the problem I have now is that I'm working off the current trunk, and have touched the same files (in particular, I've rewritten style contexts and the way rules are matched). If this patch goes in, I'll be in a world of hurt from a merge perspective. Given that I'd like to continue work on this new system, I'd prefer it if this patch did not go into the tree. I don't think it moves in the right direction, and I don't think it addresses the fundamental performance problems that plague the style system.
Actually, if you want to check it in, I won't hold it up. I'll just deal. :)
If it is a bigger performance and footprint win, as well as more simple and correct in logic and is within a timeframe of landing in, oh, say the next 2-3 weeks without getting pushed back and back and back further for testing, then I say wait for it. Perhaps Pierre should take a look at your work Doug and see if it warrants scrapping the landing coming late today?
I checked in. Dave: As long as we have a StyleContext, you and everybody else will benefit from the change of API. You can overwrite my version of nsStyleContext.cpp with yours and update the mutable APIs. It should not take you more than a couple of hours to do so. I did it a couple times when switching between the previous version and the new one. About the process, I think it has always been clear that Marc, myself and now Daniel welcome and encourage anybody to help on the style system. You should have contacted us way sooner, before doing any implementation. We learned by chance last Sunday that you were in the midst of some serious rework in the style system by reading a chat on IRC: <hyatt> argh <hyatt> com sucks <hyatt> darnit <hyatt> the style system uses too much COM <hyatt> i'm trying not to comify my new rule nodes. <hyatt> and i can't get around it <bryner> hyatt: what are you rewriting? <hyatt> rule matching. <bryner> do you have a way to speed it up? <hyatt> hell yes <hyatt> and to make it take much less memory I asked you what was going on... >You mentioned in IRC that you were speeding up rule matching. Is it internal >to XUL only? Or is it something like nsICSSPseudoComparator which changes a >little bit the style system in order to make XUL much faster? Or something else? > >Just curious. The nsICSSPseudoComparator is maybe not the most elegant thing >one can come across in the style system but it is very localized and so much >worth it! Are you working on something like that? And you replied... > I'm doing what I talked about with you and Marc in an earlier email... > namely replacing the mRules member of StyleContextImpl with a lexicographic > tree. Right now every style context holds on to all the rules it matched as > an nsISupportsArray... I'm replacing that with a structure that enables > sharing... I, and apparently Marc too, don't recall this mail. Or at least we were not aware that you had started an implementation. Besides, the 'mRules' member is one of the things that Marc and I did not change. It is fairly separate from the work we did when reducing the memory footprint of the computed style. As I said, it will take you a couple of hours to update your tree. I can give you some pointers if you want. For the rest, you have a CVS account and you can checkin pretty much what you want as long as you have it approved by the drivers and go through the same kind of testing we did for StyleContext and StyleData sharing, but I think you are missing some opportunities by not collaborating with those who usually work on that code, not to mention the risk of redundant work.
Apparently the bloat went down 3 Mb on Linux and 4 Mb on other flavors of Unix. I'm stunned, I did not expect the numbers to be so high. People say the application still works. Let's wait for some more testing and performance metrics from QA.
Here is the email that I sent to you, brendan, attinasi, and waterson. Maybe you don't recall it, but I sent it straight to you, and you even responded to it. ======= I was thinking about the style system sharing stuff last night, and I came up with an idea on how to generalize techniques used in the outliner widget to make our style sharing code even faster. What I would propose is that the style system have two levels of cache: the first line of defense would be a new cache that I'm going to outline momentarily, and the second line of defense would be the style sharing code that pierre and attinasi have done. This new cache complements the existing cache, and both are necessary for maximal footprint reduction and allocation reduction. I'll call the new cache the level 1 cache, and the existing cache, the level 2 cache. Here's the basic idea for the level 1 cache. The style system maintains a state machine that treats a sorted list of rules as an input word. This state machine transitions on the rules in the input word until all have been processed. The resultant final state contains all of the style data for that input word. On a miss, i.e., when no style data is found at the final state, you then check the level 2 cache. You have to allocate the style data and peform t he CRC computation to find matching style data structs that can then be cached at the final state in the level 1 cache. The first level cache will allow us to avoid even doing the allocations of temporary data that we then throw away, and we won't have to do a CRC computation (which gets expensive since there are many things to compare). Thoughts? I would be very interested in translating the outliner state machine into the style system if you guys think it's a good idea. I have all the data structures for this already implemented and tested. I think this works really well, because it complements the code you guys have already written. Level 1 (the new cache) makes sure the same set of rules gives you an immediate answer with no allocation. Level 2 (the existing cache) ensures that different sets of rules that map to similar style data end up being shared. Dave (hyatt@netscape.com) ===== I presented this idea to you on 4/4/2001, one month ago. Both you and attinasi responded. I describe the exact approach in this email of treating the sorted list of rules as an input word, fueling a state machine/lexicographic tree. To claim that I have been working on this in a vacuum without mentioning my ideas is completely ridiculous.
Attinasi's response to the input word paragraph. Again, you are copied on the mail along with brendan and waterson. ============== "This sounds like a very nice mechanism. I'm wondering if it is essentially similar to the sibling style sharing that happens now, before the style context data sharing phase [1]. Currently, the style system looks at all child style contexts to see if there are any with exactly the same rules (comparison by hash, but not a CRC) and if it finds such a context, it shares it immediatly, without having to fully resolve style for it. Your level 1 cache sounds like a more global version of this (not restricted to just sharing among siblings)." ==================
> Apparently the bloat went down 3 Mb on Linux and 4 Mb on other flavors of Unix. > I'm stunned, I did not expect the numbers to be so high. People say the > application still works. Let's wait for some more testing and performance > metrics from QA. If you're basing this on the bloat numbers from tinderbox, I think they went down because you didn't add logging for the classes that are no longer part of the StyleContextData object and you removed (#ifdef-ed out) the logging for StyleContextData.
David (Baron), I may have been wrong but the calculation is as following: in StyleContextImpl::SizeOf(), we had the size of everything that belong to the style context, minus the style data which is counted separately (line 3970) // count the style data seperately if (mStyleData) { mStyleData->SizeOf(aSizeOfHandler,localSize); } Then inside nsStyleContextData::SizeOf(), we had the size of all the style data structures that are not null (line 2344) // add the sizes of the individual style blobs if (mData.mBlobs.mFont) aSize += sizeof(StyleFontBlob); if (mData.mBlobs.mColor) aSize += sizeof(StyleColorBlob); if (mData.mBlobs.mList) aSize += sizeof(StyleListBlob); I did add logging for the blobs and I ifdef'd out counting the StyleContextData in the StyleContextImpl otherwise it would have been counted twice: once in line 2342 in nsStyleContextData::SizeOf(): // get the size of an empty instance and add to the sizeof handler aSize = sizeof(*this); ...and a second time in line 3955 in StyleContextImpl::SizeOf() // get the size of an empty instance and add to the sizeof handler aSize = sizeof(*this); Where is the bug?
Read "we add" instead of "we had". Twice. Time to sleep.
StyleContextImpl::SizeOf has nothing to do with the bloat logging done by nsTraceRefcnt. That uses MOZ_COUNT_CTOR/MOZ_COUNT_DTOR or NS_LOG_ADDREF/NS_LOG_RELEASE. You commented out (I think) the NS_LOG_ADDREF/NS_LOG_RELEASE pair for the StyleContextData, and you didn't add any MOZ_COUNT_CTOR/MOZ_COUNT_DTOR for it or for any of the other structs that are now separately allocated.
Gasp! Correct. But at least the StyleContextData is counted as part of the StyleContext (with a waste of 1 pointer per StyleContext). The blobs however are *not* counted.
dbaron, please review. If my estimates are correct, and if the Blobs indeed take 3 to 4 Mb, the gains should be around 500K to 800K. It will be much more when we address the performnce problems and re-enable StyleContext sharing.
David (Hyatt): I remember this discussion but you were then talking about complementary mechanisms (level 1 cache & level 2 cache) and yesterday, you described what Marc and I had done (the level 2 cache) as "moving in the wrong direction, not addressing the fundamental problem, and working under a fundamentally broken assumption". The change of tone made me assume that you were off on something completely different. This being said, reducing the memory footprint by 40% then by 60% doesn't seem to me as moving in the wrong direction. Ok, too bad the two things are not enabled at the same time yet, but what kind of savings do you think you will get, and how does it compare with the proposal I sent last month? I propose we get together tomorrow, put down ideas, list problems, define a roadmap, and open a separate bug report or a thread on the newsgroups to allow other people to participate. I really am excited about what you are working on, I just don't see the antagonism with the work that has been done by different persons over the past few years, and as often I believe that it is out of exchanges and compromises that the best solutions can be found. Especially after a good fight. Take your bat.
I fixed the nsTraceRefcnt but the tinderboxen still show a gain of 2Mb, sometimes 3Mb. I think all the structures are accounted for, if someone wants to double check. Otherwise it's not 4Mb but I'll still take it. An interesting test would be to get the bloat stats for StyleContext + StyleData sharing. Any volunteer? Enable SHARE_STYLECONTEXTS in nsIStyleSet.h line #47 and recompile Content, Layout and Editor.
r=dbaron on attachment 33121 [details] [diff] [review], although technically you should have a MOZ_DECL_CTOR_COUNTER for each one as well (but it's currently a no-op and I don't think that's likely to change)
Blocks: 78961
I checked in the MOZ_COUNT_CTOR yesterday (my apologies for this crass ignorance) and as I wrote, the bloat seems to be 2 or 3Mb lower. Here is the data I collected from the various tinderboxen: The "before" numbers comes from before my checkin on 05/03 at 06:12. The "after" numbers comes from after my checkin on 05/03 at 19:18. ------------------------------------------------------- T-BOX OS BEFORE AFTER GAIN ------------------------------------------------------- coffee linux 26 - 27 24 -(25) -2 (+) shrike linux 29 -(30) (27)- 28 -2 (-) speedracer sun 27 - 28 24 - 25 -3 bismark sun 29 26 - 27 -2 (+) cement irix 31 29 -2 monkeypox linux 29 26 -3 muerte bsd 30 27 -3 nebiros sun 29 26 -3 senna linux 29 26 -3 ------------------------------------------------------- 2Mb or 3Mb less bloat is enough of a step in the right direction. Closed as Fixed. The debate shall continue under bug 78961 and in the n.p.m.style newsgroup at news://news.mozilla.org/3AF314D7.4C896990%40netscape.com I would like to thank all the 34 people copied on this bug for their interest and their participation, with special thanks to the QA folks for their dedication, jrgm for his kindness, Brendan for his wisdom, and Waterson for his volunterism. If you want to continue to follow the debate, don't forget to add your email to the CC list under bug 78961.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I actualy does not see any mem usage improvements... For me it is now worse than it was. I can go very quickly to 70Mb mem used by mozilla after the check-in. Can someone confirm? PS On Win98.
Whiteboard: [whitebox]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: