Closed Bug 43457 Opened 24 years ago Closed 23 years ago

MEM: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: attinasi, Assigned: pierre)

References

(Blocks 1 open bug)

Details

(Keywords: embed, perf, Whiteboard: [whitebox])

Attachments

(21 files)

1.27 KB, text/plain
Details
175.62 KB, patch
Details | Diff | Splinter Review
1.02 KB, text/plain
Details
255.84 KB, patch
Details | Diff | Splinter Review
137.10 KB, text/plain
Details
25.98 KB, text/plain
Details
8.65 KB, text/plain
Details
137.20 KB, text/plain
Details
252.90 KB, patch
Details | Diff | Splinter Review
257.70 KB, patch
Details | Diff | Splinter Review
15.74 KB, image/gif
Details
252.95 KB, patch
Details | Diff | Splinter Review
11.40 KB, text/plain
Details
6.31 KB, patch
Details | Diff | Splinter Review
281.25 KB, patch
Details | Diff | Splinter Review
11.96 KB, patch
Details | Diff | Splinter Review
15.10 KB, image/gif
Details
14.43 KB, image/gif
Details
15.10 KB, image/gif
Details
207.66 KB, patch
Details | Diff | Splinter Review
5.93 KB, patch
Details | Diff | Splinter Review
Style Context Data sharing currently saves a lot of memory at the expense of a 
moderate performance hit. We need to improve the sharing design to minimize the 
performance hit and to increase the space savings.

Some initial ideas include:
- Allocate StyleContextData out of a Pool or Arena to minimize heap allocations
- Share style data across documents: presumably this means storing the style 
context cache (or a style data cache) at a more global scope than the style set 
currently used to root the collection.
- Enable the FastCaching algorithm: this requires closing the encapsulation-hole 
of GetMutableStyleData and having the style system manage changes to style data 
effectively (updating the CRC, updating the style context cache, possibly 
validating the style data is consistent)

This bug is intended to track ideas and log results of experimentation and 
prototyping of designs that improve style context sharing in terms of 
performance and memory footprint (including bloat).
Added depends on bug 39618 since there is valuable information in that bug that 
pertains to this one.
Status: NEW → ASSIGNED
Depends on: 39618
Target Milestone: --- → M20
Adding myself to the Cc list...
Adding perf keyword
Keywords: perf
Moving all non-nsbeta3 bugs to future milestone: these will be worked on after 
beta3/rtm.
Target Milestone: M20 → Future
adding myself to interest list
Keywords: embed
Setting milestone.

Adding additional idea too: Sharing is now all or nothing (all style data for a
context is shared or not). A finer granularity of sharing, where each style
struct can be shared seperately, would increase the sharing potential.
Target Milestone: Future → mozilla0.8
I broke up nsStyleSpacing into 4 smaller structs (nsStyleMargin, nsStylePadding, 
nsStyleBorder and nsStyleOutline).  It is the first step to reduce the memory 
footprint as described in
       news://news.mozilla.org/3A5A072F.5F0B752D%40netscape.com
Attached file list of modified files
I have reviewed the code and I have only a couple of minor questions/comments:

Zeroth, I'm amazed at the amount of changes, and the lack of problems - well done!

First, I noticed that nsButtonFrameRenderer uses two methods for getting the
borderAndPadding, one is to get border and padding individually and then combine
them, and the other is to use the new shortcut
eStyleStruct_BorderAndPaddingShortcut. Is this intentional? I would have thought
that the shortcut would be the way to go in all cases.

Secondly, I started seeing this #ifdef XP_MAC \ pragma mark - \ #endif construct
with no documentation: what is it, and should it be removed or commented?

Third, it looks like nsCSSRendering has an 'overloaded' argument type now, since
there are casts of aBorderStyle to nsStyleOutline. This is seriously ugly, and
should probably be changed to pass in both types and check for one being null.
At the very least, document the hackery.

Fourth, I saw several examples where there is a call like
'this->CalcBorderPadding(...)'. Why bother with the 'this->'? It is equivalent
to 'CalcBorderPadding(...)'. See the nsGfxCheckboxControlFrame.cpp diff for the
first example of this construct.

Fifth, nsHTMLReflowState.h introduced forward declarations for
nsStyleBorder,Margin, and Padding, but none was there previously for
nsStyleSpacing. I'm curious why we need the forward declarations now.

r=attinasi, assuming these points are addressed, especially point (3).
1) Fixed.
2) "#pragma mark -" puts separators in the list of functions of the Mac editor.  
You can see them in other particularly long files that are often modified by Mac 
folks.
3) Correct, I made uglier something that was already there.  It's fixed.
4) Fixed, but this notation can be found in many places.
5) There *was* a forward declaration of nsStyleSpacing.  You comments allowed me 
to make a global search on nsStyleSpacing and to remove it entirely, including in 
comments.
Performance regression caused by this checkin is in bug 66263
Important Comment from bug 66263

------ Additional Comments From Axel Hecht 2001-01-24 23:24 -------

This checkin has caused a regression in the XSLT code, which caused some worries
to both Peter and me over the last few days. The backing out made our life
happy again.
To be specific, we couldn't style a xhtml body tag anymore.
If code like the one backed out should go back in, could you (pierre?) please
speak out load, so we can test in advance?

Thanx

Axel
Moving to Mozilla 1.0 - this cannot be completed in the next week or two...
Target Milestone: mozilla0.8 → mozilla1.0
I tried to apply this patch (to test XSLT), to no avail.
I got around the whitespace mess, but the filenames are just not found.
I need a commandline to apply this on unix, I tried all the commandline opts
I know.
Btw, XSLT is targeted to go into the default builds for 0.9, if this patch is
the code that caused us to regress this will be pretty serious.

Axel
Is the performance problems solved now?
It seems that the patches have been checked in on Feb 7. Does XSLT work now?

This also caused some troubles for us trying to break content from layout, but I
believe peterv has solved them for that case(?)
XSLT didn't regress this time.

Axel
From an email about some profiling data:

Brendan Eich wrote:
- 27% of this time (9msec) is spent in 70 calls to
nsXBLBinding::GenerateAnonymousContent(). About half of this
time (5msec) goes to nsXMLElement::CloneNode() (copying the
binding).

- 20% of the time (17msec) is spent in
StyleSetImpl::ResolveStyleFor().

- Most of the time (9msec) is spent in 160 calls to
NS_NewStyleContext(). The single biggest consumer is
AccumlateCRC() (1msec).
1msec is not big in light of other numbers, but it's still a lot of cycles.

Marc, wouldn't a rotating xor a la
http://lxr.mozilla.org/mozilla/source/js/src/jsdhash.c#74 be sufficient?  I
don't see the need for full-on Galois field arithmetic here! I'm all for making
the CRC calculation faster, however we do need to make sure that we still get a
good ration of unique CRCs to total CRCs. In fact, I bet we can also limit the
CRC calculation to only select fields int he StyleContextData structures, and
still get most of the benefit.

Me thinks that this is worth looking into - being carefule to track the CRC
uniqueness.
I will soon have another big checkin.  Reassigned to myself.

Set milestone to mozilla0.8.1 (March 13th).

Assignee: attinasi → pierre
Status: ASSIGNED → NEW
Priority: P3 → P2
Summary: PERF: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation → MEM: Style Context Sharing needs to be improved to increase sharing and reduce performance degradation
Target Milestone: mozilla1.0 → mozilla0.8.1
Moving to mozilla0.9
Target Milestone: mozilla0.8.1 → mozilla0.9
Blocks: 71874
I've got something that apparently works nicely.  The StyleData, which accounts 
for about half the memory used by the style system, is reduced by 65% or 70%.  
The overall reduction for the style system is around 35%, more on complex pages.  
The reduction can reach a couple of hundreds K on pages like Netscape.com.  
Apparently again, the performance wasn't hurt much, if at all.  I'll post more 
accurate data later.  I'm going to attach the patch for other people to start 
play with it.
Attached file list of modified files
For those who want to review the code or test the diffs, the basic principles are 
the following:

1) Change of interfaces to modify the style data:
GetMutableStyleData() was replaced by 2 methods: ReadMutableStyleData() and 
WriteMutableStyleData().  Look into nsIMutableStyleContext.h for more info on the 
helper classes and the implementation.

2) Memory footprint reduction:
In nsStyleContext.cpp, the nsStyleContextData now contains pointers to the 15 
individual style structures, instead of the structures themselves.  When a 
pointer is null, it means that we are inheriting the data from the parent.

3) Simplification of the code:
The 15 pointers in nsStyleContextData are stored as a union of individual 
pointers and an array of generic pointers.  The array allows to loop through the 
structures instead of duplicating the code 15 times in switch() statements.  This 
simplification required a slightly unfortunate change in the definition of 
nsStyleStruct (in nsStyleStruct.h) where we now expose methods that should be 
hidden from everybody but the StyleContext.  It's a small drawback for a big 
gain.
I'm trying to build with the latest patch, and I get

mozilla/content/base/src/nsStyleContext.cpp:4218:35: nsStyleContextDebug.cpp: No
such file or directory

Did you miss a file in the diff?
In nsStyleContext.cpp, methods StyleDisplayImpl::SetFrom and
StyleDisplayImpl::CopyTo, you haven't made the same change within the #ifdef
IBMBIDI block that you made in the surrounding code.

Another tiny point: at the very beginning of the same file, there is a
regression from "Bidi" to "BiDi" in a comment. Please change this back to "Bidi"
to retain consistency. 

Thanks to Mike and Simon for pointing out these errors.  I'm going to attach the 
files themselves instead of the diffs.

Note that the debug features in nsStyleContextDebug.cpp/.h can only be activated 
on the Mac for now.  Someone has to adapt IsTimeToDumpDebugData() for other 
platforms.
Attached file nsStyleContext.cpp
Attached file nsStyleContextDebug.h
nsStyleContextDebug.cpp and nsStyleContextDebug.h should be stored in the same 
directory as nsStyleContext.cpp (mozilla/content/base/src)
Adding myself to the list.
Rather than exposing some methods, couldn't you make StyleContext a friend of
nsStyleStruct ? (Didn't look at the patch, so dunno if this makes sense.)
If you broke the XUL stuff, you'll know just by running the product.  If a
navigator window and a prefs window comes up and looks ok, then the XUL struct
must work.
Heikki: I could have declared the methods private and make StyleContextImpl and 
nsStyleContextData friends of nsStyleStruct but it seemed to me that it wasn't 
much of an improvement.  It would also require to move some debugging code into a 
wrapper class and declare it as friend of nsStyleStruct.
Dave & others: Here is yet another version of nsStyleContext.cpp that fixes some 
obvious problems with XUL...
Blocks: 72381
Pierre, what testing have you done on this? Specifically, have you run the
style/block/table regression tests? Have you run John Morrison's Load Time
tests? Just curious.
I did not test against any test suites yet.  Please check Block & Tables as they 
require a Window machine and I'll take care of the CSS test suites.

The tests I did until now, and that I just completed, where about memory 
footprint and performance in Debug and Optimized builds, before and after my 
changes.  As you can see, we have a slowdown of around 5% as a trade-off to 
reduce the memory by 30% to 40% in the Style System overall, which represents 
about 80k for Yahoo, 160k for CNN and Netscape and 275k for MSN.

The savings for the chrome are a bit disappointing, around 15%, which represents 
64k for Navigator and 100k for Messenger.  The percentage is so low in the chrome 
because most of the memory allocated for the style system in XUL files is taken 
by CSSDeclarations and StyleRules.  The XUL files generate between 2 and 2.5 
times the average number of declarations and style rules that we find in web 
pages.


               Slowdown/DBG(%)   Slowdown/OPT(%)   Footprint(%)   Footprint(Kb)
Yahoo               2                 2                -30            -83
Slashdot            4                 6                -35           -140
CNN                 5                 0                -31           -158
Netscape            1                 2                -35           -170
MSN                 3                 5                -40           -275
Navigator.xul      n/a               n/a               -14            -64
Messenger.xul      n/a               n/a               -18           -100


                CSSDeclarationImpl    CSSStyleRuleImpl
Yahoo                 332                   582
Slashdot              339                   584
CNN                   466                   710
Netscape              380                   625
MSN                   402                   644
Navigator.xul         928                  1389
Messenger.xul        1002                  1457
Moving to correct QA Contact- here ya go Ian!
QA Contact: ckritzer → ian
I see no r= or sr=, but understand from pierre that marc has not tested the
patch (or perhaps neither tested nor reviewed it?  The message forwarded to me
isn't as clear as an r= in the bug).

Has anyone tried the patch on Linux?

Should these changes carpool in, to get verification and followup fixes if
needed?  Today is not a good day to avoid a carpool.

/be
I posted a request for review last Monday but got basically no response.  My 
partner Marc is way too busy fighting other fires in Layout.

Could someone please help me to review these changes?  I would especially 
appreciate if someone could build and play a bit with Mozilla on Unix.  Thanks!
Sounds like testing is more important than reviewing at this stage.  Pierre, I
can try an up-to-date patch -- should I just detach the diffs and the whole
files and use them, or are they now slightly out of date with respect to top of
trunk?

Cc'ing jrgm, who should run the patch too, if he's able.

/be
To build it, I need (?):
attach_id=28072 -- diffs to layout/content, but remove the 
                   nsStyleContext*.(h|cpp) stuff
attach_id=28118 -- nsStyleContextDebug.cpp   
attach_id=28119	-- nsStyleContextDebug.h     
attach_id=28142	-- nsStyleContext.cpp        (updated version)
where the last three are whole files, and go in mozilla/content/base/src.
Is that correct?

I will do this on win32, as my linux box is (currently) the test
server, and my mac build-fu is weak.

However, if someone can, running the gecko perf tests (which are more specific 
to this bug); and the block/table/layout regression tests would be good if 
people are concerned.
I haven't build in the past 4 days but according to Bonsai & CVS, it should still 
be ok.  To merge the changes:
- apply the diffs
- add nsStyleContextDebug.cpp/.h to mozilla/content/base/src
- overwrite nsStyleContext.cpp
sorry. I didn't get to this quite yet, but I will do it by 3pm tomorrow.
once we get a build of some sort, we can coordinate to run the
block/table/layout regression tests which Pierre wasn't able to.  I'll get this
coordinate with the cia@netscape.com group.  Thanks.
I checked in nsStyleContextDebug.cpp/.h and updated my local tree.  I'm going to 
attach a new set of diffs in a single file.  To try the new code, just do an 
update and apply the patch.
Sorry, I finally got these run done. As pierre mentioned, the times are
about 4% slower, on either the first load (variable range), or a subsequent
visit when cached (fairly consistent across the board). (This was for a 
tree pulled ~11pm last night, built, ran test, patched, built, ran test 2, 
on win2k).
Hrm, that's a pretty steep price to pay. What were the savings again?
jrgm, 
still got that build? 
can you provide a pointer to it?
how 'bout we get curt to to a special mem test run to look at that graph?
Yes, I still have that build on win2k. It's not on a shared drive, but I 
can make it available (eek! virus). But, those memory tests run on linux,
don't they? Are you really looking for a linux build (which waterson made).
I can potentially do a round of builds on three platforms for this once we get 
scc's string changes tested and in.
I have a fix for the two regressions detected by QA in the test builds that were 
made last week.  I sent Leaf the new diffs this morning in order to generate new 
test builds.  I'm going to attach them there too.
People started to inquire about performance lately.  Here is a copy...


Chris Waterson wrote:
> 
> Asa Dotzler wrote:
> 
> > It looks like Pierre is working to get the Style Context Sharing changes
> > landed for Mozilla 0.9.  drivers is looking for input on this one.  Do
> > you think it's something we want to do.  Without further objection I
> > suspect that they are going to try to land it this week.  I'm getting
> > ready to start waving off major landings until after 0.9 real soon now
> > but this could make it in sooner than that.
> 
> > --Asa
> 
> Pierre, have you fixed the performance problems yet? If so, then let's
> take the changes if you can land them by tomorrow.
> 
> If not, then I think these should be waved off for mozilla-0.9. We're
> simply out of time to fix the performance problems. (I'd originally
> endorsed ``just landing the changes'' because the landing date was early
> last week.)
> 
> My $0.02.
> 
> chris


It was never in my plans to fix the 4% performance drop before landing
because nobody objected that the trade-off, memory footprint vs performance,
was not very much acceptable.  Phil inquired about the perf yesterday.  Here
is my response (you are mentioned in it, Chris).  It also describes two ways
to improve the speed a bit: one which consists in tuning up the memory
allocators (sfraser usually does that on the Mac, I don't know for other
platforms), another in adding arenas.

As for checking in, I've been waiting for the Build team to make some test
builds since Thursday morning.  They promised them for sometime today,
Tuesday.  QA needs to do a last round of verifications, then I'll checkin. 
When doing a first round of verification 10 days ago, QA found 2 regressions.
 It's fixed now so if you like I can skip this second round of tests and just
checkin. 

Pierre


Pierre Saslawsky wrote:
> 
> Phil Peterson wrote:
> 
> > Chris and Pierre, I heard some talk today that the CSS footprint work
> > comes with some performance degradation. Is this the case? If so, could
> > we talk sometime tomorrow about the details of how much heap space is
> > saved and how much performance is lost? I'd like to understand this
> > before the work gets checked in.
> >
> > -- Phil.
> 
> Basically, we save 70% of the StyleData which represents half the footprint
> of the style system.  It means that we save 30% to 40% of the entire memory
> used by the style system.
> 
> The savings represent approximately:
> 
>    * 80k on Yahoo
>    * 160k on Netscape and CNN
>    * 275k on MSN
>    * 64k per empty Navigator window (no sidebar)
>    * 100k per empty Messenger window (no lists, just the chrome)
> 
> I originally estimated the slowdown at less than 5%.  John Morrison
> confirmed that it is in fact 4%.  Note that this degradation has been
> measured without doing any optimization at all.
> 
> Chris Waterson had the same question and concern as you:
> 
>      ------- Additional Comments From Chris Waterson 2001-03-27 23:59
>      -------
>      Hrm, that's a pretty steep price to pay. What were the savings
>      again?
> 
> When I gave him the numbers, his response was:
> 
>      Chris Waterson wrote:
>      >
>      > Wow, that's great.
>      >
>      > chris
> 
> Anyhow, the easiest way to improve the performance a bit would be to tune
> up the memory allocators to reserve larger pools of blocks that match the
> individual style structures (we no longer allocate the StyleData as a
> single block of 800 bytes but as 14 or 15 smaller blocks that correspond to
> the style structures like font, margin, color etc...).  Tuning up the
> memory allocators is something we should always do before shipping a
> version anyhow (and that we did, afaik).  It would be a good idea to do it
> after I checkin as the pattern of the block sizes we allocate across the
> app would change quite a bit for at least 5 of them (out of the 15).
> 
> Another way to improve the performance would be to get these style
> structures out of arenas.  It involves a little bit of work.  I haven't
> looked into it but at first sight it's something that can be done in a few
> days.
> 
> Below are the comments and chart from John Morrison:
> 
>      ------- Additional Comments From John Morrison 2001-03-27 21:35
>      -------
>      Sorry, I finally got these run done. As pierre mentioned, the
>      times are
>      about 4% slower, on either the first load (variable range), or a
>      subsequent
>      visit when cached (fairly consistent across the board). (This was
>      for a
>      tree pulled ~11pm last night, built, ran test, patched, built,
>      ran test 2,
>      on win2k).
> 
> More details are at http://bugzilla.mozilla.org/show_bug.cgi?id=43457
> 
> Pierre
> 
> 
> [Image]
> 
> 
>
A point to consider is that when I landed the first part of these changes back in 
February, I did a 1-line fix that improved the overall performance by 5%.
Could we consider the two parts of my changes as a whole, and say that a 5% 
improvement minus a 4% slowdown is still a 1% improvement?

Another idea: why don't we land my changes and disable the style context sharing?  
Mine reduce the StyleData by 30 to 40%, the style context sharing by 15 to 40%.  
Mine cause a 4% performance drop across the board, the style context sharing a 
10% performance drop inside the style system (http://bugzilla.mozilla.org/
showattachment.cgi?attach_id=10050).  We need to figure out what was the overall 
performance degradation of the style context sharing across the app.  It was 
enabled them on December 14th 2000 according to the bug report.  Was there any 
page load measurements taken back then?
Pierre has an interesting idea here, about disabling style context sharing when
we enable his style data sharing. However, it is not clear what the savings
would be for style data sharing if style context sharing is disabled - it may
not yield the same level of memory benefit. Also, style context sharing is
actually faster than what I had originally measured. This is largely due to the
faster StyleContextCache being put in, which I believe resulted in only a 5%
slowdown for Viewer (and even less for SeaMonkey, reported anecdotally). I'd
love to see what the performance cost of style conext sharing is for SeaMonkey,
and now we have the tools to do that (eg. jrgm).

I suggest running tests to characterize the performance and memory gains/losses
across two dimensions:

With and Without context sharing
With and Without data sharing

If we see that the footprint gains due to style context sharing alone are less
than the gains due to style data sharing alone, and the performance gain from
disabling style context sharing make style data sharing's performance loss
acceptable (or zero), then we should turn off context sharing and turn on style
data sharing. I think this is a good approach because unlike the style context
sharing can be enabled or disabled with a #define in one file, so it ie easy to
keep in the tree while we work on performance issues in style data sharing. (I
tried to make this more confusing, but this is as obfuscated as I could get it...)
[mid-air collision - resubmittig - however, I noted that marc raised similar 
points to mine]

BTW, there hasn't been much comments about the interaction of your "StyleData 
sharing" and the previous "StyleContext sharing". Could you brief us a little 
bit on what is happening on this front? Is there any inter-dependence? When both
are used, do your changes entail an extra overhead in the implementation of the
style context sharing? If both are used, does that lead to "double" (or some
other factor) extra savings. Do your "low-level" styledata changes provide an
independent alternative that aptly removes the need for the most expensive 
"higher-level" style-context sharing?  [I am just asking these questions as an 
indication of the things that some might find useful to understand. Thanks]
StyleContext and StyleData sharing are independant.  I'll try to get some numbers 
today. 
Here is the first batch of numbers...

In terms of memory footprint, the StyleData sharing represents an improvement of
7% to 21% over the StyleContext sharing.  I compared the overall size used by the 
style system.  If we compare just the style data, the difference is twice
as much.  I collected the numbers from 3 test pages only: Yahoo, Netscape and 
MSN.

Now I have to compare the performance.  That's where I fear some problems because 
we have much more memory allocations with the StyleData sharing.  We'll see... It 
will take a couple of hours: I have to make some optimized builds.



================================================================
                         Count TotalSize MinSize MaxSize AvgSize
================================================================
Yahoo - StyleContext sharing only
    StyleContextData       177    135228     764     764     764
    StyleContextImpl       295     20954      56     110      71
    *** Total ***         2459    291325
----------------------------------------------------------------
Yahoo - StyleData sharing only
    StyleContextData       293     74164      68     884     253
    StyleContextImpl       293     20810      56     110      71
    *** Total ***         2573    230053
================================================================
Netscape - StyleContext sharing only
    StyleContextData       369    281916     764     764     764
    StyleContextImpl      1035     72902      56     138      70
    *** Total ***         3613    508247
----------------------------------------------------------------
Netscape - StyleData sharing only
    StyleContextData      1035    247012      68     884     238
    StyleContextImpl      1035     72902      56     138      70
    *** Total ***         4279    473279
================================================================
MSN - StyleContext sharing only
    StyleContextData       587    448468     764     764     764
    StyleContextImpl      1418    105952      56     110      74
    *** Total ***         4324    716391
----------------------------------------------------------------
MSN - StyleData sharing only
    StyleContextData      1411    319808      68     884     226
    StyleContextImpl      1411    105464      56     110      74
    *** Total ***         5141    587179
================================================================

Results:
    Yahoo    Data/Context = 230053/291325 = 79% or -21%
    Netscape Data/Context = 473279/508247 = 93% or -7%
    MSN      Data/Context = 587179/716391 = 82% or -18%
A quick food for thought: if the style data are shared (i.e., using pointers to
the actual values), then may be the style context sharing logic can be re-worked
a little bit so as to determine the sharing based on these shared data pointers.
This way, no cost will be incurred in accummulating the CRC with the actual
values. Seems like combining the two in this way might be of some help in the
-4% perf problem.
It's already the way it works now: the CRC is not computed for these style data 
that are inherited from the parent.  But, wait, I've got some good news.

Here is the second batch of numbers...

In terms of performance, the StyleData sharing represents an improvement of
2% to 4% over the StyleContext sharing.  I compared the overall performance
of the application.  If we compare just the time spent inside the style system,
the difference should be at least 3 times higher.  I collected the numbers
from 2 test pages only: Yahoo and CNN.  I could not test Netscape and MSN
like before because of a regression that was in the tree when I checked
out last week, which causes some pages not to post a notification when the
page is fully loaded.

It appears that the removal of CRC calculations and table lookup that
we have in the StyleContext sharing has offset the larger number of
memory allocations in the StyleData sharing.

It's important to note that in my version, the nsStyleContextData is
still allocated separately from the StyleContextImpl.  If I put it
this structure back into the StyleContextImpl, we'll allocate larger
blocks overall but half as many.  This could represent an additional
improvement in performance.  I'll see if I can work something out
relatively quickly.


==================================================================
Yahoo - StyleContext sharing only
    814+812+817+817+814+815+815+815+811/9 = 814 ms
------------------------------------------------------------------
Yahoo - StyleData sharing only
    801+792+794+797+793+796+796+799+796/9 = 796 ms
==================================================================
CNN - StyleContext sharing only
    1315+1301+1363+1275+1320+1309+1262+1277+1274+1301/10 = 1300 ms
------------------------------------------------------------------
CNN - StyleData sharing only
    1238+1242+1241+1246+1254+1260+1257+1260+1244+1256/10 = 1250 ms
==================================================================

Results:
    Yahoo StyleData/StyleContext = 796/814   = 97.8% or -2.2%
    CNN   StyleData/StyleContext = 1250/1300 = 96.2% or -3.8%


Note: In both cases, I dropped 1 out of the 10 measures taken with Yahoo
because it was clearly out of bounds (something like 900ms instead of 800).
Conclusion:

Without any tune-up of any kind, the StyleData sharing represents improvements 
over the StyleContext sharing of:
    7% to 21% in terms of memory footprint in the style system overall,
    2% to 4% in terms of performance across the application.

So... may I checkin now?  I would feel more comfortable with another round 
verifications but that would require making it a priority for the Build team to 
generate some builds on Win32 and Unix, and for QA to play with them a little 
bit.  Otherwise if the time is running short, I can just checkin.  After all, 
that's the way everybody else proceeds.

Who wants to sign on this?
Assume the following situation:
- you build your own mozilla
- you have plenty of memory
- you want to tune for speed instead of size
Will there be a way to turn off these two things (style data sharing / style
context sharing)? Or is the performance loss irreversible?
From pierre analysis, it appears that StyleData sharing is faster and more 
memory efficient than StyleContext sharing. This can be understood that there is
usually more chance for two things (style contexts) to share a common "subpart" 
(style data) than to be identical as a whole. Plus, there is a cost in
determining if sharing is possible in the first place. And it appears that this 
determination is less expensive with StyleData sharing that with StyleContext 
sharing. 

So pierre, what is your call, do you recommend to enable StyleData sharing and
disable StyleContext sharing by default?
Sure, it's smaller and faster.  I'm fairly confident it won't break anything 
because we already went through a first round of QA verifications.  But a second 
one wouldn't hurt either.  I'll let the drivers decide how to proceed because it 
would require the Build and QA team to spend some extra time on it.


And of course, I'll be glad to check in.  I don't like to have the code sitting 
in my local tree.  If it's checked in, it would allow me or others to start 
playing with an idea I had earlier this week.  I'll post it here when I can get 
access to my email account again, or maybe I'll open a related bug report.  
Basically it consists in storing the style data into hash tables.  It would be a 
killer in terms of memory footprint but there are some risks in terms of 
performance.  It's worth doing some instrumentation and get some numbers at 
least.  After moz0.9, of course.
To answer Andreas's question, I think it is out of the question to disable both 
optimizations.  According to Marc in bug 39618, the StyleContexts + StyleData on 
CNN used to take about 1Mb.  With either method we can divide it by 3.
linux bits in http://ftp.mozilla.org/pub/mozilla/nightly/latest-43457/

windows builds are running now, and will appear in the same directory.
I tried out the win32-talback drop (in fact I am sending this comment with it).
No weirdness to report so far. Yep, pierre, now that instrumentation has shown 
an improved perf, your call makes sense. Indeed, the style data sharing is
"inclusive" of the style context sharing (i.e., the very fact that two style
contexts are identical means that all their style data are indentical and shared).
By single-mindely focussing on style data sharing, maybe more consolidation and
efficiency can be implemented in due time before 1.0.
So pierre is measuring a speedup, yet at one point I thought Netcsape QA
measured a slowdown (I'm guessing with jrgm's test suite). Pierre, I presume
you've only tried this on Mac? Might other platforms be affected differently?
(e.g., because malloc() is more expensive on Win32/Linux due to lock
contention?) It'd be good to undesrtand why we're seeing different reports.
(Maybe the delta is just too small to realisitically resolve with our tools, in
which case, performance should be considered a non-issue.)

Out of curiosity, what do you see on a larger pages? (e.g.,
nsCSSFrameConstructor.cpp in LXR, tinderbox, table stress test.)

Do we know the impact of this stuff on chrome? (Menus, tree, hover, tooltips,
etc.) Maybe hyatt could help you assess that.

Are you still looking for r= and sr=? Or are those covered?
Apparently this is the first time it is tested in isolation (without SC sharing)
Chris,
- It is indeed the first time StyleData sharing is tested independently from 
StyleContext sharing.
- I tested on the Mac only.
- Results may vary depending on your platform but if experience is of any 
indication, they should really not vary by a lot.  For instance, when I predicted 
a slowdown lower than 5% for SC + SD sharing on the Mac, jrgm found around 4% on 
other platforms.
- I will test larger pages later today.
- I don't know the impact on the chrome but if experience is of any indication, 
it will be good.
- There has not been any formal review because _you_ and other people declined to 
reply to my repeated requests.  So with Brendan's suggestion, I went through the 
process of a landing, with verification builds handed over to QA before checking 
in.  I also supported Marc's standpoint that for these changes a thorough testing 
would be much more important that a formal review. Anyhow, if you'd like to 
review now, please be my guest: the diffs are attached.  If you want the exact 
code, I can attach new diffs where StyleContext sharing is disabled.
Were jrgm's tests run using optimized builds with these changes, or debug? Was 
the comparison made to a debug or optimized build? Could that account for the 
performance hit?
pierre, I apologize if you felt that I ignored you. (IIRC, when you asked me to 
sr=, I replied, asking if Marc liked the changes. You said something along the 
lines of ``let's just get a Unix build to work first''.) If you'd still like me 
to sr= these changes, attach the latest diffs to the bug and I'll go through 
them.
Leaf, all the performance measurements are done with optimized bits.  The debug 
builds are at least 30% slower, often much more than that.

Chris, thanks for your offer.  I'll attach the diffs within the next hour or so.  
Look at my comments from [2001-03-18 19:30] for more info.
So to make sure I understand, the current plan is to disable style context
sharing and enable style data sharing.
  
Yes, the plan is to disable StyleContext sharing and use StyleData sharing 
instead because StyleData sharing results in a smaller memory footprint and a 
better performance.

We'll address the performance problem and re-enable StyleContext sharing later.  
The benefit of having StyleContext sharing together with StyleData sharing is 
that both optimizations work separately.  StyleData sharing reduces the memory 
used by the nsStyleContextData by 70%, while StyleContext sharing reduces it by 
60% or so.  The two reductions are almost cumulative in real life tests.

Chris: I'm going to attach new diffs with StyleData sharing only.  I had to do a 
complete update (meaning: you shouldn't have any problem to apply the patch).
Pierre, I'm confused by the new patch (30514)... Are you just removing the
#define SHARE_STYLECONTEXTS or removing the context sharing code? For example,
what happended to the GETSCDATA macros? Like I said, I'm confused.
Never mind - I see what is going on. Sorry for the noise.
I think you should talk to scc about the nsMutableStyleXXX stuff. I'd
bet he could help you design a solution that avoids the C-style casts,
and is therefore type-safe. As it stands, you're forcing a programmer
to circumvent the compiler's type-checking mechanisms to use this
stuff!

Personally, I'd have overridden |operator->()| and |operator*()|; e.g.,

  class nsMutableStyleBorder {
  public:
    nsMutableStyleBorder(nsIMutableStyleContext*) {...}
    nsMutableStyleBorder(nsIStyleContext*) {...}
    nsStyleBorder* operator->() {
      return NS_STATIC_CAST(nsStyleBorder*, mStyleStruct); }
    const nsStyleBorder* operator->() const {
      return NS_STATIC_CAST(const nsStyleBorder*, mStyleStruct); }
    nsStyleBorder& operator*() {
      return *NS_STATIC_CAST(nsStyleBorder*, mStyleStruct); }
    const nsStyleBorder& operator*() const {
      return *NS_STATIC_CAST(const nsStyleBorder*, mStyleString); }
  };

s.t. a user could do:

  nsMutableStyleBorder styleBorder(mStyleContext);
  styleBorder->mTextAlign = /*whatever*/;
  MapHTMLBorderStyle(aPresContext, *styleBorder, tableFrame);

It might even read more nicely as

  nsMutableStyleBorder styleBorder = GetStyleBorderFrom(mStyleContext);

Kinda makes it more clear that you're extracting something from the
style context, rather than ``constructing'' something. (Maybe
not. Take it or leave it.)

You could probably do this with templates and typedefs with a lot less
typing, too. (But that's really scc's domain...)



+#ifdef XP_MAC
+#pragma mark -
+#endif

What does this do?

 struct nsStyleStruct {
+  // These declarations should not be here but they make
+  // the code in nsStyleContext.cpp so much easier!...
+  // It allows to store the style data as a union of individual
+  // data structures and an array of generic data structures.
+  virtual void ResetFrom(const nsStyleStruct* aParent, nsIPresContext* 
aPresContext) {};
+  virtual void CopyTo(nsStyleStruct* aDest) const {};
+  virtual PRInt32 CalcDifference(const nsStyleStruct* aOther) const {return 
0;};
+  virtual PRUint32 ComputeCRC32(PRUint32 aCrc) const {return 0;};
 };

Should these be pure virtual? Or do you really have subclasses that
delegate up to these implementations?

Is AssignFrom() a better name than ResetFrom()? Would symmetry be
better if you had a single |CopyTo()| method to copy structures, and a
|GetDefaultsFrom(nsIPresContext*)| to fill in default values? e.g.,

  if (aParent)
    aParent->CopyTo(STYLESTRUCT[i]);
  else
    STYLESTRUCT[i]->GetDefaultsFrom(aPresContext);

There are a couple places where you currently do

  STYLESTRUCT[i]->CopyFrom(nsnull, aPresContext);

which is a bit weird, and might make more sense using the
GetDefaultsFrom() scheme. Certainly gets rid of a bit of duplicate
assignment code.


+  switch (aSID) {
+    case eStyleStruct_Font:
+      if (aPresContext) {
+        result = new DATAIMPL(Font)(aPresContext->GetDefaultFontDeprecated(), 
aPresContext->GetDefaultFixedFontDeprecated());
+      }
+      else {
+        result = new DATAIMPL(Font);
+      }
+      break;
+    case eStyleStruct_Color:          result = new DATAIMPL(Color);         
break;
+    case eStyleStruct_List:           result = new DATAIMPL(List);          
break;
+    case eStyleStruct_Position:       result = new DATAIMPL(Position);      
break;
+    case eStyleStruct_Text:           result = new DATAIMPL(Text);          
break;
+    case eStyleStruct_Display:        result = new DATAIMPL(Display);       
break;
+    case eStyleStruct_Table:          result = new DATAIMPL(Table);         
break;
+    case eStyleStruct_Content:        result = new DATAIMPL(Content);       
break;
+    case eStyleStruct_UserInterface:  result = new DATAIMPL(UserInterface); 
break;
+    case eStyleStruct_Print:          result = new DATAIMPL(Print);         
break;
+    case eStyleStruct_Margin:         result = new DATAIMPL(Margin);        
break;
+    case eStyleStruct_Padding:        result = new DATAIMPL(Padding);       
break;
+    case eStyleStruct_Border:         result = new DATAIMPL(Border);        
break;
+    case eStyleStruct_Outline:        result = new DATAIMPL(Outline);       
break;

Could you use a function table here to avoid the fact that switch
degenerates to branch-and-test? (Not sure if this is called a lot; if
not, don't worry about it.)

+static PRBool CanShareStyleData(nsIStyleContext* aStyleContext)
 {
-#define max_structs eStyleStruct_Max
+  PRBool bSharingSupported = PR_TRUE;

Hungarian notation. Yuck.

+nsStyleStruct* gStructPointer[eStyleStruct_Max][kStructIndexMax];
+bool           gStructBusy[eStyleStruct_Max][kStructIndexMax];

Please use PRBool, or PRPackedBool.

+  static bool initialized = false;
+  if (!initialized) {

Ibid.

+  short structType = aSID - 1;
+  for (short structIndex = 0; structIndex < kStructIndexMax; structIndex++) {
+    if (!gStructBusy[structType][structIndex]) {
+      result = gStructPointer[structType][structIndex];
+      gStructBusy[structType][structIndex] = true;

PR_TRUE.

+  NS_ASSERTION(false, "Can't release mutable style struct");

NS_ERROR(...)


I don't really understand why you've decided to pool the style structs
for purposes of nsMutableStyleXXX reading and writing? Why didn't you
just make a member variable in each nsMutableStyleXXX class that would
serve as the temporary storage for reading and writing the struct?


+  if (aSID == eStyleStruct_BorderPaddingShortcut) {
+/*
+    nsMargin border, padding;
+    if (GETDATA(Border)->GetBorder(border)) {
+      if (GETDATA(Padding)->GetPadding(padding)) {
+        border += padding;
+        ((nsStyleBorderPadding&)aStruct).SetBorderPadding(border);
+      }
+    }
+*/
+    nsMargin border, padding;

Use #if 0, and comment as to why this is not currently part of the code.



Index: mozilla/layout/xul/base/src/outliner/src/nsOutlinerBodyFrame.h
===================================================================
RCS file: 
/m/pub/mozilla/layout/xul/base/src/outliner/src/nsOutlinerBodyFrame.h,v
retrieving revision 1.25
diff -u -2 -r1.25 nsOutlinerBodyFrame.h
--- nsOutlinerBodyFrame.h	2001/04/11 02:44:25	1.25
+++ nsOutlinerBodyFrame.h	2001/04/12 00:15:59
@@ -29,4 +29,5 @@
 #include "nsIScrollbarMediator.h"
 #include "nsIWidget.h"
+#include "nsHashtable.h"

Why does nsOutlineBodyFrame now need nsHashTable.h?


Ok, that's all I can do for now. I'll hope marc has covered the hard stuff. :-)
I'm gonna have to back Chris up on this one, Pierre.  The preferred way to make
an object act like a pointer to another object is to implement |operator->| and
|operator*|.  For a non-pointer object, these have no built-in implementation,
so you're not removing functionality.  |operator&| is defined for _every_
object, so redefining that takes away a power permanently, and for no reason.
Someday, someone might want to pass one by address.  If you override
|operator&|, they can't.  And you can do the same job better (more easily used
by clients) with |operator->| and |operator*|; plus it'll be that much easier
for someone trying to understand your code, because that's the idiom for
smart-pointer style implementations.

Two other things Chris pointed out are worth re-iterating: (1) avoid C-style
casts; new style casts catch errors that old casts hide; and (2) |bool|, |true|,
and |false|, aren't defined on every platform I'm ashamed to say.

Good use of the stack-based transaction idiom for style modification though.  I
like it.  I only looked at the auto-style objects; I've got to go examine the
mutable-style cast situation now.  More comments when I have a better
understanding of the big picture.
1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct:
----------------------------------------------------------------------------
scc was copied on the review requests too.  My fault if I wasn't more specific 
when I contacted him because that's where his input would have been the most 
valuable: your proposal is the kind of solution that I hoped he had made.  I like 
the solution but if you don't mind, I'd prefer to implement it after checking in 
because it is only a syntactical change to enforce the type-checking.


2) GetStyleXXXFrom(mStyleContext):
----------------------------------
I prefer the current notation where it is clear that the helper classes are in 
fact constructing something, because it is the destructor that causes the 
modified data to be written in the style context.  You might have seen a couple 
of places, once in Table code and maybe once in Block layout, where I added 
blocks {...} to make sure the data was written before we continued the 
processing.


3) Templates:
-------------
Same thing as (1).  I'd say let's checkin and work on the notation later if scc 
comes with a better idea.


4) #pragma mark -:
------------------
It draws a separator in the list of functions that pops up in the CodeWarrior 
editor.  Very convenient...  You can see it in large files written or maintained 
by folks who use the Mac as main developement machine.


5) Pure virtual functions in nsStyleStruct:
-------------------------------------------
It was my idea at first, of course.  Unfortunately it would have required all the 
classes that inherit from nsStyleStruct to declare and implement these methods.  
The breakup of Layout into 2 DLLs made the task a bit more complex, but 
especially the result would have been much worse.


6) Is AssignFrom() a better name than ResetFrom()?
--------------------------------------------------
Maybe...  It has been there forever, I kept it.  I did not like it much at first 
too but I got used to it.  Other names could have been InheritFrom() or 
InitializeFrom().  I usually try to be flexible when I work on code developed by 
someone else, and after all ResetFrom() makes sense too.


7) Symetry of ResetFrom() and CopyTo():
---------------------------------------
There is no symetry between ResetFrom() and CopyTo().  The symetrical function to 
CopyTo() was SetFrom().  It was so symetrical that the code was duplicated (but 
reversed) in the 15 implementations.  One function copied member variables from |
this| to the destination; the other function from the origin to |this|.  It's 
something I described to Marc and you 2 or 3 weeks ago: I removed 
nsIMutableStyleContext::SetStyle() which wasn't used anywhere and it allowed me 
to remove the 15 SetFrom() methods, and replace the calls to SetFrom() with calls 
to CopyTo() in the 2 places where it was done.  It doesn't sound obvious at first 
but the two issues were indeed related.


8) Replace ResetFrom() with CopyTo() + GetDefaultsFrom():
---------------------------------------------------------
We cannot replace ResetFrom() with the following lines, as you suggest.

  if (aParent)
    aParent->CopyTo(STYLESTRUCT[i]);
  else
    STYLESTRUCT[i]->GetDefaultsFrom(aPresContext);

ResetFrom() defines the inheritance.  Some properties are inherited, some others 
are not.  We cannot replace that with a CopyTo().


9) CopyFrom(nsnull):
--------------------
There is no CopyFrom(nsnull), but we have some ResetFrom(nsnull) and they make 
sense.  It means that we create a style structure that doesn't inherit from any 
parent.  We need this, for instance, to resolve the style within tables in quirks 
mode (Nav4x tables act almost like embedded documents: they don't inherit 
anything but a couple of properties).


10) Replace switch() with jump table:
-------------------------------------
I believe good compilers do it automatically when the number of cases within the 
switch() reaches a certain threshold.


11) Yucky "bSharingSupported" Hungarian notation:
-------------------------------------------------
Yes, I agree.  Marc indeed developed a few bad habits in previous lives but they 
are so negligible compared to the quantity and the quality of the work he 
otherwise produces that I would have felt really bad pointing out that kind of 
things to him.  So I kept the name.


12) Replace bool with PRBool:
-----------------------------
Correct.


13) Replace ASSERTION(false) with NS_ERROR():
---------------------------------------------
Correct.


14) Member variable instead of pooling:
---------------------------------------
To minimize the impact on the existing code, I wanted ReadMutableStyleData() to 
behave exactly like the current GetMutableStyleData() with the only difference 
that the caller must call WriteMutableStyleData() when he's done with the 
changes.


15) Commented-out code:
-----------------------
Correct: I should have removed the these lines instead of commenting them out.


16) nsHashTable.h in nsOutlineBodyFrame:
----------------------------------------
nsOutlineBodyFrame always used a nsHashTable but it was getting the declaration 
from nsIStyleSet.h.  The problem is that nsStyleSet.h doesn't include 
nsHashTable.h when SHARE_STYLECONTEXTS is not defined so when I disabled the 
StyleContext sharing, I prefered to add the #include "nsHashTable.h" in 
nsOutlineBodyFrame which is the real customer, instead of always including it in 
nsIStyleSet.h, even when not needed.

====

Thanks a bunch for this very thorough review.  The actions I'm proposing to take 
are:
1) Give some builds with the current code to jrgm to evaluate the performance.  
My own evaluation shows that we can expect a small gain but following your 
recommendation, I'm going to test with very large test pages.
2) Fix the small points: #12 + #13 + #15.
3) Checkin if that's ok with everybody.
4) Ask scc if he's got a better solution than the one you proposed in #1 for 
type-checkin, plus possibly something with templates as suggested in #3.
5) Check in these changes for type-checking whenever we want or can.
6) Continue our merry way towards new adventures.

I'm not planning to change anything on the following points: #4, #7, #8, #9, #14, 
#16.

I think my explanations are sufficient on #5 but I may adapt if someone comes up 
with a better solution.  I promise I tried very hard.

I think my explanations are sufficient on the following points but I will adapt 
if you insist: #2, #6, #10, #11.
Scott: I posted my comments before I read yours.  I agree with you and Chris on 
#1.  If you prefer, I can implement it before checking in but since it doesn't 
change anything to the execution, I would not want that to cause another round of 
verification builds for QA.

About the mutable style, my main guideline in all the declarations was to have as 
little an impact as possible in the code outside the StyleContext.  
GetMutableStyleData() is called in approximately 130 places in 35 files, and the 
pointers to the mutable style are used in probably hundreds of places (and 
fortunately nowhere in a persistent way).

The thing that will make you frown the most is probably #5.  If you want to give 
it a try, you're welcome.  I sincerely tried many solutions with pure virtual 
methods or just virtual methods (like having a second base class that the 
StyleXXXImpl structures could inherit from, etc...) but given the facts that:
- customers had to be able to use an object created by the StyleContext 
(ReadMutableStyleData)
- the StyleContext had to be able to use an object created by the customer 
(GetStyle)
- customers could be in another DLL (Layout)
I believe I ended up with a very honorable solution but if there is a better one, 
I'm confident you will find it (beware of crashes when virtual tables don't 
match).
I will cast my vote for #1 to be addressed before check-in. Motivation: strike
the iron while it is hot... Now that all eyes are on this, let's make (or at
least try to make) the most of the brainstorming that is going on and the
eagerness around w.r.t. reaping the savings of this style data sharing.
> 1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct:

What rbs said: you're essentially redefining the API at this point (for all 
practical purposes, callers must use your nsMutableStyleXXX objects). If we 
don't fix this now, you'll be chasing the tree as people begin to start writing 
new code that uses the new API.

One thing that I forgot to ask last night was, what is the linkage impact of 
these new classes? Does layout need to link against content now? (I guess not, 
since you've not changed any makefiles AFAICT.) Which would mean the code 
probably gets duplicated in each of content and layout. (Tangent: we need to 
redouble our efforts to get content & layout converted to using 
NS_IMPL_NSGETMODULE so they can be recombined for release.)

> 2) GetStyleXXXFrom(mStyleContext):
> ----------------------------------
> I prefer the current notation where it is clear that the helper classes are in 
> fact constructing something,

Agreed. That makes sense.

> 5) Pure virtual functions in nsStyleStruct:
> -------------------------------------------
> It was my idea at first, of course.  Unfortunately it would have required
> all the classes that inherit from nsStyleStruct to declare and implement
> these methods.  

What style structs *don't* need this? Why?

> 14) Member variable instead of pooling:
> ---------------------------------------
> To minimize the impact on the existing code, I wanted ReadMutableStyleData()
> to behave exactly like the current GetMutableStyleData() with the only
> difference that the caller must call WriteMutableStyleData() when he's
> done with the changes.

But because of the nsMutableStyleXXX classes, no sane person would call 
[Read|Write]MutableStyleData() by hand, would they? (Are we bending over 
backwards to support a non-use case?) I'll have to read the code more carefully 
to really understand what you're trying to do here. More comments later.

1) Override the operators in nsMutableStyleXXX instead of nsAutoStyleStruct:
----------------------------------------------------------------------------
Agreed but for the reason rbs mentions.

I did redefine the API but not the way you seem to think.
The previous API was:
  nsStyleStruct* GetMutableStyleData(nsStyleStructID aSID)
Now it is:
  nsStyleStruct* ReadMutableStyleData(nsStyleStructID aSID)
  nsresult WriteMutableStyleData(nsStyleStructID aSID, nsStyleStruct* 
aStyleStruct)

The difference being that everywhere there is a Get(), we have to replace it with 
a pair of Read()/Write().  The nsMutableStyleXXX objects were defined only as 
helper classes to ease the transition of the existing code to the new API.  My 
goal was to change all the GetMutableStyleData() to something else that fits in 
only 1 or 2 lines.  I did not want the Write() calls to appear because it would 
have made the task much more complex (ie. make sure Write() is called before 
every |return| or |break| etc...)

However callers are not required to use the helper classes.  The following 
snippet is perfectly valid:
  nsStyleBorder* border = mContext->ReadMutableStyleData(eStyleStruct_Border);
  border->SetBorderStyle(0,NS_STYLE_BORDER_STYLE_SOLID);
  nsresult result = mContext->WriteMutableStyleData(eStyleStruct_Border, border);


1.1) Linkage impact:
--------------------
None.


1.2) Duplication of the code:
-----------------------------
Yes, the code of the helper classes is duplicated in Content and Layout.  It 
doesn't take much space, it is fairly straightforward and it doesn't do much 
anyhow (for those who haven't looked at the diffs, the helper classes call 
Read()/Write() when the objects are created/destructed).

<gentle_settling_of_accounts>
Honestly it doesn't bother me as much (and far from it) as the big duplication of 
"real" code that you approved in nsIStyleContext.h, without my reviewing, in 
order to enable the split between Content and Layout: http://bonsai.mozilla.org/
cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsIStyleContext.h&root=/
cvsroot&subdir=mozilla/layout/base/public&command=DIFF_FRAMESET&rev1=3.88&rev2=
3.89
</gentle_settling_of_accounts>


5) Pure virtual functions in nsStyleStruct:
-------------------------------------------
I haven't made myself clear, sorry.  One of the aspects of the problem is related 
to the linkage (or absence of) between Content and Layout.  GetMutableStyleData() 
is called in a few places in Layout (in the CSSFrameConstructor mainly) so if 
nsStyleStruct had pure virtual functions, we would have to link the two 
libraries.


14) Member variable instead of pooling:
---------------------------------------
See the code snippet in #1.  The helper classes are not mandatory, not using them 
removes a little overhead.  Some folks may like the "Read/Modify/Write" paradigm 
better, after all it's a very common one in computing.  Besides we can't always 
force people to do the way we like: nsCOMPtrs have been there for ages and we can 
still see some NS_RELEASE() checked in everyday.

----
As an incidental note, I very much enjoy this particular code-review.  French 
people have a thing about arguing.  I wish all the big changes were under the 
same scrutiny (as much as I wish the small ones were given some slack, but that's 
another debate) so thank you, and scc and rbs, for your input.
pierre: my 2 cents, but they were gained over years at much greater expense:

Don't treat the Read/Write pattern as a good low-level device for the masses. 
Do consider making it private/protected/friend/whatever-restricts-it and
mandating the helper classes.  nsXPIDLString, nsCOMPtr, etc. and their lack show
the folly of letting low-level, relatively-error-prone tools loose in a large,
mixed-strength community such as Mozilla's.

What is the overhead of the helper classes?  It should be noise to zero, I hope.

/be
> <gentle_settling_of_accounts>

Touche!

> 5) Pure virtual functions in nsStyleStruct:
> ------------------------------------------
> I haven't made myself clear, sorry.  One of the aspects of the problem is
> related to the linkage (or absence of) between Content and Layout.
> GetMutableStyleData() is called in a few places in Layout (in the
> CSSFrameConstructor mainly) so if nsStyleStruct had pure virtual functions,
> we would have to link the two libraries.

I don't understand. Pure virtual methods should *eliminate* linkage 
dependencies, not *cause* them. Could you please elaborate?

> 14) Member variable instead of pooling:
> ---------------------------------------
> See the code snippet in #1.  The helper classes are not mandatory, not
> using them removes a little overhead.  Some folks may like the
> "Read/Modify/Write" paradigm better, after all it's a very common one
> in computing.  Besides we can't always force people to do the way we like:
> nsCOMPtrs have been there for ages and we can still see some NS_RELEASE()
> checked in everyday.

But my point is, your scheme isn't required to support the read/modify/write 
paradigm. I believe that I've now understood the code a bit better, and I 
think I see what you're trying to accomplish now. (I think I could even guess at 
the code geneology, and how you've arrived at the implementation you did.)

I think you should:

1. Change ReadStyleData() to copy the style data into a caller-provided
   struct. WriteStyleData() will write data back from the struct.

2. Your auto-classes should each have an appropriate struct as a member.
   Each will use its member to call [Read|Write]StyleData.

3. Any macho men that want to use [Read|Write]StyleData directly will just
   have to have their own style struct on the stack as a buffer.

This should simplify the back-end implementation of [Read|Write]StyleData. It 
also removes the artificial ``only four live requests'' at a time. It even makes 
it so that we could one day hope to have a thread-safe layout engine, etc. (Your 
current implementation is just one more stake in thread-safety's heart here!)

FWIW, I am enjoying this too. It's been a good learning experience. Also, I 
*don't* think I'm hen-pecking here: I claim that I *am* arguing about the big 
changes. You're fundamentally altering how clients will interact with the style 
system, and your implementation will have far-reaching consequences.
Chris Waterson wrote:

>1. Change ReadStyleData() to copy the style data into a caller-provided
>   struct. WriteStyleData() will write data back from the struct.

This is what the old methods on nsIMutableStyleContext

 NS_IMETHOD GetStyle(nsStyleStructID aSID, nsStyleStruct& aStruct) const; 
 NS_IMETHOD SetStyle(nsStyleStructID aSID, const nsStyleStruct& aStruct);

did, and Peter Linss put those in to facilitate migration from GetMutableStyle
calls to a better encapsulation that does not expose the internal data
structures of the style context.
I think you're missing my point. I want ReadStyleData() to copy the style data 
into a *caller-provided buffer*. (I think the caller can do a better job 
managing the memory that ReadStyleData() can; specifically, the caller can do it 
in a re-entrant way.)
Chris, GetStyle(...) does exactly that, which is why I brought it up. The caller
creates their style struct and passes it to GetStyle, then mucks with it, and
then calls SetStyle passing it back in. Does that make sense, or am I missing
something?

example usage -
 nsStyleDisplay disp;
 GetStyle(eStyleStruct_Display, disp);
 disp.mDisplay = NS_STYLE_DISPLAY_NONE;
 SetStyle(eStyleStruct_Display, disp);

Am I being dense? What is the point of the style struct pool then? I've got the 
diffs applied in a tree. I'll spend some more time looking at them.
Brendan: sure, the overhead isn't much higher than, let's say, the one we have in 
switch() statements vs jump tables, so no problem: we can enforce the use of the 
helper classes.

Marc & Chris: There is an important point that you may have missed and you will 
see how the issues of interfaces and pure virtual methods are intertwined...

In the current implementation, the nsStyleXXX structures don't have any real base 
class other than the empty nsStyleStruct which is basically a void*.  This leads 
to some aberrations like big duplications of code or useless switch() statements.  
Compare the current implementation of StyleContextImpl::CalcStyleDifference():
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsStyleContext.cpp#3658
with what I propose.  Same thing for GetStyle(), GetStyleData() etc... or even 
RemapStyle().

In the current code, none of the 5 basic methods of the nsStyleXXX structures 
(ResetFrom, CopyTo, SetFrom, CalcDifference and ComputeCRC32) is inherited.  They 
could very well have completely different names and different parameters from a 
nsStyleXXX struct to another.  Yucky, isn't it?  It was really important in my 
opinion to clean that up in order to improve the legibility of the current code 
and ease:
- The implementation of the StyleData sharing.
- The support for more data structures in the style context, like the one added 
recently for XUL.  We may have some more for CSS2/3 modules that we don't support 
yet (aural, webfonts, ruby and whatnot) or external modules like SVG.
- Implement the memory model we want in the future.
The cleanup allowed to declare the nsStyleContextData as a union of individual 
data structures (StyleXXXImpl*) and an array of generic data structures 
(nsStyleStruct*).

So... I made these 5 functions (later reduced to 4, having removed SetFrom) part 
of the base class nsStyleStruct and declared them pure virtual.

The problem is that:
1) In public interfaces, nsStyleStruct is the base class for the structures that 
are manipulated by the customers (nsIStyleContext.h declares nsStyleXXX 
structures which all inherit from nsStyleStruct).
2) Some customers in other DLLs (read "Layout" but it could be other ones) 
declare some of these structures on the stack, for instance to call GetStyle() 
and fill up the structure they are passing by reference.
3) None of these public interfaces implements these 5 functions.  The only place 
where the functions are implemented is in the StyleXXXImpl classes inside 
nsStyleContext.cpp (StyleFontImpl, StyleColorImpl, etc...)
4) Result: You get link errors in these other DLLs (er, "Layout") because you 
construct objects with pure virtual methods that are not implemented anywhere.

A "solution" would have been to somehow declare the nsStyleStruct as an XPCOM 
object and provide constructors in the StyleXXXImpl, or something along that 
line...  Another "solution" could have been to link back Content and Layout...  
Nothing we wanted anyhow because another requirement was that the customers 
should be able to continue to access directly the member variables of the 
nsStyleXXX objects which are defined the public interfaces.

If you have another idea, I would be interested to look it but I would prefer if 
you played with it a little bit in the comfort of your own tree.  I think I 
already spent enough time on it and I still believe that my solution is close to 
the best you can get.  C'mon, I double-dare you :-)

...That was about the pure virtual methods.  Now about the APIs...

I maintain (you will be interested!) that SetStyle() is evil, as well as any API 
where we would have to accept a structure allocated by the customer with the 
purpose of modifying the style context.  This is for almost the same reason as 
above: the implementations are done in the StyleXXXImpl, which are a certain way 
private to the StyleContext, and when writing into the StyleContext we need to 
receive back objects that we have created in order to be able to talk to them 
(ie. objects that implement our code).  The principle behind it is fairly clear.  
From the StyleContext standpoint, we publish interfaces to these objects but we 
must remain the entity that creates them.  It's nothing new: COM objects work the 
same.

Basically, my points are:
1) We need a base class with the 5, or 4, basic methods for the reasons listed 
above.
2) Because of (1), the base class nsStyleStruct must come with its own dummy 
implementation of these methods so that (a) other DLLs can link correctly and (b) 
the virtual tables are identical everywhere and customers can access the member 
variables directly inside the nsStyleXXX structures.
3) Because of (1), we generally cannot accept objects that we don't create.  The 
only exception (I think, but there is maybe another one) is GetStyle() because we 
fill-up the structure passed by the customer by doing "ourOwnStruct->CopyTo(&
aCustomerStruct)".
Damn, it looks like the diffs for nsCSSStyleRule.cpp are missing.
To be honest, if you are looking for better solutions for the declaration of 
nsStyleStruct, the closest thing I found involved a big reshuffling of all the 
StyleXXXImpl and quite a bit of modifications in the StyleContext too:
- Leave nsStyleStruct the way it is.
- Use the nsStyleXXX structures for the only purpose of passing data back and 
forth with the customers, meaning that when we receive nsStyleXXX, we have to 
copy it into a StyleXXXImpl before we do anything (perf?).
- Make the StyleXXXImpl structures not inherit from nsStyleStruct but from 
another base class that implements these 4 functions.
- Declare a nsStyleXXX as a member variable in its corresponding StyleXXXImpl.

It sounds simple like that but I think there was a problem (besides the 
reshuffling) and I can't remember what.
With the patches applied, looking at a tree, it's generally much easier to
understand. :-)

> I maintain (you will be interested!) that SetStyle() is evil, as well as
> any API where we would have to accept a structure allocated by the
> customer with the purpose of modifying the style context. This is for
> almost the same reason as above: the implementations are done in the
> StyleXXXImpl, which are a certain way private to the StyleContext, and
> when writing into the StyleContext we need to receive back objects that
> we have created in order to be able to talk to them (ie. objects that
> implement our code).

I find this argument to be pretty weak. I think the *real* issue is that this
has been designed around trying to use C++ vtable dispatch in the back-end
implementation, and that's lead you down this garden path.

The style structs are the interface by which your consumers communicate with you
about style data. Even if you did, say, need to compute a whole bunch of extra
information for your own private use (which you don't), I still don't think the
argument holds water.

You don't need to *create* the style struct to be able to ``talk'' to it
efficiently. You need to know its style struct ID. Think of *that* as your
vtable. If you find the pervasive `switch' logic too burdensome (I would!), then
roll your own dispatch! Places where you've used C++'s build-in virtual dispatch
should simply use the style struct ID to do hand-rolled dispatch: use the style
struct IDs indexes into a table of function pointers, and a *lot* of the ugly
switch statement code will go away.

I don't think that this would really entail much change to what you've already
got. But, it would

1. Get rid of the need for you to have the nsStyleStruct implement any
   virtual methods. (This is unsighly anyway: it's propagated implementation
   detail out to a place where nobody should know or care about it.)

2. Get rid of the ad hoc style struct pool, which is another nail in the
   layout re-entrancy coffin.

3. Still allow you to achieve switch-less nirvana.

4. Still allow you to use your own private storage, if the day comes where
   style maintains ``extra'' computed information that's not visible to
   consumers of the API.

What do you think?
You are correct about the origin of the problem, I wanted to keep the C++ 
dispatch.  And I still do: I'll give a second look at what I described above 
because when I tried it, I had in fact the StyleXXXImpl inheriting both from a 
base class that declared the pure virtual methods and from the 'empty' 
nsStyleStruct.

If we cut the inheritance between nsStyleStruct and StyleXXXImpl, we are left 
with quite a bit of reorg in the StyleXXXImpl, but I prefer to put the burden on 
these objects rather than on the rest of the code.  The benefits would be the 
same, with in addition the C++ dispatch and inheritance, and less changes in 
StyleContext and StyleContextData.

Same question: what do you think?

And what do you mean by "Still allow you to use your own private storage..."?  
How can I have private storage if I receive a structure that I did not create?
Never mind... I guess I can put a "void* mPrivate" in nsStyleStruct and store 
whatever I want in it during ReadMutableStyleData().
> You are correct about the origin of the problem, I wanted to keep the C++ 
> dispatch.  And I still do:

Why? What is it buying you? What I'm proposing is the same design you've got
now, with a slightly different mechanism. AFAICT, C++ is just getting in the way
of your design here.

I could understand if you were doing virtual dispatch all over the place (and
C++'s syntactic sugar was worth it), but you have:

  - four places where you call the virtualized CopyTo()
  - four places where you call ResetFrom()
  - three places where you call CalcDifference()
  - two places where you call ComputeCRC32()

So the syntactic brain-print of doing your own dispatch is marginal.

And it's all *internal* to the style context's implementation: you're not
forcing this mechanism on your users.

As you note, a top-to-bottom C++ design where the nsStyleStruct is an abstract
base class won't work here because of linkage issues, even though it might have
been the most natural. But the alternative that you suggest seems like the worst
of both worlds: you'll still have plenty of places where you need the big ugly
switch statements, and you're only taking advantage of the C++ syntactic sugar
in a dozen places. (Well, a baker's dozen.)

> And what do you mean by "Still allow you to use your own private storage..."?  
The structures used to *communicate* style information need not have anything to
do with the structures that you use to *store* that style information in the
style context.

>As you note, a top-to-bottom C++ design where the nsStyleStruct is an abstract
>base class won't work here because of linkage issues,

I haven't made myself clear maybe: nsStyleStruct and the nsStyleXXX structures 
would not change compared to what is currently in the tree.  Only the 
StyleXXXImpl would change and since they are internal, I can do pretty much what 
I want with them, including not inherit from nsStyleStruct at all, and inherit 
from a base class with pure virtual methods.  It's not the worst of both worlds 
at all: of course, we won't have the ugly switch statements back.  The solution I 
propose is identical in all points to yours in its interfaces with the outside 
but in addition, it keeps the benefit of using true C++ objects in the inside 
(and we are going to need it for the next round of optimizations).

As for the private storage, I guess you meant it could be handled through the 
helper classes.  If not, please stop playing games and tell me what's on your 
mind (or tell me you want to play and I'll gladly play too :-).
> The solution I propose is identical in all points to yours in its
> interfaces with the outside but in addition, it keeps the benefit
> of using true C++ objects in the inside (and we are going to need
> it for the next round of optimizations).

Okay, I guess I'm most concerned about the external interface, so I'd be 
willing to cave on this. I still think that the mechanism that I propose 
would be simpler and cleaner. But, I don't have to do the typing. :-)

So we're clear, could you elaborate as to what the benefit of using ``true C++ 
objects'' are here? How will it affect your future optimizations?

> As for the private storage, I guess you meant it could be handled through the 
> helper classes.  If not, please stop playing games and tell me what's on your 
> mind (or tell me you want to play and I'll gladly play too :-).

My point was that the storage need not have anything to do with the interface 
used to communicate the style data. Since I'm not sure *why* you'd want to do 
this, my example will be rather bizarre, extreme, and contrived but...here goes.

Let's say, for example, that instead of storing the font information like we do 
now, we wanted (for unknown some reason) to store a handle to the resolved font 
on the operating system. A referent to the following would be what would be 
stored in the style context:

  struct nsStyleFontPrivate {
    nsIFont* mFont;
  };

To get information *into* this object, your ``write'' dispatch for 
eStyleStruct_Font would examine the nsStyleFont struct, use GFX to find an 
appropriate font, and stuff it into an nsStyleFontPrivate structure.

Conversely, to get information *out of* this object, your ``read'' dispatch 
would use the nsIFont object to fill in the nsStyleFont structure.

Like I said, contrived. But the point is, the storage need not have anything to 
do with the structures used to communicate about the storage.
About C++:
I sent you and other people earlier this week a description of a memory model 
that could bring huge gains in terms of memory footprint.  It consists in storing 
the style data (nsStyleXXX) in a library composed of hash tables, with one hash 
table per type of style data.  We'd have one for all the fonts used in the 
document, one for all the colors, etc...  To implement it, we would need to add 
some methods and member variables to a base class that all the StyleXXXImpl 
inherit from.  For instance, to store the CRC locally, or refcount these objects, 
or maybe add some debug functions on the way...  Overall, I don't know if a 
system like this one would be viable in terms of performance but if it's not, 
we'll find something else which may or may not benefit from having a base class 
for StyleXXXImpl.  I just want to keep the possibility open.

About the storage:
Understood.  We were not talking about the same thing.  The kind of private 
storage that I had in mind was a tag that we would set in ReadMutable() and 
retrieve in WriteMutable().  It could be very useful for some optimizations.  For 
instance, we could store a direct pointer to the data inside the library, or the 
CRC of the previous data.  With the helper classes, we can pass something like 
that between Read() and Write().  It would be a strong argument in favor of 
Brendan's suggestion for not exposing the 2 methods but just the helper classes.
Ok, so I'll look forward to seeing what you come up with. (I have to admit I was
a little bit befuddled by those emails that were flying around; I'll probably be
able to understand them better now.)

> It would be a strong argument in favor of Brendan's suggestion for not
> exposing the 2 methods but just the helper classes.

I'm all for it.

scc/waterson, have you had any chance to look at the template magic that was
alluded to earlier? It looks something elegant might be possible in this area.
I hope the code will be easier to understand than some of my comments. The diffs 
are following. As a summary:
- there is no pooling anymore (the helper classes contain the style structure 
that is used for Read/Write)
- the APIs are easier to use (1 line for each call to GetMutableStyleData)
- private methods are no longer exposed (nsStyleStruct is an empty struct like 
before).

All the other points are fixed too, except #5 about the templates: we'll see if 
scc makes a proposal.  My concern is the performance.  According to some 
measurements, it should be less than 2% slower than the current code and maybe 2 
or 3% faster.  It's difficult to tell because it seems that something happened in 
the tree lately that causes larger fluctuations in the results I'm getting.

I did several implementations because of the performance:
- In the first one, the style data (nsStyleXXX) were embedded as a member 
variable inside the StyleBlobs (formerly StyleXXXImpl).  It was too slow (between 
5 and 10% slower).
- In the second one, the StyleBlobs inherited from both the style data (as they 
do now) and from a base class. It was still slightly too slow.
- In the third one, just as a test, I put back in the pooling of style structs 
that are used for Read/Write.  It was slightly faster than the current 
implementation.
- In the last implementation (diffs attached), I took back the second 
implementation and streamlined the helper classes to win some cycles.  It seems 
to have worked a bit.  My guess is that it is slightly slower than the current 
builds but depending on the platform it may very well be slightly faster too.

Chris & Jrgm: if you do any kind of performance evaluation, please don't use a 
daily build to compare with a fresh pull over which you apply my changes.  Use 
the same pull for both (ie. pull, build, take measurements, apply the diffs over 
the same build, take more measurements).

Chris: If you think that performance matters more than trying to keep Layout 
thread-safe (or at least the style resolution part of Layout), I can attach some 
diffs with the pooling for Read/Write.  The rest would not change, it's hidden by 
the helper classes.
I'll do test builds again today with the latest patch.
Just an idea. Might this be a more elegant way to declare the helper classes?
Thanks for your speedy turnaround on this stuff, pierre! I think the
helper classes are much more natural now.

- I'm surprised that you see a speedup with your pooled contexts. What
  is happening differently there? Seems like the only difference would
  be whether we were copying to/from the stack versus copying to/from
  your pool.

  Could you post a patch that illustrates how the pooled
  implementation works, too? (If you still have it around.)

- I'm a bit fuzzy on what the difference is between nsIStyleContext
  and nsIMutableStyleContext. Is the distinction not important once
  these changes have been made? If so, should we log a task to remove
  one or the other?

- I just noticed that there's another re-entrancy problem in
  RemapStyle().  Sorry that I didn't notice that before :-(. Rather
  than making the temporary remap data in the static global area, how
  about making the PresContext own it? Anyway, I'd be willing to let
  this slide if you'll commit to fixing it eventually.

- I still think the tail is wagging you with respect to your dogged
  insistence on using C++ virtual dispatch. That said, I'm flattered
  that you were able to try so many alternative implementations over
  the weekend and were still able to eschew my proposal. :-)

  As an example of this, note that it's unfortunate that you need both
  SetFrom() and CopyTo() methods on your nsStyleBlob class. In each
  place where you call these routines, you have the style struct ID
  available to you! The code blaot is minor, but you were doing this
  to avoid maintentance hassle, remember?

  Remind me again why C++ is going to make things so great here
  someday?

- There's still some use of C++ |bool| in here that may bite you on
  some of the ports.

- Consider use of a C++ template to define your helper classes; see
  above patch.
1) Speedup with pooled strutures:
I can't really explain it.  Most of the constructors/destructors are null.  The 
difference was about 10ms over 800ms to load Yahoo, and 30ms over 1300ms to load 
CNN.  I can send you the diffs but it would take a little bit of time to merge 
them again.  If you just want an illustration of how it works, you can look at 
the previous diffs.  Otherwise would you seriously consider this solution or did 
you ask out of curiosity?  We could put these structures in the PresContext like 
the rest of the permanent data.  I was planning to keep it as a backup solution 
if jrgm finds some performance problems.

2) Difference between nsIStyleContext and nsIMutableStyleContext:
The goal, I guess, was to limit the scope of what people can do during style 
resolution.  I don't think it would be a good idea to change it but you can file 
a bug to remove nsIMutableStyleContext if you like.

3) Pool in RemapStyle():
Sure, it can be moved to the PresContext. I'll have other things to store there 
too.  Besides it will allow us to break WrapRemapStyle() in two as it should be.

4) C++:
I worked with jump tables long ago, before C++.  Jump tables can be really nifty, 
they ensure a very robust code that can easily sustain the burden of the years.  
The best example is probably jwz's mime engine, one of the only pieces of code 
that's survived since 3.0.  I gave you twice a description of the evolution that 
I would like to see for the memory model in the style context, but really it 
doesn't matter.  Whatever we do and whenever we do it, I'd like to keep it open 
and flexible so I'll stick with C++.  Unless you veto it, of course.  As always, 
it's all a question of what we want to see in the tree and when we want to see it 
but nobody else issued an opinion so I'll let you decide.  Thanks for 
volunteering, by the way.

5) bools:
They only ones left are in a debug code that I did not maintain.  I did not 
decide yet whether I'll update it or remove it.  Not part of the build, as they 
say.

6) Templates:
Really neat, thanks!  I thought only scc could speak template...
> The goal, I guess, was to limit the scope of what people can do
> during style resolution.

I.e., during style resolution you get the ``immutable''
nsIStyleContext?

> I don't think it would be a good idea to change it but you can file
> a bug to remove nsIMutableStyleContext if you like.

Not sure what this means. It seems to me that either:

1. nsIStyleContext and nsIMutableStyleContext are redundant: all style
   contexts are immutable, all the time. In which case let's do away
   with one of extra interfaces.

2. nsIStyleContext indicates an immutable style context, and therefore
   should not have a GetStyleData() member that allows mutation. In
   which case, the helper classes are conspiring with a bunch of
   naughty callers to write to supposedly ``immutable'' contexts.

Given your comments, it looks like we sort of tried to pay lip-service
to the latter (e.g., nsIStyleContext returns a |const nsStyleData*|),
but we then turn around and write to the ``immutable'' contexts anyway!
(The helper classes all toss the |const| and write data back
regardless of whether the original context was mutable or not).

I'm just trying to understand what the intent of the API is here. I'm
not asking you to fix it right now; rather, I'm asking you to sketch
out a roadmap of where things ought be, and asking you to commit to
Making It Right Some Fine Day (i.e., file a bug).

> Sure, it can be moved to the PresContext. I'll have other things to
> store there too.  Besides it will allow us to break WrapRemapStyle()
> in two as it should be.

Ok, great. Again, bug filed please.

> I worked with jump tables long ago, before C++.  Jump tables can be
> really nifty, they ensure a very robust code that can easily sustain
> the burden of the years.  The best example is probably jwz's mime
> engine, one of the only pieces of code that's survived since 3.0.  I
> gave you twice a description of the evolution that I would like to
> see for the memory model in the style context, but really it doesn't
> matter.  Whatever we do and whenever we do it, I'd like to keep it
> open and flexible so I'll stick with C++. 

This either is a non sequitor, or an extremely sarcastic remark. I'm
guessing the latter. :-)

I'll happily sr= the code as it stands (I can't _force_ you to do
anything, but at least the impact is constrained). To be frank, I
jump tables would provide a more elegant solution here. As it stands,

1. The C++ vtable is redundant with the style struct ID.

2. You need a translation layer to switch back and forth between the
   SIDs and the vtables anyway.

3. There are only a handful of places where the C++ syntactic sugar is
   even used.

4. You've ended up with redundant code (CopyTo, SetFrom).

But hey, it's your name on the cvs blame! :-)

> They only ones left are in a debug code that I did not maintain.  I
> did not decide yet whether I'll update it or remove it.  Not part of
> the build, as they say.

Ok.

> Really neat, thanks!  I thought only scc could speak template...

I'm an amateur. If you're seriously considering them, let's let scc
look at it first.

So, a couple rhetorical questions left:

1. nsIStyleContext vs. nsIMutableStyleContext. Need roadmap and a bug,
   or a rebuttal as to why it's good as it is.

2. Need a bug for re-entrancy issue in WrapRemapStyle().

3. Need blessing from scc if you decide to use templates.

sr=waterson. Thanks for your patience.
I ran with the builds that waterson passed on to me, for linux and win2k. 
I ran (1) win2k/500Mhz/128MB with normal settings, looking at 'already cached'
load times, (2) win2k/500MHz/128MB with disk cache set to 0KB, forcing a 
network request for all elements of a page, (3) on a slower linux system 
(266MHz/128MB) looking at 'already cached' load times, and (4) on a 800Mhz 
linux system (results not shown but they're the same story). 

I all cases, the code with pierre's changes measured in as "something like"
2% slower than the baseline build. But the reason I say "someting like" is 
that I don't think this page loading test is precise to that level. This is 
a wash (with a small hint of small difference if tested on a very quiet network
under very careful initial conditions ;-).
I tried running jrgm's tests (default settings) on an Extremely Crappy Machine
(133MHz PC with 48MB of RAM running Win98). Here's what I found:

              AVG   MED   MAX   MIN  TIMES 
before       7442  7206 19838  1363  8568  7061  6912  7009  7037
after        7805  7591 20357  1613  9005  7457  7362  7436  7351
delta        -363  -385  -519  -250  -437  -396  -450  -427  -314
%improvement -4.8  -5.3  -2.6 -20.9  -5.1  -5.6  -6.5  -6.1  -4.5

This was run from inside Mozilla proper; I'll take another set of measurements
in the mfcEmbed shell. Win32 and Linux Builds from the tree this morning are
available (internally only, sorry) at ftp://btek.mcom.com/pub/.

pierre, could you produce another set of patches that use the global contexts
instead of the stack-based ones? If there really is a delta, I'll roll up my
sleeves and profile.
Actually, before anyone goes off hacking more code, maybe we should get a 2nd
opinion. Could somebody else confirm or refute my results on a low-end machine?
I'd say: low-end, yes, but with a little bit more memory.  I think that if the 
difference is more important than on the other machines (the slowdown is 5 or 6% 
instead of 2%), it's because we allocate around 3 times as many blocks as we used 
to (the blocks are way smaller) and we hit much more often I don't know what 
Win32 function that cleans up the memory.  I'm no Windows specialist so I don't 
know if it makes sense...
Still, I would be curious to see if what I noticed with globals vs stack can be 
confirmed on other platforms.  I'll make diffs but not until late.
Regarding the [im]mutable question, let me chime in to point to a comment of
Peter Linss that I once saw in bug 1230. Basically, he was saying that there 
were two pathways to the style structs, and only some blessed objects were
allowed to use the mutable pathway. On the same topic, in a reply to an e-mail
message where I asked if MathML frames could use the GetMutable API to alter
their resolved style data [I asked this because I was hoping that the MathML
frames could change their style contexts so as to reflect MathML attributes that
are not supported by the Style System], he replied something along the lines:
"Nope, unless you are a style rule". And he said that he was planning to remove
that method (that's why it was deprecated BTW), and via this process to fix the
bad callers (I guess by this he meant calls that are modifying the style context
outside the style resolution -- these can be seen on pierre's migration patch:
the mutations at the frame-level, e.g., [nsGfxTextControl,Table,nsCSS]Frame).
Pierre might better explain the Right Thing that should to be done for these
random mutations: it is a costly style re-resolution in the associated sub-trees
to bring the style tree back in sync. I guess that maybe why Peter Linss wanted
to avoid the profileration of this practice, and to only let the possibility of
straight mutations to the blessed objects during the controlled style resolution
process. Hope these comments provide some insights into waterson's point 1 and
to the genesis of the problem.
Also, unless there are some trickery that I am not aware, these mutations
that a frame makes outside the style resolution are lost when a re-resolve is
fired from one of its ancestor frames.
I kicked off a test run(s) with a Not Quite As **** 64MB/166MHz Windows ME
system, and I'll update when I have that info.
Correct on all points.  If we want to simplify the API, we should leave 
nsIMutableStyleContext the way it is and remove Read/WriteMutableStyle from 
nsIStyleContext. We have been bitten with Marc with various problems coming from 
gfx scrollframes that hack into the style context.  If you look at the diffs, you 
will notice that both StyleContext sharing and StyleData sharing are disabled for 
the contexts that correspond to a small list of scrolling elements..

Chris, I'm going to attach a partial diffs that uses globals in Read/Write.  I 
had a full backup so it should compile ok.  The files included in the diffs are:
    nsStyleContext.cpp
    nsIMutableStyleContext.h
    nsIStyleContext.h
    nsIStyleSet.h
Okay, so that system was not (is not?) well. I had to clean a bunch of stuff 
up to get it to run sanely at all (e.g., a virus program that meant that 
deleting an old profile took >5min). I'm still suspicious of the state of 
that machine, so I don't think these numbers mean a heck of a lot (e.g., it
is nominally a faster machine with more memory than what waterson tested, but 
the times are slower than what he measured). But, here they are anyways (I 
only did 3 cycles; first is uncached, 2 and 3 are cached):

	avg	med	max	min	#1	#2	#3
before	9674	9694	25181	1694	9864	9283	9878
after	9842	9772	26890	1691	9851	9871	9805
	-168	-78	-1709	3	13	-588	73
	-1.7%	-0.8%	-6.8%	0.2%	0.1%	-6.3%	0.7%

I'll check in tonight if nobody objects.

Chris, will you have enough time to get some performance data about using globals 
instead of stack variables?
Pierre, drivers@mozilla.org already decided to keep this change out of 0.9, so
we don't have too many confounding variables in trying to unregress performance
(we have regressions already, as well as stability bugs and other things to
fix).  waterson sees a 5-6% slowdown on a low-end machine running jrgm's tests,
but his results need to be confirmed.  I understand he's gonna help sort out the
performance effects with you.  Until they're well in hand, and especially right
now, please do not check this patch into the trunk.

/be
I would like to checkin the change of APIs: that's all the files in the diffs, 
except for nsStyleContext.cpp and part of .h files.  The reason is that I'm 
leaving on sabbatical in a little bit more than 3 weeks and I expect not to have 
the time to work on performance or memory footprint again before mid-july.  I 
have to take care of the bug list that really got out of hands in the last 
several weeks.

The change of API that I would like to checkin should not have any effect on 
performance whatsoever.  The nsMutableXXX helper class will be reduced to a 
simple constructor that calls the current GetMutableStyleData().

Please advise.
Note also that the platform that Waterson tested on does not match the 
requirements for Netscape 6 (64Mb)
http://home.netscape.com/browsers/6/datasheet/index.html?cp=n6i#platforms

Jrgm's tests on a 64Mb machine shows an average slowdown of 1.7% and a medium 
slowdown of 0.8%.
Jrgm's tests also show a speedup of 0.1% on first load on a machine and what 
appears like a negligible slowdown on another one (no numbers for this one, but a 
graph at http://bugzilla.mozilla.org/showattachment.cgi?attach_id=31092)
Ooops, I'm leaving on sabbatical in a bit more than 2 weeks, not 3.  Yeehaaa!
drivers@mozilla.org don't see the need to get the API changes in for 0.9 -- the
patches here are backed up, as well as on your machine.

It's very hard right now to get any review and testing cycles, because everyone
is rushing for the 0.9 deadline.  We'll have plenty of confounding and causative
variables to untangle tomorrow, and for a week (until 0.9 releases).  Do we know
exactly what the performance effects of the API-only patch is?  Unless you get a
windows build and jrgm tests going fast, I don't know those effects.  They
should be zero, but Murphy was an optimist.

We don't need any more risk here, even if it is tiny -- we don't even have the
"brainprint" to take these changes into the mix.  Sorry, and we will get them in
after 0.9, with performance improvements too, I hope -- possibly even next week
if you, waterson, et al. can pin down where the cycles are going and optimize
them away.

/be
Sorry, I can't commit myself to work a week on this or even 3 days, I think.  I 
have more than a hundred bugs to triage before I leave (out of 170+), and more 
than a couple of them would be nice to fix for 0.9.1 or 1.0.  I'm not sure I'll 
be back before the tree closes for 1.0.

Checking in the API would allow someone else to work on the style system memory 
model much more easily, as only 3 files would be different.  The change of API 
would consist in changing the calls from:

  nsStyleXXX myStyle = myContext->GetMutableStyleData(eStyleStruct_XXX);

to:

  nsMutableStyleXXX myStyle(myContext);

where the template for nsMutableStyleXXX would be:

    template <class T, nsStyleStructID SID>
    class basic_nsAutoMutableStyle
    {
    protected:
      T      mStyleStruct;

    public:
      basic_nsAutoMutableStyle(nsIMutableStyleContext* aContext) {
        aContext->GetMutableStyleData(SID, &mStyleStruct);
      }

      basic_nsAutoMutableStyle(nsIStyleContext* aContext) {
        aContext->GetMutableStyleData(SID, &mStyleStruct);
      }

      T* get() { return &mStyleStruct; }
      T* operator->() { return get(); }
      T& operator*() { return *get(); }
     };

As you can see, it's pretty much empty.  A point to consider is that whatever the 
risk of checking in this empty code might be, it would better be managed if I 
were still around for a couple of weeks to cover any problems that may arise.  
After all, if risk there is, it would make even less sense to finally proceed to 
do the checkin the day before I leave.  I would check it in now without 
hesitation but it's up to drivers@mozilla.org to balance the risk involved by any 
code, even empty, and the benefit to the Community of making it possible for its 
members to work on something important.

Anyhow, API or not, now or later, a full resolution is in the Future for me.  
Marking as such.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9 → Future
Of course my point was that this mostly empty code doesn't need any performance 
testing and to answer your concerns, it was reviewed (and even contributed to) by 
waterson.
Based on the current API, an even emptier code would be:

    template <class T, nsStyleStructID SID>
    class basic_nsAutoMutableStyle
    {
    protected:
      T*      mStyleStruct;

    public:
      basic_nsAutoMutableStyle(nsIMutableStyleContext* aContext) {
        mStyleStruct = aContext->GetMutableStyleData(SID);
      }

      basic_nsAutoMutableStyle(nsIStyleContext* aContext) {
        mStyleStruct = aContext->GetMutableStyleData(SID);
      }

      T* get() { return mStyleStruct; }
      T* operator->() { return get(); }
      T& operator*() { return *get(); }
     };
pierre, I understand that you're going on sabbatical in short order, but I don't
think that means we need to rush this checkin. The patches aren't going anywhere
(they're in the bug, as well as two of my trees). The tree is going to be locked
down so there won't be much bit-rot with respect to the new APIs.

I think this should be a no-brainer to land as soon as the tree opens for
mozilla-0.9.1 (Friday? Monday?). By then we'll have sorted out this performance
weirdness. (I'm not leaving you high and dry on that; just need to knock out my
own 0.9 bugs.)
On the Very Crappy Machine (133MHz Win98, but with more memory this time, 
160MB), results with mfcEmbed show little difference in page load time with 
jrgm's test; in fact, it looks like style sharing has a slight edge:

         AVG   MED   MAX   MIN  ---------- TIMES -----------
before  3653  3486 10512   643  4602  3369  3449  3425  3420
after   3520  3456 10538   644  3765  3471  3429  3460  3475
delta    133    30   -26    -1
%improv  3.6   0.8  -0.2  -0.1

I'll next try to repeat the seamonkey results to see if there's something 
happening with chrome that's introducing the 5% penalty that my previous run 
indicated.
It confirms what I have been seeing.  The fluctuations in performance seem to 
have increased in the past 2 weeks, and the difference between the before/after 
results are beyond the margin of error.  So why not apply the old mantra and keep 
recounting until we get the numbers we want?
I'm all for that!
Any follow-up on this? Is it in a branch somewhere?

[Reading the code, it looks like there could be room for futher tweaks to make 
it more space-time efficient in subsequent iterations.]
Nothing changed.  According to jrgm, the code I'm proposing is on average 1.7% 
slower than the original code.  It means that a page that takes 2 seconds to load 
would take 2.034 seconds.  That's 3.4 hundredth of a second more and this is 
totally unacceptable to drivers@mozilla.org.

I guess my vision of performance and memory improvement in software engineering 
was a bit outdated.  I always thought that to improve memory a lot, performance 
had to be compromised a little bit and vice versa, to improve performance a lot, 
one had to use a little bit more memory.  As a result by doing successive steps 
(- a lot of memory + a bit of cpu - a lot of cpu + a bit of memory) we still had 
in the end a lot less memory a lot less cpu.

Mozilla is different.  We don't compromise.

With my best smileys ;-) of course... :-)
Blocks: 47159
Would that be possible to have a branch with the latest? Continuous large diffs
are scary. With a branch, anyone could pull, and try, and perhaps add or refine.
And QA could easily pull and try from there too if some progress is made.
My first preference is that drivers@mozilla.org treat the bug fairly and allow 
the code to go in.  We can even use APIs with globals instead of stack variables 
and it will reduce the 1.7% slowdown a bit, or maybe even go faster than the 
current code as I measured on the Mac.

If it is not possible, my second preference is to checkin the change of APIs so 
that the entire changes would be limited to just a few files (nsStyleContext.cpp/
.h, nsIMutableStyleContext.h and maybe nsIStyleSet.h).

If it is not possible either, a branch is nice but it would rate fairly low on my 
list of priorities for moz0.9.1.  Anybody can do it anyhow.  I would be able to 
help, though, for instance to update the diffs.  Otherwise I'm leaving this 
Friday, I'll be back July 9th, and as Chris wrote: the patches aren't going 
anywhere.

Note that if I checkin this thing, the support for the aural stylesheets will 
move on top of my list.  Accessibility issues have a high priority in 6.5.
pierre, I don't think anyone is averse to this going in for mozilla-0.9.1, but 
maybe you are privvy to some email from drivers@ that I am not.
Not really: the only things I heard from drivers were the conclusions copied on 
this bug report.  What you are saying seems like good news.  If a checkin is ok, 
I can get in any solution: StyleData sharing with or without StyleContext 
sharing, APIs with or without variables on the stack.  As far as I can tell, with 
StyleData sharing only and with global variables, it uses less memory and it goes 
a tad faster than the current code base.
Pierre, I commented for drivers on 2001-04-17.  At 19:33, I wrote:

"...we will get them in after 0.9, with performance improvements too, I hope --
possibly even next week if you, waterson, et al. can pin down where the cycles
are going and optimize them away."

This by way of asking you to hold off until after 0.9 had branched, with an
apology for the request that you wait.

Previously that day, I had commented:

"Pierre, drivers@mozilla.org already decided to keep this change out of 0.9, so
we don't have too many confounding variables in trying to unregress performance
(we have regressions already, as well as stability bugs and other things to
fix).  waterson sees a 5-6% slowdown on a low-end machine running jrgm's tests,
but his results need to be confirmed.  I understand he's gonna help sort out the
performance effects with you.  Until they're well in hand, and especially right
now, please do not check this patch into the trunk."

It sounds like you guys have sorted things out, and that the performance effects
are well in hand.  Sorry to leave any doubt in your mind about checking into the
trunk, but really, I don't think you can say that drivers was clearly against
you checking in a 1.7% performance hit, since you didn't know that was the size
of the hit when I wrote my comments.  Please be fair here, and by all means
check in.

/be
Time to weigh in here.  I am of the opinion that all of the efforts taken so 
far to reduce the style system's footprint, although admirable, are moving in 
the wrong direction.  

None of the patches that have been produced have addressed the fundamental 
problem of the style system, namely that it eagerly resolves all possible 
values for every style context. 

The sharing strategies employed thus far continue to work under a fundamentally 
broken assumption: that all style data must be computed and only after a match 
is found can data be thrown away.  This patch adds even more complicated 
sharing logic, and although it achieves a runtime footprint reduction, it does 
nothing to reduce the obscene amount of churn that the style system generates 
as it resolves all of these property values.

I have been working on a proof of concept for a new system of rule matching 
that uses lazy resolution and that uses other tricks to reduce both footprint 
and performance costs.  I've gotten far enough with the code that I'm confident 
it will be not only a large performance win, but that it will also be a large 
footprint win.

I would like to work on developing this new rule-matching strategy for real, 
but the problem I have now is that I'm working off the current trunk, and have 
touched the same files (in particular, I've rewritten style contexts and the 
way rules are matched).  

If this patch goes in, I'll be in a world of hurt from a merge perspective.  
Given that I'd like to continue work on this new system, I'd prefer it if this 
patch did not go into the tree.  I don't think it moves in the right direction, 
and I don't think it addresses the fundamental performance problems that plague 
the style system.
Actually, if you want to check it in, I won't hold it up.  I'll just deal. :)
If it is a bigger performance and footprint win, as well as more simple and 
correct in logic and is within a timeframe of landing in, oh, say the next 2-3 
weeks without getting pushed back and back and back further for testing, then I 
say wait for it.  Perhaps Pierre should take a look at your work Doug and see 
if it warrants scrapping the landing coming late today?
I checked in.

Dave: As long as we have a StyleContext, you and everybody else will benefit from 
the change of API.  You can overwrite my version of nsStyleContext.cpp with yours 
and update the mutable APIs.  It should not take you more than a couple of hours 
to do so.  I did it a couple times when switching between the previous version 
and the new one.

About the process, I think it has always been clear that Marc, myself and now 
Daniel welcome and encourage anybody to help on the style system.  You should 
have contacted us way sooner, before doing any implementation.  We learned by 
chance last Sunday that you were in the midst of some serious rework in the style 
system by reading a chat on IRC:
    <hyatt> argh
    <hyatt> com sucks
    <hyatt> darnit
    <hyatt> the style system uses too much COM
    <hyatt> i'm trying not to comify my new rule nodes.
    <hyatt> and i can't get around it
    <bryner> hyatt: what are you rewriting?
    <hyatt> rule matching.
    <bryner> do you have a way to speed it up?
    <hyatt> hell yes
    <hyatt> and to make it take much less memory

I asked you what was going on...

>You mentioned in IRC that you were speeding up rule matching.  Is it internal
>to XUL only?  Or is it something like nsICSSPseudoComparator which changes a
>little bit the style system in order to make XUL much faster? Or something else?
>
>Just curious.  The nsICSSPseudoComparator is maybe not the most elegant thing
>one can come across in the style system but it is very localized and so much
>worth it!  Are you working on something like that?

And you replied...

> I'm doing what I talked about with you and Marc in an earlier email...
> namely replacing the mRules member of StyleContextImpl with a lexicographic
> tree.  Right now every style context holds on to all the rules it matched as
> an nsISupportsArray... I'm replacing that with a structure that enables
> sharing...

I, and apparently Marc too, don't recall this mail.  Or at least we were not 
aware that you had started an implementation.  Besides, the 'mRules' member is 
one of the things that Marc and I did not change.  It is fairly separate from the 
work we did when reducing the memory footprint of the computed style.

As I said, it will take you a couple of hours to update your tree.  I can give 
you some pointers if you want.  For the rest, you have a CVS account and you can 
checkin pretty much what you want as long as you have it approved by the drivers 
and go through the same kind of testing we did for StyleContext and StyleData 
sharing, but I think you are missing some opportunities by not collaborating with 
those who usually work on that code, not to mention the risk of redundant work.
Apparently the bloat went down 3 Mb on Linux and 4 Mb on other flavors of Unix.  
I'm stunned, I did not expect the numbers to be so high.  People say the 
application still works.  Let's wait for some more testing and performance 
metrics from QA.
Here is the email that I sent to you, brendan, attinasi, and waterson.  Maybe 
you don't recall it, but I sent it straight to you, and you even responded to 
it.

=======

I was thinking about the style system sharing stuff last night, and I 
came up with an idea on how to generalize techniques used in the 
outliner widget to make our style sharing code even faster.

What I would propose is that the style system have two levels of cache: 
the first line of defense would be a new cache that I'm going to outline 
momentarily, and the second line of defense would be the style sharing 
code that pierre and attinasi have done.  This new cache complements the 
existing cache, and both are necessary for maximal footprint reduction 
and allocation reduction.  I'll call the new cache the level 1 cache, 
and the existing cache, the level 2 cache.

Here's the basic idea for the level 1 cache.  The style system maintains 
a state machine that treats a sorted list of rules as an input word.  
This state machine transitions on the rules in the input word until all 
have been processed.  The resultant final state contains all of the 
style data for that input word.  On a miss, i.e., when no style data is 
found at the final state, you then check the level 2 cache.  You have to 
allocate the style data and peform t he CRC computation to find matching 
style data structs that can then be cached at the final state in the 
level 1 cache.

The first level cache will allow us to avoid even doing the allocations 
of temporary data that we then throw away, and we won't have to do a CRC 
computation (which gets expensive since there are many things to compare).  

Thoughts?  I would be very interested in translating the outliner state 
machine into the style system if you guys think it's a good idea.  I 
have all the data structures for this already implemented and tested.  I 
think this works really well, because it complements the code you guys 
have already written.  Level 1 (the new cache) makes sure the same set 
of rules gives you an immediate answer with no allocation.  Level 2 (the 
existing cache) ensures that different sets of rules that map to similar 
style data end up being shared.

Dave
(hyatt@netscape.com)

=====

I presented this idea to you on 4/4/2001, one month ago.  Both you and attinasi 
responded.  I describe the exact approach in this email of treating the sorted 
list of rules as an input word, fueling a state machine/lexicographic tree.

To claim that I have been working on this in a vacuum without mentioning my 
ideas is completely ridiculous.  

Attinasi's response to the input word paragraph.  Again, you are copied on the 
mail along with brendan and waterson.
==============
"This sounds like a very nice mechanism. I'm wondering if it is 
essentially similar to the sibling style sharing that happens now, 
before the style context data sharing phase [1]. Currently, the style 
system looks at all child style contexts to see if there are any with 
exactly the same rules (comparison by hash, but not a CRC) and if it 
finds such a context, it shares it immediatly, without having to fully 
resolve style for it. Your level 1 cache sounds like a more global 
version of this (not restricted to just sharing among siblings)."
==================



> Apparently the bloat went down 3 Mb on Linux and 4 Mb on other flavors of Unix.
> I'm stunned, I did not expect the numbers to be so high.  People say the 
> application still works.  Let's wait for some more testing and performance 
> metrics from QA.

If you're basing this on the bloat numbers from tinderbox, I think they went
down because you didn't add logging for the classes that are no longer part of
the StyleContextData object and you removed (#ifdef-ed out) the logging for
StyleContextData.
David (Baron), I may have been wrong but the calculation is as following: in 
StyleContextImpl::SizeOf(), we had the size of everything that belong to the 
style context, minus the style data which is counted separately (line 3970)

    // count the style data seperately
    if (mStyleData) {
      mStyleData->SizeOf(aSizeOfHandler,localSize);
    }

Then inside nsStyleContextData::SizeOf(), we had the size of all the style data 
structures that are not null (line 2344)

  // add the sizes of the individual style blobs
  if (mData.mBlobs.mFont)          aSize += sizeof(StyleFontBlob);
  if (mData.mBlobs.mColor)         aSize += sizeof(StyleColorBlob);
  if (mData.mBlobs.mList)          aSize += sizeof(StyleListBlob);

I did add logging for the blobs and I ifdef'd out counting the StyleContextData 
in the StyleContextImpl otherwise it would have been counted twice: once in line 
2342 in nsStyleContextData::SizeOf():

  // get the size of an empty instance and add to the sizeof handler
  aSize = sizeof(*this);

...and a second time in line 3955 in StyleContextImpl::SizeOf()

  // get the size of an empty instance and add to the sizeof handler
  aSize = sizeof(*this);


Where is the bug?
Read "we add" instead of "we had".  Twice.  Time to sleep.
StyleContextImpl::SizeOf has nothing to do with the bloat logging done by
nsTraceRefcnt.  That uses MOZ_COUNT_CTOR/MOZ_COUNT_DTOR or
NS_LOG_ADDREF/NS_LOG_RELEASE.  You commented out (I think) the
NS_LOG_ADDREF/NS_LOG_RELEASE pair for the StyleContextData, and you didn't add
any MOZ_COUNT_CTOR/MOZ_COUNT_DTOR for it or for any of the other structs that
are now separately allocated.
Gasp!  Correct.  But at least the StyleContextData is counted as part of the 
StyleContext (with a waste of 1 pointer per StyleContext).  The blobs however are 
*not* counted.
dbaron, please review.  If my estimates are correct, and if the Blobs indeed take 
3 to 4 Mb, the gains should be around 500K to 800K.  It will be much more when we 
address the performnce problems and re-enable StyleContext sharing.
David (Hyatt): I remember this discussion but you were then talking about 
complementary mechanisms (level 1 cache & level 2 cache) and yesterday, you 
described what Marc and I had done (the level 2 cache) as "moving in the wrong 
direction, not addressing the fundamental problem, and working under a 
fundamentally broken assumption".  The change of tone made me assume that you 
were off on something completely different.  This being said, reducing the memory 
footprint by 40% then by 60% doesn't seem to me as moving in the wrong direction.  
Ok, too bad the two things are not enabled at the same time yet, but what kind of 
savings do you think you will get, and how does it compare with the proposal I 
sent last month?

I propose we get together tomorrow, put down ideas, list problems, define a 
roadmap, and open a separate bug report or a thread on the newsgroups to allow 
other people to participate.  I really am excited about what you are working 
on, I just don't see the antagonism with the work that has been done by different 
persons over the past few years, and as often I believe that it is out of 
exchanges and compromises that the best solutions can be found.  Especially after 
a good fight.  Take your bat.
I fixed the nsTraceRefcnt but the tinderboxen still show a gain of 2Mb, sometimes 
3Mb.  I think all the structures are accounted for, if someone wants to double 
check.  Otherwise it's not 4Mb but I'll still take it.

An interesting test would be to get the bloat stats for StyleContext + StyleData 
sharing.  Any volunteer?  Enable SHARE_STYLECONTEXTS in nsIStyleSet.h line #47 
and recompile Content, Layout and Editor.
r=dbaron on attachment 33121 [details] [diff] [review], although technically you should have a
MOZ_DECL_CTOR_COUNTER for each one as well (but it's currently a no-op and I
don't think that's likely to change)
Blocks: 78961
I checked in the MOZ_COUNT_CTOR yesterday (my apologies for this crass ignorance) 
and as I wrote, the bloat seems to be 2 or 3Mb lower.  Here is the data I 
collected from the various tinderboxen:

The "before" numbers comes from before my checkin on 05/03 at 06:12.
The "after" numbers comes from after my checkin on 05/03 at 19:18.

-------------------------------------------------------
T-BOX      OS        BEFORE        AFTER        GAIN
-------------------------------------------------------
coffee     linux     26 - 27       24 -(25)     -2 (+)
shrike     linux     29 -(30)     (27)- 28      -2 (-)
speedracer sun       27 - 28       24 - 25      -3
bismark    sun       29            26 - 27      -2 (+)
cement     irix      31            29           -2
monkeypox  linux     29            26           -3
muerte     bsd       30            27           -3
nebiros    sun       29            26           -3
senna      linux     29            26           -3
-------------------------------------------------------

2Mb or 3Mb less bloat is enough of a step in the right direction.
Closed as Fixed.

The debate shall continue under bug 78961 and in the n.p.m.style newsgroup at 
news://news.mozilla.org/3AF314D7.4C896990%40netscape.com

I would like to thank all the 34 people copied on this bug for their interest and 
their participation, with special thanks to the QA folks for their dedication, 
jrgm for his kindness, Brendan for his wisdom, and Waterson for his volunterism.

If you want to continue to follow the debate, don't forget to add your email to 
the CC list under bug 78961.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
I actualy does not see any mem usage improvements... For me it is now worse than
it was.

I can go very quickly to 70Mb mem used by mozilla after the check-in.

Can someone confirm?

PS On Win98.
Whiteboard: [whitebox]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: