Closed Bug 636039 Opened 9 years ago Closed 9 years ago

replace use of nsRuleData* structs with arrays of nsCSSValue

Categories

(Core :: CSS Parsing and Computation, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

(Keywords: perf)

Attachments

(22 files, 2 obsolete files)

17.61 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
2.15 KB, text/plain; charset=UTF-8
Details
912 bytes, patch
bzbarsky
: review+
Details | Diff | Splinter Review
1.16 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.87 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
1.32 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
80.77 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
9.52 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
2.28 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
1.35 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
140.34 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
23.61 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
7.58 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
55.01 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
9.82 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
9.78 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
31.17 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.60 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.40 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
11.30 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
4.31 KB, patch
Details | Diff | Splinter Review
2.47 KB, patch
Details | Diff | Splinter Review
We should replace use of nsRuleData* structs in the nsRuleNode::Get*Data methods with use of an array of nsCSSValue.  This will allow us to get rid of the distinction between all of these methods, which will reduce branching in nsRuleNode.
Depends on: 636040
So I have patches here.  I need to figure out how to benchmark them.

Then, I need to figure out whether, on top of that, we should also remove the nsRuleNode changes from bug 360870.

Furthermore, I should also decide whether to do similar work to nsCSSExpandedDataBlock, or just try to get rid of nsCSSExpandedDataBlock.
This becomes important later in this patch queue, when I use the domprop field for a set of convenience per-property methods.  (I'm not entirely convinced that was the best of choices, mainly because it scatters Moz prefixes internally where we don't need them... but it would be somewhat painful to redo at this point, and I don't really see a valid alternative anyway.)

But I'd like to land this ahead of the patches in bug 636029, since I wrote it ahead of them and it's a pain to merge.  (I've untangled the rest of the tangle in my patch queue, though.)
Attachment #514741 - Flags: review?(bzbarsky)
Actually, I could get around the internal prefixing by adding a CSS_PROP_ATTACH_PREFIX(prefix, domprop) macro that's normally defined to ## them together, but that callers could instead define to just drop the first argument.  The extra ugliness there might even be worth it...
Comment on attachment 514741 [details] [diff] [review]
patch 1: make sure all properties in nsCSSPropList.h have a useful domprop field

This is fine, I think.  The prefix thing seems orthogonal to this change.
Attachment #514741 - Flags: review?(bzbarsky) → review+
Another thing I realized I can do to make things faster:  reorder style struct IDs so the inherited and reset structs have separate ID ranges, and then use the IDs as indices into an array in ns{Inherited,Reset}StyleData.
(In reply to comment #3)
> Actually, I could get around the internal prefixing by adding a
> CSS_PROP_ATTACH_PREFIX(prefix, domprop) macro that's normally defined to ##
> them together, but that callers could instead define to just drop the first
> argument.  The extra ugliness there might even be worth it...

I've done (roughly) this in my patch queue.

(In reply to comment #5)
> Another thing I realized I can do to make things faster:  reorder style struct
> IDs so the inherited and reset structs have separate ID ranges, and then use
> the IDs as indices into an array in ns{Inherited,Reset}StyleData.

... and then the idea would be that once I've done this, I'd try getting rid of all the separate nsRuleNode::GetStyle* and nsStyleContext::GetStyle* methods.
For what it's worth, those methods are a good bit faster than computing the right reset-or-inherit pointer, when I measured....

You mean make them thin shims around GetInheritedData and GetResetData methods?  Or just get rid of those APIs altogether?
I mean make the nsStyleContext ones thin shims, and get rid of the rest.

The idea would be that the patches I have here (which get rid of all the branching that those functions avoid) plus comment 5 (which would turn the reset-or-inherit stuff into a subtraction and offset) should eliminate almost all of the benefits, and then the codesize and cache effects would make eliminating them a speedup.
Sounds like a plan.
Attached file benchmark results
I separated the SID separation part (comment 5) into bug 639231 rather than do it in this bug.

Then I benchmarked, separately:
 * this bug
 * bug 639231
 * the replacement of the separate per-struct methods (comment 6 / comment 8) with the use of inline methods for the generics, in particular, the patch:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/f029154a1bc5/remove-unrolled-SID-methods

This has the results of my individual measurements, which were done by loading www.mozilla.com and then repeatedly causing recomputation of all style data without rerunning of selector matching (at least, that's what it was intended to measure).

The results show that this bug and bug 639231 are both measureable improvements, but removing the unrolled per-SID methods is not.  That said, if I'd benchmarked all three together, it would still have been a net improvement, since this bug is about a 4% win on that test, bug 639231 is about a 2% win, and removing the unrolled methods is about a 2% loss.

So I'll plan to keep the unrolled methods for now.
The definition of these methods changes in later patches.
Attachment #517188 - Flags: review?(bzbarsky)
... uses removed in patch 11
Attachment #517197 - Flags: review?(bzbarsky)
See the added comments in nsCSSProps.h for further explanation.
Attachment #517199 - Flags: review?(bzbarsky)
For the record, I wrote patches 16 and 17 after doing the benchmarking.  I see no reason to expect they'd make any difference; they shouldn't even change the resulting binary at all.
Fix typo in comment, and also, more importantly, some test failures because I forgot to fix ListCSSProperties.
Attachment #517187 - Attachment is obsolete: true
Attachment #517187 - Flags: review?(bzbarsky)
Attachment #517213 - Flags: review?(bzbarsky)
again, forgot to fix ListCSSProperties
Attachment #517198 - Attachment is obsolete: true
Attachment #517198 - Flags: review?(bzbarsky)
...and so I don't miss test failures
Attachment #517215 - Flags: review?(bzbarsky)
Attachment #517214 - Flags: review?(bzbarsky)
Comment on attachment 517185 [details] [diff] [review]
patch 2: correct comment describing nsRuleData::ValueFor

r=me
Attachment #517185 - Flags: review?(bzbarsky) → review+
Comment on attachment 517186 [details] [diff] [review]
patch 3: move some members from nsCSSTable to nsRuleDataTable

r=me
Attachment #517186 - Flags: review?(bzbarsky) → review+
Comment on attachment 517188 [details] [diff] [review]
patch 5: Add method getters for each property to nsRuleData

r=me
Attachment #517188 - Flags: review?(bzbarsky) → review+
Comment on attachment 517213 [details] [diff] [review]
patch 4: allow nsCSSPropList's method field to be used with or without prefixes

r=me
Attachment #517213 - Flags: review?(bzbarsky) → review+
Assignee: nobody → dbaron
Comment on attachment 517189 [details] [diff] [review]
patch 6: fix error in CSS vs. HTML precedence for variable attribute of <pre>

r=me, but it seems that no one else implements this.... perhaps we should just remove it?  Followup bug.  If we do decide to keep it, we should add a test for this.
Attachment #517189 - Flags: review?(bzbarsky) → review+
Comment on attachment 517191 [details] [diff] [review]
patch 8: convert C++-implemented style rules to using nsRuleData getters

r=me
Attachment #517191 - Flags: review?(bzbarsky) → review+
Comment on attachment 517192 [details] [diff] [review]
patch 9: remove unused *AtOffset methods

r=me
Attachment #517192 - Flags: review?(bzbarsky) → review+
Comment on attachment 517194 [details] [diff] [review]
patch 10: correct checks that were for the wrong MathML-related pseudo-property

r=me
Attachment #517194 - Flags: review?(bzbarsky) → review+
Comment on attachment 517190 [details] [diff] [review]
patch 7: convert attribute mapping functions to using nsRuleData getters

r=me.  I tried to check over things to make sure nothing weird was happening, but I have to admit some eye-glazing happened.
Attachment #517190 - Flags: review?(bzbarsky) → review+
Comment on attachment 517197 [details] [diff] [review]
patch 12: remove unneeded params to COMPUTE_START_{INHERITED,RESET}

r=me
Attachment #517197 - Flags: review?(bzbarsky) → review+
Comment on attachment 517199 [details] [diff] [review]
patch 14: add index and count data for which properties need to be computed for each SID

I think the API should be PropertiesInStruct() and IndexInStruct(), and gIndexInSID should be named gIndexInStruct....

Als, eSIDCount_for_* should be called ePropertyCount_for_*, I think.  And eSIDIndex_for_* should be called ePropertyIndex_for_*.

r=me with those renames.
Attachment #517199 - Flags: review?(bzbarsky) → review+
(In reply to comment #42)
> I think the API should be PropertiesInStruct() and IndexInStruct(), and
> gIndexInSID should be named gIndexInStruct....

Actually, how about PropertyCountInStruct() and PropertyIndexInStruct() (with corresponding constant renames)?
> Actually, how about PropertyCountInStruct() and PropertyIndexInStruct()

Sounds good to me.
Comment on attachment 517200 [details] [diff] [review]
patch 15: stop stack-allocating nsRuleData* structs and branching to nsRuleNode::Get*Data to do so

Can't we just inline ValueFor() and implement ValueFor* by calling ValueFor?

And do the const version by const_cast and calling the non-const version?  Should reduce the code duplication...

Oh, and index_in_SID should probably be called index_in_struct or something; see comments on previous patch.

I do think it makes sense to at least poison mValueStorage.  A followup bug, perhaps?

> nsRuleNode::CheckSpecifiedProperties(const nsStyleStructID aSID,
>+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
>+                    "we assume the value offset is zero instead of adding it");

Why can we assume that?

> UnsetPropertiesWithoutFlags(const nsStyleStructID aSID,
>+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
>+                    "we assume the value offset is zero instead of adding it");

And here.

Some comments somewhere about when we might or might not have more than one struct's worth of nsCSSValues might be good.

>+struct AutoCSSValueArray {

So the only reason this doesn't allocate its own storage is because of SetGenericFont, right?  Is the loop there iterated over often enough that this matters?

>+  AutoCSSValueArray(void* aStorage, size_t aCount) {

Please assert that aStorage is aligned reasonably.  I'd think alloca() handles this automatically, but if it doesn't we want to know.

r=me with those nits dealt with.
Attachment #517200 - Flags: review?(bzbarsky) → review+
Comment on attachment 517206 [details] [diff] [review]
patch 16: remove Moz prefixes from names of subproperty tables

r=me
Attachment #517206 - Flags: review?(bzbarsky) → review+
Comment on attachment 517207 [details] [diff] [review]
patch 17: remove unused nsRuleData* structs

r=me
Attachment #517207 - Flags: review?(bzbarsky) → review+
Answering one of the comments now; I'll try to go through the rest tomorrow:

(In reply to comment #45)
> Comment on attachment 517200 [details] [diff] [review]
> patch 15: stop stack-allocating nsRuleData* structs and branching to
> nsRuleNode::Get*Data to do so
> 
> > nsRuleNode::CheckSpecifiedProperties(const nsStyleStructID aSID,
> >+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
> >+                    "we assume the value offset is zero instead of adding it");
> 
> Why can we assume that?
> 
> > UnsetPropertiesWithoutFlags(const nsStyleStructID aSID,
> >+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
> >+                    "we assume the value offset is zero instead of adding it");
> 
> And here.

We can assume these because everything that uses nsRuleData except for nsRuleNode::HasAuthorSpecifiedRules walks only one struct at once.  So I assume for everything outside HasAuthorSpecifiedRules that we don't need to generate the extra instruction (or more complex instruction) to add that offset.
Comment on attachment 517214 [details] [diff] [review]
patch 13: make the three non-CSS properties have normal longhand SIDs

Shouldn't you remove the CSS_PROP_INCLUDE_NOT_CSS from the one consumer?  Or is it gone already?

r=me with that explained.  I'm hoping you looked at all the places that include this file but don't exclude external and made sure they're all ok with this change.
Attachment #517214 - Flags: review?(bzbarsky) → review+
Comment on attachment 517215 [details] [diff] [review]
patch 18: fix test_property_database.html so it reports all errors

r=me
Attachment #517215 - Flags: review?(bzbarsky) → review+
Comment on attachment 517195 [details] [diff] [review]
patch 11: Convert nsRuleNode::Compute*Data to nsRuleData getters

r=me, though reading this makes me think that it would be nicer if ValueFor() and ValueFor* returned references, not pointers...
Attachment #517195 - Flags: review?(bzbarsky) → review+
(In reply to comment #51)
> Comment on attachment 517195 [details] [diff] [review]
> patch 11: Convert nsRuleNode::Compute*Data to nsRuleData getters
> 
> r=me, though reading this makes me think that it would be nicer if ValueFor()
> and ValueFor* returned references, not pointers...

I was worried about confusion regarding assignment semantics there...

(In reply to comment #49)
> Comment on attachment 517214 [details] [diff] [review]
> patch 13: make the three non-CSS properties have normal longhand SIDs
> 
> Shouldn't you remove the CSS_PROP_INCLUDE_NOT_CSS from the one consumer?  Or is
> it gone already?

Yes, I should remove it.  (There are actually two.)

> r=me with that explained.  I'm hoping you looked at all the places that include
> this file but don't exclude external and made sure they're all ok with this
> change.

I'm pretty sure I did... but I no longer remember the details.
(In reply to comment #45)
> Comment on attachment 517200 [details] [diff] [review]
> patch 15: stop stack-allocating nsRuleData* structs and branching to
> nsRuleNode::Get*Data to do so
> 
> Can't we just inline ValueFor() and implement ValueFor* by calling ValueFor?

I'll benchmark this and see what it does.

> And do the const version by const_cast and calling the non-const version? 
> Should reduce the code duplication...

Yeah.

> Oh, and index_in_SID should probably be called index_in_struct or something;
> see comments on previous patch.

Yep.

> I do think it makes sense to at least poison mValueStorage.  A followup bug,
> perhaps?

Er, I meant to deal with all the FIXMEs before uploading the patches, but I missed that one.  I'll add another patch to the queue and request review.

> > nsRuleNode::CheckSpecifiedProperties(const nsStyleStructID aSID,
> >+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
> >+                    "we assume the value offset is zero instead of adding it");
> 
> Why can we assume that?
> 
> > UnsetPropertiesWithoutFlags(const nsStyleStructID aSID,
> >+  NS_ABORT_IF_FALSE(aRuleData->mValueOffsets[aSID] == 0,
> >+                    "we assume the value offset is zero instead of adding it");
> 
> And here.

See comment 48.

> Some comments somewhere about when we might or might not have more than one
> struct's worth of nsCSSValues might be good.

OK.

> >+struct AutoCSSValueArray {
> 
> So the only reason this doesn't allocate its own storage is because of
> SetGenericFont, right?  Is the loop there iterated over often enough that this
> matters?

No.  If I used alloca inside the constructor, the allocation lifetime would only be for the constructor.  (Or am I misunderstanding your comment?)

> >+  AutoCSSValueArray(void* aStorage, size_t aCount) {
> 
> Please assert that aStorage is aligned reasonably.  I'd think alloca() handles
> this automatically, but if it doesn't we want to know.

will do, with NS_ALIGNMENT_OF.
(In reply to comment #36)
> Comment on attachment 517189 [details] [diff] [review]
> patch 6: fix error in CSS vs. HTML precedence for variable attribute of <pre>
> 
> r=me, but it seems that no one else implements this.... perhaps we should just
> remove it?  Followup bug.  If we do decide to keep it, we should add a test for
> this.

Filed bug 642227.
> (Or am I misunderstanding your comment?)

I think so.  If we didn't need the buffer and the object to have different lifetimes, we could just allocated the buffer inline (as |nsCSSValue mValues[n]| or whatever), without using alloca, right?
Not if |n| is a variable, no?
Oh, bah.  I thought that worked in C++, but seems like it doesn't have to in general.  OK, then.
And in particular, gcc supports that, but it might well be non-portable.
Variable-length arrays are a C99 feature, according to http://gcc.gnu.org/onlinedocs/gcc-3.0.2/gcc_5.html#SEC81 .  However, the C++0x draft, in [dcl.array] (8.3.4) says that the thing inside the [] is a constant-expression.

Variable-length arrays are supported by gcc outside of C99, but a bunch of stackoverflow.com hits suggest that MSVC does not support them (and some suggest alloca as a workaround), including a link to http://connect.microsoft.com/VisualStudio/feedback/details/333273/request-for-c99-vla-in-visual-studio .

So it sounds like alloca (which we definitely have in our tree) is more portable.  (Though it's hard to search the tree for variable-length arrays!)
Per comment 45 and comment 53, I wanted to benchmark the effects of more inlining.  This is the patch I used, benchmarking on top of:
http://hg.mozilla.org/projects/birch/file/61ce3fc46a82
http://hg.mozilla.org/users/dbaron_mozilla.com/patches/file/9a721bd410a7 applied through css-animations-properties

Interestingly, this additional inlining *reduced* the size of optimized (though not stripped, perhaps?) libxul.so on my 64-bit Linux (Ubuntu 10.10) machine, from 46,738,526 bytes to 46,738,470 bytes (reduction of 56 bytes).  (Perhaps the inlined function is exactly the same size as the function call overhead?)

Given that, it's not surprising that it also made things (the same benchmark as comment 11) faster:

with patch:    3904ms 3881ms 3889ms 3890ms 3883ms
without patch: 4035ms 3950ms 3949ms 3942ms 3905ms
with patch:    4070ms 3915ms 3909ms 3902ms 3930ms
without patch: 3925ms 3965ms 3950ms 3940ms 3971ms

So I'll incorporate this diff into patch 15 per your review comments (comment 45).  (It'll mean nsRuleData.cpp is empty from patch 15 until patch 19, but I think that's OK.)
Actually, though, I should also try inlining ValueFor *without* making ValueFor##method_ use ValueFor, since the above patch makes ValueFor##method_ hit nsCSSProps::kSIDTable, whereas that offset could really be compiled-in in that case.
Per comment 62.  This leads to libxul.so file size 46,738,475 bytes.

with patch:    3888ms 3913ms 3890ms 3879ms 3884ms
without patch: 3927ms 3953ms 3930ms 3952ms 3936ms

I'm going to take this approach instead, though I'm not sure if the difference is measurable (on 64-bit Linux, at least).
> whereas that offset could really be compiled-in in that case.

Yeah, that makes sense.  The plan in comment 63 sounds good.

Note that I've also considered templating WalkRuleTree on the SID...  I'm not sure how well that would work, given i-cache effects.
Comment on attachment 520015 [details] [diff] [review]
patch 19: poison value offsets

r=me
Attachment #520015 - Flags: review?(bzbarsky) → review+
And, to fix Windows bustage:
http://hg.mozilla.org/projects/birch/rev/75a891d0b104
since I only tryserver'd on one platform.
And a bustage fix for orange on 32-bit debug builds (not sure why it didn't hit 64-bit debug builds, but...):
http://hg.mozilla.org/projects/birch/rev/242b56318834

(I should have reaudited for patch 13, clearly!)
(In reply to comment #68)
> (I should have reaudited for patch 13, clearly!)

... which I've now done.

There's still Windows opt bustage, likely from patch 15.  I'm waiting on builds to confirm that.  If I can't find a fix quickly, I'll back out patches 15, 17, and 19 for now.

(Also, I noticed some changes I should have made in patch 14 when revising patch 13 -- nothing harmful, just more CSS_PROP_INCLUDE_NOT_CSS hackery to remove.)
(In reply to comment #69)
> There's still Windows opt bustage, likely from patch 15.  I'm waiting on builds
> to confirm that.  If I can't find a fix quickly, I'll back out patches 15, 17,
> and 19 for now.

It is from patch 15.
Anyway, landed two more changesets:

https://hg.mozilla.org/projects/birch/rev/be25286c3338
  to fix failing to follow up on the patch 13 comments in comment 52 that
  weren't actually introduced until patch 14.

https://hg.mozilla.org/projects/birch/rev/3766d8e25632
  fix the Windows opt bustage by avoiding placement new[], which is, as
  far as I can tell, always the wrong thing to use.
Depends on: 645423
You need to log in before you can comment on or make changes to this bug.