Closed Bug 661746 Opened 9 years ago Closed 2 years ago

Make selector matching work on const Element/Document/etc

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set

Tracking

()

RESOLVED INVALID

People

(Reporter: bzbarsky, Assigned: dzbarsky)

References

Details

Attachments

(5 files, 17 obsolete files)

9.43 KB, patch
Details | Diff | Splinter Review
34.58 KB, patch
Details | Diff | Splinter Review
11.56 KB, patch
Details | Diff | Splinter Review
4.62 KB, patch
Details | Diff | Splinter Review
13.46 KB, patch
Details | Diff | Splinter Review
This might end up being a metabug depending on how much work there is here, but to parallelize selector matching we need to make sure we're not mutating things as we go.

This definitely means doing something about the places where we add bits to elements in selector matching....  I'm just not sure yet what that "something" should be.
Blocks: 631527
Assignee: nobody → dzbarsky
Attachment #541033 - Flags: review?(bzbarsky)
Comment on attachment 541033 [details] [diff] [review]
Store slow selector flags and set them later

Might be worth doing this in a separate bug blocking this one... but anyway.

1)  No need to have the explicit PushUpdates() calls; just doing it in the
    destructor is good enough.  Then you can make it private as well.
2)  s/AddUpdateFlags/RecordFlagsToAdd/ perhaps?
3)  s/PushUpdate/AddRecordedFlags/ perhaps?
4)  s/UpdateStyleFlags/FlagsToAdd/.  Is there a reason this can't be an inner
    class of TreeMatchcontext?
5)  FlagsToAdd constructor should take const nsINode*, I would think. 
    AddRecordedFlags should const_cast as needed.
6)  You don't need the |flag| and |update| temporaries, right?
Attached patch Patch addressing comments (obsolete) — Splinter Review
Attachment #541033 - Attachment is obsolete: true
Attachment #541033 - Flags: review?(bzbarsky)
Attachment #541376 - Flags: review?(bzbarsky)
Attachment #541463 - Flags: review?(bzbarsky)
Comment on attachment 541376 [details] [diff] [review]
Patch addressing comments

s/mToUpdate/mFlagsToAdd

And you don't need the explicit Clear() call.

You also don't need the |public:| bit for a struct.

r=me with those nits.
Attachment #541376 - Flags: review?(bzbarsky) → review+
Comment on attachment 541463 [details] [diff] [review]
Part 2: Make selector matching const-correct

>+                       const nsPseudoClassList* pseudoClass,

Why that change?  It's probably fine, but if the idea is to make the selectors const too (which is a good idea), I suggest a separate patch for that.

IndexOf() is sadly not const-safe: it uses a shared mutable cache.  The fact that it claims to be a const method is a lie.  :(  Why are we calling IndexOf() in this code anyway?

Why can't you use const in the NthIndexCache?  For our purposes here const_cast is really bad, since it can let bugs through.

Did you go through and look for const_cast and mutable in node implementations?  We want to eliminate both or make sure we don't hit it...
Attachment #541463 - Flags: review?(bzbarsky) → review-
I checked node implementations for const_cast and mutable, and the only use I found was http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericElement.cpp#1250, but I don't think we end up hitting that from selector matching.
Attachment #541463 - Attachment is obsolete: true
Attachment #542145 - Flags: review?(bzbarsky)
Attached patch Part 3: Use const selectors (obsolete) — Splinter Review
Attachment #542146 - Flags: review?(bzbarsky)
Comment on attachment 542145 [details] [diff] [review]
Part 2: Make selector matching work with const Element, etc.

>--- a/layout/style/nsCSSRuleProcessor.cpp
>+++ b/layout/style/nsCSSRuleProcessor.cpp
>+static PRBool SelectorMatchesTree(const Element* aPrevElement,
>+        const nsIContent* prevSibling = nsnull;
>+        const nsIContent* curr = parent->GetFirstChild();
>+        while (curr && curr != prevElement) {
>+          if (curr->IsElement()) {
>+            prevSibling = curr;
>           }
>+          curr = curr->GetNextSibling();
>+        }
>+        
>+        if (prevSibling) {
>+          element = prevSibling->AsElement();
>         }

Seems like this can be written as

const nsIContent* cur = prevElement->GetPreviousSibling();
while (cur && !cur->IsElement()) {
  cur = cur->GetPreviousSibling();
}
if (cur) {
  element = cur->AsElement();
}

or am I missing something here?
OS: Mac OS X → All
Hardware: x86 → All
> I checked node implementations for const_cast and mutable, and the only use I
> found was

Link has a mutable mCachedURI.  Do your link changes take care of that?  What about editor state?  Not relevant?  

nsHTMLOptionElement has a bunch of const_cast.  So does nsHTMLInputElement, nsHTMLMediaElement, etc.  Did you look over all of those?
Comment on attachment 542145 [details] [diff] [review]
Part 2: Make selector matching work with const Element, etc.

For the SelectorMatchesTree code, Ms2ger is right.  You want to walk backwards from the current node, because usually you'll find another element pretty quickly.  How about this:

  for (const nsIContent* curr = prevElement->GetPreviousSibling();
       curr;
       curr = curr->GetPreviousSibling()) {
    if (curr->IsElement()) {
      element = curr->AsElement();
      break;
    }
  }

r=me with that.
Attachment #542145 - Flags: review?(bzbarsky) → review+
Comment on attachment 542146 [details] [diff] [review]
Part 3: Use const selectors

>+          const PRBool docIsRTL =

That part is silly; its just a local.

r=me with that taken out.
Attachment #542146 - Flags: review?(bzbarsky) → review+
Attached patch Part 1 for checkin r=bz (obsolete) — Splinter Review
Attached patch Part 2 for checking r=bz (obsolete) — Splinter Review
Attached patch Part 3 for checkin r=bz (obsolete) — Splinter Review
Keywords: checkin-needed
Attachment #541376 - Attachment is obsolete: true
Attachment #542145 - Attachment is obsolete: true
Attachment #542146 - Attachment is obsolete: true
I just realized something about the flags patch....  Say you have a <div> with N <p> children, and some selectors like "p + p" and "p:nth-child(even)" and "p:nth-child(odd)".  How many times is the <div> added to the array?  Seems like it would be N*(number of selectors), which seems unfortunate.  Would a hashtable make more sense there?
Keywords: checkin-needed
Attached patch Part 1 using a hashtable (obsolete) — Splinter Review
Attachment #542340 - Attachment is obsolete: true
Attachment #542543 - Flags: review?(bzbarsky)
David, have you done any performance testing with these patches?  In an opt build with and without, what do numbers for the last column of http://www.webkit.org/perf/slickspeed/ look like, for example?  You'll need to run it a few times, because of the noise.
Comment on attachment 542543 [details] [diff] [review]
Part 1 using a hashtable

This is identical to "Part 1 for checkin", as far as I can tell.
Attached patch Part 1 using a hashtable (obsolete) — Splinter Review
Timing results for http://www.webkit.org/perf/slickspeed/:

Unpatched: 300, 351, 345, 316, 339, 325, 344, 319, 351, 346, 336, 325
Average: 333.1

With these patches and bug 660959: 288, 373, 405, 346, 359, 363, 362, 364, 364, 363, 324, 401
Average: 359.3

This is a 7.9% slowdown, which is coming from the hashing, since the rest of the changes should make it faster.
Attachment #542543 - Attachment is obsolete: true
Attachment #542543 - Flags: review?(bzbarsky)
Attachment #542700 - Flags: review?(bzbarsky)
> This is a 7.9% slowdown, which is coming from the hashing

Does that mean it goes away if you take out just the hashing patch?
And if so, we can probably do better on the hashtable (e.g. you end up with two hashtable ops on every add right now, where we could use just one, if we use nsBaseHashtable or the jshashtable).  If you can confirm that the hashtable is the cause of the slowdown, can we try one of those solutions?
So some of the slowdown was coming from link stuff, but I've fixed that.  The rest of it is hashing overhead.

unpatched: 108, 107, 107, 106, 107, 107, 107, 107, 107, 106, 107, 107, 88, 106, 107, 107, 107, 107, 106, 107, 107, 107, 107, 88, 107, 107, 107, 107, 106, 107, 107, 107, 107, 107, 107, 89, 107, 107, 107, 107, 106, 107, 107, 107, 107, 106, 89, 107, 106, 107, 108, 107, 107, 107, 107, 107, 107, 106, 89, 107
Average: 105.35

with original patch for bug 660959: 120, 110, 110, 110, 110, 109, 110, 110, 110, 110, 110, 110, 87, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 86, 110, 110, 110, 111, 110, 110, 109, 110, 110, 110, 110, 110, 110, 85, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 111, 110, 110, 85, 110, 110, 110, 110, 111
Average: 108.56666666666666

with new patch for bug 660959: 109, 108, 107, 107, 106, 108, 107, 85, 107, 107, 107, 107, 107, 107, 106, 108, 107, 107, 107, 89, 107, 107, 107, 107, 107, 107, 106, 107, 107, 107, 108, 88, 107, 107, 108, 107, 107, 107, 107, 107, 108, 106, 107, 88, 107, 107, 107, 108, 107, 106, 107, 107, 107, 107, 107, 89, 107, 107, 106, 107
Average: 105.45

The following numbers are all with bug 660959 patch
with datahashtable: 114, 113, 112, 113, 113, 113, 113, 113, 112, 95, 112, 114, 113, 112, 114, 112, 113, 113, 112, 113, 113, 113, 112, 96, 113, 113, 113, 113, 113, 113, 113, 113, 96, 113, 113, 113, 113, 113, 113, 113, 112, 97, 113, 113, 113, 113, 112, 113, 113, 112, 113, 95, 112, 113, 112, 113, 113, 112, 112, 113
Average: 111.38333333333334

with datahashtable that avoid redundant puts: 112, 112, 111, 111, 112, 112, 94, 112, 111, 112, 111, 112, 112, 111, 112, 111, 111, 112, 111, 112, 111, 95, 111, 112, 111, 112, 112, 111, 112, 112, 111, 112, 111, 112, 112, 95, 113, 112, 111, 113, 111, 112, 112, 111, 112, 112, 111, 111, 96, 112, 111, 112, 111, 112, 112, 111, 112, 112, 112, 111
Average: 110.51666666666667

with nsTHashtable: 111, 111, 111, 111, 111, 111, 111, 111, 110, 111, 110, 111, 86, 111, 111, 111, 111, 111, 111, 111, 112, 110, 95, 112, 110, 111, 111, 111, 111, 112, 111, 95, 111, 111, 111, 111, 111, 111, 111, 111, 111, 95, 112, 111, 111, 111, 111, 111, 111, 112, 111, 93, 111, 112, 112, 110, 111, 112, 111, 111
Average: 109.53333333333333

with Part 2: 112, 111, 111, 112, 112, 111, 110, 112, 110, 111, 111, 111, 110, 112, 95, 111, 111, 110, 112, 111, 111, 110, 111, 111, 111, 94, 112, 111, 111, 111, 111, 112, 111, 111, 112, 112, 112, 112, 112, 113, 94, 111, 112, 112, 112, 112, 114, 112, 112, 112, 112, 113, 94, 111, 111, 112, 113, 112, 112, 112
Average: 110.36666666666666

with Part 3: 112, 112, 111, 111, 111, 112, 112, 112, 110, 112, 111, 110, 95, 111, 111, 112, 112, 112, 111, 111, 112, 112, 111, 111, 112, 111, 112, 95, 111, 111, 112, 111, 112, 110, 111, 112, 111, 93, 111, 111, 112, 112, 110, 112, 112, 112, 110, 112, 112, 111, 111, 111, 111, 111, 94, 111, 111, 112, 110, 112
Average: 110.2

This is now a 4.6% slowdown.
Attached patch Part 1 using nsTHashtable (obsolete) — Splinter Review
Attachment #542700 - Attachment is obsolete: true
Attachment #542700 - Flags: review?(bzbarsky)
Attachment #543005 - Flags: review?(bzbarsky)
Can you try jshashtable too, just so we have those numbers?
With jshashtable and parts 2 and 3 we have
112, 112, 111, 111, 112, 111, 112, 94, 112, 112, 111, 111, 112, 111, 112, 112, 111, 112, 111, 112, 111, 111, 111, 94, 111, 111, 111, 111, 111, 111, 111, 111, 111, 95, 111, 112, 111, 111, 112, 112, 111, 112, 111, 111, 112, 111, 112, 111, 111, 94, 112, 111, 111, 111, 112, 111, 112, 112, 111, 112
Average: 110.25

I guess we should just use the nsTHashtable?
> I guess we should just use the nsTHashtable?

Looks like it, yes.
Comment on attachment 543005 [details] [diff] [review]
Part 1 using nsTHashtable

>+++ b/layout/style/nsRuleProcessorData.h
>+  NodeFlagEntry(const nsINode* aNode) : nsPtrHashKey(aNode), mNode(aNode) {}
>+  
>+  const nsINode* mNode;

You don't need the mNode member.

On the other hand, you _do_ need to initialize mFlags to 0, I think.

>+  RecordFlags(NodeFlagEntry* aEntry, void* aData) {
>+    const_cast<nsINode*>(aEntry->mNode)->SetFlags(aEntry->mFlags);

Here you'd use aEntry->GetKey() instead of aEntry->mNode.

And maybe call this function AddFlagsToNode?

>+  void RecordFlagsToAdd(const nsINode* aNode, PRUint32 aFlag) {

aFlags, right?

r=me with that.  Thanks for doing the performance testing.
Attachment #543005 - Flags: review?(bzbarsky) → review+
Attached patch Part 1 for checkin r=bz (obsolete) — Splinter Review
Attachment #543005 - Attachment is obsolete: true
Attached patch Part 4: const nsIDocument (obsolete) — Splinter Review
with Part 4: 111, 111, 110, 110, 110, 111, 110, 110, 110, 110, 109, 110, 87, 110, 109, 110, 110, 109, 110, 110, 109, 112, 109, 90, 110, 110, 109, 109, 109, 110, 109, 110, 109, 93, 110, 109, 110, 110, 110, 110, 111, 109, 110, 109, 110, 89, 110, 109, 110, 110, 109, 110, 109, 110, 110, 110, 111, 111, 111, 112
Average: 108.56666666666666
Attachment #543169 - Flags: review?(bzbarsky)
Comment on attachment 543169 [details] [diff] [review]
Part 4: const nsIDocument

>-                   nsIDocument* aDocument)
>+                   const nsIDocument* aDocument)

This can stay non-const.  The initialization will happen on the main thread, after all.  That would remove the need for those const_cast too.

Speaking of which, could you add an assert to TreeMatchContext ctor/dtor that they happen on the main thread?  That should make it clear why the const_cast in the destructor is safe too.
Comment on attachment 543169 [details] [diff] [review]
Part 4: const nsIDocument

Oh, and r=me with those nits.
Attachment #543169 - Flags: review?(bzbarsky) → review+
Attached patch Part 4 for checkin r=bz (obsolete) — Splinter Review
Attachment #543169 - Attachment is obsolete: true
Attached patch Part 1 for checkin r=bz (obsolete) — Splinter Review
Previous patch didn't build on OSX.
Attachment #543168 - Attachment is obsolete: true
Attachment #547518 - Flags: review?(bzbarsky)
Comment on attachment 547518 [details] [diff] [review]
PArt 5 - avoid creating atoms during selector matching

>@@ -767,18 +772,20 @@ nsCSSSelector::AppendToStringWithoutComb
>+        nsAutoString string;
>+        list->u.mAtom->ToString(string);
>         nsStyleUtil::AppendEscapedCSSIdent(
>+          nsDependentString(string), aString);

You want to use nsDependentAtomString here.

>+++ b/layout/style/StyleRule.h
>+    nsIAtom*        mAtom;

Document that this is a strong ref, please?

That said, can the string-type pseudoclasses ever have a null atom here?  I guess it's not obvious as things stand, but worth a followup to figure that out and if so document it here and hoist the "is string" check in the dtor outside the mMemory null-check.  Just don't do that in this bug, please.  ;)

>+++ b/layout/style/nsCSSRuleProcessor.cpp
>+          nsAutoString langString;
>+          pseudoClass->u.mAtom->ToString(langString);

nsDependentAtomString.

r=me with those nits.
Attachment #547518 - Flags: review?(bzbarsky) → review+
Attached patch Part 5 for checkin r=bz (obsolete) — Splinter Review
Attachment #547518 - Attachment is obsolete: true
Filed bug 673510 for followup
Blocks: 673510
Attachment #543916 - Attachment is obsolete: true
Attachment #542341 - Attachment is obsolete: true
Attachment #542342 - Attachment is obsolete: true
Attachment #543219 - Attachment is obsolete: true
Attachment #548088 - Attachment is obsolete: true
Unfortunately PGO builds (and therefore obviously nightlies) on Windows are failing after this landed, with:
{
e:\builds\moz2_slave\m-in-w32-pgo\build\layout\style\nsruleprocessordata.h(172) : fatal error C1001: An internal error has occurred in the compiler.
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=7383696&tree=Mozilla-Inbound#error0
Backed out of inbound, at bz's request:
https://hg.mozilla.org/integration/mozilla-inbound/rev/add05f884131
Target Milestone: mozilla11 → ---
Nicholas, the patches here replace the mString in nsPseudoClassList with an mAtom.  How should I adjust nsPseudoClassList::SizeOfIncludingThis?  Just remove the part that measures the string, since now it's interned and can be shared and whatnot?
I tried pushing this to try with PGO on.  The result is at https://tbpl.mozilla.org/?tree=Try&rev=cd2dfa804461

The bad news is that now even with VS 2010 the PGO build ends up with an ICE.  :(

Kyle said he'd try to get together a bug report to Microsoft, now that we're at least on a supported compiler version...
(In reply to Boris Zbarsky (:bz) from comment #47)
> Nicholas, the patches here replace the mString in nsPseudoClassList with an
> mAtom.  How should I adjust nsPseudoClassList::SizeOfIncludingThis?  Just
> remove the part that measures the string, since now it's interned and can be
> shared and whatnot?

I think so?  If it ends up in the atom table implemented by nsAtomTable.{h,cpp} it'll be measured by the "explicit/atom-table" memory reporter.
Tes, that's where it ends up.  OK, good.  Now if only this compiled.... ;)
Comment on attachment 574223 [details] [diff] [review]
Part 2 updated to tip

Landed the const AsElement() for mozilla16 because I needed it:

https://hg.mozilla.org/mozilla-central/rev/b93768181dd0
Not relevant anymore, with stylo.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.