Incremental Rule destroyer

RESOLVED FIXED in Firefox 41



6 years ago
2 years ago


(Reporter: mccr8, Assigned: mccr8)


(Depends on: 1 bug, Blocks: 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(firefox41 fixed)



(1 attachment, 9 obsolete attachments)

In bug 850065, I'm making the Unlinking() and Unroot() phases of the cycle collector incremental to reduce pause times.  I measured how long Unroot() is taking (which is going to often be the last reference, so this should mostly be the cost of the destructor), and on TechCrunch page teardown I see nsCSSStyleSheets taking a long time:

CC: Object 0x11cd98550 took 107.48us to Unroot.  <-- nsCSSStyleSheet
CC: Object 0x11cd98430 took 213.98us to Unroot.  <-- nsCSSStyleSheet
CC: Object 0x12b23b3a0 took 117.04us to Unroot.  <-- nsCSSStyleSheet
CC: Object 0x12b23afb0 took 4255.67us to Unroot.  <-- nsCSSStyleSheet
CC: Object 0x12b23b0d0 took 878.75us to Unroot.  <-- nsCSSStyleSheet

The 0.1ms ones aren't a problem, but the 4.2ms one and even the 0.8ms one are pretty bad, given that I'm trying to make slices take under 5ms.

I haven't investigated what is taking so long yet.  It could just be tearing down a giant non-cycle-collected data structure.
Disclaimer: this is in an opt debug build...
I also see nsDOMAttributeMap taking 0.7ms to tear down, which I understand may also be style sheet related.
Looks like RemoveSheet is the slow thing here:
  QQQQ: Clearing children took 0.051000us
  QQQQ: DropRuleCollection took 3.765000us
  QQQQ: DropMedia took 3.220000us
  QQQQ: RemoveSheet took 5926.785000us

(usually it is much faster, of course)
Digging further, this appears to be mostly time spent clearing mOrderedRules in ~nsCSSStyleSheetInner, which is an nsCOMArray.  Clearing each element doesn't appear to take all that long, so we should be able to swap it into a runnable, and incrementally destroy it.
Summary: Lazy nsCSSStyleSheet blaster? → Lazy Rule destroyer?
Assignee: nobody → continuation
Created attachment 742867 [details] [diff] [review]
All hail the lazy rule destroyer! WIP

This is based on the lazy content unbinder, and defers the destruction of nsCSSStyleSheetInner::mOrderedRules to a runner, which does so in 4ms chunks.  This knocks the max pause time on TechCrunch teardown from about 9 or 10ms to a little under 6ms.
Comment on attachment 742867 [details] [diff] [review]
All hail the lazy rule destroyer! WIP

Review of attachment 742867 [details] [diff] [review]:

::: layout/style/nsCSSStyleSheet.cpp
@@ +922,5 @@
> +  {
> +    MOZ_ASSERT(nsCCUncollectableMarker::sGeneration, "Don't call append during shutdown.");
> +    if (sRuleDestroyer) {
> +      sRuleDestroyer->mLast->mNext = new RuleDestroyer();
> +      sRuleDestroyer->mLast = sRuleDestroyer->mLast->mNext;

How about a LinkedList?
(In reply to :Ms2ger from comment #6)
> How about a LinkedList?

Yeah, that's a good point.

I was actually thinking that it might be good to add a general deferred release mechanism to the cycle collector, where things can register objects with two methods, IncrementalDestroy(int32 timeLeft) and DestroyAll(), with the CC, and then the CC could schedule them to run after it has done everything else it needs to do, instead of everybody who wants to do this having to implement their own thing.

In addition to simplifying the implementation, it would probably be less janky if a single point was scheduling all of these, rather than everybody firing off their own runnables.  When I was testing this, I noticed the Lazy Rule Destroyer was firing off while ICC was still going on.
Created attachment 8416776 [details] [diff] [review]
Lazy Rule destroyer

Rebased, plus updated to take into account nsCSSStyleSheet being CCed now.

With the large caveat that this was in an opt debug build, in my local measurement,
unlinking nsCSSStyleSheets when closing TechCrunch takes 4.5ms.  With this patch,
it only takes 0.125ms.

Other things that are worth looking at are GroupRule, taking 1.4ms, and FragmentOrElement,
at about 2ms.
Attachment #742867 - Attachment is obsolete: true
It looks like maybe the GroupRule unlink time is mostly spent in destroying its own list of CSS Rules.  It is easy enough to make it also use the lazy rule destroyer.
Blocks: 956068
Summary: Lazy Rule destroyer? → Lazy Rule destroyer
Created attachment 8416809 [details] [diff] [review]
Lazy Rule destroyer

Now with support for GroupRule
Attachment #8416776 - Attachment is obsolete: true
Oddly, this doesn't seem to be a problem on TechCrunch any more.  The longest list of rules I see is 73, whereas I'm pretty sure it was like a thousand before.  I wonder if this is still worth fixing.
Created attachment 8540340 [details] [diff] [review]
Incremental rule destroyer.

I rewrote the lazy rule destroyer in a less gross way.  Some debugging code is still present.
Attachment #8416809 - Attachment is obsolete: true
Oh, and you have to set beIncremental to true to have it actually do anything incrementally...
Created attachment 8540376 [details] [diff] [review]
Incremental rule destroyer.

Oops, it turns out I wasn't seeing any performance improvements because I forgot to comment out the eager clearing of the style sheet list.

This reduces unlink time by about 3ms, which is around what I was seeing before.

Of course, the runnable ends up running immediately, while the CC is still going, which is not ideal.
Attachment #8540340 - Attachment is obsolete: true
Created attachment 8540411 [details] [diff] [review]
Incremental rule destroyer.

It works a little better when the budget is set to 2ms rather than 2s...
Attachment #8540376 - Attachment is obsolete: true
I measured again with a newer version of the patch, and it is still pretty effective.  On TechCrunch page close, it reduced unlink time by 4.5ms, which also reduced byte max pause time from 10ms to 6ms.

with my patch:

cc: ScanIncrementalRoots::fix nodes took 1.4ms
cc: CollectWhite::Root took 1.1ms
cc: CollectWhite::Unlink took 3.3ms
cc: CleanupAfterCollection::mGraph.Clear() took 3.6ms
cc: total cycle collector time was 601ms in 16 slices
cc: visited 8692 ref counted and 62829 GCed objects, freed 8623 ref counted and 62783 GCed objects (99%).


cc: ScanIncrementalRoots::fix nodes took 1.3ms
cc: CollectWhite::Root took 1.0ms
cc: CollectWhite::Unlink took 7.8ms
cc: CleanupAfterCollection::mGraph.Clear() took 2.4ms
cc: total cycle collector time was 443ms in 12 slices
cc: visited 8378 ref counted and 51535 GCed objects, freed 8310 ref counted and 51491 GCed objects (99%).
Created attachment 8591147 [details] [diff] [review]
Deferred Rule destroyer. WIP
Attachment #8540411 - Attachment is obsolete: true
Created attachment 8593672 [details] [diff] [review]
Deferred Rule destroyer.

This version builds on DeferredFinalize, so it avoids creating its own
runnable.  It builds on the patch in bug 866681 that makes it usable
for CCed things.

It also encapsulates the deferred destruction behavior in a new subclass of nsCOMArray, that only replaces the behavior for Clear() and the dtor.  This is less error-prone, because you don't have to remember to fix up every use.  In a prior revision of this code, I accidentally lost the deferred behavior when rebasing because some code had been added in another place to do a clear.  It is still a little fragile in that you could remove large numbers of things from the array though some other method, but I think this is okay.

The deferred finalizer keeps an array of arrays of rules.  In each slice, it removes the last |aSlice| elements of the last array, then removing any newly-emptied arrays.

One little detail is that I had to ensure that we don't defer anything if we're in the middle of shutdown.  We synchronously clear the deferred finalizer stuff out at the end of the CC, which I thought would be enough, but it turns out (bug 1155454) that there's a cache that holds alive CSS stuff after the cycle collector is gone.
Attachment #8591147 - Attachment is obsolete: true
Summary: Lazy Rule destroyer → Incremental Rule destroyer
try run:

I'm going to give the content unbinder changes a few more days to settle in before I put this up for review.
I don't want to land this until content process telemetry is working.
Depends on: 1156857
Created attachment 8611288 [details] [diff] [review]
Incremental css::Rule destroyer.
Attachment #8593672 - Attachment is obsolete: true
Attachment #8611288 - Flags: review?(dbaron)
Comment on attachment 8611288 [details] [diff] [review]
Incremental css::Rule destroyer.

Given that you're putting it in namespace mozilla, please don't use
the "ns" prefix on the class name (or filename).

>+  while(aSliceBudget > 0 && newOuterLen > 0) {

"while (" with a space

>+    mozilla::DeferredFinalize(AppendRulesArrayPointer,

Doesn't seem like you need the "mozilla::" given the "using namespace
mozilla" above.

Regarding the callback functions -- I mildly prefer the convention of
making variables with their real types for all the void* parameters as
the very first thing in the function, then a blank line, and then
continuing with other things -- but it's ok as is if you don't want
to change that.  (Mixing with argument-validation assertions is fine.)
(That would mean having a variable for aObject in
AppendRulesArrayPointer, and adding a blank line after the first 2 lines
of DeferredFinalizeRulesArray.)

>+    delete rulesArray;

Perhaps the comments in DeferredFinalize.h should mention that you're
supposed to do this?

>+  uint32_t newOuterLen = rulesArray->Length();

rulesArray is an nsTArray, whose Length() returns size_t.  I think
you should thus use size_t for newOuterLen.

>+    aSliceBudget -= currSlice;

In theory DeferredFinalize.h says this should be protected by
if (aSliceBudget != UINT32_MAX).

r=dbaron with that
Attachment #8611288 - Flags: review?(dbaron) → review+
Created attachment 8622568 [details] [diff] [review]
Incremental css::Rule destroyer.
Attachment #8611288 - Attachment is obsolete: true
Comment on attachment 8622568 [details] [diff] [review]
Incremental css::Rule destroyer.

Carrying forward dbaron's r+.

(In reply to David Baron [:dbaron] ⏰UTC-7 (mostly busy, back June 29) from comment #22)
> Perhaps the comments in DeferredFinalize.h should mention that you're
> supposed to do this?
That's true. I filed bug 1174783 to fix up the comment.

> In theory DeferredFinalize.h says this should be protected by
> if (aSliceBudget != UINT32_MAX).
Good catch! In all the time I've spent looking over this code, I never noticed that. We do in fact rely on that when calling the callback, but none of the existing implementors actually follow that convention, which could be bad. I filed bug 1174796 for addressing that somehow.

I addressed the rest of the review comments (including the stuff about the void* casts). Thanks for the review.
Attachment #8622568 - Flags: review+
Last Resolved: 4 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Depends on: 1354987
You need to log in before you can comment on or make changes to this bug.