Closed Bug 926922 Opened 8 years ago Closed 8 years ago

Log reason for layout flushes in profiles

Categories

(Core :: Gecko Profiler, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: BenWa, Assigned: BenWa)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=profiling p= s=2014.05.23.t u=])

Attachments

(2 files, 3 obsolete files)

Attached file WIP Prototype (obsolete) —
Currently we get a lot of layout/style flushes in profiles but it's hard to identify the source site behind the original mutation. I'd like to record the what's responsible for the first change to layout/styles so that we can crack down on them when we aren't expecting them to happen.

All the profiler pre-req for this have landed, we just need to find the right triggers for this and add some UI to cleopatra.
(In reply to Benoit Girard (:BenWa) from comment #0)
> All the profiler pre-req for this have landed, we just need to find the
> right triggers for this

You mean things such as mutating the DOM or changing styles through CSS-OM?
Actually, finding all of those places may not be that hard.

* nsIPresShell::FrameNeedsReflow should cover all of the cases where we mark some frame as dirty inside layout code.
* nsINode::ReplaceOrInsertBefore should cover most of the cases where we mutate the DOM tree.  (Not sure if that covers things such as changing an attribute's value, you need to double check how we do that these days.)  nsCSSFrameConstructor::ContentAppended and friends should be called from there.
* Not sure where the right place to hook in for CSS-OM changes is.

Please make sure to double check this list with people who know more about this stuff...
Thanks for the feedback. I want to advance my WIP then I'll get further feedback. I believe dbaron had some opinion in taipei so I'll needinfo? him in a bit.
Attached patch WIP (obsolete) — Splinter Review
This may or may not be the final version. Most of the remaining work is in cleopatra to visualize this.
Assignee: nobody → bgirard
Attachment #817168 - Attachment is obsolete: true
Status: NEW → ASSIGNED
CCing aklotz since I'm lumping in the work to symbolicate and better display markers with stacks.
Comment on attachment 820630 [details] [diff] [review]
WIP

Review of attachment 820630 [details] [diff] [review]:
-----------------------------------------------------------------

Requesting dbaron since roc pointed out the code paths again. It's not exactly what you pointed out so perhaps you can let me know if it's appropriate. The profiler changes are trivial so I'm not worried about these.
Attachment #820630 - Flags: review?(dbaron)
Benoit, I don't think these markers capture what you want at all.  Have you tested this stuff or have your requirements changed?
To clarify what I mean here, based on our previous discussions, you were interested to find out which actions cause a reflow in the future (e.g., style mutations, DOM mutations, etc.)  That's what I was going for in comment 2.  However I think your current patch just tracks where a reflow occurs, etc, not what caused it.  I think that what you were after before is probably more interesting.  Or, it could be that I'm just missing something, in which case I'd appreciate being corrected!
I believe it does catch it properly but I'd love to know if it misses some causes. I've already used the results to diagnose bug 930587 so it appears to be working. Lets talk about this tomorrow if your at the office.
So I talked to Benoit and I think we convinced each other that this patch is missing the case for nsINode::ReplaceOrInsertBefore in comment 2.  Roc, does that make sense to you?  We used a simple test case which calls appendElement() to add something to the DOM and the profiler with this patch did not capture the callstack to appendElement().  With my suggested modification to ReplaceOrInsertBefore(), we managed to capture that callstack as well.
Flags: needinfo?(roc)
I don't think you actually want to track nsINode::ReplaceOrInsertBefore.

Appending an element to the DOM should trigger lazy frame construction (or possibly eager frame construction, but lazy ought to be the hard case here). That should happen via nsCSSFrameConstructor::ContentRangeInserted calling MaybeConstructLazily returning true, and that path calls RestyleManager::PostRestyleEventForLazyConstruction, which calls RestyleManager::PostRestyleEventInternal, missing Benoit's patch.

So I think in Benoit's patch we should move the profiler_tracing call into RestyleManager::PostRestyleEventInternal, in the "if" block that calls AddStyleFlushObserver. We don't need to check tracker.Count() there.
Flags: needinfo?(roc)
Ehsan, do you agree with roc?
Flags: needinfo?(ehsan)
Yes, that makes sense to me.
Flags: needinfo?(ehsan)
See Also: → 928447
Keywords: perf
Whiteboard: [c=profiling p= s= u=]
Comment on attachment 820630 [details] [diff] [review]
WIP

I'm not confident from looking at the patch that it'll give us the data we want, but it certainly seems worth adding and seeing what happens when people use it.  r=dbaron

I'm a little unsure about the "StartTimer" one, though.  It seems a little odd to attribute paints to the timer when it's the first tick of the timer, but not for other ticks of the timer.

Sorry for taking so long to get to this.
Attachment #820630 - Flags: review?(dbaron) → review+
Hi BenWa, anything holding you back from landing this now that dbaron's r+'d your patch?
Flags: needinfo?(bgirard)
I want to do a bit of more testing since dbaron raised some questions.
Flags: needinfo?(bgirard)
Duplicate of this bug: 789712
Paul did something similar in bug 926371.
See Also: → 926371
The previous patch was wrong because I assume there was a single refresh driver outputing to the screen and that there was only one list of pending reflow/restyles. This patch should handle this much better.

Now causes are tracked and associated with the execution explicitly.

NOTE: profiler_get_backtrace checks if the profiler is running and early returns if its not so it should be cheap enough to ship.
Attachment #820630 - Attachment is obsolete: true
Attachment #8414855 - Flags: review?(ehsan)
Attachment #8414855 - Flags: review?(dbaron)
I've left some improvements that we can do later if the information collected is insufficient:
- Expose which document we're handling (ex: browser.xul, gmail.com, gaia.app, addonpanel.xul)
- Expose which content node needs restyle (or which style node changed)
- Expose why the refresh driver is ticking
Comment on attachment 8414855 [details] [diff] [review]
patch v1 (Support multiple RefreshDriver)

Review of attachment 8414855 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsRefreshDriver.cpp
@@ +150,5 @@
>      mLastFireTime = now;
>  
>      LOG("[%p] ticking drivers...", this);
>      nsTArray<nsRefPtr<nsRefreshDriver> > drivers(mRefreshDrivers);
> +    profiler_tracing("Paint", "RD", TRACING_INTERVAL_START);

Add a comment explaining what "RD" is.

::: layout/base/nsRefreshDriver.h
@@ +151,5 @@
>     */
>    bool AddStyleFlushObserver(nsIPresShell* aShell) {
>      NS_ASSERTION(!mStyleFlushObservers.Contains(aShell),
>  		 "Double-adding style flush observer");
> +    if (!mStyleCause) {

Please add a comment explaining why you do this only when mStyleCause is null.  Same for below.

::: tools/profiler/ProfilerMarkers.cpp
@@ +48,5 @@
>    : mCategory(aCategory)
>    , mMetaData(aMetaData)
> +{
> +  if (aMetaData == TRACING_EVENT_BACKTRACE) {
> +    SetStack(profiler_get_backtrace());

It would be marginally better if we could do this somehow in the caller through some macro magic to avoid getting this frame in the backtrace.  Please file a follow-up if you care about that.
Attachment #8414855 - Flags: review?(ehsan) → review+
Changes to cleopatra are deployed. Here's a profile of a b2g homescreen swipe:
http://people.mozilla.org/~bgirard/cleopatra/#report=79755aab3d6d92b7bd8c8a89bf2d0d314cec0a3b&select=11077,11194

If you mouse over 'Styles' you can see we're incurring a 33ms style flush caused by grid.js.
Looks awesome!  One idea looking at this, though.  I wonder if it would make sense to highlight/select the associated stacks in the histogram when someone clicks a frame marker?  Perhaps a separate bug.
Yes, the tooltip is temporary (and frankly quite annoying).
Blocks: 1004899
I'm not convinced that nsRefreshDriver::Add{Style,Layout}FlushObserver is the right place to hook in; I suspect nsDocument::SetNeed{Style,Layout}Flush might be better (in that it will catch things that the refresh driver methods won't).  Curious what bz thinks, though.  (In fact, bz might be a better reviewer for this right now.)
Flags: needinfo?(bzbarsky)
... although I suppose it's possible we only want to catch things in the smaller set.
There are certainly cases in which we call SetNeedLayoutFlush when we don't call AddLayoutFlushObserver.  The most obvious is when we add a new reflow root for a presshell that has a pending reflow for another root already.  But also, ScrollContentIntoView calls, view manager flushes, suppressed interruptible reflow flushes.

Similarly, SetNeedStyleFlush will catch posting of restyle events when we're already observing the refresh driver, as well as some things like resizes.
Flags: needinfo?(bzbarsky)
Note in my case I want to track what is the -first- thing that caused the flush that is triggered from the refresh driver. Capturing the stack is a bit costly so restricting to once per refresh tick is a good trade off. If I understand correctly Add{Style,Layout}FlushObserver wont catch all subsequent call which is something that I don't need or support.
Comment on attachment 8414855 [details] [diff] [review]
patch v1 (Support multiple RefreshDriver)

Review of attachment 8414855 [details] [diff] [review]:
-----------------------------------------------------------------

dbaron suggests that you're a better reviewer for this.

Since right now I'm only interested in explaining style/reflow flushes in the refresh driver and showing the first thing that causes it, it sound like this might be appropriate to capture that.
Attachment #8414855 - Flags: review?(dbaron) → review?(bzbarsky)
Comment on attachment 8414855 [details] [diff] [review]
patch v1 (Support multiple RefreshDriver)

Assuming we're ok with flagging something in a random ad subframe just because it happened to activate the refresh driver first even though the expensive reflow was in the main document and had nothing to do with the subframe, this looks fine.
Attachment #8414855 - Flags: review?(bzbarsky) → review+
I think I'm going to land this for now because it's useful. But I'd like to know how I can improve it.

Boris do you have any concrete suggestion on how I can efficiently get a better source for the expensive reflow? Keeping in mind that capturing several backtrace is expensive?
Flags: needinfo?(bzbarsky)
The only way I've thought of so far is to do the stack traces on a per-presshell basis when presshells add themselves to the refresh driver...
Flags: needinfo?(bzbarsky)
Blocks: 1008062
Attached patch patch v2Splinter Review
Attachment #8414855 - Attachment is obsolete: true
Attachment #8420152 - Flags: review+
(In reply to Benoit Girard (:BenWa) from comment #35)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/cf1cacb85ddb

It broke the non-sps builds.
 3:40.58 ../..//dist/include/GeckoProfiler.h(97): error: identifier "ProfilerBacktrace" is undefined
https://hg.mozilla.org/mozilla-central/rev/cf1cacb85ddb
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Depends on: 1009291
Whiteboard: [c=profiling p= s= u=] → [c=profiling p= s=2014.05.23.t u=]
You need to log in before you can comment on or make changes to this bug.