Changing the transform on an element spends 1/3 of the time in style recomputation and comparison

RESOLVED FIXED in mozilla18

Status

()

defect
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: mattwoodrow, Assigned: dbaron)

Tracking

(Blocks 1 bug)

unspecified
mozilla18
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-basecamp:+)

Details

(Whiteboard: [WebAPI:P0], )

Attachments

(8 attachments, 1 obsolete attachment)

6.15 KB, patch
Details | Diff | Splinter Review
35.29 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.18 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
5.16 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
8.61 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
24.40 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
19.73 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
6.35 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
Profile: http://people.mozilla.com/~bgirard/cleopatra/?report=d88b214864b676913d00f8678828c91b3863755e

The gaia UI is changing the transform on 3 elements while side scrolling through the apps grid. The linked profile shows us spending 32% of the total profile time in RestyleTracker::DoProcessRestyles.

A couple of things seem potentially fixable here:

1) We call nsFrameManager::ReResolveStyleContext on all the descendant frames of the one that changed transform. Could we only walk the first level of children and bail out if none of them inherited our transform (which I assume is the most common case, it's certainly true for gaia).

2) We spend a lot of time inside nsStyle*::CalcDifference, while nsStyleDisplay::CalcDifference doesn't feature at all. Could we potentially skip calling the others if we retained some information about what was changed when we call PostRestyleEvent?

This is accounting for approximately 7.5ms per frame of our total 15ms needed to hit 60fps.
Should soft-block because this is the lowest hanging fruit for getting current homescreen codebase running at 60fps, and should benefit quite a bit of web content besides.  But if absolutely necessary, we can probably work around in gaia.
blocking-basecamp: --- → ?
(In reply to Matt Woodrow (:mattwoodrow) from comment #0)
> 2) We spend a lot of time inside nsStyle*::CalcDifference, while
> nsStyleDisplay::CalcDifference doesn't feature at all. Could we potentially
> skip calling the others if we retained some information about what was
> changed when we call PostRestyleEvent?

Is this dominated by values of * for which nsStyle*::ForceCompare returns true?

> This is accounting for approximately 7.5ms per frame of our total 15ms
> needed to hit 60fps.

Is "This" (2), or is it (1)+(2)?
> Could we only walk the first level of children and bail out if none of them inherited
> our transform

We have to make sure that the style contexts have the right parents, no?  There's stuff other than transform styles stored in style contexts.

Your real problem here, in some sense, is that the style system doesn't really have a way to say "X has changed"; it can only say "something in styles has changed; figure out what"...

> Could we potentially skip calling the others if we retained some information about
> what was changed when we call PostRestyleEvent?

Yes, if we retained more information.  See above.

But here's an interesting related question: why exactly are we ending up with so much calculation for the position and margin structs?  It sounds like we're ending up with structs that are parent-dependent but not fully inherited for those?  Is that the case, and if not, why are we doing a difference check at all?
(In reply to Matt Woodrow (:mattwoodrow) from comment #0)
> 1) We call nsFrameManager::ReResolveStyleContext on all the descendant
> frames of the one that changed transform. Could we only walk the first level
> of children and bail out if none of them inherited our transform (which I
> assume is the most common case, it's certainly true for gaia).

Depending on how the transform is changed, we'd hopefully only be calling it to reparent the style contexts rather than rerun selector matching.  Is this the case?

That said, there are cases for which we could prove the reparenting is unnecessary, though it's somewhat scary in terms of messing with style system invariants.
> Is this dominated by values of * for which nsStyle*::ForceCompare returns true?

Oho!  ForceCompare is true for Position and Margin.  Those two account for 50% of the CalcStyleDifference time.

Still wonder why the actual pointer identity is different, though; that should bypass even ForceCompare().
> Is this the case?

At first glance yes, though the weird flattening and renaming Cleopatra seems to be doing in that profile is a bit of a pain to make sense of....
And in particular, the flattening makes it hard to say what things are really attached to what.  :(

As a side note, we're talking a total of 133 samples under restyle processing in that profile, which is really pushing the boundaries of statistical significance on things like comparing 2% to no samples...
Will do my best to answer these questions with my limited knowledge of the style system.

 
> Is this dominated by values of * for which nsStyle*::ForceCompare returns
> true?

Unsure, these just show up in the profile under CalcStyleDifference(), where I am assuming on the nsStyleDisplay is relevant since only the transform property was changed.

> 
> Is "This" (2), or is it (1)+(2)?

Well, (1) encompasses (2). The total time to run (1), including recursing into descendants is around 7.5ms per frame. (2) makes up about half of that time.


> We have to make sure that the style contexts have the right parents, no? 
> There's stuff other than transform styles stored in style contexts.

Ah right, didn't think of that. But yes, I was working under the assumption that we could determine (cheaply) that the only change to the parent frame was a transform.

> 
> Your real problem here, in some sense, is that the style system doesn't
> really have a way to say "X has changed"; it can only say "something in
> styles has changed; figure out what"...

Agreed!

> 
> But here's an interesting related question: why exactly are we ending up
> with so much calculation for the position and margin structs?  It sounds
> like we're ending up with structs that are parent-dependent but not fully
> inherited for those?  Is that the case, and if not, why are we doing a
> difference check at all?

I have no idea. How would I be able to determine this?

> Depending on how the transform is changed, we'd hopefully only be calling it
> to reparent the style contexts rather than rerun selector matching.  Is this
> the case?

As above.

> That said, there are cases for which we could prove the reparenting is
> unnecessary, though it's somewhat scary in terms of messing with style
> system invariants.

It certainly feels like a worthwhile risk given the current data. But maybe we can find another solution.

> And in particular, the flattening makes it hard to say what things are
> really attached to what.  :(

Profiling on android doesn't have stack walking, so it just attributes symbols to the nearest marker that is on the stack. The naming of said markers definitely leaves a bit to be desired.

> 
> As a side note, we're talking a total of 133 samples under restyle
> processing in that profile, which is really pushing the boundaries of
> statistical significance on things like comparing 2% to no samples...

Since I was profiling this while attempting to scroll around manually, the profile had a lot of variation corresponding to me changing direction. The link is just for a small section of the total profile, but the approximate results are more or less consistent with the whole profile.
> Still wonder why the actual pointer identity is different, though; that
> should bypass even ForceCompare().

Didn't we create a new style context when the transform changed?

I've got the callstack for the creation of the nsStyleMargin, and there's no cached data on the nsStyleContext.

CaptureChange shows different nsStyleContext pointers too. Are we supposed to copy the pointers over to the new style context in some cases? Or not recreate the style context?
> I have no idea. How would I be able to determine this?

The simplest way is to produce a testcase with minimal styles that has a profile like this and then see in a debugger what happens during style data calculation...

A stand-in might be looking into why we're not caching those structs in the ruletree.

> As above.

With a decent profile with enough samples that actually shows stacks correctly, based on whether the samples for creating new style contexts are under nsStyleSet:ReparentStyleContext or nsStyleSet::ResolveStyleFor.

> Profiling on android doesn't have stack walking

Yeah, that's a pain.  :(  You could add more markers, I guess...

> Didn't we create a new style context when the transform changed?

Yes, but we share style structs across style contexts, and it's the struct pointer that matters: we check pointer identity before calling CalcDifference.

> Are we supposed to copy the pointers over to the new style context in some cases?

Exactly.  In many cases, in fact.
Longest profile I can record: http://people.mozilla.com/~bgirard/cleopatra/?report=ba904a35f4179f11f95d2ebf9a992e43ac35045a

I also added labels for ResolveStyleFor and ReparentStyleContext
OK, so some combination of Resolve and Reparent...  Odd.
Indeed.

What should I be looking at to determine why we have to regenerate the struct data instead of using the rule node cached values?

I tried to follow the calls and ended up deep in macro generated code.
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> What should I be looking at to determine why we have to regenerate the
> struct data instead of using the rule node cached values?

Well, except there are many cases where we can't cache the content in the rule tree (see all the things that set canStoreInRuleTree to false) because it depends on something other than the particular set of style rules matched (e.g., inheritance).

In the cases where we can, however, copy the style directly from the parent style context, we should have appropriate mNoneBits set in the rule tree.


Boris, were you thinking of cases other than (a) data cached in the rule tree and (b) mNoneBits being set for the struct where you thought we should, in the current code, copy data to the new style context?
Margin and Position are both reset structs, so if we're not getting pointer identity something is presumably setting canStoreInRuleTree to false for them.  The question is what (e.g. if it's some other Gaia styles, we might be able to just fix that part).

Hence my suggestion of a minimalish testcase and then figuring out why it's getting set to false...
Porting the gaia homescreen to a standalone testcase would be helpful here.

I've found out the Margin struct is being recomputed (at least one of the times) because of margin-top being set a RootEM value.

Position has the width value as -moz-calc.

Haven't figured out which elements these values correspond to yet though, will find that out tonight.
At least one of the elements with -moz-calc (it appears that there might be quite a few getting recalculated) looks to be an <ol> element. (parent chain is <div>, <div>, <body>). Attempting to retrieve the id attributes keeps crashing gdb, might have to try writing a c++ helper for this.

I can't find any <ol> elements anywhere near the homescreen, and definitely not inside the transformed grid. Not sure where these are coming from, and why we're restyling them.
Whiteboard: [blocked-on-input Chris Jones]
Processing restyles compromises the most samples while panning the homescreen (1/3).  Cutting this down is the best route to 60fps on the homescreen.

However, there are other routes.  We can set this to block for understanding the underlying issue causing us to spend so much time in this code, but fixing this doesn't necessarily block.
blocking-basecamp: ? → +
Whiteboard: [blocked-on-input Chris Jones]
Matt, can you be the owner here?
Assignee: nobody → matt.woodrow
I don't really know enough about the style system here to help this much further.

I've tried to create desktop testcases that show the issue, without success. It appears to be a fairly complex interaction of style properties that cause this to slow down so much.

I think we need someone to port the exact homescreen code to a standalone desktop testcase and confirm that it reproduces the performance issue and then assign this to one of the style peers.
Assignee: matt.woodrow → nobody
Matt: Porting any style/layout performance problem that we run into to desktop testcases doesn't seem like a good use of the gaia team's time. And doesn't help us with issues that only happen on the device. I know it's painful to set up a new dev environment, but is there anyone on the layout team that can help the B2G team in general.

Also cc'ing Jet to see if he knows of anyone that has the cycles to help us.
The point is that we need some way of creating a more reduced testcase.

I know how to do that given a web page.

I have no clue how to do that with the gaia UI on the device.  Any ideas?
The best way I can think of is to take whatever testcase was used to create the profile in comment 0 and reduce it, rebuilding gaia as needed.
Comment 22 is really quite rude given the context here.

 - We profiled the homescreen on otoro and found it was slow.
 - Matt has a b2g build environment set up.  On the otoro, he debugged the homescreen to the point where he was able to find approximately what in restyling is causing this to be slow.  He's not able to investigate any more from there and asked for help.
 - No one else who can help has a b2g build environment set up.  So a request was made for a desktop test case.  It doesn't have to reproduce the *performance* problem, in units of fps, it has to reproduce the *restyling* bug.
 - Matt doesn't understand the homescreen code well enough to port to desktop or reduce it.  (It's complicated!)  He asked for help with that.

We can unblock here by
 1. having someone help make the homescreen run standalone on desktop.  See if same restyling bug occurs there.
 2. setting up b2g dev environments for folks who could help debug homescreen as-is on device.
 3. having someone help reduce the homescreen and hope that Matt can debug on device
Given that we during triage have about 2 minutes for each bug I didn't have time to get all context. Apologies if I offended anyone, that was not my intent.

If anyone has ideas for how to move this bug forward, and have ideas for who can do that, please contact that person and ask if they can take this bug.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #25)
> Comment 22 is really quite rude given the context here.

We discussed this bug during triage and Jonas offered to step up to ask Matt if he knew who would be a better person to investigate.  He was definitely not intending to be rude but I'm sorry if you felt slighted in any way, Matt.  When interpreting others' comments, let's all try to remember that we're working towards the same goals here :)

> We can unblock here by

Thanks for these suggestions.  Who are the homescreen authors that we should ask for assistance in reducing the test case or porting to desktop?
Vivien, can you hep us find someone to get the homescreen code running on desktop, or a small "kernel" of it running on device?  (This bug is the restyling bug we discussed this morning, the biggest piece of work to getting homescreen back up to 60fps.)
(At a very minimum, there's an easier question to answer:  where *is* the code in question?)
Jet, can you please find someone to own this?  If that lucky individual needs a device, let us know.  Thanks.
Assignee: nobody → bugs
We've got Otoro devices on the Layout team now, and are setting up B2G/Gaia debug systems. Is there someone in SF who can get us jump-started?
+cc: padenot who will profile this on-device
We already have an on-device profile (otoro).  See comment 0.
Assignee: bugs → paul
I made a small-ish test case http://people.mozilla.com/~cjones/homescreen/homescreen.html .

On my extremely beefy laptop, it only gets ~42fps (with GL layers).  This is absurd, since we're just moving a few textures around on the GPU.

I can't run this test directly on device, but the invalidation characteristics are the same, as is the surprisingly bad perf for just two -moz-transform changes on each frame.
The Unknown.png icon needs to be manually added to $gaia/test_apps/template/.

I see some *very* interesting things with this test.
 - when it runs OOP
   * the framerate is a solid 60 fps when left alone
   * if I drag my finger around the screen, framerate drops to ~28fps
 - when it runs in-process
   * the framerate is a consistent 54fps when left alone
   * if I drag my finger around the screen, framerate drops to ~41fps

The fps numbers from dragging-around-finger pretty closely match what happens on the real homescreen.  The test doesn't create any animations or transitions so async-ification shouldn't be confounding anything.

In our current setup, the compositor thread runs with higher priority than content-process-main-threads.  It runs with equal priority to the master-process-main-thread.

Based on these results, I'd hypothesize that
 - the master-process-main-thread is competing too hard with the compositor thread.  If we prioritize the compositor thread above the master-process-main-thread, the in-process framerate for when this test is left alone should match the OOP framerate.  Will test that in a bit.

 - the new work introduced by dragging my finger around in the in-process case is
   * input_event poller thread wakes up, breaks master-process-main-thread out of epoll
   * build display list, do hit test
That extra work may cost us the ~10fps drop we see in the results here.  When this runs OOP, we do the extra above, plus
   * wake up IPC transport thread in master process to forward touch event
   * wake up IPC transport thread in content process to deliver touch event
   * wake up main thread to dispatch message
   * build another display list, do hit test
All this seems to cost another ~10fps.

I'll try to fix up this test case to process real touch events instead of moving divs off of timers, since event processing seems to be the interesting ingredient here.
Summary: Changing the transform on an element is slow → Changing the transform on an element spends 1/3 of the time in style recomputation and comparison
Here's a profile of the Template process while running the testcase here

http://people.mozilla.com/~bgirard/cleopatra/?report=bb40602d0c84974a2f620b92b61ab399786ac66e

I gathered the profile while moving my finger around in a circle near the bottom of the screen as quickly as I could.

The profile looks very very similar to the one on the real homescreen in bug 780341 comment 10, so I think I'm safe in saying that the testcase is representative.

Here's a profile of the same testcase gathered while *not* moving my finger around

http://people.mozilla.com/~bgirard/cleopatra/?report=d544b2db65c7fc1de7be3504106ed8fd773aca91

ProcessRestyles is 42% of the time, so I think just letting this test run on its own fairly accurately reproduces the specific problem this bug covers.
The previous version of this accidentally left Template running in-process.
Attachment #658418 - Attachment is obsolete: true
Whiteboard: [WebAPI:P0]
(In reply to Boris Zbarsky (:bz) from comment #3)
> But here's an interesting related question: why exactly are we ending up
> with so much calculation for the position and margin structs?  It sounds
> like we're ending up with structs that are parent-dependent but not fully
> inherited for those?  Is that the case, and if not, why are we doing a
> difference check at all?

(In reply to Boris Zbarsky (:bz) from comment #5)
> > Is this dominated by values of * for which nsStyle*::ForceCompare returns true?
> 
> Oho!  ForceCompare is true for Position and Margin.  Those two account for
> 50% of the CalcStyleDifference time.
> 
> Still wonder why the actual pointer identity is different, though; that
> should bypass even ForceCompare().

Paul and I just debugged this a bit.  The basic answer to this pair of questions is that the testcase is full of rem units.  If we compute all the rem units in the testcase out to px by editing the testcase, then we stop hitting nsStyleMargin::CalcDifference at all while it animates (though it's still pegging the CPU; then again, we're talking about trying to fix 1/3 of the time it's spending).


I think we probably need to think through the ForceCompare stuff.  I think it was too big a hammer for solving the problem it was for (bug 528038); we ought to be able to force comparisons in fewer cases somehow, though I've yet to work out exactly how.
I'm thinking that maybe we should pass a word with force-compare bits for each struct down (along with aMinChange), and we'd only set the force-compare bit if the parent's comparison actually produced a non-inherited hint.  Does that make sense?
Then again, that's basically bz's original solution in bug 528038; I need to figure out why it wasn't sufficient.
So, as far as possible workarounds in front-end code go, if you can avoid using rem (or em) units, particularly in the cases where there are large numbers of elements using the style, I think that's likely to improve the situation.  I think the worst rules in http://people.mozilla.com/~cjones/homescreen/homescreen.html are:


/* Container -> icon + label */
.apps ol > li > div {
/*  height: 100%;*/
  width: 100%;
	margin-top: .8rem;
	margin-bottom: .8rem;
  pointer-events: none; /* NO touchable area */
}

/* Handling the case where the status bar is double size */
/* XXX remove when the homescreen layout gets more dynamic */
@media screen and (max-height: 440px) {
  .apps ol > li > div {
    margin-top: .2rem;
  }
}

/* icon */
.apps ol > li > div > img {
  width: 6rem;
  height: 6rem;
}

since these affect large numbers of elements.


It looks like the whole reason this is using rem is a bit weird, so that you can have a single media query:

@media screen and (min-width: 480px) {
  html {
    font-size: 13.3px;
  }
}

that overrides the html (but not body) part of this rule:

html, body {
  ...
  font-size: 10px;
  ...
}

and then use the rem units to avoid having to put everything else in that same media query.
Gaia's rem-happiness was the result of a discussion I didn't participate in.  CC'ing some Gaia folks who may be able to comment.
(In reply to David Baron [:dbaron] from comment #42)
> Then again, that's basically bz's original solution in bug 528038; I need to
> figure out why it wasn't sufficient.

Not quite, actually; bz was passing bits in the hint rather than bits of structs.  That's necessary since with my approach we'd miss structs that we skipped because we'd already hit their maxHint.

But I still need to figure out why it didn't work.
The reason it didn't work is described in bug 528038 comment 14.

Basically, I was using the incoming hint to decide whether to force compares.  But the incoming hint is not propagated to kids as-is, which means that in table anonymous pseudo case where A has a pseudo child B which has a child C, if A does something that should force a compare on C we instead ended up with a forced compare on B, and by the time we got to C we lost the information about needing a forced compare.

The other problem I ran into while thinking about how to solve that is that say we have this DOM:

  <span>
    <div></div>
  </span>

and there is a style change on the <span> that should force compares on the <div>.  Given the frame structure in this case, with the ib split, how do we get that information to the <div>?
(In reply to Boris Zbarsky (:bz) from comment #46)
> The reason it didn't work is described in bug 528038 comment 14.
> 
> Basically, I was using the incoming hint to decide whether to force
> compares.  But the incoming hint is not propagated to kids as-is, which
> means that in table anonymous pseudo case where A has a pseudo child B which
> has a child C, if A does something that should force a compare on C we
> instead ended up with a forced compare on B, and by the time we got to C we
> lost the information about needing a forced compare.
> 
> The other problem I ran into while thinking about how to solve that is that
> say we have this DOM:
> 
>   <span>
>     <div></div>
>   </span>
> 
> and there is a style change on the <span> that should force compares on the
> <div>.  Given the frame structure in this case, with the ib split, how do we
> get that information to the <div>?

Could we handle both of these cases by just setting all the bits (i.e., do what we do now) if the style context parent of a child doesn't match the frame parent?
Sorry, hg bzexport insisted on putting patch 5 on the wrong bug the first time (and munging the commit message incorrectly for an attachment name the second time). :-(

In any case, I've verified that in gdb, I don't hit nsStyleMargin::CalcDifference when http://people.mozilla.com/~cjones/homescreen/homescreen.html is animating.  (It wasn't a perfect test, since I'd seen it this afternoon on padenot's machine, with a different setup; I didn't feel like rebuilding layout two more times to check for sure.)

Try push at https://tbpl.mozilla.org/?tree=Try&rev=c07cd90a2735


Also, taking bug.  As I discussed with Paul earlier, this probably is not the sort of bug I want to inflict on somebody learning the code, at least not with time pressure on top.
Assignee: paul → dbaron
Attachment #659161 - Attachment description: Refix in the new setup for forcing comparison in nsStyleContext::CalcDifference, since we can no longer rely on nsStyleBorder::ForceCompare. (Bug 779968, patch 5) → Refix bug 731521 in the new setup for forcing comparison in nsStyleContext::CalcDifference, since we can no longer rely on nsStyleBorder::ForceCompare. (patch 5)
(In reply to David Baron [:dbaron] from comment #56)
> Try push at https://tbpl.mozilla.org/?tree=Try&rev=c07cd90a2735

And another, with commas at end of enumerator lists twiddled:
https://tbpl.mozilla.org/?tree=Try&rev=3d6bec1ee989

not reposting the patches, but they're also the bottom 7 patches in https://hg.mozilla.org/users/dbaron_mozilla.com/patches/
Also, as a followup, we should probably restructure how ReResolveStyleContext reaches continuations so that things with a lot of continuations (in particular, deeply nested continuations) can benefit.
Comment on attachment 659153 [details] [diff] [review]
Move nsStyleStruct MaxDifference methods inline, and make them available unconditionally rather than DEBUG-only.  (, patch 1)

r=me
Attachment #659153 - Flags: review?(bzbarsky) → review+
Comment on attachment 659154 [details] [diff] [review]
Use nsStyleStruct MaxDifference methods instead of maxHint for hint handling in nsStyleContext::CalcStyleDifference.  (, patch 2)

r=me
Attachment #659154 - Flags: review?(bzbarsky) → review+
Comment on attachment 659155 [details] [diff] [review]
Remove maxHint from nsStyleContext::CalcStyleDifference.  (, patch 3)

r=me
Attachment #659155 - Flags: review?(bzbarsky) → review+
Comment on attachment 659156 [details] [diff] [review]
Abstract nsChangeHint_NonInherited_Hints into a function so that it accurately reports the reflow cases to all callers.  (, patch 4)

>+    // REVIEW: In the old code, we also reported ClearAncestorIntrinsics
>+    // and ClearDescendantIntrinsics here (and then skipped the next
>+    // check).

Hmm.

ClearDescendantIntrinsics is obviously always an inherited hint.

ClearAncestorIntrinsics should be inherited if ClearDescendantIntrinsics was set, independent of the dirty reflow stuff.

I just looked through the asserts in FrameNeedsReflow and StyleChangeReflow, and I think we should be ok on all of those.

So looks good.
Attachment #659156 - Flags: review?(bzbarsky) → review+
Comment on attachment 659161 [details] [diff] [review]
Refix bug 731521 in the new setup for forcing comparison in nsStyleContext::CalcDifference, since we can no longer rely on nsStyleBorder::ForceCompare.  (patch 5)

r=me
Attachment #659161 - Flags: review?(bzbarsky) → review+
Comment on attachment 659158 [details] [diff] [review]
Make nsStyleContext::CalcStyleDifference force comparison based on the parent's hint instead of the style struct ForceCompare methods.  (, patch 6, the main patch)

>REVIEW: Can we come up with a better name for non-inherited here?

NotHandledByParent?

r=me
Attachment #659158 - Flags: review?(bzbarsky) → review+
Comment on attachment 659159 [details] [diff] [review]
Remove ForceCompare methods from style structs.  (, patch 7)

r=me
Attachment #659159 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #64)
> Comment on attachment 659158 [details] [diff] [review]
> Make nsStyleContext::CalcStyleDifference force comparison based on the
> parent's hint instead of the style struct ForceCompare methods.  (, patch 6,
> the main patch)
> 
> >REVIEW: Can we come up with a better name for non-inherited here?
> 
> NotHandledByParent?


I think I mostly like it, but I'm inclined to flip it around (and also use descendants/ancestors rather than parent/child).  How about:

  nsChangeHint_Hints_NotHandledForDescendants

  NS_HintsNotHandledForDescendantsIn()
(In reply to Chris Jones [:cjones] [:warhammer] from comment #44)
> Gaia's rem-happiness was the result of a discussion I didn't participate in.
> CC'ing some Gaia folks who may be able to comment.

I proposed the use of rem at some point during our flexible ui discussion.  (If a rem is 4px on a 320px device and 6px on a 480px device, then UIs can be designed to be 80rem wide and they should work well on both screen sizes.) But flexible ui was tabled for v1, and I never started using rems in my own Gallery app. I don't know anything about how they are being used anywhere else.
(In reply to David Flanagan [:djf] from comment #68)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #44)
> > Gaia's rem-happiness was the result of a discussion I didn't participate in.
> > CC'ing some Gaia folks who may be able to comment.
> 
> I proposed the use of rem at some point during our flexible ui discussion. 
> (If a rem is 4px on a 320px device and 6px on a 480px device, then UIs can
> be designed to be 80rem wide and they should work well on both screen
> sizes.) But flexible ui was tabled for v1, and I never started using rems in
> my own Gallery app. I don't know anything about how they are being used
> anywhere else.

Sounds quite reasonable to me.
Here's a profile from after these patches

http://people.mozilla.com/~bgirard/cleopatra/?report=1e9659bc6f25c64f4791b0282442d9d3f1b58bfe

We're spending ~42% of the time in ProcessRestyles(), but symbolicating is currently broken and we don't have enough sample stack to break that time down further.
Yeah, so now the one comprehensible part of the original profile isn't there anymore, and it's just the incomprehensible parts left.  I think we need a way to get more accurate data; the stack walking doesn't seem to be producing data good enough to do something with.

(I'm hoping that there wasn't something in the fixes that landed that made something else slower to compensate for what it fixed.)
So I'm starting to get increasingly suspicious of these profiles, both the one in comment 0 and the one in comment 73.

First, there's clearly a lot of caller/callee relationships that don't make sense.  That could be a problem with the stack walking or the symbol lookup (though it seems more likely to be the stack walking).  Depending on how much doesn't make sense (i.e., what portion of stacks are accurate), the profile may or may not still be useful.  I think the profile in comment 0 might still be usable; the profile in comment 73 is almost certainly not (perhaps because the symbol lookup is wrong too).

But, since these profiles seem to combine stack walking with explicit labels (the SAMPLE_LABEL macros) placed in the code, they have a strong sense of being useful, because those labels lend credibility.  But I'm actually not even sure the same labels were used for the two profiles.  In particular, the profile in comment 0 seems to have labels called nsStyleContext::{Position,Margin,...}::CalcDifference.  This isn't an actual function name (though it's close to one, nsStyle{Position,Margin,...}::CalcDifference), and I don't see the code for those labels in the tree.

The thing about these CalcDifference functions, though, is that the functions themselves are really quick functions; they do a bunch of comparisons but don't actually call into any other code other than some mostly-trivial equality operators.  So then the question is:  how much *overhead* is there from the labeling that's being done for this profile.  And is that overhead enough to distort the results for functions that take more time?
There's no stack walking during profiling.  The only context information shown is derived from the pseudo, sample, stack.

Pushing/popping sample labels is relatively cheap (atomic store), but it's not cheap enough to use in hot loops.  If that's the case here, we need to move the label.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #76)
> There's no stack walking during profiling.  The only context information
> shown is derived from the pseudo, sample, stack.

Ah, so it's just instruction pointer grouped by the stack of labels that it's inside?

I still think something's broken in the profile in comment 73.  (Things that (obviously to me) don't belong inside CSS::ProcessRestyles are:  nsFind::nsFind, nsFindContentIterator::Reset, nsCSSRendering::PaintBoxShadowOuter.  (nsFind::nsFind is also trivial.)

> Pushing/popping sample labels is relatively cheap (atomic store), but it's
> not cheap enough to use in hot loops.  If that's the case here, we need to
> move the label.

I think the labels that were used in the struct CalcDifference methods might be enough to distort the profile a bit, but those labels aren't in the tree, but are in the profile in comment 0.
(In reply to David Baron [:dbaron] from comment #77)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #76)
> > There's no stack walking during profiling.  The only context information
> > shown is derived from the pseudo, sample, stack.
> 
> Ah, so it's just instruction pointer grouped by the stack of labels that
> it's inside?
> 

Correct.

> I still think something's broken in the profile in comment 73.  (Things that
> (obviously to me) don't belong inside CSS::ProcessRestyles are: 
> nsFind::nsFind, nsFindContentIterator::Reset,
> nsCSSRendering::PaintBoxShadowOuter.  (nsFind::nsFind is also trivial.)
> 
> > Pushing/popping sample labels is relatively cheap (atomic store), but it's
> > not cheap enough to use in hot loops.  If that's the case here, we need to
> > move the label.
> 
> I think the labels that were used in the struct CalcDifference methods might
> be enough to distort the profile a bit, but those labels aren't in the tree,
> but are in the profile in comment 0.

Matt added some locally.  He can comment.
Yes, those we just in a local patch, not sure if I have it any more.

I believe I just added a label line to the DO_STRUCT_DIFFERENCE macro here: http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleContext.cpp#411

Can't really comment on the possibility of it distorting the profile, haven't looked into the profiler itself much.

Note that the profiler is sampling at 1ms iirc, it's possible that the results are fairly inaccurate based on that.
Note: moving this discussion to bug 780341, since the original problem here is fixed.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #80)
> Note: moving this discussion to bug 780341, since the original problem here
> is fixed.

Though I haven't actually demonstrated that the work here improved anything.  (It ought to have, though, but it would be nice to have a test that shows it did.)
Depends on: 828312
You need to log in before you can comment on or make changes to this bug.