Closed Bug 1147143 Opened 9 years ago Closed 9 years ago

Inverted call trees self cost/time don't make sense

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect, P3)

37 Branch
x86
macOS
defect

Tracking

(firefox40 fixed, firefox41 fixed)

RESOLVED FIXED
Firefox 41
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: jsantell, Assigned: jsantell)

References

Details

Attachments

(4 files, 4 obsolete files)

In the inverted call tree, we show leaf nodes, where usually most of the work occurs, up top. When digging down into it's children nodes, all of their self cost and time are equal to the parent (leaf node) cost/time, when they should be much smaller.

This probably isn't as simple as moving over the data from the non-inverted tree, as these children frame nodes could be children in several other frame nodes.

See attachment.
Attached image chrome inverted tree
On Chrome, it seems slightly better, but doesn't quite make sense to me either. I'd imagine all the childrens' self cost in a branch to add up to the total cost of the parent node there.
The self cost for a given function should always be calculated based on the regular call tree - which should be calculated as:

self_cost(frame) = total_cost(frame) -  sum(total_cost(child) for child in immediate_children(frame))

Also, from profiling, the inverted call tree should use self-cost to sort rather than the total-cost.  Given that we are looking at leaf frames, we really care about the frames where we spend the most time directly in, as opposed to the most time indirectly in.
(In reply to Kannan Vijayan [:djvj] from comment #2)
> The self cost for a given function should always be calculated based on the
> regular call tree - which should be calculated as:
> 
> self_cost(frame) = total_cost(frame) -  sum(total_cost(child) for child in
> immediate_children(frame))
> 
> Also, from profiling, the inverted call tree should use self-cost to sort
> rather than the total-cost.  Given that we are looking at leaf frames, we
> really care about the frames where we spend the most time directly in, as
> opposed to the most time indirectly in.

Alternatively, to simplify and avoid requiring creating the regular tree when we are inverted by default:

  self_cost(frame) = count(leaf(s) == frame for s in samples) / count(samples)

Done before any reversing of frames or anything for inverting.
P3
Assignee: nobody → jsantell
Status: NEW → ASSIGNED
Priority: -- → P3
Been playing around with OS X Instruments — it’s doing this the right way (all IMO obviously throughout this comment). To make things easier, it just has a "total" cost/time, and a sample count (only for frames that are leaf nodes!), which is nice (although "sample count" isn't labeled and maybe I'm guessing here). Anyways, their top-down tree is pretty much what we and chrome have. Straight forward. For inversion, they 

Take the following 3 samples in this case:

A->C
A->B
A->B->C
A->B->D

+--total--+--self--+--tree-------------+
|  100%   |   0    |  A
|   75%   |   25%  |  -> B
|   25%   |   25%  |     -> C
|   25%   |   25%  |     -> D
|   25%   |   25%  |  -> C
+---------+--------+-------------------+

I think we're all in agreement that this is the Right Way to do non-inverted trees. Now for inverted.

What we have currently; an inverted path from leaf to root: http://i.imgur.com/Hh7QqmJ.png
We have some errant values for total cost (9.54% thrown in there?) and the self costs surpass the total cost times. Guessing the RunNextBenchmark frame is getting samples from another "branch", causing this one to be weird.

What chrome has currently; an inverted path from leaf to root: http://i.imgur.com/su9zUME.png
self costs of the entire tree add up to well over 100% -- and they just seem to match total costs. These make absolutely no sense to me.

Anyways, what Instruments has is something like this: http://i.imgur.com/muO5tqA.png
I think we do something close, but something somewhere goes wrong. This is what Instruments does with the above samples inverted:

+--total--+--self--+--tree-------------+
|   50%   |   50%  |  C
|   25%   |   0    |  -> B
|   25%   |   0    |     -> A
|   25%   |   0    |  -> A

|   25%   |   25%  |  B
|   25%   |   0    |  -> A

|   25%   |   25%  |  D
|   25%   |   0    |  -> B
|   25%   |   0    |     -> A

Since they only count "samples" when it's a leaf node, it cleans up our self costs a lot. The Total costs are pretty much the self cost of a leaf node, and then divided by frequency of path as you approach the (root) node.

So in the first inverted branch of the "C" frame, the self cost is 50% -- quarter of the time we had A->C, and another quarter of the time we had A->B->C. So we add up the self cost, and since half of the time C was a leaf node had the stack A->B->C, these frames have 25% total cost, and same with the A->C stacks.

Unless there are objections, I think this is The Right Way to do this. On actually doing this and coming up with the values in a performant way? Less familiar with that.
As an aside, chrome has self/total and we have total/self -- we should make these in the same order, although I like Instruments' "total time/cost" and sample count, only counting leaf nodes. If this makes sense in any discussion here, lets open a new bug for that.
Assignee: jsantell → nobody
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #6)
> As an aside, chrome has self/total and we have total/self -- we should make
> these in the same order,

Agreed. Should show self first because its more important than total. Although, I guess you could make the argument that total is more important when not inverted and self is more important when inverted. We could flip the order depending on inversion, but I don't know if that's too unexpected UX-wise.
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #5)
Sounds good to me.
Some updates: I was wrong about Chrome's tree. It's total times inverted are correct (however, self-times do not make sense -- these should indeed add up to 100%).

Going back on the change from bug 1147604, where we add the (root) node to inverted trees to differentiate between (root)->B and (root)->A->B, as this way its implicit whether or not any time was spent calling B directly as opposed from A.

Attached is the new rendering (except we'll keep samples count how are they now, but would like to change that in a follow up, just "self samples" maybe, or have both?
Assignee: nobody → jsantell
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attached patch 1147143-inverted-trees.patch (obsolete) — Splinter Review
Still some issues to work out, and in real-world scenarios (octane), I'm not getting the results I'm expecting, but this is the general idea -- capturing and tallying stacks for each leaf frame node.

Arg, can't f? shu on here, reminder to myself to ping him on Monday
Attachment #8609778 - Flags: feedback?(vporof)
Attachment #8609778 - Flags: feedback?(nfitzgerald)
Comment on attachment 8609778 [details] [diff] [review]
1147143-inverted-trees.patch

Review of attachment 8609778 [details] [diff] [review]:
-----------------------------------------------------------------

This problem is simpler than you think. You don't need this uniq'ing array business at all, and already have all the data you need in the FrameNodes. This is my understanding of what you want to do (and I agree, it's sensible):

Given a tree T denoting an inverted call tree. For uniformity we assume that T is a tree and not a forest by synthesizing a root node R that is the ancestor of all nodes.

For each node in T, let's define the following:

  Let |count(f)| =
    when f != R, the number of times the frame f was encountered in the sample.
    otherwise, sum of count(f') for each direct child f' of R.

The total cost %:

  |totalcost(f)| =
     when f != R, count(f) / count(parent of f) * totalcost(parent of f)
     otherwise, 100

The total cost is always a function on the call tree structure you're computing, inverted or otherwise. The self cost in the inverted tree is the totalcost(f) if f is a root in the inverted tree, and 0 otherwise.


So, all that needs be done is to keep count of the frames as the call tree is constructed. And we already are! Each FrameNode already has a .samples property. The total cost computation can be done using that.

This computation probably should be done in tree-view and not in tree-model, but either place is fine.

::: browser/devtools/performance/modules/logic/tree-model.js
@@ +228,5 @@
> +        // will make it clear to differentiate (root)->B vs (root)->A->B
> +        // when a tree is inverted, a revert of bug 1147604
> +        if (invertTree && mutableFrameKeyOptions.isRoot) {
> +          continue;
> +        }

No need for this, you can change the loop to something like:

while (true) {
  ...
  let frameIndex = stackEntry[STACK_FRAME_SLOT];
  if (frameIndex === null) {
    break;
  }
  ...
}

Then get rid of that if (!invertTree) line in details-js-call-tree.js

@@ +239,4 @@
>          let frameKey = inflatedFrame.getFrameKey(mutableFrameKeyOptions);
>  
>          // Leaf frames are never skipped and require self count and duration
>          // bookkeeping.

You can get rid of the selfCount and selfDuration business if we're about to do something reasonable now.
Attachment #8609778 - Flags: feedback?(vporof)
Attachment #8609778 - Flags: feedback?(nfitzgerald)
Attachment #8609778 - Flags: feedback-
(In reply to Shu-yu Guo [:shu] from comment #11)
> Comment on attachment 8609778 [details] [diff] [review]
> 1147143-inverted-trees.patch
> 
> Review of attachment 8609778 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This problem is simpler than you think. You don't need this uniq'ing array
> business at all, and already have all the data you need in the FrameNodes.
> This is my understanding of what you want to do (and I agree, it's sensible):
> 
> Given a tree T denoting an inverted call tree. For uniformity we assume that
> T is a tree and not a forest by synthesizing a root node R that is the
> ancestor of all nodes.
> 
> For each node in T, let's define the following:
> 
>   Let |count(f)| =
>     when f != R, the number of times the frame f was encountered in the
> sample.
>     otherwise, sum of count(f') for each direct child f' of R.
> 
> The total cost %:
> 
>   |totalcost(f)| =
>      when f != R, count(f) / count(parent of f) * totalcost(parent of f)
>      otherwise, 100
> 
> The total cost is always a function on the call tree structure you're
> computing, inverted or otherwise. The self cost in the inverted tree is the
> totalcost(f) if f is a root in the inverted tree, and 0 otherwise.

This is incorrect, if I undestand it correctly. Take the following tree:

  /**
   * Samples
   *
   * A->C
   * A->B
   * A->B->C x4
   * A->B->D x4
   *
   * Expected Tree
   * +--total--+--self--+--tree-------------+
   * |   50%   |   50%  |  C
   * |   40%   |   0    |  -> B*
   * |   40%   |   0    |     -> A
   * |   10%   |   0    |  -> A
   *
   * |   10%   |   10%  |  B
   * |   10%   |   0    |  -> A
   *
   * |   40%   |   40%  |  D
   * |   40%   |   0    |  -> B
   * |   40%   |   0    |     -> A
   */

I want the total cost of B*.
|totalcost(f)| = count(f) / count(parent of f) * totalcost(parent of f)
|totalcost(B)| = count(B) / count(C) * totalcost(C);
.5 = 5 / 5 * .5;

This strategy does not differentiate when B is the leaf node, in the event of A()->B(). In this case, "Total Cost" is *not* the time of all of this function and ALL of it's callees -- only callee's that share the same path as this.

> 
> You can get rid of the selfCount and selfDuration business if we're about to
> do something reasonable now.

Probably still need duration info for flame charts, but should probably get rid of it otherwise just using totalSamples/totalDuration
That's not a counter example. B* and B are different FrameNodes in the call tree already. B*'s FrameNode has a sample count it 4, it seems fine to me.
I thought there was a one to one relationship with framenodes and function calls as the location is the key (in content frames)?
In this case B and B* are the same functions, just one is found as a leaf and the other in C's stack
the (In reply to Jordan Santell [:jsantell] [@jsantell] from comment #14)
> I thought there was a one to one relationship with framenodes and function
> calls as the location is the key (in content frames)?

a FrameNode is just a node in the call tree. It's true that they're keyed by function name, but it's not a key into a flat hash table. In this case there are 2 FrameNodes with the B key, with different counts, since there are 2 different stacks.
Oh OK then. If the key to the hash table is unique then this should  work
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #17)
> Oh OK then. If the key to the hash table is unique then this should  work

There is no hash table, it's a tree.
Thanks, Shu -- this is way easier and understandable. I think this is now accurate. Some durations don't quite match up, but we should convert to use samples for this rather than derive it from the sample timing like you mentioned, but should handle that in another bug.

In this view, also, we'll need to change the tooltip for total time/cost, as it should be inverse of what it is (callee vs caller). Another bug.
Attachment #8609775 - Attachment is obsolete: true
Attached patch 1147143-calltree.patch (obsolete) — Splinter Review
This is way more simple.
Attachment #8609778 - Attachment is obsolete: true
Attachment #8609795 - Flags: review?(shu)
Attachment #8609795 - Flags: review?(nfitzgerald)
Comment on attachment 8609795 [details] [diff] [review]
1147143-calltree.patch

Review of attachment 8609795 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, I guess. I haven't looked at this code since shu rewrote it, so I'm not sure what's really going on here.

FWIW, the tree in the screenshot looks super.
Attachment #8609795 - Flags: review?(nfitzgerald) → feedback+
No longer blocks: perf-40-uplifts
Comment on attachment 8609795 [details] [diff] [review]
1147143-calltree.patch

Review of attachment 8609795 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry about churn, I think this should be simplified even further.

::: browser/devtools/performance/modules/logic/tree-model.js
@@ +220,5 @@
>  
> +        // Do not include the (root) node in this sample, as the costs of each frame
> +        // will make it clear to differentiate (root)->B vs (root)->A->B
> +        // when a tree is inverted, a revert of bug 1147604
> +        if (stackIndex === null && inverted) {

You don't need to check for |inverted| here. Instead, you can remove this:

https://dxr.mozilla.org/mozilla-central/source/browser/devtools/performance/views/details-js-call-tree.js?from=details-js-call-tree.js&case=true#83-89

@@ +269,5 @@
>          isLeaf = mutableFrameKeyOptions.isLeaf = false;
> +
> +        if (mutableFrameKeyOptions.isRoot) {
> +          break;
> +        }

This is dead, right?

@@ +511,5 @@
>      if (!this._optimizations) {
>        return null;
>      }
>      return new JITOptimizations(this._optimizations, this._stringTable);
> +  },

Is this official style or something, to have trailing commas? Kinda weird IMO.

::: browser/devtools/performance/modules/widgets/tree-view.js
@@ +346,5 @@
>  
> +    /**
> +     * When inverting call tree, the costs and times are dependent on position
> +     * in the tree. We must only count leaf nodes with self cost, and total costs
> +     * dependent on how many times the leaf node was found with a full stack path.

ISTM the costs depend on the position of the node in the tree, inverted or otherwise. The only logic that differs between inverted and non-inverted is which nodes should have non-0 self costs. The inverted tree conceptually has multiple roots, but that's really a surface difference: you can always synthesize a fake root for inverted trees.

If this logic is unified, we can also remove the selfDuration and selfCount objects. Does that make sense?

@@ +378,5 @@
> +      }
> +
> +      // Self/total samples percentage.
> +      if (this.visibleCells.percentage) {
> +        data.totalPercentage = this.frame.samples / this.root.frame.samples * 100;

Isn't this supposed to be totalPercentage * 100? Why is totalPercentage only used to compute totalDuration? Am I missing something?
Attachment #8609795 - Flags: review?(shu)
Comment on attachment 8609795 [details] [diff] [review]
1147143-calltree.patch

Review of attachment 8609795 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/devtools/performance/modules/logic/tree-model.js
@@ +220,5 @@
>  
> +        // Do not include the (root) node in this sample, as the costs of each frame
> +        // will make it clear to differentiate (root)->B vs (root)->A->B
> +        // when a tree is inverted, a revert of bug 1147604
> +        if (stackIndex === null && inverted) {

that is true, and i've tried it, but requires changing all the tests to remove that, and has some issues with meta content nodes in non-inverted trees.

@@ +269,5 @@
>          isLeaf = mutableFrameKeyOptions.isLeaf = false;
> +
> +        if (mutableFrameKeyOptions.isRoot) {
> +          break;
> +        }

In non-inverted views, no, we have to break the loop at some point

@@ +511,5 @@
>      if (!this._optimizations) {
>        return null;
>      }
>      return new JITOptimizations(this._optimizations, this._stringTable);
> +  },

No real style one way or another, but when adding or moving methods, don't have to consider if this is last in a property list or not

::: browser/devtools/performance/modules/widgets/tree-view.js
@@ +346,5 @@
>  
> +    /**
> +     * When inverting call tree, the costs and times are dependent on position
> +     * in the tree. We must only count leaf nodes with self cost, and total costs
> +     * dependent on how many times the leaf node was found with a full stack path.

Sort of. Removing those things can happen when we no longer use both duration and samples to calculate these values, giving us slightly wrong numbers when comparing percent to time, like

100%   1000ms
50%     490ms

@@ +378,5 @@
> +      }
> +
> +      // Self/total samples percentage.
> +      if (this.visibleCells.percentage) {
> +        data.totalPercentage = this.frame.samples / this.root.frame.samples * 100;

`totalPercentage` is % of time this frame results in the leaf frame. So in the illustration on line 354, The first B and A both have 0.5 totalPercentage (this just happens to equal the 50% displayed because C is 100%). We can just use `totalPercentage * parentFrame.totalPercentage` I think
Attached patch 1147143-calltree.patch (obsolete) — Splinter Review
This just has the changed totalDuration calculation based on the samples/totalPercentage.

If we want to remove the calls[0].calls massaging that occurs in non-inverted trees (and agree, we should), we should handle that in another bug -- there are some meta content issues when just doing that in tests and the view.

W/R/T logic that differs between inverted/non-inverted, i'm not sure other than the non-0 costs and unifying the logic. In non-inverted we don't specify different "paths" to get to a leaf node; it's all aggregated.

Either way, I don't think it should be handled here, but this gets us closer to having tests on what we should have, because right now our inverted trees do not make any sense
Attachment #8609795 - Attachment is obsolete: true
Attachment #8611596 - Flags: review?(shu)
Comment on attachment 8611596 [details] [diff] [review]
1147143-calltree.patch

Review of attachment 8611596 [details] [diff] [review]:
-----------------------------------------------------------------

Went over stuff in detail on IRC.
Attachment #8611596 - Flags: review?(shu) → review+
Comment on attachment 8611596 [details] [diff] [review]
1147143-calltree.patch

Review of attachment 8611596 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/devtools/performance/modules/widgets/tree-view.js
@@ +352,5 @@
> +     *   50%   |   0%  |    50 |     ▼ A
> +     *   50%   |   0%  |    50 |   ▼ B
> +     *
> +     * Every instance of a `CallView` represents a row in the call tree. The same
> +     * parent node is used for all rows.

Nit: rename "parent" with "container", to avoid potential confusion about what "parent" means. Every tree item has a proper parent item, but every tree dom node has a single container dom node, to make things nice and easy to style & query.
Made the mentioned comment changes, use the word "container" and "callee/caller" where appropriate to differentiate between vague 'parent'

 https://treeherder.mozilla.org/#/jobs?repo=try&revision=704b719d21d7
Attachment #8611596 - Attachment is obsolete: true
Attachment #8612084 - Flags: review+
Adding this to uplifts if possible, there'll be some docs on inverted call trees for release
Flags: needinfo?(bgrinstead)
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #31)
> Adding this to uplifts if possible, there'll be some docs on inverted call
> trees for release

Seems like there have been a lot of eyes on it, so if you think it's safe to include I'll track it.  There's always the option of letting things bake for a few days and then getting them into aurora later next week, but sounds like this is going to be documented so we probably want it in from the start.
Flags: needinfo?(bgrinstead)
FWIW, this code is isolated and none of the other patches since May 12th or so touch this area, so should have 0 problems with that
https://hg.mozilla.org/mozilla-central/rev/55adbc962e51
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 41
Comment on attachment 8612084 [details] [diff] [review]
1147143-calltree.patch

Approval Request Comment
[Feature/regressing bug #]: 1167252, the new performance tool
[User impact if declined]: The inverted call tree costs in the performance tool will be wrong.  This feature is being documented for the release, so this needs to be fixed ASAP
[Describe test coverage new/current, TreeHerder]: Includes new test.  There are try pushes in Bug 1167252
[Risks and why]: Risk is contained within the cost calculation of the JS call tree within devtools performance panel.
[String/UUID change made/needed]: None
Attachment #8612084 - Flags: approval-mozilla-aurora?
Attachment #8612084 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Setting [qe-verify-] since this is already covered by automated testing.
Flags: qe-verify-
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: