[AWSY] Double counting of totals_heap in path_total

RESOLVED FIXED in Firefox 56

Status

Testing
AWSY
RESOLVED FIXED
4 months ago
a month ago

People

(Reporter: mccr8, Assigned: erahm)

Tracking

Version 3
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

4 months ago
for report in data["reports"]:
        # This totals up all of the heap allocated memory under explicit. Ok!
        if report["kind"] == 1 and report["path"].startswith("explicit/"):
            totals_heap[report["process"]] += report["amount"]

        if report["path"].startswith(path):
            ...
            # But, if path == 'explicit', then now we're adding
            # in amount a second time!
            if report["kind"] == 1:
                totals_heap[report["process"]] += report["amount"]
        ...

Later, the explicit/ case computes heap_unclassified as totals_heap_allocated - totals_heap, but in a local run, heap_unclassified is negative, presumably due to the double counting. I think the fix is just to delete that second accumulation into totals_heap. I don't know why we'd ever need it.

We only ever use totals_heap if the path is explicit/ or explicit/heap-unclassified. The latter doesn't seem to be included in the report, so we'll only use it in the former case, when it seems to be bogus.
(Reporter)

Comment 1

4 months ago
An assert that heap_unclassified is non-negative would have caught this. ;)
(Reporter)

Updated

4 months ago
Assignee: nobody → continuation
(Reporter)

Comment 2

4 months ago
Created attachment 8877265 [details] [diff] [review]
Only count heap memory once in totals_heap.

This passes local tests for me. With just the assertion, but not the removal, it fails. This is on top of your other patch, so I'll wait until that lands.
Attachment #8877265 - Flags: review?(erahm)
(In reply to Andrew McCreight [:mccr8] from comment #2)
> Created attachment 8877265 [details] [diff] [review]
> Only count heap memory once in totals_heap.
> 
> This passes local tests for me. With just the assertion, but not the
> removal, it fails. This is on top of your other patch, so I'll wait until
> that lands.

Oh sorry, I have a slightly more elaborate patch that makes that function less awful. I'll incorporate your assertion as well.
Created attachment 8877290 [details] [diff] [review]
Part 2: Clean up path_total

This attempts to make the path_total function more legible.

MozReview-Commit-ID: 4quRaQ4DV9j
Attachment #8877290 - Flags: review?(continuation)
Comment on attachment 8877265 [details] [diff] [review]
Only count heap memory once in totals_heap.

Review of attachment 8877265 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I have a follow-up that cleans up that function as well.
Attachment #8877265 - Flags: review?(erahm) → review+
(Reporter)

Comment 6

4 months ago
Comment on attachment 8877290 [details] [diff] [review]
Part 2: Clean up path_total

Review of attachment 8877290 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/awsy/awsy/parse_about_memory.py
@@ +37,5 @@
> +            return value == path
> +        else:
> +            return value.startswith(path)
> +
> +    def add_to_heap(report):

Comments are good, but I don't know how useful it is to split this and the next thing into tiny functions that are only called once.

@@ +56,4 @@
>  
> +    def heap_unclassified(process):
> +        """
> +        Calculates the heap-unclassifed value for the given process. This is

nit: unclassified.

@@ +68,5 @@
> +        unclassified = heap_allocated[process] - explicit_heap[process]
> +
> +        # Make sure the value is sane. A misbehaving reporter could lead to
> +        # negative values.
> +        assert unclassified >= 0

Hmm, yeah I forgot that bugs in memory reporters could cause this to fail. I guess causing orange for that isn't the worst thing.

@@ +78,2 @@
>      for report in data["reports"]:
> +        # Update the heap bookkeeping.

It might be nice to compute before the loop whether we even need this special processing, and not run these two things if we don't. It would make it a little clearer that we're not using it in most cases.
Attachment #8877290 - Flags: review?(continuation) → review+
Created attachment 8877771 [details] [diff] [review]
Part 2: Clean up path_total

Small tweaks from code review.
Attachment #8877771 - Flags: review?(continuation)
Attachment #8877290 - Attachment is obsolete: true
Assignee: continuation → erahm
Status: NEW → ASSIGNED
Looping in jmaher, just a heads up this is going to cause what seems like large regressions in the "explicit" AWSY measurement. It's just us fixing the calculation.
(Reporter)

Comment 9

4 months ago
Comment on attachment 8877771 [details] [diff] [review]
Part 2: Clean up path_total

Review of attachment 8877771 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/awsy/awsy/parse_about_memory.py
@@ +11,5 @@
>  from collections import defaultdict
>  import gzip
>  import json
>  
> +KIND_HEAP = 1

Please add a comment that says that this value comes from nsIMemoryReporter.idl.
Attachment #8877771 - Flags: review?(continuation) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a39710ce0565f7ea27bb5e256c113723774455a3
Bug 1372667 - Only count heap memory once in totals_heap. r=erahm

https://hg.mozilla.org/integration/mozilla-inbound/rev/d2217f433f94c07cd3ff5973e1999f352305acd9
Bug 1372667 - Part 2: Clean up path_total. r=mccr8
(Reporter)

Updated

4 months ago
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P1]
https://hg.mozilla.org/mozilla-central/rev/a39710ce0565
https://hg.mozilla.org/mozilla-central/rev/d2217f433f94
Status: ASSIGNED → RESOLVED
Last Resolved: 4 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
some AWSY regressions and improvements:
== Change summary for alert #7326 (as of June 15 2017 03:00 UTC) ==

Regressions:

 95%  Explicit Memory summary linux64 opt      155,409,474.55 -> 303,305,627.07
 93%  Explicit Memory summary windows7-32-vm opt 120,162,997.27 -> 232,370,273.57
 89%  Explicit Memory summary windows10-64-vm opt 154,376,219.50 -> 291,609,015.68

Improvements:

 53%  Resident Memory summary linux64 opt      975,752,396.48 -> 456,641,808.56
 34%  JS summary linux64 opt                   172,822,921.10 -> 113,915,479.43
 34%  JS summary windows10-64-vm opt           179,022,387.62 -> 118,757,080.54
 32%  JS summary windows7-32-vm opt            129,098,763.51 -> 87,922,099.41
 30%  Resident Memory summary windows7-32-vm opt 397,155,034.25 -> 279,333,769.59
 29%  Resident Memory summary windows10-64-vm opt 642,938,075.92 -> 454,042,792.22

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7326

this looks expected, but want to double check.

:erahm, is this what was expected?
Flags: needinfo?(erahm)
(In reply to Joel Maher ( :jmaher) from comment #12)
> some AWSY regressions and improvements:
> == Change summary for alert #7326 (as of June 15 2017 03:00 UTC) ==
> 
> Regressions:
> 
>  95%  Explicit Memory summary linux64 opt      155,409,474.55 ->
> 303,305,627.07
>  93%  Explicit Memory summary windows7-32-vm opt 120,162,997.27 ->
> 232,370,273.57
>  89%  Explicit Memory summary windows10-64-vm opt 154,376,219.50 ->
> 291,609,015.68
> 
> Improvements:
> 
>  53%  Resident Memory summary linux64 opt      975,752,396.48 ->
> 456,641,808.56
>  34%  JS summary linux64 opt                   172,822,921.10 ->
> 113,915,479.43
>  34%  JS summary windows10-64-vm opt           179,022,387.62 ->
> 118,757,080.54
>  32%  JS summary windows7-32-vm opt            129,098,763.51 ->
> 87,922,099.41
>  30%  Resident Memory summary windows7-32-vm opt 397,155,034.25 ->
> 279,333,769.59
>  29%  Resident Memory summary windows10-64-vm opt 642,938,075.92 ->
> 454,042,792.22
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=7326
> 
> this looks expected, but want to double check.
> 
> :erahm, is this what was expected?

Yeah that fits, we expected 'explicit' to increase and 'resident' and 'JS' to decrease.
Flags: needinfo?(erahm)

Updated

a month ago
Component: General → AWSY
You need to log in before you can comment on or make changes to this bug.