The default bug view has changed. See this FAQ.

Memory usage reports via telemetry

RESOLVED FIXED in mozilla7

Status

()

Core
General
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

unspecified
mozilla7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P1][inbound])

Attachments

(4 attachments, 2 obsolete attachments)

(Assignee)

Description

6 years ago
Now that telemetry is working, it'd be great to get some kind of data gathering and analysis relating to memory usage.
(Assignee)

Updated

6 years ago
Blocks: 585196

Comment 1

6 years ago
We have *some* kind of reporting already. http://people.mozilla.com/~tglek/telemetry/log.html lists aggregate info on some of the about:memory stuff
(Assignee)

Comment 2

6 years ago
Great!  So this bug is a matter of working out what more we can do with that data, if we can automate any kind of tracking, etc.

Comment 3

6 years ago
(In reply to comment #2)
> Great!  So this bug is a matter of working out what more we can do with that
> data, if we can automate any kind of tracking, etc.

Well, I put those probes in as an example. I would be happy if you could suggest refinements to that or a better approach altogether.
(Assignee)

Updated

6 years ago
Blocks: 640791
Here's the list of currently-tracked memory reporters, from TelemtryPing.js:

  "explicit/js/gc-heap": [1024, 512 * 1024, 10],
  "resident": [32 * 1024, 1024 * 1024, 10],
  "explicit/layout/all": [1024, 64 * 1025, 10],
(Assignee)

Comment 5

6 years ago
We should do "explicit" as well, that's arguably the single most interesting number.  That'll require bug 660731.
Assignee: nobody → nnethercote
Depends on: 660731
(Assignee)

Comment 6

6 years ago
http://blog.mozilla.com/tglek/2011/06/22/developers-how-to-submit-telemetry-data/ is useful info for this bug.
(Assignee)

Comment 7

6 years ago
explicit/js/gc-heap-chunk-unused (from bug 661474) would also be interesting, for an indication of fragmentation on the JS heap.

Comment 8

6 years ago
(In reply to comment #7)
> explicit/js/gc-heap-chunk-unused (from bug 661474) would also be
> interesting, for an indication of fragmentation on the JS heap.

is there a reason you didn't include a telemetry probe in that bug?
(Assignee)

Comment 9

6 years ago
(In reply to comment #8)
> 
> is there a reason you didn't include a telemetry probe in that bug?

I am planning to do all the new telemetry additions in a single patch here :)
(Assignee)

Comment 10

6 years ago
Created attachment 543339 [details] [diff] [review]
patch 1

So the memory reporting in telemetry is completely busted.  This first patch fixes it.  Is this stuff tested anywhere?
Attachment #543339 - Flags: review?(tglek)
(Assignee)

Comment 11

6 years ago
Created attachment 543340 [details] [diff] [review]
patch 2

This patch: 
- Adds three extra memory measurements to telemetry.
- Removes |memReporters| from |gatherMemory|, because it is dead.

It relies on mgr.explicit which is added by the patch in bug 660731.
Attachment #543340 - Flags: review?(tglek)
(Assignee)

Comment 12

6 years ago
BTW, I didn't do explicit/js/gc-heap-chunk-unused because that's computed via a multi-reporter (bug 666075) and there's currently no way to extract a single number from a multi-reporter.

Updated

6 years ago
Attachment #543339 - Flags: review?(tglek) → review+

Updated

6 years ago
Attachment #543340 - Flags: review?(tglek) → review+

Comment 13

6 years ago
(In reply to comment #10)
> Created attachment 543339 [details] [diff] [review] [review]
> patch 1
> 
> So the memory reporting in telemetry is completely busted.  This first patch
> fixes it.  Is this stuff tested anywhere?

Not yet. I'm not sure of the best way to test it. I was thinking of adding a NS_WARNING to the for loop if every memory reporter wasn't found (or if some of them are reporting 0, which seems crazy). Turns out there isn't an NS_WARNING equivalent from JS.
(Assignee)

Comment 14

6 years ago
(In reply to comment #13)
> Turns out there isn't an NS_WARNING equivalent from JS.

JS_ASSERT!  (Yes, I'm serious;  if this is busted I want to know, not have a warning that gets lost in the 100s of other warnings :)

Comment 15

6 years ago
(In reply to comment #14)
> (In reply to comment #13)
> > Turns out there isn't an NS_WARNING equivalent from JS.
> 
> JS_ASSERT!  (Yes, I'm serious;  if this is busted I want to know, not have a
> warning that gets lost in the 100s of other warnings :)

I meant from .js

Comment 16

6 years ago
Also while we do this, can we check if val != 0 before adding it to a histogram, I don't know why MEMORY_RESIDENT would be 0 so often.

Comment 17

6 years ago
Created attachment 543528 [details] [diff] [review]
enforce mem reporters

The above patches use NS_ASSERT, which isn't available in this module(and is apparently dangerous too as it shows up as a modal dialog).

Turns out there is no good way to assert(that I can see) from js.
Attachment #543528 - Flags: review?(nnethercote)

Comment 18

6 years ago
Created attachment 543529 [details] [diff] [review]
patch 1(for checkin)

I don't want to take the risk of branching for aurora with broken memory reporters. Lets checkin the fix right away.
Attachment #543339 - Attachment is obsolete: true
Attachment #543529 - Flags: review?(justin.lebar+bug)
Comment on attachment 543529 [details] [diff] [review]
patch 1(for checkin)

>+      if (val)
>+        h.add(val);

Why are we excluding zero values?  It's perfectly valid for the number of hard page faults since the last sample to be 0, for instance.
(In reply to comment #19)
> Why are we excluding zero values?  It's perfectly valid for the number of
> hard page faults since the last sample to be 0, for instance.

Taras and I had a long argument about this on IRC.

He thinks that 0 values aren't interesting, because they're a non-event.  In the unlikely scenario that you want to compute the proportion of time spent not swapping versus swapping, you can infer the number of 0s that are missing by looking at some other memory reporter.  Also, since presumably the 0s bucket will be much larger than the other buckets, including it will make the histogram ugly.

I think the histogram makes no sense without the 0 values.  You *always* want to compare the number of swap events to the number of non-swap events.  Otherwise you're just learning a number which is strongly correlated with how long the browser has been running.  If the histogram can't display this well, then it's bugged.

Anyway, Taras needs this hack in there because some other reporters aren't working properly -- RSS reports 0 about half the time, for reasons which are unclear to me.  We agreed that the patch gets r=me if this check is called out as a hack and we commit to fix it in the next cycle.
Attachment #543529 - Flags: review?(justin.lebar+bug) → review+

Comment 21

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/e5de7146ac19

please don't close this bug on mc merge :)
> You *always* want to compare the number of swap events to the number of non-swap events.

For instance, I might want to know: "Is FF becoming less swappy for users?"  I'd compute this by comparing the number of events below a threshold (say less than 1000 faults / min) to those above a threshold.  The 0s are crucial here.

(This further shows why it doesn't make sense to exclude 0s: A value of 0 faults / min isn't a lot different from 2 faults / min.  Yet only one makes it into the histogram?)
Also, WRT the point that you can infer the number of 0s by examining another memory reporter: That only works if the memory reporters aren't randomly returning 0.  :)
(Assignee)

Comment 24

6 years ago
Comment on attachment 543528 [details] [diff] [review]
enforce mem reporters

Review of attachment 543528 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #543528 - Flags: review?(nnethercote) → review+
merged comment 21
http://hg.mozilla.org/mozilla-central/rev/e5de7146ac19
Target Milestone: --- → flash10

Updated

6 years ago
Target Milestone: flash10 → ---
Created attachment 543613 [details] [diff] [review]
UNITS_COUNT fix.

I tested this and didn't see 0s coming from anywhere except hard-page-faults, where it's expected.

If RSS is still reporting 0 for some unknown reason, I don't think we should hide it, as we currently do.  Since it's possible that the 0s are reported in place of certain values (as we don't know where they were coming from), taking out the 0s entirely kind of invalidates the RSS measure.

If you really feel strongly that 0 hard page faults in an interval shouldn't be reported, then let's exclude 0s for UNITS_COUNT, but not for other units.  Although I still feel like that's entirely the wrong thing to do.  Maybe njn can referee...
Attachment #543613 - Attachment description: UNITS_COUNTS fix. → UNITS_COUNT fix.
Comment on attachment 543613 [details] [diff] [review]
UNITS_COUNT fix.

Taras, do you think we can get this in before we branch?
Attachment #543613 - Flags: review?(tglek)

Comment 28

6 years ago
Comment on attachment 543613 [details] [diff] [review]
UNITS_COUNT fix.

>-      // hack to deal with some memory reporters returning 0
>-      if (val)
>-        h.add(val);
>+      h.add(val);
This is still needed for UNITS_BYTES.
Attachment #543613 - Flags: review?(tglek) → review-
Comment on attachment 543613 [details] [diff] [review]
UNITS_COUNT fix.

Maybe njn can arbitrate; would you be OK with that, Taras?
Attachment #543613 - Flags: feedback?(nnethercote)
(Assignee)

Comment 30

6 years ago
I'd like to include zero counts for byte measurements.  We should never get zero for them, AFAIK -- memory reporters return -1 if they fail -- so if we do that is interesting, and probably a sign something is buggy.
I think we all agree with including the zeros for byte measurements.  What about for counts?
(Assignee)

Comment 32

6 years ago
Comment on attachment 543613 [details] [diff] [review]
UNITS_COUNT fix.

Review of attachment 543613 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #543613 - Flags: feedback?(nnethercote) → feedback+

Comment 33

6 years ago
(In reply to comment #30)
> I'd like to include zero counts for byte measurements.  We should never get
> zero for them, AFAIK -- memory reporters return -1 if they fail -- so if we
> do that is interesting, and probably a sign something is buggy.

Ok, so r+ with a fix that checks for -1 and does a continue.

Math.round(-1/1024)==0 :)
(Assignee)

Comment 34

6 years ago
(In reply to comment #31)
> I think we all agree with including the zeros for byte measurements.  What
> about for counts?

I don't think we should throw away any valid measurements, ie. measurements where nothing went wrong when taking the measurement.

Do we need to check for -1 for page fault counts?
> Do we need to check for -1 for page fault counts?

Yes, they return -1 on failure.
Created attachment 543627 [details] [diff] [review]
Fix for UNITS_COUNT v2
Attachment #543627 - Flags: review?(tglek)
Attachment #543613 - Attachment is obsolete: true

Comment 37

6 years ago
Comment on attachment 543627 [details] [diff] [review]
Fix for UNITS_COUNT v2

Thanks, happy to see a proper fix for this.
Please triplecheck that this works as expected on your machine.
Attachment #543627 - Flags: review?(tglek) → review+
Checked one last time that it works on my machine, and pushed to inbound.

http://hg.mozilla.org/integration/mozilla-inbound/rev/016fcb800fc2
Whiteboard: [MemShrink:P1] → [MemShrink:P1][inbound]
(Assignee)

Comment 39

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/1ec0339ac9a1

That is "patch 2" with some minor rebasing.

This should be the last patch for this bug, so when it's merged to m-c this bug can be marked as RESOLVED FIXED.  Thanks.
http://hg.mozilla.org/mozilla-central/rev/016fcb800fc2
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla7
ehr, still missing part 2 from this merge.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
http://hg.mozilla.org/mozilla-central/rev/1ec0339ac9a1
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.