Closed Bug 748440 Opened 12 years ago Closed 12 years ago

Add telemetry for heap fragmentation

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(2 files, 4 obsolete files)

We don't currently have telemetry telling us how much memory we waste out of the malloc heap and the JS heap.

As a result, we don't have any idea how bad our fragmentation problem is.
Blocks: 746009
Assignee: nobody → justin.lebar+bug
njn, we don't have an assertion that percentage reporters are less than 100%, right?  I don't see one anywhere, but just to check...
Ah, I realized that I totally messed up telemetry for js-heap-allocated, because telemetry doesn't collect multi-reporters, anad I moved heap-allocated into a multi-reporter.  :(
Blocks: 748454
The main change here is adding a new memory reporter which contains a bunch of KIND_OTHER reporters from the "js" memory reporter.  "js" still reports the old values as before, but now "js-summary" reports those same values, under the path "js-summary/" and with KIND_SUMMARY.

I renamed "*fragmentation" to "*committed-unused-ratio" and changed it to report |waste / allocated|, which I think is a better metric than |waste / committed| -- what we should be getting at with this value is the relative overhead of the allocator.  I also added "*committed-unused" which reports the raw amount of committed, unused memory.
Attachment #618122 - Flags: review?(n.nethercote)
In addition to reporting these new values, this patch fixes js-gc-heap-committed reporting, which I broke in bug 741378.  (js-gc-heap-committed was in a memory reporter, so it was inaccessible to telemetry until this patch.)
Comment on attachment 618123 [details] [diff] [review]
Part 2: Support multi-reporters in telemetry, and report {heap,js}-committed-unused{,-ratio} in telemetry.

Taras, are you OK with Nathan reviewing this patch, or would you like to look at it yourself?
Attachment #618123 - Flags: review?(nfroyd)
Attachment #618123 - Flags: feedback?(taras.mozilla)
Comment on attachment 618123 [details] [diff] [review]
Part 2: Support multi-reporters in telemetry, and report {heap,js}-committed-unused{,-ratio} in telemetry.

Nathan is ok to review stuff like this, but since I'm here, have an r+. I would prefer if this had tests, I didn't put in any myself because about:memory stuff was not really useful from xpcshell tests. Would be nice to not regress these.

Optional nit: looks like the if, that you reused from my code, can become a switch() statement.
Attachment #618123 - Flags: review?(nfroyd)
Attachment #618123 - Flags: review+
Attachment #618123 - Flags: feedback?(taras.mozilla)
Attachment #618123 - Flags: feedback+
> Would be nice to not regress these.
...again.  Yes.  :)

So the way we test about:memory is we unregister all the real memory reporters, then register some fake ones which always return the same value.  At the end of the test, we fix things back up.

But it looks like the ping that test_TelemetryPing.js gets is a real ping -- that is, it includes data collected before the test started running.  If I unregister all the memory reporters at the beginning of the test, I still get a histogram for MEMORY_STORAGE_SQLITE, which is backed by a memory reporter.

I don't see a way to clear out this data from telemetry, either.

So it looks like I can test that the data is there, but not that it's correct.  Is that a test you think I should do?
(In reply to Justin Lebar [:jlebar] from comment #7)
> I don't see a way to clear out this data from telemetry, either.

We were just talking today about how it would be useful to have Telemetry.clearHistogram in a different context.  If it would be useful for test purposes, then so much the better.
Is part 1 missing some changes?  You've used KIND_SUMMARY in XPCJSRuntime.cpp but haven't defined it anywhere, AFAICT.
(In reply to Justin Lebar [:jlebar] from comment #7)
> > Would be nice to not regress these.
> ...again.  Yes.  :)
> 
> So the way we test about:memory is we unregister all the real memory
> reporters, then register some fake ones which always return the same value. 
> At the end of the test, we fix things back up.
> 
> But it looks like the ping that test_TelemetryPing.js gets is a real ping --
> that is, it includes data collected before the test started running.  If I
> unregister all the memory reporters at the beginning of the test, I still
> get a histogram for MEMORY_STORAGE_SQLITE, which is backed by a memory
> reporter.
> 
> I don't see a way to clear out this data from telemetry, either.
> 
> So it looks like I can test that the data is there, but not that it's
> correct.  Is that a test you think I should do?

Yes. As Nathan said, bug 748914 should get us a way to clear telemetry histograms
Since this fixes a regression, I don't want to block waiting on another bug.  I'll write a test once clearing is implemented.
Oh, I understand, you meant "yes, you should test that the data is there".  I can do that...
(In reply to Nicholas Nethercote [:njn] from comment #9)
> Is part 1 missing some changes?  You've used KIND_SUMMARY in
> XPCJSRuntime.cpp but haven't defined it anywhere, AFAICT.

We defined it for ghost windows some patches ago.
Attached patch Part 3: Add a test (obsolete) — Splinter Review
Attachment #618483 - Flags: review?(taras.mozilla)
Comment on attachment 618483 [details] [diff] [review]
Part 3: Add a test

this will do nicely
Attachment #618483 - Flags: review?(taras.mozilla) → review+
Comment on attachment 618123 [details] [diff] [review]
Part 2: Support multi-reporters in telemetry, and report {heap,js}-committed-unused{,-ratio} in telemetry.

Review of attachment 618123 [details] [diff] [review]:
-----------------------------------------------------------------

IIUC, the "js-summary" multi-reporter touches every live thing on the JS heap, and it's run on every telemetry ping.  This isn't acceptable -- we used to do exactly that, but I fixed that in bug 708159.

I think some of the numbers you want to report to telemetry aren't obtainable without touching every live thing on the JS heap... I don't know how to square that circle :(
Attachment #618123 - Flags: review-
Comment on attachment 618122 [details] [diff] [review]
Part 1: Add js-summary multi-reporter and tweak some memory reporters.

Review of attachment 618122 [details] [diff] [review]:
-----------------------------------------------------------------

Removing the review request because of my r- for the follow-up patch.
Attachment #618122 - Flags: review?(n.nethercote) → review-
(In reply to Nicholas Nethercote [:njn] from comment #16)
> IIUC, the "js-summary" multi-reporter touches every live thing on the JS
> heap, and it's run on every telemetry ping.  This isn't acceptable -- we
> used to do exactly that, but I fixed that in bug 708159.

Ah, you're totally right.

> I think some of the numbers you want to report to telemetry aren't obtainable without touching every 
> live thing on the JS heap... I don't know how to square that circle :(

I really want these heap fragmentation numbers, and this patch fixes a regression (js-gc-heap), so I'll split out the JS stuff and we can worry about it separately.
Summary: Add telemetry for heap and JS-heap fragmentation → Add telemetry for heap fragmentation
Anyone have telemetry they want from a memory multi-reporter?  I'd hate to leave unused code in the tree...
(In reply to Justin Lebar [:jlebar] from comment #19)
> Anyone have telemetry they want from a memory multi-reporter?  I'd hate to
> leave unused code in the tree...

Not that I know of.  Running multi-reporters for telemetry seems like a bad idea in general -- too easy for them to take a long time...
This is the same as parts 2-3 v1, except it doesn't include js-gc-committed-unused{,-ratio} and doesn't support multi-reporters.  The test is a bit weaker too -- we now test a reporter with UNITS_COUNT, and UNITS_BYTE, but not with UNITS_PERCENTAGE, because the only reporter with those units that's in telemetry is heap-committed-unused-ratio, and that reporter is not available without jemalloc (and therefore the test would fail in trace-malloc builds).

I don't think Taras needs to look at this again, since the changes are mostly code removal.
Attachment #618122 - Attachment is obsolete: true
Attachment #618123 - Attachment is obsolete: true
Attachment #618483 - Attachment is obsolete: true
Attachment #618862 - Flags: review?
Attachment #618862 - Flags: review? → review+
Wow, these patches do not even compile.   Let me try again.
Attachment #618860 - Flags: review?(n.nethercote)
Try run for 55757fb6909b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=55757fb6909b
Results (out of 15 total builds):
    failure: 15
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-55757fb6909b
Attached patch Part 1, v3Splinter Review
This one compiles.  :)
Attachment #619107 - Flags: review?(n.nethercote)
Attachment #618860 - Attachment is obsolete: true
Comment on attachment 619107 [details] [diff] [review]
Part 1, v3

Review of attachment 619107 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1792,5 @@
>  
> +        REPORT(NS_LITERAL_CSTRING("js-main-runtime-gc-heap-committed-unused-ratio"),
> +               nsIMemoryReporter::KIND_OTHER,
> +               nsIMemoryReporter::UNITS_PERCENTAGE,
> +               (PRInt64) 10000 * rtStats.gcHeapUnused /

Can you crib the removed comment from MemoryMetrics.cpp to explain why the 10000 is used here?  Thanks.
Attachment #619107 - Flags: review?(n.nethercote) → review+
Try run for 96e316210931 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=96e316210931
Results (out of 230 total builds):
    exception: 9
    success: 124
    warnings: 27
    failure: 70
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-96e316210931
https://hg.mozilla.org/mozilla-central/rev/f5a301fe9ba5
https://hg.mozilla.org/mozilla-central/rev/d7db4957bb75
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Depends on: 751509
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: