Last Comment Bug 664758 - Add page fault reporting to telemetry
: Add page fault reporting to telemetry
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: General (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on: 585196 664486
Blocks: 664291
  Show dependency treegraph
 
Reported: 2011-06-16 09:53 PDT by Justin Lebar (not reading bugmail)
Modified: 2011-06-28 18:53 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (6.99 KB, patch)
2011-06-23 09:15 PDT, Justin Lebar (not reading bugmail)
taras.mozilla: review-
Details | Diff | Review
Patch v2 (7.00 KB, patch)
2011-06-24 13:48 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Review
Patch v3 (8.00 KB, patch)
2011-06-24 14:34 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Review
Patch v3 (rebased) (8.00 KB, patch)
2011-06-24 14:50 PDT, Justin Lebar (not reading bugmail)
taras.mozilla: review+
Details | Diff | Review
Patch v4 (removes unneded double-conversion code) (7.59 KB, patch)
2011-06-27 09:54 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Review
Patch v4.1 (fixing stupid bug) (7.59 KB, patch)
2011-06-27 10:00 PDT, Justin Lebar (not reading bugmail)
taras.mozilla: review-
Details | Diff | Review

Description Justin Lebar (not reading bugmail) 2011-06-16 09:53:28 PDT
Having telemetry on page faults would be useful in general, but also for understanding whether the changes we make in bug 664291 are helpful.
Comment 1 (dormant account) 2011-06-16 10:46:43 PDT
(In reply to comment #0)
> Having telemetry on page faults would be useful in general, but also for
> understanding whether the changes we make in bug 664291 are helpful.

How do you intend to gather page fault stats?
Comment 2 Justin Lebar (not reading bugmail) 2011-06-16 10:47:53 PDT
Bug 664486.  I already have something working on Linux and Mac.
Comment 3 Justin Lebar (not reading bugmail) 2011-06-23 09:15:51 PDT
Created attachment 541399 [details] [diff] [review]
Patch v1
Comment 4 (dormant account) 2011-06-23 11:37:52 PDT
Comment on attachment 541399 [details] [diff] [review]
Patch v1

I don't think that's a useful way to measure paging traffic. As I understand it, you are recording the total amount of pagefaults that occured up till probe point..on every probe. This is wasteful.

so a) Decrease the max to 1000ish.
b) Record the current pagefault count in this._prev_pagefaults, then in the idle handler do

faults = current("hard-page-faults") - this._prev_pagefaults
histogram.Add(faults)
Comment 5 Justin Lebar (not reading bugmail) 2011-06-23 11:41:07 PDT
That's *much* better way of doing it.

How often are the probes fired?  What's the variance in their spacing?
Comment 6 (dormant account) 2011-06-23 11:44:09 PDT
(In reply to comment #5)
> That's *much* better way of doing it.
> 
> How often are the probes fired?  What's the variance in their spacing?

At most once a minute, but this only fired on idle. http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#52

May be worth tying that to some ui event in the future so we don't run down batteries for no reason.
Comment 7 Justin Lebar (not reading bugmail) 2011-06-24 13:48:03 PDT
Created attachment 541783 [details] [diff] [review]
Patch v2

Updating to be compatible with v2 of the page fault patch; one line change to use the new units field on memory reporters.
Comment 8 Justin Lebar (not reading bugmail) 2011-06-24 13:51:39 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > That's *much* better way of doing it.
> > 
> > How often are the probes fired?  What's the variance in their spacing?
> 
> At most once a minute, but this only fired on idle.
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryPing.js#52

Well, we could try to do better than this, but it's probably good as a first pass.

> May be worth tying that to some ui event in the future so we don't run down
> batteries for no reason.

Indeed!
Comment 9 Justin Lebar (not reading bugmail) 2011-06-24 14:22:12 PDT
> Decrease the max to 1000ish.

I've seen 40k page faults in a short period of time, so I'm going to set it to 64k, if that's OK with you.  If we have to page in more than (64 * 1024) pages * 4kb/page = 256mb, I'm happy to call that "massive paging".
Comment 10 (dormant account) 2011-06-24 14:29:26 PDT
(In reply to comment #9)
> > Decrease the max to 1000ish.
> 
> I've seen 40k page faults in a short period of time, so I'm going to set it
> to 64k, if that's OK with you.  If we have to page in more than (64 * 1024)
> pages * 4kb/page = 256mb, I'm happy to call that "massive paging".

ok
Comment 11 Justin Lebar (not reading bugmail) 2011-06-24 14:34:47 PDT
Created attachment 541803 [details] [diff] [review]
Patch v3
Comment 12 Justin Lebar (not reading bugmail) 2011-06-24 14:40:15 PDT
Comment on attachment 541803 [details] [diff] [review]
Patch v3

I have no idea if this actually works, but it passes the Telemetry test.  I'd be happy to write another test, if that's possible!
Comment 13 Justin Lebar (not reading bugmail) 2011-06-24 14:50:50 PDT
Created attachment 541812 [details] [diff] [review]
Patch v3 (rebased)
Comment 14 (dormant account) 2011-06-24 14:59:17 PDT
(In reply to comment #12)
> Comment on attachment 541803 [details] [diff] [review] [review]
> Patch v3
> 
> I have no idea if this actually works, but it passes the Telemetry test. 
> I'd be happy to write another test, if that's possible!

Modify the existing test to add a double in addition to an int
Comment 15 (dormant account) 2011-06-24 15:03:44 PDT
Comment on attachment 541812 [details] [diff] [review]
Patch v3 (rebased)

>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal)
>+          prevVal = 0;
This should be { prevVal = curVal; continue}

cold startup causes a bunch of pagefaults, you do not want to count those.

It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT. 
Add a comment that 
// this if block is measuring counting pagefaults per period
Comment 16 Justin Lebar (not reading bugmail) 2011-06-27 08:12:02 PDT
> It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT.
> Add a comment that 
> // this if block is measuring counting pagefaults per period

Were you commenting that it's confusing that I check UNITS_COUNT and need to add a comment, or that I should resolve the weirdness in some other way?
Comment 17 Justin Lebar (not reading bugmail) 2011-06-27 09:12:55 PDT
(In reply to comment #14)
> Modify the existing test to add a double in addition to an int

The reason I added that double conversion code in the first place was because the page fault count was coming in as an int64 (thus a js double) and was causing the test to fail.  I guess the other memory reporters didn't cause a failure because the code did some math on them before adding them to the histogram, so they got downcast to int32s.

Now that we're also doing some math on the UNITS_COUNT value (subtraction), I suspect the jsdouble branch isn't exercised.  But maybe we want to leave it in, since the old behavior was kind of weird.

In that case, it's not clear to me how to test that branch.  test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I even do that without modifying TelemetryHistograms.h?
Comment 18 (dormant account) 2011-06-27 09:21:23 PDT
(In reply to comment #17)

> In that case, it's not clear to me how to test that branch. 
> test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I
> even do that without modifying TelemetryHistograms.h?

Yes, test_nsITelemetry.js does it. nITelemetry::newHistogram lets you create adhoc histograms. If the code isn't exercising it, land this without conversion stuff I'll just add double conversion + testcase to bug 666309.
Comment 19 Justin Lebar (not reading bugmail) 2011-06-27 09:54:13 PDT
Created attachment 542180 [details] [diff] [review]
Patch v4 (removes unneded double-conversion code)
Comment 20 Justin Lebar (not reading bugmail) 2011-06-27 09:59:44 PDT
> Yes, test_nsITelemetry.js does it.

Ah, I see!

> If the code isn't exercising it, land this without conversion stuff I'll just 
> add double conversion + testcase to bug 666309.

I just checked, and it's indeed not exercising that path.  So your plan sounds good to me!
Comment 21 Justin Lebar (not reading bugmail) 2011-06-27 10:00:15 PDT
Created attachment 542185 [details] [diff] [review]
Patch v4.1 (fixing stupid bug)
Comment 22 (dormant account) 2011-06-27 15:18:02 PDT
(In reply to comment #21)
> Created attachment 542185 [details] [diff] [review] [review]
> Patch v4.1 (fixing stupid bug)

did you mean to r? this or carry over r+?
Comment 23 Justin Lebar (not reading bugmail) 2011-06-27 15:20:30 PDT
I was going to carry over the r+, but you're welcome to take another look if you'd like!  I just removed the double conversion code and fixed an accidental |curVal| instead of |val|.
Comment 24 (dormant account) 2011-06-27 16:34:01 PDT
Comment on attachment 542185 [details] [diff] [review]
Patch v4.1 (fixing stupid bug)

>+        name = "Memory:" + mr.path;
>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal) {
>+          // If this is the first time we're reading this reporter, store its
>+          // current value but don't report it in the telemetry ping, so we
>+          // ignore the effect startup had on the reporter.
>+          this._prevValues[mr.path] = curVal;
>+          continue;
>+        }
>+        val = curVal - prevVal;
>+        this._prevValues[mr.path] = val;
 isn't it supposed to be
this._prevValues[mr.path] = curVal?
Comment 25 Justin Lebar (not reading bugmail) 2011-06-28 06:22:18 PDT
>  isn't it supposed to be
> this._prevValues[mr.path] = curVal?

Ack, yes.  Thanks for noticing that.
Comment 26 Justin Lebar (not reading bugmail) 2011-06-28 09:39:33 PDT
Pushed to m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/3b0160b38c3d
Comment 27 Joe Drew (not getting mail) 2011-06-28 18:53:42 PDT
http://hg.mozilla.org/mozilla-central/rev/3b0160b38c3d

Note You need to log in before you can comment on or make changes to this bug.