Add page fault reporting to telemetry

RESOLVED FIXED

Status

()

Core
General
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Justin Lebar (not reading bugmail), Assigned: Justin Lebar (not reading bugmail))

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

6 years ago
Having telemetry on page faults would be useful in general, but also for understanding whether the changes we make in bug 664291 are helpful.

Comment 1

6 years ago
(In reply to comment #0)
> Having telemetry on page faults would be useful in general, but also for
> understanding whether the changes we make in bug 664291 are helpful.

How do you intend to gather page fault stats?
(Assignee)

Comment 2

6 years ago
Bug 664486.  I already have something working on Linux and Mac.
(Assignee)

Updated

6 years ago
URL: 664291
No longer depends on: 664291
(Assignee)

Updated

6 years ago
Blocks: 664291
URL: 664291
(Assignee)

Comment 3

6 years ago
Created attachment 541399 [details] [diff] [review]
Patch v1
(Assignee)

Updated

6 years ago
Attachment #541399 - Flags: review?(tglek)

Comment 4

6 years ago
Comment on attachment 541399 [details] [diff] [review]
Patch v1

I don't think that's a useful way to measure paging traffic. As I understand it, you are recording the total amount of pagefaults that occured up till probe point..on every probe. This is wasteful.

so a) Decrease the max to 1000ish.
b) Record the current pagefault count in this._prev_pagefaults, then in the idle handler do

faults = current("hard-page-faults") - this._prev_pagefaults
histogram.Add(faults)
Attachment #541399 - Flags: review?(tglek) → review-
(Assignee)

Comment 5

6 years ago
That's *much* better way of doing it.

How often are the probes fired?  What's the variance in their spacing?

Comment 6

6 years ago
(In reply to comment #5)
> That's *much* better way of doing it.
> 
> How often are the probes fired?  What's the variance in their spacing?

At most once a minute, but this only fired on idle. http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#52

May be worth tying that to some ui event in the future so we don't run down batteries for no reason.
(Assignee)

Comment 7

6 years ago
Created attachment 541783 [details] [diff] [review]
Patch v2

Updating to be compatible with v2 of the page fault patch; one line change to use the new units field on memory reporters.
(Assignee)

Updated

6 years ago
Attachment #541399 - Attachment is obsolete: true
(Assignee)

Comment 8

6 years ago
(In reply to comment #6)
> (In reply to comment #5)
> > That's *much* better way of doing it.
> > 
> > How often are the probes fired?  What's the variance in their spacing?
> 
> At most once a minute, but this only fired on idle.
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryPing.js#52

Well, we could try to do better than this, but it's probably good as a first pass.

> May be worth tying that to some ui event in the future so we don't run down
> batteries for no reason.

Indeed!
(Assignee)

Comment 9

6 years ago
> Decrease the max to 1000ish.

I've seen 40k page faults in a short period of time, so I'm going to set it to 64k, if that's OK with you.  If we have to page in more than (64 * 1024) pages * 4kb/page = 256mb, I'm happy to call that "massive paging".

Comment 10

6 years ago
(In reply to comment #9)
> > Decrease the max to 1000ish.
> 
> I've seen 40k page faults in a short period of time, so I'm going to set it
> to 64k, if that's OK with you.  If we have to page in more than (64 * 1024)
> pages * 4kb/page = 256mb, I'm happy to call that "massive paging".

ok
(Assignee)

Comment 11

6 years ago
Created attachment 541803 [details] [diff] [review]
Patch v3
Attachment #541803 - Flags: review?(tglek)
(Assignee)

Updated

6 years ago
Attachment #541783 - Attachment is obsolete: true
(Assignee)

Comment 12

6 years ago
Comment on attachment 541803 [details] [diff] [review]
Patch v3

I have no idea if this actually works, but it passes the Telemetry test.  I'd be happy to write another test, if that's possible!
(Assignee)

Comment 13

6 years ago
Created attachment 541812 [details] [diff] [review]
Patch v3 (rebased)
Attachment #541812 - Flags: review?(tglek)
(Assignee)

Updated

6 years ago
Attachment #541803 - Attachment is obsolete: true
Attachment #541803 - Flags: review?(tglek)

Comment 14

6 years ago
(In reply to comment #12)
> Comment on attachment 541803 [details] [diff] [review] [review]
> Patch v3
> 
> I have no idea if this actually works, but it passes the Telemetry test. 
> I'd be happy to write another test, if that's possible!

Modify the existing test to add a double in addition to an int

Comment 15

6 years ago
Comment on attachment 541812 [details] [diff] [review]
Patch v3 (rebased)

>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal)
>+          prevVal = 0;
This should be { prevVal = curVal; continue}

cold startup causes a bunch of pagefaults, you do not want to count those.

It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT. 
Add a comment that 
// this if block is measuring counting pagefaults per period
Attachment #541812 - Flags: review?(tglek) → review+
(Assignee)

Comment 16

6 years ago
> It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT.
> Add a comment that 
> // this if block is measuring counting pagefaults per period

Were you commenting that it's confusing that I check UNITS_COUNT and need to add a comment, or that I should resolve the weirdness in some other way?
(Assignee)

Updated

6 years ago
Assignee: nobody → justin.lebar+bug
(Assignee)

Comment 17

6 years ago
(In reply to comment #14)
> Modify the existing test to add a double in addition to an int

The reason I added that double conversion code in the first place was because the page fault count was coming in as an int64 (thus a js double) and was causing the test to fail.  I guess the other memory reporters didn't cause a failure because the code did some math on them before adding them to the histogram, so they got downcast to int32s.

Now that we're also doing some math on the UNITS_COUNT value (subtraction), I suspect the jsdouble branch isn't exercised.  But maybe we want to leave it in, since the old behavior was kind of weird.

In that case, it's not clear to me how to test that branch.  test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I even do that without modifying TelemetryHistograms.h?

Comment 18

6 years ago
(In reply to comment #17)

> In that case, it's not clear to me how to test that branch. 
> test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I
> even do that without modifying TelemetryHistograms.h?

Yes, test_nsITelemetry.js does it. nITelemetry::newHistogram lets you create adhoc histograms. If the code isn't exercising it, land this without conversion stuff I'll just add double conversion + testcase to bug 666309.
(Assignee)

Comment 19

6 years ago
Created attachment 542180 [details] [diff] [review]
Patch v4 (removes unneded double-conversion code)
(Assignee)

Updated

6 years ago
Attachment #541812 - Attachment is obsolete: true
(Assignee)

Comment 20

6 years ago
> Yes, test_nsITelemetry.js does it.

Ah, I see!

> If the code isn't exercising it, land this without conversion stuff I'll just 
> add double conversion + testcase to bug 666309.

I just checked, and it's indeed not exercising that path.  So your plan sounds good to me!
(Assignee)

Comment 21

6 years ago
Created attachment 542185 [details] [diff] [review]
Patch v4.1 (fixing stupid bug)
(Assignee)

Updated

6 years ago
Attachment #542180 - Attachment is obsolete: true

Comment 22

6 years ago
(In reply to comment #21)
> Created attachment 542185 [details] [diff] [review] [review]
> Patch v4.1 (fixing stupid bug)

did you mean to r? this or carry over r+?
(Assignee)

Comment 23

6 years ago
I was going to carry over the r+, but you're welcome to take another look if you'd like!  I just removed the double conversion code and fixed an accidental |curVal| instead of |val|.

Comment 24

6 years ago
Comment on attachment 542185 [details] [diff] [review]
Patch v4.1 (fixing stupid bug)

>+        name = "Memory:" + mr.path;
>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal) {
>+          // If this is the first time we're reading this reporter, store its
>+          // current value but don't report it in the telemetry ping, so we
>+          // ignore the effect startup had on the reporter.
>+          this._prevValues[mr.path] = curVal;
>+          continue;
>+        }
>+        val = curVal - prevVal;
>+        this._prevValues[mr.path] = val;
 isn't it supposed to be
this._prevValues[mr.path] = curVal?
Attachment #542185 - Flags: review-
(Assignee)

Comment 25

6 years ago
>  isn't it supposed to be
> this._prevValues[mr.path] = curVal?

Ack, yes.  Thanks for noticing that.
(Assignee)

Comment 26

6 years ago
Pushed to m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/3b0160b38c3d
Whiteboard: [inbound]
http://hg.mozilla.org/mozilla-central/rev/3b0160b38c3d
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
You need to log in before you can comment on or make changes to this bug.