Closed Bug 765351 Opened 10 years ago Closed 10 years ago

Add telemetry probe for startup I/O

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: nicolas, Assigned: nicolas)

References

Details

Attachments

(2 files, 7 obsolete files)

Add a telemetry probe that measures process-wide disk I/O bytes during startup.
Assignee: nobody → necheverria
- Make it more robust / better integration?
- Use native calls instead of js-ctypes?
Attachment #633643 - Flags: feedback?(vdjeric)
Comment on attachment 633643 [details] [diff] [review]
Draft for adding startup IO counters to telemetry data using js-ctypes.

Can we also add a listener for the before-first paint event in TelemetryPing? It might be good to have that number as well, since we'll be doing a lot of network and disk cache I/O during session restore.

I've made a few small notes about style & JS conventions.

>diff -r 00244ceddd42 toolkit/components/telemetry/TelemetryPing.js
>--- a/toolkit/components/telemetry/TelemetryPing.js	Wed Jun 13 12:14:31 2012 -0500
>+++ b/toolkit/components/telemetry/TelemetryPing.js	Fri Jun 15 15:17:53 2012 -0400
>+/**
>+ * Read current process I/O counters.
>+ */
>+function getProcessInformation() {
>+  if (Services.appinfo.OS == "WINNT")
>+    return getProcessInformation_Windows();
>+  return false;
>+}

Functions should always return the same data type, either always return bools or always return an object/null

>+
>+function getProcessInformation_Windows() {
>+  var me = getProcessInformation_Windows;

We prefer to use "let" for declaring variables in Firefox JS code because it's block-scope, check out https://developer.mozilla.org/en/JavaScript/Reference/Statements/let

>+function getProcessInformation_Windows() {
>+  var me = getProcessInformation_Windows;
>+  if (typeof me.init == 'undefined'){
>+    me.init = false; 

You could do this with a global JS object instead of a function, e.g.

let startupIO = {
   readOps: 0,
   writeOps: 0,
   fetchCounters: function () {
      let kernel32 = ...
      this.readOps = ...
   },
};

But as you said, it might be better to put this functionality into native code

>+    try {
>+      ...
>+      me.init = true;
>+    } catch (err) { }
>+  }
>+  if (!me.init) return false;

You can just return null or false in the catch block
Attachment #633643 - Flags: feedback?(vdjeric)
Thanks for the feedback, I'll work on it.

before-first-paint is fired repeatedly, and is first fired at 2163ms after startup in a test run. in the same test, about:startup reports firstPaint at 4646ms.

Is the first before-first-paint event what we want?
(In reply to Nicolás Chaim from comment #3)
> Thanks for the feedback, I'll work on it.
> 
> before-first-paint is fired repeatedly, and is first fired at 2163ms after
> startup in a test run. in the same test, about:startup reports firstPaint at
> 4646ms.
> 
> Is the first before-first-paint event what we want?

Depends, why is before-first paint firing so much?
Attachment #638548 - Flags: review?(vdjeric)
Comment on attachment 638548 [details] [diff] [review]
Improved patch including firstPaint listener

>+  getCounters_Windows: function() {
>+    if (this.init == 0){
>+      this.init = -1; 

You can just use a bool variable here (e.g. _initialized) instead of the -1/0/1 flags. I don't think there's much harm in trying to initialize the ctypes structures twice in the very rare case initialization fails for some reason.

if (!this._initialized) {
  ...
  this._initialized = true;
}

>+/**
>+ * Read current process I/O counters.
>+ */            
>+let processInfo = {
>+  init: 0,

Nitpicks: Change "init: 0," to "_initialized: false," and add _IO_COUNTERS = null, _GetCurrentProcess, etc

>+      try {
>+         ...
>+      } catch (err) { }

Nitpick: return null here

>+    if (this.init != 1) return null;

Nitpick: Make this check into 2 lines to fit with the code style
Attachment #638548 - Flags: review?(vdjeric)
Made improvements suggested by vladan
Attachment #638548 - Attachment is obsolete: true
Attachment #640230 - Flags: feedback?(vdjeric)
Attachment #640230 - Flags: review?(taras.mozilla)
Attachment #640230 - Flags: feedback?(vdjeric)
Attachment #640230 - Flags: feedback+
Comment on attachment 640230 [details] [diff] [review]
Improved patch including firstPaint listener

firstPaintObserver is misleading. Use _hasXulWindowVisibleObserver.
Do not use a multilevel structure here. If you use a flat one, metrics will be able to process it automagically.

so instead of  
+    payloadObj.simpleMeasurements.startupIO = this._startupIO;
Do 

+    payloadObj.simpleMeasurements.startup_KB_read.
+    payloadObj.simpleMeasurements.startup_KB_written.

I don't think we care about other measurements windows gives us.
Attachment #640230 - Flags: review?(taras.mozilla) → review-
Attached patch Patch (obsolete) — Splinter Review
Do we still want two different checkpoints for the I/O counters, xul-window-visible and sessionstore-windows-restored?
Attachment #640230 - Attachment is obsolete: true
Attachment #641504 - Flags: feedback?(vdjeric)
Attached patch Patch (obsolete) — Splinter Review
Attachment #641504 - Attachment is obsolete: true
Attachment #641504 - Flags: feedback?(vdjeric)
Attachment #644026 - Flags: review?(vdjeric)
Duplicate of this bug: 734737
Comment on attachment 644026 [details] [diff] [review]
Patch

>+    
>+    for (let p in this._startupIO)
>+      payloadObj.simpleMeasurements[p] = this._startupIO[p];

Nitpick: Rename "p" to something more descriptive like "property"

>+      [this._startupIO.startupWindowVisibleReadBytes, 
>+        this._startupIO.startupWindowVisibleWriteBytes] = processInfo.getCounters();

getCounters() will return NULL on non-Windows platforms & during error conditions. This assignment would then throw an exception. Check if the return value of getCounters() is NULL before assigning to these properties.

Taras, is it a problem for Metrics if the startup IO fields don't exist on non-Windows platforms?
Attachment #644026 - Flags: review?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #12)
> Comment on attachment 644026 [details] [diff] [review]
> Patch
> 
> >+    
> >+    for (let p in this._startupIO)
> >+      payloadObj.simpleMeasurements[p] = this._startupIO[p];
> 
> Nitpick: Rename "p" to something more descriptive like "property"
like ioCounter?
> 
> >+      [this._startupIO.startupWindowVisibleReadBytes, 
> >+        this._startupIO.startupWindowVisibleWriteBytes] = processInfo.getCounters();
> 
> getCounters() will return NULL on non-Windows platforms & during error
> conditions. This assignment would then throw an exception. Check if the
> return value of getCounters() is NULL before assigning to these properties.
> 
> Taras, is it a problem for Metrics if the startup IO fields don't exist on
> non-Windows platforms?

nope
Attached patch Patch (obsolete) — Splinter Review
Attachment #644026 - Attachment is obsolete: true
Attachment #644065 - Flags: review?(vdjeric)
Comment on attachment 644065 [details] [diff] [review]
Patch

Looks good, just a few style details

>+    
>+    for (let key in this._startupIO)
>+      payloadObj.simpleMeasurements[key] = this._startupIO[key];

Maybe use ioCounter as Taras suggested


>+      var counters = processInfo.getCounters();
>+      if (counters)  
>+        [this._startupIO.startupSessionRestoreReadBytes, 
>+          this._startupIO.startupSessionRestoreWriteBytes] = counters;

Nit: Add curly braces around this, it makes me nervous :P 

>+  if (isWindows) {
>+    do_check_true("startupSessionRestoreReadBytes" in payload.simpleMeasurements &&
>+      payload.simpleMeasurements.startupSessionRestoreReadBytes > 0);
>+    do_check_true("startupSessionRestoreWriteBytes" in payload.simpleMeasurements &&
>+      payload.simpleMeasurements.startupSessionRestoreWriteBytes > 0);
>+  }

Nit: Break this up into 4 separate do_check_true checks, so that it's more obvious what failed when it fails. Sorry for telling you to do it like this first :)
Attachment #644065 - Flags: review?(vdjeric) → review+
Attached patch old Patch (obsolete) — Splinter Review
Attachment #644065 - Attachment is obsolete: true
Attachment #644082 - Flags: review?(vdjeric)
Attachment #644082 - Flags: review?(taras.mozilla)
Attachment #644082 - Flags: review?(vdjeric) → review+
Comment on attachment 644082 [details] [diff] [review]
old Patch

+    do_check_true("startupSessionRestoreReadBytes" in payload.simpleMeasurements);

+    do_check_true("startupSessionRestoreWriteBytes" in payload.simpleMeasurements); 

I don't think you need above

+        {'readOps': ctypes.unsigned_long_long},
+        {'writeOps': ctypes.unsigned_long_long},
+        {'otherOps': ctypes.unsigned_long_long},
+        {'otherBytes': ctypes.unsigned_long_long} ]);

I don't see any benefit to above, please take them out. Rest looks good
Attachment #644082 - Flags: review?(taras.mozilla) → review+
(In reply to Taras Glek (:taras) from comment #17)

Those fields are part of the Windows IO_COUNTERS structure as required by the API, and need to remain in the same order. I could merge the first three fields into a dummy array of three items, but that might just make the code more confusing to anyone reading it.
Attached patch Patch (obsolete) — Splinter Review
(In reply to Taras Glek (:taras) from comment #17)

I have removed 
 do_check_true("startupSessionRestoreWriteBytes" in payload.simpleMeasurements);
since
 do_check_true(payload.simpleMeasurements.startupSessionRestoreReadBytes > 0);
will already fail if startupSessionRestoreWriteBytes is not defined.
Attachment #644082 - Attachment is obsolete: true
Attachment #644365 - Flags: review?(taras.mozilla)
Comment on attachment 644365 [details] [diff] [review]
Patch

I just realized that your testpath is duplicating code... that kind of defeat purpose of testing :)

+    case "xul-window-visible":
+      Services.obs.removeObserver(this, "xul-window-visible");
+      this._hasXulWindowVisibleObserver = false;   
+      var counters = processInfo.getCounters();
+      if (counters) {
+        [this._startupIO.startupWindowVisibleReadBytes, 
+          this._startupIO.startupWindowVisibleWriteBytes] = counters;
+      }
+      break;
     case "sessionstore-windows-restored":
       Services.obs.removeObserver(this, "sessionstore-windows-restored");
       this._hasWindowRestoredObserver = false;
       // fall through
     case "test-gather-startup":
-      this.gatherStartupInformation();
+      var counters = processInfo.getCounters();
+      if (counters) {  
+        [this._startupIO.startupSessionRestoreReadBytes, 
+          this._startupIO.startupSessionRestoreWriteBytes] = counters;
+      }
+      this.gatherStartupInformation(); 

Factor this out into a common function.

(In reply to Nicolás Chaim from comment #18)
> (In reply to Taras Glek (:taras) from comment #17)
> 
> Those fields are part of the Windows IO_COUNTERS structure as required by
> the API, and need to remain in the same order. I could merge the first three
> fields into a dummy array of three items, but that might just make the code
> more confusing to anyone reading it.

Right, sorry. I misread the code.
Attachment #644365 - Flags: review?(taras.mozilla) → review-
(In reply to Taras Glek (:taras) from comment #20)
> Comment on attachment 644365 [details] [diff] [review]
> Patch
> 
> I just realized that your testpath is duplicating code... that kind of
> defeat purpose of testing :)
> 
> [...]
> Factor this out into a common function.

test-gather-startup executes the same code path as sessionstore-windows-restored, skipping the removeObserver() call, which does not apply during testing. 

The code in the xul-window-visible differs in the name of the variables where the counters are stored. The test does not test this code.

I could refactor the code into something like the following:

saveCounters("startupSessionRestore");
[...]
saveCounters("startupWindowVisible");
[...]
this.saveCounters = function(name){
  let counters = processInfo.getCounters();
  if (counters) {  
    [this._startupIO[name+"ReadBytes"], this._startupIO[name+"WriteBytes"]] =
      counters; 
  }
}

Refactoring it this way for only two calls doesn't really make much difference - it doesn't really make the code more readable or compact.
Attachment #644365 - Flags: review- → review?(taras.mozilla)
Attachment #644082 - Attachment description: Patch → old Patch
Comment on attachment 644365 [details] [diff] [review]
Patch

Ok, I buy your argument that unifying this code will just make it uglier and that it atleast tests one real io-gathering codepath.
Attachment #644365 - Flags: review?(taras.mozilla) → review+
Attached patch PatchSplinter Review
Merge with latest changes in mozilla-central.
Attachment #644365 - Attachment is obsolete: true
Attachment #645912 - Attachment is patch: true
Attachment #645912 - Flags: checkin?(vdjeric)
You need to log in before you can comment on or make changes to this bug.