Closed Bug 1131544 Opened 5 years ago Closed 5 years ago

Change TelemetrySession.jsm to conform to the latest main ping specification.

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: Dexter, Assigned: Dexter)

References

Details

(Whiteboard: [ready])

Attachments

(2 files, 10 obsolete files)

16.53 KB, patch
Dexter
: review+
Details | Diff | Splinter Review
21.44 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
We need to change the format of the main ping based on the specification in bug 1120982.
Blocks: 1069869
Assignee: nobody → alessio.placitelli
Status: NEW → ASSIGNED
This patch stacks on bug 1120362 and updates TelemetrySession.jsm to send the correct data for the main ping.
Attached patch part 2 - Update the tests. (obsolete) — Splinter Review
This patch updates the tests to check for the correct main ping format.
Also adds a test to check that profileSsubsessionCounter is correctly serialised to disk.
Rebased.
Attachment #8565907 - Attachment is obsolete: true
Attached patch part 2 - Update the tests. (obsolete) — Splinter Review
Rebased
Attachment #8565913 - Attachment is obsolete: true
Attachment #8566001 - Flags: review?(gfritzsche)
Attachment #8566002 - Flags: review?(gfritzsche)
Apparently introduced by part 1 here:
gfritzsche> JavaScript error: resource://gre/modules/TelemetrySession.jsm, line 23: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIXPCComponents_Utils.import]
<gfritzsche> Hm, second one is: Cu.import("resource://services-common/utils.js");
I saw the following float by when test-driving - could this result from trying to load data in the content process?
We need to skip those parts there.

1424361529039	Toolkit.Telemetry	TRACE	TelemetrySession::observe - content-child-shutdown notified.
1424361529040	Toolkit.Telemetry	TRACE	TelemetrySession::sendContentProcessPing - Reason saved-session
1424361529041	Toolkit.Telemetry	TRACE	TelemetrySession::getSessionPayload - reason: saved-session, clearSubsession: true
1424361529044	Toolkit.Telemetry	TRACE	TelemetrySession::getSimpleMeasurements
1424361529055	Toolkit.Telemetry	TRACE	TelemetrySession::assemblePayloadWithMeasurements - reason: saved-session, submitting subsession data: false
1424361529055	Toolkit.Telemetry	TRACE	TelemetrySession::getHistograms - subsession: false, clearSubsession: true
1424361529118	Toolkit.Telemetry	TRACE	TelemetrySession::getKeyedHistograms - subsession: false, clearSubsession: true
1424361529124	Toolkit.Telemetry	TRACE	TelemetrySession::getThreadHangStats
1424361529131	Toolkit.Telemetry	TRACE	TelemetrySession::_scheduleDailyTimer - now: Thu Feb 19 2015 16:58:49 GMT+0100 (CET), scheduled: Fri Feb 20 2015 00:00:00 GMT+0100 (CET)
*************************
A coding exception was thrown and uncaught in a Task.

Full message: TypeError: invalid path component
Full stack: join@resource://gre/modules/osfile/ospath_unix.jsm:90:1
Impl._saveSessionData<@resource://gre/modules/TelemetrySession.jsm:1346:19
TaskImpl_run@resource://gre/modules/Task.jsm:314:40
TaskImpl@resource://gre/modules/Task.jsm:275:3
createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:249:14
getSessionPayload@resource://gre/modules/TelemetrySession.jsm:870:5
sendContentProcessPing@resource://gre/modules/TelemetrySession.jsm:1074:19
Impl.observe@resource://gre/modules/TelemetrySession.jsm:1186:9

*************************
1424361529145	Toolkit.Telemetry	TRACE	TelemetryPing::observe - content-child-shutdown notified.
Assertion failure: mUsedShmems.empty(), at /Users/gfritzsche/moz/mc1/gfx/layers/ipc/ISurfaceAllocator.cpp:53
1424361529154	Toolkit.Telemetry	TRACE	TelemetrySession::receiveMessage - Message name Telemetry:Payload
Comment on attachment 8566001 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs

Review of attachment 8566001 [details] [diff] [review]:
-----------------------------------------------------------------

Only a quick look because of the afore-mentioned issues.

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +613,4 @@
>      let subsessionStartDate = toLocalTimeISOString(truncateToDays(this._subsessionStartDate));
> +    // Compute the subsession length in milliseconds, then convert to seconds.
> +    let subsessionLength =
> +      (Date.now() - this._subsessionStartDate.getTime()) / 1000;

Math.floor?

@@ +842,5 @@
>  
> +  /**
> +   * Break the subsession.
> +   */
> +  _breakSubsession: function () {

This is named misleadingly - we're not "breaking" or "breaking up" subsessions here.
Maybe _startNewSubsession() or something?

@@ +857,5 @@
>      let measurements = this.getSimpleMeasurements(reason == "saved-session");
>      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
>      let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
>  
> +    this._breakSubsession();

Don't do this for the content process.

@@ +862,4 @@
>      this._scheduleDailyTimer();
>  
> +    // Persist session data to disk (don't wait until it completes).
> +    this._saveSessionData();

I'm not sure if we have a guaranteed ordering with this in case we quickly call _saveSessionData two times.
We need to find out - this has two interruption points / yields in _saveSessionData already and who knows how many else on the way through OS.File etc.

Also - don't do this for the content process.

@@ +1313,5 @@
> +   *                           can't be loaded.
> +   */
> +  _loadSessionData: function () {
> +    if (this._loadSessionDataTask) {
> +      return this._loadSessionDataTask;

Ok, i think the implementation is a little complex here.
We only need to load this once at startup, so we don't need to save tasks or anything.
Just skip the whole this._loadSessionDataTask thing, make it |_loadSessionData: Task.async(...| and we require it to be finished as part of the init task.
Attachment #8566001 - Flags: review?(gfritzsche)
Comment on attachment 8566001 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs

Review of attachment 8566001 [details] [diff] [review]:
-----------------------------------------------------------------

Heads-up for the previousBuildID -> previousBuildId change (only casing changing).
Attachment #8566001 - Flags: feedback?(vdjeric)
Component: Client: Desktop → Telemetry
Product: Firefox Health Report → Toolkit
Comment on attachment 8566001 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs

Review of attachment 8566001 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +857,5 @@
>      let measurements = this.getSimpleMeasurements(reason == "saved-session");
>      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
>      let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
>  
> +    this._breakSubsession();

Came up in bug 1120362, comment 14 and is probably best addressed here:
Let's do _breakSubsession() and _saveSessionData() only |if (clearSubsession)|.
Comment on attachment 8566002 [details] [diff] [review]
part 2 - Update the tests.

Review of attachment 8566002 [details] [diff] [review]:
-----------------------------------------------------------------

With fakeNow() and adding a getUUID() override etc. we should be able to test all these explicitly against expected values:
timezoneOffset
sessionId
subsessionId
previousSubsessionId
subsessionCounter
profileSubsessionCounter
sessionStartDate
subsessionStartDate
subsessionLength

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
@@ +257,5 @@
> +  for (let f in EXPECTED_INFO_FIELDS_TYPES) {
> +    Assert.ok(f in data, f + " must be available.");
> +
> +    let checkFunc = EXPECTED_INFO_FIELDS_TYPES[f];
> +    Assert.ok(checkFunc(data[f]), f + " must have the correct type.");

Add data[f] to the message so we can see what the invalid data is?

@@ +267,5 @@
> +  }
> +
> +  Assert.ok(ALLOWED_REASONS.indexOf(data.reason) >= 0,
> +            "Payload must contain an allowed reason.");
> +  Assert.ok(data.subsessionLength >= 0, "Must be a valid length.");

Also:
subsessionCounter >= 0
profileSubsessionCounter >= 0 and >= subsessionCounter
subsessionStartDate >= sessionStartDate
timezoneOffset in (-12*60, +12*60)
Attachment #8566002 - Flags: review?(gfritzsche)
Thanks Georg, this patch addresses your comments in the previous review.

I've introduced the SerialScheduler object which deals with write serialisation to the disk (this comes from storage.jsm !). This could be shared, ideally.

Please also note that starting a new subsession only |if (clearSubsession)| breaks part 2 as of now.
Attachment #8566001 - Attachment is obsolete: true
Attachment #8566001 - Flags: feedback?(vdjeric)
Attachment #8567167 - Flags: review?(gfritzsche)
Comment on attachment 8567167 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs (v2)

Review of attachment 8567167 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +346,5 @@
> +  _sessionId: generateUUID(),
> +  // Random subsession id.
> +  _subsessionId: null,
> +  // Subsession id of the previous subsession (even if it was in a different session),
> +  // null on first run.

So this needs persistence too - add _previousSubsessionId to load/save.

@@ +868,5 @@
>  
> +    if (!IS_CONTENT_PROCESS && clearSubsession) {
> +      this.startNewSubsession();
> +      // Persist session data to disk (don't wait until it completes).
> +      SerialScheduler.enqueueTask(this._saveSessionData.bind(this));

We should collect the data to persist here though.

@@ +1306,5 @@
>        };
>  
>        if (Telemetry.canSend || testing) {
> +        return this.savePendingPings()
> +                .then(() => SerialScheduler.enqueueTask(this._saveSessionData.bind(this)))

The subsession data didnt actually change here right?
So we dont need to save it?

@@ +1442,5 @@
>      };
>    },
>  };
> +
> +let SerialScheduler = {

This is a utility and should go before TelemetrySession and Impl.

@@ +1456,5 @@
> +   * @param {Function} aFunction The task function to
> +   * @return {Promise} A promise resolved when the enqueued task completes.
> +   */
> +  enqueueTask: function (aFunction) {
> +    let promise = new Promise((resolve, reject) => 

Nit: Trailing whitespace.

@@ +1474,5 @@
> +    if (!this._queuedOperations.length || this._queuedInProgress) {
> +      return;
> +    }
> +
> +    this._log.trace("Performing queued operation.");

trace("_popAndPerformQueuedOperation ...

@@ +1482,5 @@
> +    try {
> +      this._queuedInProgress = true;
> +      promise = func();
> +    } catch (ex) {
> +      this._log.warn("Queued operation threw during execution. ", ex);

warn("_popAndPerformQueuedOperation - ...

@@ +1491,5 @@
> +    }
> +
> +    if (!promise || typeof(promise.then) != "function") {
> +      let msg = "Queued operation did not return a promise: " + func;
> +      this._log.warn(msg);

warn("_popAndPerformQueuedOperation - " + msg)

@@ +1500,5 @@
> +      return;
> +    }
> +
> +    promise.then(result => {
> +        this._log.trace("Queued operation completed.");

trace("_popAndPerformQueuedOperation - ...

@@ +1506,5 @@
> +        resolve(result);
> +        this._popAndPerformQueuedOperation();
> +      },
> +      error => {
> +        this._log.warn("Failure when performing queued operation.", error);

warn("_popAndPerformQueuedOperation - ...
Attachment #8567167 - Flags: feedback+
Attachment #8567167 - Flags: review?(gfritzsche)
Attachment #8567809 - Attachment is obsolete: true
Attachment #8567853 - Flags: review?(gfritzsche)
Attached patch part 2 - Update the tests (v2) (obsolete) — Splinter Review
Thanks Georg for the review. I've added some more data sanity check as you suggested.
Attachment #8566002 - Attachment is obsolete: true
Attachment #8567880 - Flags: review?(gfritzsche)
Comment on attachment 8567853 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs (v3)

Review of attachment 8567853 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +219,5 @@
>      return [parseInt(io.readBytes), parseInt(io.writeBytes)];
>    }
>  };
>  
> +let SerialScheduler = {

Given that we only use it for one thing, lets name it explicitly, e.g. StateSaveSerializer.

@@ +220,5 @@
>    }
>  };
>  
> +let SerialScheduler = {
> +  // To prevent race conditions, we serialise the execution.

That comment should go before |let SerialScheduler ...| and could also be more descriptive:
* serialization of async tasks
* problem this is solving

@@ +229,5 @@
> +  /**
> +   * Enqueues an operation to a list to serialise their execution in order to prevent race
> +   * conditions. Useful to serialise access to disk.
> +   *
> +   * @param {Function} aFunction The task function to

"task function to"? What is the expected behavior of the function?
It definitely needs to return a promise.

@@ +422,5 @@
>    _hasXulWindowVisibleObserver: false,
>    _startupIO : {},
>    // The previous build ID, if this is the first run with a new build.
>    // Undefined if this is not the first run, or the previous build ID is unknown.
> +  _previousBuildId: undefined,

Can't we just set this to null too?

@@ +429,5 @@
>    // where source is a weak reference to the child process,
>    // and payload is the telemetry payload from that child process.
>    _childTelemetry: [],
> +  // Generate a unique id once per session so the server can cope with
> +  // duplicate submissions. This is shared by subsessions.

We can add a more general info about session & subsession ids here.
They are not only for duplicate submissions, but also to deal with orphaning and other oddities.

@@ +716,2 @@
>        revision: HISTOGRAMS_FILE_VERSION,
> +      timezoneOffset: this._sessionStartDate.getTimezoneOffset(),

Note that Date.getTimezoneOffset() unintuitively returns negative values if you're ahead of UTC and vice versa (e.g. -60 for UTC+1).
We should invert the sign here.

Also, the offset can change between subsessions, let's submit the _subsessionStartDate offset.

@@ +1056,5 @@
>      this._thirdPartyCookies.init();
>  
>      // Record old value and update build ID preference if this is the first
>      // run with a new build ID.
> +    let previousBuildId = Preferences.get(PREF_PREVIOUS_BUILDID, undefined);

Why not null?

@@ +1478,5 @@
> +          typeof(data.profileSubsessionCounter) == "number" &&
> +          "previousSubsessionId" in data) {
> +        this._previousSubsessionId = data.previousSubsessionId;
> +        // Add |_subsessionCounter| to the |_profileSubsessionCounter| to account for
> +        // new subsession while loading still takes place.

Side-note: That will always be exactly 1 - the current subsessions. Could be more explicit about that.

@@ +1485,5 @@
> +        return data;
> +      }
> +    } catch (e) {
> +      // fall through to next option
> +      this._log.info("_loadSessionData - Cannot load session data file.", e);

Nit: No trailing dot - |log.info("foo", e)| produces something like "foo: <error info>".

@@ +1491,5 @@
> +
> +    // Wait on persisting the data.
> +    let sessionData = this._getSessionDataObject();
> +    let writePromise = SerialScheduler.enqueueTask(() => this._saveSessionData(sessionData));
> +    yield writePromise;

It's confusing and unexpected to do this in a _load* function.
Let's do that explicitly in the delayed init task, right after _load*.

Be sure to gracefully handle rejection there as we might always run into oddities like AV software blocking access to the file.
Note that we don't need to go through the SerialScheduler when moving it there as we don't expect anything else there.

@@ +1514,5 @@
> +    let dataDir = OS.Path.join(OS.Constants.Path.profileDir, "datareporting");
> +    yield OS.File.makeDir(dataDir);
> +
> +    let filePath = OS.Path.join(dataDir, SESSION_STATE_FILE_NAME);
> +    yield CommonUtils.writeJSON(sessionData, filePath);

We should at least log on error here with an explicit mention of the file path.
Attachment #8567853 - Flags: review?(gfritzsche) → feedback+
* Renames SerialScheduler and adds more comments.
* Removes the fallback session data save from _loadSessionData.
* Adds and fixes some comments.
Attachment #8567853 - Attachment is obsolete: true
Attachment #8567946 - Flags: review?(gfritzsche)
Comment on attachment 8567946 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs (v4)

Review of attachment 8567946 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +222,5 @@
>  
> +/**
> + * This object allows the serialisation of asynchronous tasks. This is particularly
> + * useful to serialise write access to the disk in order to prevent race conditions
> + * to corrupt the data being written.

Add a line like:
We are using this to synchronize saving to the file that TelemetrySession persists its state in.

@@ +224,5 @@
> + * This object allows the serialisation of asynchronous tasks. This is particularly
> + * useful to serialise write access to the disk in order to prevent race conditions
> + * to corrupt the data being written.
> + */
> +let StateSaveSerializer = {

Sorry for missing this before, but this should be gStateSaveSerializer.

@@ +433,5 @@
>    // where source is a weak reference to the child process,
>    // and payload is the telemetry payload from that child process.
>    _childTelemetry: [],
> +  // Generate a unique id once per session so the server can cope with
> +  // duplicate submissions, orphaning and other oddities. This is shared by subsessions.

Nit: "The id is shared across subsessions."

@@ +1477,5 @@
>    },
>  
> +  /**
> +   * Loads session data from the session data file, set their values in TelemetrySession
> +   * and return an object containing the data.

Nit: "sets the values", "returns".
Why do we set the values & return them? We don't need the returned value now, lets skip the returning for now.

@@ +1480,5 @@
> +   * Loads session data from the session data file, set their values in TelemetrySession
> +   * and return an object containing the data.
> +   *
> +   * @return {Promise<Object>} A promise which is resolved when loading has completed. The
> +   *                           loaded data is returned. Returns null if data file can't be

Nit: "if the data file"
Attachment #8567946 - Flags: review?(gfritzsche) → review+
Comment on attachment 8567946 [details] [diff] [review]
part 1 - Change TelemetrySession.jsm and update the docs (v4)

Vladan, you might want to take a look here.
The most interesting part here is that we have to work with persistant data now, so we have to use save the TelemetrySession state to a disk and load it on (delayed) init.
Attachment #8567946 - Flags: feedback?(vdjeric)
Comment on attachment 8567880 [details] [diff] [review]
part 2 - Update the tests (v2)

Review of attachment 8567880 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good if we can get the below addressed.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
@@ +120,5 @@
>    let session = Cu.import("resource://gre/modules/TelemetrySession.jsm");
>    session.Policy.now = () => date;
>  }
>  
> +function fakeGeneraeUUID(generatorFunc) {

typo: fakeGenerateUUID

@@ +584,5 @@
> +  const baseUUID = "009fd1ad-b85e-4817-b3e5-00000000000";
> +  let uuidIndex = 0;
> +  fakeGeneraeUUID(() => baseUUID + uuidIndex++);
> +  // A function to extract an index out of a fake UUID.
> +  let getExpectedIndex = uuid => parseInt(uuid.replace(baseUUID, ""), 10);

getUUIDIndex? "expected" is used for "the values we expect the payload to have" in these tests, not for the actual payload values.

@@ +605,5 @@
> +  Assert.equal(getExpectedIndex(payload.info.sessionId), 0,
> +               "The fake session UUID must have an index of 0.");
> +  // The subsessionId is the second UUID requested
> +  Assert.equal(getExpectedIndex(payload.info.subsessionId), 1,
> +               "The fake subsession UUID must have an index of 1.");

Hm, that seems pretty fragile. I guess we could live with that for a bit, but we should really fix that.
How about just using different policy functions for session and subsession id?
That way we know exactly which ones we will end up with.

@@ +1017,5 @@
>    let ping = TelemetryFile.popPendingPings().next();
>    Assert.equal(ping.value.clientId, gDataReportingClientID);
>  });
>  
> +add_task(function* test_savedSessionData() {

Can we also test loading an invalid state file please?
Attachment #8567880 - Flags: review?(gfritzsche) → review+
Fixed the comments and changed the behaviour of |_loadSessionData|.
Attachment #8567946 - Attachment is obsolete: true
Attachment #8567946 - Flags: feedback?(vdjeric)
Attachment #8568039 - Flags: review+
Attachment #8568039 - Flags: feedback?(vdjeric)
Added test coverage for loading an invalid session data file. Changed |fakeGenerateUUID| to allow for less fragile tests.
Attachment #8567880 - Attachment is obsolete: true
Attachment #8568041 - Flags: review+
Whiteboard: [ready]
Rebase.
Attachment #8568039 - Attachment is obsolete: true
Attachment #8568039 - Flags: feedback?(vdjeric)
Attachment #8568528 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/a410838a0f98
https://hg.mozilla.org/mozilla-central/rev/f3d186d21ca1
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Depends on: 1145980
You need to log in before you can comment on or make changes to this bug.