Closed Bug 1120362 Opened 5 years ago Closed 5 years ago

Break up Telemetry sessions in regular intervals

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set

Tracking

(firefox39 fixed)

RESOLVED FIXED
Firefox 39
Tracking Status
firefox39 --- fixed

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

(Whiteboard: [ready])

Attachments

(4 files, 12 obsolete files)

9.99 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
7.17 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
17.90 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
29.14 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
We want to break up Telemetry sessions in regular intervals.

The current plan is:
* 24h sessions
* break them up e.g. midnight local time, so its not in the middle of sessions for everyone
Blocks: 1120363
Depends on: 1122061
Depends on: 1122063
No longer blocks: 1069869
(In reply to Vladan Djeric (:vladan) -- please needinfo! from bug 1127914, comment #23)
> > > which code is going to call clear() on the subsession histograms?
> > 
> > This will happen in bug 1120362, with the session split.
> 
> Ok, you'll want the histogram clear() to happen as close as possible to the
> snapshotting.. the more time elapses between capturing the snapshot and
> resetting the histograms, the more data gets lost.  so  for example you
> wouldn't want to have any yields from a Task in between the snapshot & clear

(Vladan Djeric (:vladan) -- please needinfo! from bug 1127914, comment #24)
> .. a snapshotAndClear method would be ideal
Blocks: 1133536
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
Attachment #8565225 - Flags: review?(vdjeric)
Note:
The Policy object introduced here into TelemetrySession.jsm is for overriding behavior that would otherwise be hard to test.
Especially timers and expected dates would be crude and/or unpredictable without this.
Rebase.
Attachment #8565225 - Attachment is obsolete: true
Attachment #8565225 - Flags: review?(vdjeric)
Attachment #8565723 - Flags: review?(vdjeric)
Rebase.
Attachment #8565226 - Attachment is obsolete: true
Attachment #8565226 - Flags: review?(vdjeric)
Attachment #8565724 - Flags: review?(vdjeric)
Rebase.
Attachment #8565227 - Attachment is obsolete: true
Attachment #8565227 - Flags: review?(vdjeric)
Attachment #8565726 - Flags: review?(vdjeric)
Attachment #8565228 - Attachment is obsolete: true
Attachment #8565228 - Flags: review?(vdjeric)
Attachment #8565727 - Flags: review?(vdjeric)
Attachment #8565727 - Attachment is obsolete: true
Attachment #8565727 - Flags: review?(vdjeric)
Attachment #8565993 - Flags: review?(vdjeric)
Comment on attachment 8565723 [details] [diff] [review]
Part 1 - Enable snapshotting and clearing subsession histograms

Review of attachment 8565723 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +453,5 @@
>      this._log.trace("getHistograms");
>  
>      let registered =
>        Telemetry.registeredHistograms(Ci.nsITelemetry.DATASET_RELEASE_CHANNEL_OPTIN, []);
> +    let hls = subsession ? Telemetry.snapshotSubsessionHistograms(false)

What's the use case for calling getHistograms() to get subsession data but not resetting? Is it about:telemetry?

::: toolkit/components/telemetry/nsITelemetry.idl
@@ +62,3 @@
>     */
>    [implicit_jscontext]
> +  jsval snapshotSubsessionHistograms([optional] in boolean clear);

is clear guaranteed to be false when a parameter isn't passed?
Attachment #8565723 - Flags: review?(vdjeric) → review+
Comment on attachment 8565724 [details] [diff] [review]
Part 2 - Enable snapshotting and clearing keyed subsession histograms

Review of attachment 8565724 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Telemetry.cpp
@@ +1524,5 @@
>      return NS_ERROR_FAILURE;
>    if (!(JS_DefineFunction(cx, obj, "add", JSKeyedHistogram_Add, 2, 0)
>          && JS_DefineFunction(cx, obj, "snapshot", JSKeyedHistogram_Snapshot, 1, 0)
>          && JS_DefineFunction(cx, obj, "subsessionSnapshot", JSKeyedHistogram_SubsessionSnapshot, 1, 0)
> +        && JS_DefineFunction(cx, obj, "snapshotSubsessionAndClear", JSKeyedHistogram_SnapshotSubsessionAndClear, 0, 0)

snapshotSubsessionAndClear is ugly, but hopefully we'll be able to get rid of it soon
Attachment #8565724 - Flags: review?(vdjeric) → review+
Comment on attachment 8565726 [details] [diff] [review]
Part 3  - Reset subsession histograms on telemetry payload collections

Review of attachment 8565726 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +118,5 @@
> +function truncateToDays(date) {
> +  return new Date(date.getFullYear(),
> +                  date.getMonth(),
> +                  date.getDate(),
> +                  0, 0, 0, 0);

you could also do: new Date().setHours(0, 0, 0, 0)

@@ +567,5 @@
>        appUpdateChannel: UpdateChannel.get(), // TODO: In Environment, but needed for |submissionPath|
>        platformBuildID: ai.platformBuildID,
>        revision: HISTOGRAMS_FILE_VERSION,
> +
> +      subsessionStartDate: truncateToDays(this._subsessionStartDate).toISOString(),

nit: might as well have the function you're calling (truncateToDays) do all the formatting (days resolution + ISOString)

@@ +792,4 @@
>      return payloadObj;
>    },
>  
> +  getSessionPayload: function getSessionPayload(reason, clearSubsession = true) {

let's always use the same default value for the "clear" parameter (e.g. getPayload).. if the clear argument is not provided, it should be "false"

@@ +798,5 @@
>      let measurements = this.getSimpleMeasurements(reason == "saved-session");
>      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
> +    let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
> +
> +    this._subsessionStartDate = Policy.now();

- should we be declaring a new subsession even if clearSubsession = false?
- we're going to have some duplicate data in the content process ping.. it's ok for now though

@@ +878,5 @@
>  
>      this._log.trace("setupChromeProcess");
>  
> +    this._sessionStartDate = Policy.now();
> +    this._subsessionStartDate = this._sessionStartDate;

these are technically times, not dates :)
Attachment #8565726 - Flags: review?(vdjeric) → review+
Comment on attachment 8565993 [details] [diff] [review]
Part 4 - Start new telemetry subsessions on local midnight

Review of attachment 8565993 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +32,3 @@
>  const RETENTION_DAYS = 14;
>  
> +const REASON_DAILY = "daily";

make the other reasons into consts as well?

@@ +35,5 @@
> +
> +const SEC_IN_ONE_DAY  = 24 * 60 * 60;
> +const MS_IN_ONE_DAY   = SEC_IN_ONE_DAY * 1000;
> +
> +const MIN_SUBSESSION_LENGTH = 10 * 60 * 1000;

nit: MIN_SUBSESSION_LENGTH_MS

@@ +141,5 @@
> +    while (number.length < places) {
> +      number = "0" + number;
> +    }
> +    return number;
> +  }

there are nicer ways to pad, e.g. 

let num = String(number);
return Array(places - num.length + 1).join("0") + num;

@@ +840,5 @@
>      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
>      let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
>  
>      this._subsessionStartDate = Policy.now();
> +    this._scheduleDailyTimer();

should every call to getSessionPayload really reschedule the daily timer? would about:telemetry reach this code path?

@@ +966,5 @@
>        this.attachObservers();
>        this.gatherMemory();
>  
>        Telemetry.asyncFetchTelemetryData(function () {});
> +      this._scheduleDailyTimer();

can you document the scheduling of this timer in your docs?

@@ +1246,5 @@
>      }
>      return Promise.resolve();
>    },
> +
> +  _scheduleDailyTimer: function() {

maybe call it rescheduleDailyTimer

@@ +1248,5 @@
>    },
> +
> +  _scheduleDailyTimer: function() {
> +    if (this._dailyTimerId) {
> +      Policy.clearDailyTimeout(this._dailyTimerId);

please log this action

@@ +1272,5 @@
> +      retentionDays: RETENTION_DAYS,
> +      addClientId: true,
> +      addEnvironment: true,
> +    };
> +    let promise = TelemetryPing.send(PING_TYPE_MAIN, payload, options);

we're not waiting for idle to send pings anymore?

some of the payload collection does heavy operations.. it was ok because it happened on idle-daily until now
Attachment #8565993 - Flags: review?(vdjeric)
And we already have idle-daily implemented in our code base.. we're re-implementing it here (minus the idle) :(
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #12)
> Comment on attachment 8565723 [details] [diff] [review]
> Part 1 - Enable snapshotting and clearing subsession histograms
> 
> Review of attachment 8565723 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetrySession.jsm
> @@ +453,5 @@
> >      this._log.trace("getHistograms");
> >  
> >      let registered =
> >        Telemetry.registeredHistograms(Ci.nsITelemetry.DATASET_RELEASE_CHANNEL_OPTIN, []);
> > +    let hls = subsession ? Telemetry.snapshotSubsessionHistograms(false)
> 
> What's the use case for calling getHistograms() to get subsession data but
> not resetting? Is it about:telemetry?

Yes.

> ::: toolkit/components/telemetry/nsITelemetry.idl
> @@ +62,3 @@
> >     */
> >    [implicit_jscontext]
> > +  jsval snapshotSubsessionHistograms([optional] in boolean clear);
> 
> is clear guaranteed to be false when a parameter isn't passed?

Yes.
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #14)
> @@ +798,5 @@
> >      let measurements = this.getSimpleMeasurements(reason == "saved-session");
> >      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
> > +    let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
> > +
> > +    this._subsessionStartDate = Policy.now();
> 
> - should we be declaring a new subsession even if clearSubsession = false?

No, good point. We will be addressing that in bug 1131544.
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #15)
> @@ +141,5 @@
> > +    while (number.length < places) {
> > +      number = "0" + number;
> > +    }
> > +    return number;
> > +  }
> 
> there are nicer ways to pad, e.g. 
> 
> let num = String(number);
> return Array(places - num.length + 1).join("0") + num;

That looks nicer, but is apparently actually usually slower then the plain looping above.
Not that i didn't do real testing here - do you have a preferences for this?

> @@ +840,5 @@
> >      let info = !IS_CONTENT_PROCESS ? this.getMetadata(reason) : null;
> >      let payload = this.assemblePayloadWithMeasurements(measurements, info, reason, clearSubsession);
> >  
> >      this._subsessionStartDate = Policy.now();
> > +    this._scheduleDailyTimer();
> 
> should every call to getSessionPayload really reschedule the daily timer?
> would about:telemetry reach this code path?

No, getting fixed in bug 1131544.

> @@ +1272,5 @@
> > +      retentionDays: RETENTION_DAYS,
> > +      addClientId: true,
> > +      addEnvironment: true,
> > +    };
> > +    let promise = TelemetryPing.send(PING_TYPE_MAIN, payload, options);
> 
> we're not waiting for idle to send pings anymore?
> 
> some of the payload collection does heavy operations.. it was ok because it
> happened on idle-daily until now

Ok, we want to do the data-collection at regular intervals - i think we have conflicting goals there (turnaround vs. possible perf issues).
Can we move that to a follow-up bug for discussing next week?
(In reply to Georg Fritzsche [:gfritzsche] from comment #19)
> > @@ +1272,5 @@
> > > +      retentionDays: RETENTION_DAYS,
> > > +      addClientId: true,
> > > +      addEnvironment: true,
> > > +    };
> > > +    let promise = TelemetryPing.send(PING_TYPE_MAIN, payload, options);
> > 
> > we're not waiting for idle to send pings anymore?
> > 
> > some of the payload collection does heavy operations.. it was ok because it
> > happened on idle-daily until now
> 
> Ok, we want to do the data-collection at regular intervals - i think we have
> conflicting goals there (turnaround vs. possible perf issues).
> Can we move that to a follow-up bug for discussing next week?
Flags: needinfo?(vdjeric)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #16)
> And we already have idle-daily implemented in our code base.. we're
> re-implementing it here (minus the idle) :(

That is not giving us fixed times though, right?
Cleared up the above in a chat - we will go forward with this and consider better-performing approaches later after the first landing.
Flags: needinfo?(vdjeric)
Attachment #8565993 - Attachment is obsolete: true
Attachment #8567231 - Flags: review?(vdjeric)
> That looks nicer, but is apparently actually usually slower then the plain
> looping above.
> Not that i didn't do real testing here - do you have a preferences for this?

I guess it makes sense that it would be slower. I don't think performance is a consideration for such rarely-executed code though, so let's go with the shorter version. 

> Ok, we want to do the data-collection at regular intervals - i think we have
> conflicting goals there (turnaround vs. possible perf issues).
> Can we move that to a follow-up bug for discussing next week?

Sure. But let's not uplift to Aurora until we have all the big correctness & performance issues resolved.
Comment on attachment 8567231 [details] [diff] [review]
Part 4 - Start new telemetry subsessions on local midnight

Review of attachment 8567231 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +840,5 @@
>  
> +    if (clearSubsession) {
> +      this._subsessionStartDate = Policy.now();
> +    }
> +    this._rescheduleDailyTimer();

this line is getting fixed in bug 1131544?

@@ +1266,5 @@
> +                    + ", scheduled: " + new Date(now.getTime() + msUntilCollection));
> +    this._dailyTimerId = Policy.setDailyTimeout(() => this._onDailyTimer(), msUntilCollection);
> +  },
> +
> +  _onDailyTimer: function() {

check for shutdown?
Attachment #8567231 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #28)
> ::: toolkit/components/telemetry/TelemetrySession.jsm
> @@ +840,5 @@
> >  
> > +    if (clearSubsession) {
> > +      this._subsessionStartDate = Policy.now();
> > +    }
> > +    this._rescheduleDailyTimer();
> 
> this line is getting fixed in bug 1131544?

I though about moving it into the block above, but think it doesn't hurt to do a timer scheduling check.
I can move it up though if you prefer.
Adopted above changes.
Attachment #8567231 - Attachment is obsolete: true
Attachment #8567911 - Flags: review+
Whiteboard: [ready]
Rebase.
Attachment #8567911 - Attachment is obsolete: true
Attachment #8568526 - Flags: review+
Depends on: 1154228
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.