If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Try to collect data for Telemetry pings when the user is idle

RESOLVED FIXED in Firefox 39

Status

()

Toolkit
Telemetry
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: vladan, Assigned: gfritzsche)

Tracking

Trunk
mozilla40
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 unaffected, firefox39 fixed, firefox40 fixed)

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

3 years ago
Telemetry can cause jank when collecting certain types of data (e.g. gatherMemory() afair), so we should try to do collect data when the user is idle.

So for example, instead of collecting the "daily" ping at midnight local time, we could schedule a timer for 11pm local time and then collect data when we get the next "user-idle" notification. If the user isn't idle for the entire hour, then we can collect it at midnight as originally intended.

Comment 1

3 years ago
gatherMemory is only for telemetry not FHR, correct? I don't think this blocks release.
Flags: needinfo?(vdjeric)
(Reporter)

Comment 2

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> gatherMemory is only for telemetry not FHR, correct? I don't think this
> blocks release.

Hmm, are you suggesting we don't call gatherMemory() for the "main" ping type? Right now the classic saved-session payload has all the same measurements (including gatherMemory) as any "main" payload (daily, shutdown, environment-changed)
Flags: needinfo?(vdjeric)

Comment 3

3 years ago
No I'm suggesting that we shouldn't be calling gatherMemory for release users, only those with telemetry enabled.
(Reporter)

Comment 4

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> No I'm suggesting that we shouldn't be calling gatherMemory for release
> users, only those with telemetry enabled.

Yeah we should classify all measurements (not just Histograms.json histograms) as optInOnRelease/optOutOnRelease.
Depends on bug 1133536 for the scheduler implementation.
Depends on: 1133536
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #4)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> > No I'm suggesting that we shouldn't be calling gatherMemory for release
> > users, only those with telemetry enabled.
> 
> Yeah we should classify all measurements (not just Histograms.json
> histograms) as optInOnRelease/optOutOnRelease.

That is bug 1122062.
Blocks: 1143796
Assignee: nobody → alessio.placitelli
Assignee: alessio.placitelli → nobody
Assignee: nobody → gfritzsche
Created attachment 8581923 [details] [diff] [review]
Collect daily ping on user idle
Attachment #8581923 - Flags: review?(vdjeric)
(Reporter)

Comment 8

3 years ago
Can you do a diff vs m-c?
Flags: needinfo?(gfritzsche)
(Reporter)

Comment 9

3 years ago
Oh I see, this stacks on top of bug 1140037. Nevermind
Flags: needinfo?(gfritzsche)
No longer blocks: 1143796
Status: NEW → ASSIGNED
Depends on: 1143796
No longer depends on: 1133536
Version: unspecified → Trunk
Created attachment 8582443 [details] [diff] [review]
Collect daily ping on user idle

Note that i moved the "daily due" logic refactoring over here.

This stacks on bug 1143796 (and therefore bug 1133536).
Attachment #8581923 - Attachment is obsolete: true
Attachment #8581923 - Flags: review?(vdjeric)
Attachment #8582443 - Flags: review?(vdjeric)
Created attachment 8583137 [details] [diff] [review]
Collect daily ping on user idle

Rebased, now part of this stack: https://pastebin.mozilla.org/8827100
Attachment #8582443 - Attachment is obsolete: true
Attachment #8582443 - Flags: review?(vdjeric)
Attachment #8583137 - Flags: review?(vdjeric)
(Reporter)

Comment 12

3 years ago
Comment on attachment 8583137 [details] [diff] [review]
Collect daily ping on user idle

Review of attachment 8583137 [details] [diff] [review]:
-----------------------------------------------------------------

What's the idle tick interval? If it's one hour (or gets changed to one hour at some point), and the user goes idle at 23:40, the next tick won't occur until 0:40. rescheduleTimeout should take into account next midnight when scheduling an idle tick

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +451,5 @@
>      if (this._schedulerTimer) {
>        Policy.clearSchedulerTickTimeout(this._schedulerTimer);
>      }
>  
> +    const interval = this._isUserIdle ? SCHEDULER_TICK_IDLE_INTERVAL_MS :

SCHEDULER_TICK_IDLE_INTERVAL_MS is defined in another bug?

@@ +468,5 @@
> +    // This is today's date and also the previous midnight (0:00).
> +    const todayDate = truncateToDays(nowDate);
> +    const nearestMidnight = getNearestMidnight(nowDate);
> +    // If we are close to midnight, we check against that, otherwise against the last midnight.
> +    const checkDate = !nearestMidnight ? todayDate : nearestMidnight;

don't negate the condition, makes it harder to read

@@ +478,5 @@
> +      this._log.trace("_isDailyPingDue - already sent one today");
> +      return false;
> +    }
> +
> +    if (!nearestMidnight) {

can you make this into: 

if (sentPingToday) {
} else if (!sentPingToday && !nearestMidnight) { <-- redundancy for readability
} else {
}

right now, it's hard to reason about what the conditions are when the idleness check is done at line 502

@@ +485,4 @@
>        return true;
>      }
>  
> +    if (nowDate.getTime() < (nearestMidnight.getTime() - SCHEDULER_MIDNIGHT_TOLERANCE_MS)) {

getNearestMidnight already does this check, it would return null if nowDate weren't within tolerance of midnight

@@ +548,5 @@
>     */
>    _onSchedulerTick: function() {
>      if (this._shuttingDown) {
>        this._log.warn("_onSchedulerTick - already shutdown.");
> +      return Promise.reject(new Error("Already shutdown."));

good catch

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
@@ +975,5 @@
>    fakeSchedulerTimer(callback => schedulerTickCallback = callback, () => {});
>    yield TelemetrySession.setup();
>  
> +  // Make sure the daily ping gets triggered at midnight.
> +  let firstDailyDue = new Date(2035, 1, 2, 0, 0, 0);

is this because of idleness detection? document it

@@ +996,5 @@
>      Assert.ok(false, "No more daily pings should be sent/received in this test.");
>    });
>  
>    // Set the current time to a bit after midnight.
> +  let secondDailyDue = futureDate(firstDailyDue, 2 * MS_IN_ONE_HOUR);

this is a different test than the old one.. the old one was checking that we don't send 2 daily pings for the same midnight
Attachment #8583137 - Flags: review?(vdjeric)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #12)
> Comment on attachment 8583137 [details] [diff] [review]
> Collect daily ping on user idle
> 
> Review of attachment 8583137 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> What's the idle tick interval? If it's one hour (or gets changed to one hour
> at some point), and the user goes idle at 23:40, the next tick won't occur
> until 0:40. rescheduleTimeout should take into account next midnight when
> scheduling an idle tick

Good point, fixed and test coverage added.

> ::: toolkit/components/telemetry/TelemetrySession.jsm
> @@ +451,5 @@
> >      if (this._schedulerTimer) {
> >        Policy.clearSchedulerTickTimeout(this._schedulerTimer);
> >      }
> >  
> > +    const interval = this._isUserIdle ? SCHEDULER_TICK_IDLE_INTERVAL_MS :
> 
> SCHEDULER_TICK_IDLE_INTERVAL_MS is defined in another bug?
> 
> @@ +468,5 @@
> > +    // This is today's date and also the previous midnight (0:00).
> > +    const todayDate = truncateToDays(nowDate);
> > +    const nearestMidnight = getNearestMidnight(nowDate);
> > +    // If we are close to midnight, we check against that, otherwise against the last midnight.
> > +    const checkDate = !nearestMidnight ? todayDate : nearestMidnight;
> 
> don't negate the condition, makes it harder to read

Fixed.

> @@ +478,5 @@
> > +      this._log.trace("_isDailyPingDue - already sent one today");
> > +      return false;
> > +    }
> > +
> > +    if (!nearestMidnight) {
> 
> can you make this into: 
> 
> if (sentPingToday) {
> } else if (!sentPingToday && !nearestMidnight) { <-- redundancy for
> readability
> } else {
> }
> 
> right now, it's hard to reason about what the conditions are when the
> idleness check is done at line 502

Fixed per IRC.

> @@ +485,4 @@
> >        return true;
> >      }
> >  
> > +    if (nowDate.getTime() < (nearestMidnight.getTime() - SCHEDULER_MIDNIGHT_TOLERANCE_MS)) {
> 
> getNearestMidnight already does this check, it would return null if nowDate
> weren't within tolerance of midnight

Fixed, just stripped that part.

> ::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js
> @@ +975,5 @@
> >    fakeSchedulerTimer(callback => schedulerTickCallback = callback, () => {});
> >    yield TelemetrySession.setup();
> >  
> > +  // Make sure the daily ping gets triggered at midnight.
> > +  let firstDailyDue = new Date(2035, 1, 2, 0, 0, 0);
> 
> is this because of idleness detection? document it

Yes, documented.

> @@ +996,5 @@
> >      Assert.ok(false, "No more daily pings should be sent/received in this test.");
> >    });
> >  
> >    // Set the current time to a bit after midnight.
> > +  let secondDailyDue = futureDate(firstDailyDue, 2 * MS_IN_ONE_HOUR);
> 
> this is a different test than the old one.. the old one was checking that we
> don't send 2 daily pings for the same midnight

Fixed.
Created attachment 8584453 [details] [diff] [review]
Collect daily ping on user idle
Attachment #8583137 - Attachment is obsolete: true
Attachment #8584453 - Flags: review?(vdjeric)
Created attachment 8584456 [details] [diff] [review]
Fix aborted-session race

This fixes an aborted-session race that i ended up chasing for the tests here.

The way _sendDailyPing() was written we could end up triggering _saveAbortedSessionPing() from the .then() *after* we cleared the aborted save tasks and removed the saved aborted-session from disk in shutdownChromeProcess().
Attachment #8584456 - Flags: review?(vdjeric)
(Reporter)

Comment 16

3 years ago
Comment on attachment 8584453 [details] [diff] [review]
Collect daily ping on user idle

Review of attachment 8584453 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +459,5 @@
> +      timeout = SCHEDULER_TICK_IDLE_INTERVAL_MS;
> +      // We need to check though that we don't miss sending pings around midnight
> +      // when we use the longer idle intervals.
> +      const now = Policy.now();
> +      if (!this._sentPingToday(now)) {

this is incorrect, we might have sent a ping today because a ping was overdue, we would still need to send a ping at midnight. i say just drop this check, it's better to err on the side of caution and schedule a tick for midnight even if we did send today, _schedulerTickLogic will take care of it. let's keep all the logic for dealing with the edge cases in _schedulerTickLogic

@@ +461,5 @@
> +      // when we use the longer idle intervals.
> +      const now = Policy.now();
> +      if (!this._sentPingToday(now)) {
> +        let nextMidnight = new Date(truncateToDays(now));
> +        nextMidnight.setDate(nextMidnight.getDate() + 1);

let's add a getNextMidnight function.. and then both this code and getNearestMidnight can call it

@@ +471,5 @@
>      this._schedulerTimer =
> +      Policy.setSchedulerTickTimeout(() => this._onSchedulerTick(), timeout);
> +  },
> +
> +  _sentPingToday: function(nowDate) {

how about _sentDailyPingToday

@@ +514,5 @@
> +    // within the tolerance interval.
> +    if (!this._isUserIdle && (nowDate.getTime() < nearestMidnight.getTime())) {
> +      this._log.trace("_isDailyPingDue - waiting for user idle period");
> +      return false;
> +    }

i'm thinking we could even drop this functionality.. the amount jank caused by telemetry is relatively small, and if it only happens at midnight it's going to be rare, and postponing it 15 minutes might make much of a difference. Your call.
Attachment #8584453 - Flags: review?(vdjeric) → review+
(Reporter)

Updated

3 years ago
Attachment #8584456 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #16)
> @@ +514,5 @@
> > +    // within the tolerance interval.
> > +    if (!this._isUserIdle && (nowDate.getTime() < nearestMidnight.getTime())) {
> > +      this._log.trace("_isDailyPingDue - waiting for user idle period");
> > +      return false;
> > +    }
> 
> i'm thinking we could even drop this functionality.. the amount jank caused
> by telemetry is relatively small, and if it only happens at midnight it's
> going to be rare, and postponing it 15 minutes might make much of a
> difference. Your call.

I think we should leave as it addresses the original point of the bug.
We want to collect at/around midnight, so we can optimistically just try to collect a bit before it if we happen to find an idle time.

Comment 18

3 years ago
I just want to check something here. It's important that no matter whether the ping is started before or after midnight, that the subsessionStartDate is for the correct day, so e.g.:

Monday       Midnight      Tuesday
A          *                   B
A                     *        B

no matter exactly when we split the subsession, "A" should have subsessionStartDate=Monday and "B" should have subsessionStartDate=Tuesday.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #18)
> I just want to check something here. It's important that no matter whether
> the ping is started before or after midnight, that the subsessionStartDate
> is for the correct day, so e.g.:
> 
> Monday       Midnight      Tuesday
> A          *                   B
> A                     *        B
> 
> no matter exactly when we split the subsession, "A" should have
> subsessionStartDate=Monday and "B" should have subsessionStartDate=Tuesday.

This is not guaranteed right now, for this bug and the TelemetryScheduler coalescing.
If this is a hard requirement i'll file a phase 3 bug to deal with it.
Flags: needinfo?(benjamin)

Comment 20

3 years ago
The analysis requirement is that we associate most of the data that is collected on Monday with Monday and the data that is collected on Tuesday with Tuesday. So if we have a session that starts right before midnight but gets coalesced to last most of Tuesday, we still need to associate that with Tuesday's stats.
Flags: needinfo?(benjamin)
https://hg.mozilla.org/integration/fx-team/rev/e0ed9413cc17
https://hg.mozilla.org/integration/fx-team/rev/4d163f1939b5
Backed out for shutdown crashes:
https://hg.mozilla.org/integration/fx-team/rev/b482ca2ea198
https://hg.mozilla.org/integration/fx-team/rev/a0fae9f2db9b
Flags: needinfo?(gfritzsche)
Being handled in bug 1139460.
Flags: needinfo?(gfritzsche)
try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=256df1f0f2f6
fxteam push: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=d2567d89cda3

https://hg.mozilla.org/integration/fx-team/rev/a8edee74d07f
https://hg.mozilla.org/integration/fx-team/rev/d2567d89cda3
This and everything else from that push is backed out in https://hg.mozilla.org/integration/fx-team/rev/6901a267f856 for Windows PGO XPCShell failures:

https://treeherder.mozilla.org/logviewer.html#?job_id=2520751&repo=fx-team
Flags: needinfo?(gfritzsche)
Bug 1139460.
Flags: needinfo?(gfritzsche)
fx-team push: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=cda9f6c087a6
pgo try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cb8ce2e0436
previous full try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=256df1f0f2f6

https://hg.mozilla.org/integration/fx-team/rev/cb4d1c3b55a2
https://hg.mozilla.org/integration/fx-team/rev/cda9f6c087a6
https://hg.mozilla.org/mozilla-central/rev/cb4d1c3b55a2
https://hg.mozilla.org/mozilla-central/rev/cda9f6c087a6
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox40: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
status-firefox38: --- → unaffected
status-firefox39: --- → affected
Comment on attachment 8584453 [details] [diff] [review]
Collect daily ping on user idle

Approved for Aurora. For approval request see bug 1139460 comment 42. For approval comments see bug 1139460 comment 43.
Attachment #8584453 - Flags: approval-mozilla-aurora+
Attachment #8584456 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/221c6a458a01
https://hg.mozilla.org/releases/mozilla-aurora/rev/daeddcbe4e41
status-firefox39: affected → fixed
You need to log in before you can comment on or make changes to this bug.