Closed Bug 1405299 Opened 3 years ago Closed 3 years ago

TelemetryStopwatch errors (and others) if user accepts Data Submission Policy before TelemetrySend::setup

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

Details

Attachments

(3 files)

Attached file tracelog of failure
From bug 1404332 comment 10:
The steps to reproduce are:
1. Open FF with a new profile.
2. Set extensions.shield-recipe-client.user_id pref to the string "pioneer-quick-timer-2".
3. Set toolkit.telemetry.log.level - Trace
4. Set "extensions.shield-recipe-client.dev_mode" pref to true
5. Restart FF.
6. Open Browser console.

See attached tracelog for the result.

Basically it seems as though 

TelemetryReportingPolicy._userNotified() calls
  TelemetrySend.notifyCanUpload() which calls
    Scheduler.triggerSendingPings(true) directly which spins
      _doSendTask which goes ahead and tries to send 3 pending pings, re-entrantly

Then TelemetryStopwatch has a fit (because of the re-entrant calls to TelemetryStorage::loadPendingPing)
Later we get an XHR error because TelemetrySend hasn't gotten around to pulling the server host from the prefs yet.

It seems as though, once the server is read from prefs, it quiets itself down and resumes normal behaviour... But we should still test and fix this behaviour.
(In reply to Chris H-C :chutten from comment #0)
> Basically it seems as though 
> 
> TelemetryReportingPolicy._userNotified() calls
>   TelemetrySend.notifyCanUpload() which calls
>     Scheduler.triggerSendingPings(true) directly which spins
>       _doSendTask which goes ahead and tries to send 3 pending pings,
> re-entrantly
>
> ...
> 
> It seems as though, once the server is read from prefs, it quiets itself
> down and resumes normal behaviour... But we should still test and fix this
> behaviour.

Good catch with this! First, I think this is not a Big Deal (tm). It looks like the shield study is submitting pings before Telemetry is fully initialized (and that's ok) and before the reporting policy is displayed (ok again).

Once the policy is notified to the user, we're good to send pings, and |notifyCanUpload()| is doing exactly that: kicking off sending the pings. If, however, TelemetrySend has not completed init yet [1], then we end up with no valid value for |_server|. As a consequence, we see the errors there: not a big deal, since once init completes, the real server gets there and pings get sent.

As you mentioned we should still test and fix this issue. There may be different ways:

- We could move reading the _server pref to |earlyInit|. We're already reading the pref in |annotatCrashReport| anyway.
- Change |notifyCanUpload| so that nothing happens if we're still initializing [2]. We could check the |_sendingEnabled| pref, that could be initialized at the end of the |setup|.

As for the stopwatch, I think this issue uncovered a bug :-D The error we're seeing is due to calling |TelemetryStopwatch.start()| on the same histogram without a corresponding |cancel| or |stop| call.

By looking at the trace it seems that we're loading 3 pending pings one after the other and |loadPendingPing| is async. This means that the stopwatch is internally referring to the "same timer" for the different calls. We could fix that by calling |start|/|stop|/|cancel| by passing the ping id as the |aObj| parameter [4].

[1] - http://searchfox.org/mozilla-central/rev/a4702203522745baff21e519035b6c946b7d710d/toolkit/components/telemetry/TelemetrySend.jsm#661
[2] - http://searchfox.org/mozilla-central/rev/a4702203522745baff21e519035b6c946b7d710d/toolkit/components/telemetry/TelemetrySend.jsm#802
[3] - http://searchfox.org/mozilla-central/rev/a4702203522745baff21e519035b6c946b7d710d/toolkit/components/telemetry/TelemetryStorage.jsm#1343
[4] - http://searchfox.org/mozilla-central/rev/a4702203522745baff21e519035b6c946b7d710d/toolkit/components/telemetry/TelemetryStopwatch.jsm#127
Assignee: nobody → chutten
Comment on attachment 8917544 [details]
bug 1405299 - Key loadArchivedPing's stopwatches for re-entrancy

https://reviewboard.mozilla.org/r/188512/#review193872

::: toolkit/components/telemetry/TelemetryStorage.jsm:706
(Diff revision 1)
>     *
>     * @param {string} id The pings id.
>     * @return {promise<object>} Promise that is resolved with the ping data.
>     */
>    async loadArchivedPing(id) {
> -    TelemetryStopwatch.start("TELEMETRY_ARCHIVE_LOAD_MS");
> +    let objId = new String(id);

nit: what do you think about changing `objId` to `idAsObject` or similar? I don't mind too much, so feel free to ignore this if it's too nitpicky
Attachment #8917544 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8917545 [details]
bug 1405299 - Do not attempt to upload pings if sending is not yet enabled

https://reviewboard.mozilla.org/r/188514/#review193876

::: toolkit/components/telemetry/TelemetrySend.jsm:800
(Diff revision 1)
>  
>    /**
>     * Notify that we can start submitting data to the servers.
>     */
>    notifyCanUpload() {
> +    if (!this._sendingEnabled) {

`_sendingEnabled` is still flipped before the server pref is set (see [here](http://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/toolkit/components/telemetry/TelemetrySend.jsm#656,661)). Maybe we should make sure it only gets set to `true` when appropriate.

::: toolkit/components/telemetry/TelemetrySend.jsm:801
(Diff revision 1)
>    /**
>     * Notify that we can start submitting data to the servers.
>     */
>    notifyCanUpload() {
> +    if (!this._sendingEnabled) {
> +      return Promise.resolve();

Let's add a log line here.
Comment on attachment 8917545 [details]
bug 1405299 - Do not attempt to upload pings if sending is not yet enabled

https://reviewboard.mozilla.org/r/188514/#review193876

> `_sendingEnabled` is still flipped before the server pref is set (see [here](http://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/toolkit/components/telemetry/TelemetrySend.jsm#656,661)). Maybe we should make sure it only gets set to `true` when appropriate.

There's no async/wait between _sendingEnabled and the server being set, so it can be treated atomically.

By that same argument, I have no problems moving _sendingEnabled = true down a few lines.
Comment on attachment 8917544 [details]
bug 1405299 - Key loadArchivedPing's stopwatches for re-entrancy

https://reviewboard.mozilla.org/r/188512/#review193872

> nit: what do you think about changing `objId` to `idAsObject` or similar? I don't mind too much, so feel free to ignore this if it's too nitpicky

I like yours better, so I fixed it.
Comment on attachment 8917545 [details]
bug 1405299 - Do not attempt to upload pings if sending is not yet enabled

https://reviewboard.mozilla.org/r/188514/#review194026
Attachment #8917545 - Flags: review?(alessio.placitelli) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e993235fbdf3
Key loadArchivedPing's stopwatches for re-entrancy r=Dexter
https://hg.mozilla.org/integration/autoland/rev/506d96e0b240
Do not attempt to upload pings if sending is not yet enabled r=Dexter
But I _have_ to. :(

I'll come up with something.
Flags: needinfo?(chutten)
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2390464f1ef
Key loadArchivedPing's stopwatches for re-entrancy r=Dexter
https://hg.mozilla.org/integration/autoland/rev/0fd360d7ba33
Do not attempt to upload pings if sending is not yet enabled r=Dexter
https://hg.mozilla.org/mozilla-central/rev/f2390464f1ef
https://hg.mozilla.org/mozilla-central/rev/0fd360d7ba33
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.