Add stable user id to telemetry ping

RESOLVED FIXED in Firefox 34

Status

defect
RESOLVED FIXED
5 years ago
8 months ago

People

(Reporter: gfritzsche, Assigned: gfritzsche)

Tracking

Trunk
Firefox 36
Dependency tree / graph
Bug Flags:
firefox-backlog +
qe-verify -

Firefox Tracking Flags

(firefox33 wontfix, firefox34 fixed, firefox35 fixed, firefox36 fixed, firefox-esr31 wontfix)

Details

Attachments

(6 attachments, 12 obsolete attachments)

23.30 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
10.55 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
8.09 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
1.24 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
10.52 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
8.05 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
No description provided.
Flags: firefox-backlog+
Flags: qe-verify?
Assignee

Updated

5 years ago
Assignee: nobody → georg.fritzsche
Assignee

Updated

5 years ago
Iteration: --- → 35.1
Iteration: 35.1 → 35.2
Status: NEW → ASSIGNED
Flags: qe-verify? → qe-verify-
Assignee

Comment 1

5 years ago
First pass, how does that look?

This adds a stable client id to the telemetry payload. The FHR client id is used, if FHR is enabled.
To keep this (and upcoming later state) around between sessions, i introduced a TelemetryState module, which persists state in a JSON file in $profdir/telemetry/state.json.
If we keep it that way, we could move the saved-pings directory into that dir as well later.

I noticed that logging is mostly absent from Telemetry, is there a specific reason?
I added logging here because proper logging facilities turned out to be very helpful with the telemetry experiments feature.

I ran into test-setup issues with the fhr/datareporting part, so left this part out for this pass.
Attachment #8491599 - Flags: review?(vdjeric)
Assignee

Comment 2

5 years ago
Separated out the FHR client id part.
Attachment #8491599 - Attachment is obsolete: true
Attachment #8491599 - Flags: review?(vdjeric)
Attachment #8492130 - Flags: review?(vdjeric)
Assignee

Comment 3

5 years ago
Posted patch Expose FHR client id (obsolete) — Splinter Review
Tiny review that might be best with you Richard?
Attachment #8492132 - Flags: review?(rnewman)
Assignee

Comment 4

5 years ago
Hm, the Telemetry patch triggers this too early in tests:
Cc["@mozilla.org/datareporting/service;1"].getService(Ci.nsISupports).wrappedJSObject.healthReporter;

... via e.g.:

DataReportingService.prototype<.healthReporter@file:///Users/gfritzsche/moz/mc2/obj/dist/NightlyDebug.app/Contents/MacOS/components/DataReportingService.js:212:46
this.TelemetryState._doInit<@resource://gre/modules/TelemetryState.jsm:121:22
TaskImpl_run@resource://gre/modules/Task.jsm:314:40
TaskImpl@resource://gre/modules/Task.jsm:275:3
createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:249:14
this.TelemetryState._init@resource://gre/modules/TelemetryState.jsm:105:22
this.TelemetryState.clientID@resource://gre/modules/TelemetryState.jsm:101:12
setup/timerCallback/<@resource://gre/modules/TelemetryPing.jsm:955:9
TaskImpl_run@resource://gre/modules/Task.jsm:314:40
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:865:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:744:7
_do_main@/Users/gfritzsche/moz/mc2/testing/xpcshell/head.js:191:5
_execute_test@/Users/gfritzsche/moz/mc2/testing/xpcshell/head.js:405:5
@-e:1:1

... as seen in:
https://tbpl.mozilla.org/?tree=Try&rev=5932b7baeb3b

I'd still like review/feedback for that patch in the mean-time to see if there's bigger changes needed.
Comment on attachment 8492130 [details] [diff] [review]
Add stable id to telemetry payload

Review of attachment 8492130 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +703,5 @@
>        addonDetails: AddonManagerPrivate.getTelemetryDetails(),
>        UIMeasurements: UITelemetry.getUIMeasurements(),
>        log: TelemetryLog.entries(),
> +      info: info,
> +      clientID: this._clientID,

The call to TelemetryState.clientId happens 60 seconds after the "profile-after-change" startup event, so that leaves a period of time during which this method (assemblePayloadWithMeasurements) could be called and "this._clientID" will be null

@@ +928,2 @@
>          }
> +        yield TelemetryState._uninit();

Are you using the underscore prefix to indicate private methods?

::: toolkit/components/telemetry/TelemetryState.jsm
@@ +20,5 @@
> +const PREF_LOGGING              = "toolkit.telemetry.logging";
> +const PREF_LOGGING_LEVEL        = PREF_LOGGING + ".level";
> +const PREF_LOGGING_DUMP         = PREF_LOGGING + ".dump";
> +const STATE_DIR                 = "telemetry";
> +const STATE_FILE_NAME           = "state.json";

Is this json file eventually going to be used for both FHR + Telemetry?
I'd prefer a more meaningful name for it, like, reportingState.json, userID.json, etc

@@ +34,5 @@
> +  return OS.File.read(file, { encoding: "utf-8" }).then(data => JSON.parse(data));
> +}
> +
> +function writeJSONAsync(object, path) {
> +  let encoder = new TextEncoder();

you can let OS.File do the encoding, https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread#OS.File.writeAtomic%28%29

@@ +77,5 @@
> + * TelemetryState holds telemetry-related state information and takes care
> + * of storing it over sessions.
> + *
> + * At the moment this is not much information, but we expect this to grow
> + * soon.

mention that the currently kept state is only the FHR/Telemetry user ID

@@ +92,5 @@
> +   * Returns a Promise<String>, which is the client id for telemetry.
> +   * While we prepare for moving FHR data over to Telemetry we re-use
> +   * the FHR client id for correlation purposes.
> +   */
> +  get clientID() {

Just a thought: I think once we add more state fields to TelemetryState, we'll want to explicitly call TelemetryState.init() from TelemetryPing and simplify the getters

@@ +119,5 @@
> +    if ("@mozilla.org/datareporting/service;1" in Cc &&
> +        gPrefs.get(PREF_HEALTHREPORT_ENABLED, true)) {
> +      let reporter = Cc["@mozilla.org/datareporting/service;1"]
> +                       .getService(Ci.nsISupports).wrappedJSObject.healthReporter;
> +      yield reporter.onInit()

nit: missing semicolon

@@ +124,5 @@
> +      fhrClientId = reporter.clientID;
> +      this._log.trace("_init - FHR active, will reuse fhr client id " + fhrClientId);
> +    }
> +
> +    yield OS.File.makeDir(this._stateDir, { ignoreExisting: true });

So if Telemetry is enabled but FHR is disabled, Telemetry will generate the unique user ID. How do you make sure FHR will use the same ID as Telemetry if the user turns on FHR at a later point?

@@ +126,5 @@
> +    }
> +
> +    yield OS.File.makeDir(this._stateDir, { ignoreExisting: true });
> +
> +    let dirty = true;

"isStateDirty" maybe?

@@ +136,5 @@
> +    }
> +
> +    if (!state || state.stateVersion != STATE_FILE_VERSION) {
> +      this._state = {
> +        clientID: generateUUID(),

why bother generating another ID if we know we have fhrClientId?

@@ +147,5 @@
> +      dirty = false;
> +      this._log.trace("_init - loaded state with client id " + state.clientID);
> +    }
> +
> +    if (fhrClientId && fhrClientId != this._state.clientID) {

I would move this block into the "else" above.. it's going to be confusing to get a message that the "client ID is being overriden with FHR client ID" if there was no state.json file to begin with

::: toolkit/components/telemetry/tests/unit/test_TelemetryState.js
@@ +4,5 @@
> +
> +const Cc = Components.classes;
> +const Ci = Components.interfaces;
> +const Cu = Components.utils;
> +const Cr = Components.results;

You don't actually use Cr in this file

@@ +35,5 @@
> +  gPrefs.set(PREF_LOGGING_DUMP, true);
> +  gPrefs.set(PREF_HEALTHREPORT_ENABLED, false);
> +});
> +
> +add_task(function* test_serialization() {

Can you also add a test the FHR "import" functionality
Attachment #8492130 - Flags: review?(vdjeric)
Assignee

Comment 6

5 years ago
(In reply to Vladan Djeric (:vladan) from comment #5)
> ::: toolkit/components/telemetry/TelemetryPing.jsm
> @@ +703,5 @@
> >        addonDetails: AddonManagerPrivate.getTelemetryDetails(),
> >        UIMeasurements: UITelemetry.getUIMeasurements(),
> >        log: TelemetryLog.entries(),
> > +      info: info,
> > +      clientID: this._clientID,
> 
> The call to TelemetryState.clientId happens 60 seconds after the
> "profile-after-change" startup event, so that leaves a period of time during
> which this method (assemblePayloadWithMeasurements) could be called and
> "this._clientID" will be null
> 
> @@ +928,2 @@
> >          }
> > +        yield TelemetryState._uninit();
> 
> Are you using the underscore prefix to indicate private methods?

Yes, this is at least common in browser/ and some other parts. Happy to use different a style if you prefer.

> ::: toolkit/components/telemetry/TelemetryState.jsm
> @@ +20,5 @@
> > +const PREF_LOGGING              = "toolkit.telemetry.logging";
> > +const PREF_LOGGING_LEVEL        = PREF_LOGGING + ".level";
> > +const PREF_LOGGING_DUMP         = PREF_LOGGING + ".dump";
> > +const STATE_DIR                 = "telemetry";
> > +const STATE_FILE_NAME           = "state.json";
> 
> Is this json file eventually going to be used for both FHR + Telemetry?
> I'd prefer a more meaningful name for it, like, reportingState.json,
> userID.json, etc

Currently it seems like we're moving FHR into Telemetry - unless there is a better plan/option - so renaming things didn't seem worth the effort now.
If we want new and less ambiguous naming, this probably works better after moving FHR over?

> @@ +124,5 @@
> > +      fhrClientId = reporter.clientID;
> > +      this._log.trace("_init - FHR active, will reuse fhr client id " + fhrClientId);
> > +    }
> > +
> > +    yield OS.File.makeDir(this._stateDir, { ignoreExisting: true });
> 
> So if Telemetry is enabled but FHR is disabled, Telemetry will generate the
> unique user ID. How do you make sure FHR will use the same ID as Telemetry
> if the user turns on FHR at a later point?

So, currently the main goal here is to correlate Telemetry to FHR for cross-checking, so losing the ID isn't problematic at this point. Hence we should be fine for now with this approach (and can avoid complications).
Comment on attachment 8492132 [details] [diff] [review]
Expose FHR client id

Review of attachment 8492132 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/healthreport/healthreporter.jsm
@@ +1116,5 @@
>      return out;
>    },
> +
> +  get clientID() {
> +    return this._state.clientID;

Without the context for how this is called: note that this won't work unless the instance has been initialized. You'll need to do something like:


    service.healthReporter.onInit().then(() => {
      let client = service.healthReporter.clientID;
      ...
    })
Attachment #8492132 - Flags: review?(rnewman) → review+
Iteration: 35.2 → ---
Assignee

Updated

5 years ago
Depends on: 1079341
Assignee

Comment 8

5 years ago
Per talks in #fhr with gps and rnewman, this doesnt work properly in regards to early initialization synchronization and we should do it differently.
We want to move things to a common datareporting client id now, that properly imports the client id from the FHR state.json, if any.
Assignee

Comment 9

5 years ago
https://tbpl.mozilla.org/?tree=Try&rev=94905bac74e2
Attachment #8492130 - Attachment is obsolete: true
Attachment #8492132 - Attachment is obsolete: true
Attachment #8501173 - Flags: review?(gps)
Assignee

Comment 10

5 years ago
Vladan is (or will soon be) away, so moving this over to you Nathan, hope you dont mind.
Attachment #8501201 - Flags: review?(nfroyd)
Comment on attachment 8501201 [details] [diff] [review]
Add datareporting client id to Telemetry ping

Review of attachment 8501201 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the change below and an answer to my question.

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +960,5 @@
>          this.gatherMemory();
>  
> +        let drs = Cc["@mozilla.org/datareporting/service;1"]
> +                    .getService(Ci.nsISupports)
> +                    .wrappedJSObject;

What is the .wrappedJSObject here for?  Do we not have an IDL for the datareporting service and therefore we need the actual JS object, rather than getService'ing a real interface?

::: toolkit/components/telemetry/tests/unit/test_TelemetryPing.js
@@ +386,5 @@
>    // Addon manager needs a profile directory
>    do_get_profile();
>    createAppInfo("xpcshell@tests.mozilla.org", "XPCShell", "1", "1.9.2");
>  
> +  Services.prefs.setBoolPref(PREF_ENABLED, true);

I'm not entirely sure that there's no pref leakage between xpcshell tests, but it'd be good practice to reset this pref anyway.
Attachment #8501201 - Flags: review?(nfroyd) → review+
Assignee

Comment 13

5 years ago
(In reply to Nathan Froyd (:froydnj) from comment #12)
> ::: toolkit/components/telemetry/TelemetryPing.jsm
> @@ +960,5 @@
> >          this.gatherMemory();
> >  
> > +        let drs = Cc["@mozilla.org/datareporting/service;1"]
> > +                    .getService(Ci.nsISupports)
> > +                    .wrappedJSObject;
> 
> What is the .wrappedJSObject here for?  Do we not have an IDL for the
> datareporting service and therefore we need the actual JS object, rather
> than getService'ing a real interface?

Yes, the docs in DataReportingService.js show the intended usage.

> ::: toolkit/components/telemetry/tests/unit/test_TelemetryPing.js
> @@ +386,5 @@
> >    // Addon manager needs a profile directory
> >    do_get_profile();
> >    createAppInfo("xpcshell@tests.mozilla.org", "XPCShell", "1", "1.9.2");
> >  
> > +  Services.prefs.setBoolPref(PREF_ENABLED, true);
> 
> I'm not entirely sure that there's no pref leakage between xpcshell tests,
> but it'd be good practice to reset this pref anyway.

Every single xpcshell test runs on a fresh profile, so there is no need to worry about cleanup for changes inside the profile.
Comment on attachment 8501173 [details] [diff] [review]
Migrate FHR client id to the datareporting service

This patch is empty!
Attachment #8501173 - Flags: review?(gps)
Assignee

Comment 15

5 years ago
Hm, weird, not sure what happened there. Lets try this again.
Attachment #8501173 - Attachment is obsolete: true
Attachment #8501861 - Flags: review?(gps)
Assignee

Comment 16

5 years ago
Minor fix.
Attachment #8501861 - Attachment is obsolete: true
Attachment #8501861 - Flags: review?(gps)
Attachment #8501865 - Flags: review?(gps)
Comment on attachment 8501865 [details] [diff] [review]
Migrate FHR client id to the datareporting service

Review of attachment 8501865 [details] [diff] [review]:
-----------------------------------------------------------------

Aside from some robustness issues, this is mostly good.

I am a bit concerned about the drop in test coverage. We should add unit tests for client ID handling in DRS.

::: services/common/utils.js
@@ +398,5 @@
>     * @return a promise, as produced by OS.File.writeAtomic.
>     */
>    writeJSON: function(contents, path) {
> +    let data = JSON.stringify(contents);
> +    return OS.File.writeAtomic(path, data, {encoding: "utf-8", tmpPath: path + ".tmp"});

Oooh - when did support for this land?

::: services/datareporting/DataReportingService.js
@@ +304,5 @@
> +    if (this._loadClientIdTask) {
> +      return this._loadClientIdTask;
> +    }
> +
> +    // Try to load the client id from the DRS state file

You should add some comments here that explain the history of the state files and client IDs. Pretend rnewman, you, and me all get hit by a bus. The pour soul left with this doesn't want to be haunted by our ghosts.

@@ +318,5 @@
> +    // If we dont have DRS state yet, try to import from the FHR state.
> +    try {
> +      let fhrStatePath = OS.Path.join(OS.Constants.Path.profileDir, "healthreport", "state.json");
> +      let state = yield CommonUtils.readJSON(fhrStatePath);
> +      this._clientID = state.clientID;

If the state file is empty, state.clientID will be undefined. Please add code and a test for this.

You also dropped a branch from FHR that checks the type is a string. Please also port that. (Normally I'm not this paranoid, but you know the pain that's been caused by orphans.)

@@ +324,5 @@
> +      this._saveClientID();
> +      return this._clientID;
> +    } catch (e) {
> +      // fall through to next option
> +    }

There's an interesting failure case here. But I'm not sure how realistic it is and whether we should pay any attention. But given the implications on orphan creation and the immense pain that's caused us, it's tempting to be safe rather than sorry.

Scenario:

1) Client is running FHR as written today and has client ID in FHR's state.json file.
2) Client upgrades to this code. Client ID is copied to DRS state.json.
3) For whatever reason we generate a new client ID. It gets stored in DRS state.
4) DRS state.json somehow gets corrupted. We throw an exception trying to read it.
5) We fall back to reading the client ID from FHR's state.json.
6) We start using the old FHR client ID and our metrics get screwy.

Now, we can't simply delete the client ID from the FHR state because if Firefox downgrades, a new client ID will be generated. (This is probably rare, but it is a source of orphans.)

We shouldn't fall back to using the FHR client ID if we've ever created a new ID in DRS because that will look like a client came back from the dead.

The only bulletproof solution is to write the new client ID to both DRS and FHR state.json when DRS generates a new client ID. Oy.

Of course, if DRS's state.json becomes corrupted, we've created an orphan. So, uh, does that mean we don't need to worry about this scenario?

Making this even more complicated, if you fall back to reading FHR state.json if DRS is corrupted and the client ID hasn't changed, you are in a *better* position than if you didn't fall back because you'll be using the proper ID instead of generating a new one.

When in doubt, measure. I'd really like to see a counter in FHR reporting on how many times the DRS state.json is present but corrupted. If data says it is a problem, we can address it.

What to do? If we don't plan on regenerating client IDs, fall back to FHR. If we do, consider touching a file whose presence indicates "client ID migrated to DRS *and* regenerated" and don't fall back to reading FHR state.json if that file is present.

These are hard problems.

@@ +330,5 @@
> +    // We dont have an id from FHR yet, generate a new ID.
> +    this._clientID = CommonUtils.generateUUID();
> +    this._loadClientIdTask = null;
> +    this._saveClientID();
> +    return this._clientID;

This pattern of setting the load task to null *then* saving the file asynchronously and then returning is racey. A subsequent caller can get a new task that generates a new ID before the first caller has finished saving the file. Since a few components will access this client ID during startup, I'd say this race is more than theoretical, especially on machines with slow I/O.

Let's add a yield for _saveClientID() and let's not set _loadClientIdTask to null until immediately before the return. Consider putting this in a closure since the code is repeated and prone to one-off bugs.

@@ +339,5 @@
> +    yield OS.File.makeDir(this._stateDir);
> +    yield CommonUtils.writeJSON(obj, this._stateFilePath);
> +  }),
> +
> +  get clientID() {

Document this!

I find it a bit weird this is a property and not a function. I think anything involving sufficient magic should be a function.

::: services/healthreport/healthreporter.jsm
@@ +131,5 @@
>        yield OS.File.makeDir(this._stateDir);
>  
> +      let drs = Cc["@mozilla.org/datareporting/service;1"]
> +                  .getService(Ci.nsISupports)
> +                  .wrappedJSObject;

It might be time to create a proper IDL interface for DRS.

@@ -1527,5 @@
> -        // This is done for privacy reasons. Why preserve the ID if we
> -        // don't need to?
> -        if (!this.haveRemoteData()) {
> -          yield this._state.resetClientID();
> -        }

Please get bsmedberg's sign-off on nuking this behavior.
Attachment #8501865 - Flags: review?(gps) → feedback+
Assignee

Comment 18

5 years ago
(In reply to Gregory Szorc [:gps] from comment #17)
> @@ +324,5 @@
> > +      this._saveClientID();
> > +      return this._clientID;
> > +    } catch (e) {
> > +      // fall through to next option
> > +    }
> 
> There's an interesting failure case here. But I'm not sure how realistic it
> is and whether we should pay any attention. But given the implications on
> orphan creation and the immense pain that's caused us, it's tempting to be
> safe rather than sorry.
> 
> Scenario:
> 
> 1) Client is running FHR as written today and has client ID in FHR's
> state.json file.
> 2) Client upgrades to this code. Client ID is copied to DRS state.json.
> 3) For whatever reason we generate a new client ID. It gets stored in DRS
> state.
> 4) DRS state.json somehow gets corrupted. We throw an exception trying to
> read it.
> 5) We fall back to reading the client ID from FHR's state.json.
> 6) We start using the old FHR client ID and our metrics get screwy.

Let's do metrics for state.json corruption in a follow-up?
I'd rather not build fallback solutions on a system part that we want to move away from.

> @@ +330,5 @@
> > +    // We dont have an id from FHR yet, generate a new ID.
> > +    this._clientID = CommonUtils.generateUUID();
> > +    this._loadClientIdTask = null;
> > +    this._saveClientID();
> > +    return this._clientID;
> 
> This pattern of setting the load task to null *then* saving the file
> asynchronously and then returning is racey. A subsequent caller can get a
> new task that generates a new ID before the first caller has finished saving
> the file. Since a few components will access this client ID during startup,
> I'd say this race is more than theoretical, especially on machines with slow
> I/O.
> 
> Let's add a yield for _saveClientID() and let's not set _loadClientIdTask to
> null until immediately before the return. Consider putting this in a closure
> since the code is repeated and prone to one-off bugs.

There are no repeated calls to _loadClientID(), so i don't see a race here.
State is loaded once, changing the ID is not supported here. I'd rather not block callers waiting on _save() as well if there is no urgent reason to do so.
I only see a potential issue with tests having to wait for _saveClientID(), which i may have to account for.

> ::: services/healthreport/healthreporter.jsm
> @@ +131,5 @@
> >        yield OS.File.makeDir(this._stateDir);
> >  
> > +      let drs = Cc["@mozilla.org/datareporting/service;1"]
> > +                  .getService(Ci.nsISupports)
> > +                  .wrappedJSObject;
> 
> It might be time to create a proper IDL interface for DRS.

Now or follow-up?
Assignee

Comment 19

5 years ago
(In reply to Gregory Szorc [:gps] from comment #17)
> @@ -1527,5 @@
> > -        // This is done for privacy reasons. Why preserve the ID if we
> > -        // don't need to?
> > -        if (!this.haveRemoteData()) {
> > -          yield this._state.resetClientID();
> > -        }
> 
> Please get bsmedberg's sign-off on nuking this behavior.

Context: Previously we would reset the client ID here after we deleted all of FHRs remote data.
Now we use the stable client id for more than FHR, so we can't just nuke it there.
I propose that we take care of clearing it later, after we made FHR & Telemetry activation interdependent.
Acceptable?
Flags: needinfo?(benjamin)
Assignee

Comment 20

5 years ago
Attachment #8501865 - Attachment is obsolete: true
Attachment #8502749 - Flags: review?(gps)
Assignee

Comment 21

5 years ago
Just a minor adoption from |drs.clientID| to |drs.getClientID()|.
Attachment #8501201 - Attachment is obsolete: true
Attachment #8502750 - Flags: review+
Long-term, the behavior we want is:

* turning off FHR data collection should delete all the pings associated with the user ID (telemetry and FHR). The user ID should be removed from the client. Currently telemetry doesn't have an endpoint for this, so part of the breakdown for this should include filing both client and server-side bugs to make this possible.
* If/when FHR is re-enabled, we generate a new ID.

Short-term:
* when FHR is disabled we should delete the FHR records and remove the client ID.
* since it's currently possible to have FHR off and telemetry on, in that case telemetry should not send any client ID. In the future this won't be possible (bug 1075055)
Flags: needinfo?(benjamin)
Comment on attachment 8502749 [details] [diff] [review]
Migrate FHR client id to the datareporting service

Review of attachment 8502749 [details] [diff] [review]:
-----------------------------------------------------------------

This is mostly good. But according to bsmedberg's comment, we need to handle the case of resetting the client ID when FHR upload is turned off. That and a few comments are reasons for withholding r+.

Please ping me on IRC if you want prompt review.

::: services/datareporting/DataReportingService.js
@@ +134,5 @@
>            this.policy = new DataReportingPolicy(policyPrefs, this._prefs, this);
>  
>            this._os.addObserver(this, "sessionstore-windows-restored", true);
> +
> +          this._loadClientIdTask = this._loadClientId();

This is going to incur extra I/O at start-up. Is it strictly necessary to perform I/O and generate an ID before any client requests one? I'm going to argue it isn't.

@@ +298,5 @@
> +    return OS.Path.join(OS.Constants.Path.profileDir, "datareporting");
> +  },
> +
> +  get _stateFilePath() {
> +    return OS.Path.join(this._stateDir, "state.json");

Consider inlining these into assignments during the profile-after-startup observer.

@@ +347,5 @@
> +  }),
> +
> +  _saveClientID: Task.async(function* () {
> +    let obj = { clientID: this._clientID };
> +    yield OS.File.makeDir(this._stateDir);

I thought OS.File.makeDir() fails if the directory exists (unless ignoreExisting is in the options). Either OS.File.makeDir's docs are wrong, the behavior is different on your-platform, or we're lacking test coverage here.

Since no "production" code yields _saveClientID(), we could be throwing and not realizing it.

While you were correct about there not being a race between _saveClientID() and getClientID(), this is another reason why skipping yields can be dangerous. I think your argument about performance is a bit flaky. Do we really want to return a client ID to consumers if that client ID hasn't been persisted to disk yet? That could be yet another source of orphaning. Let's yield on _saveClientID during _loadClientID.

::: services/datareporting/tests/xpcshell/test_client_id.js
@@ +40,5 @@
> +  yield OS.File.remove(drsPath);
> +  yield CommonUtils.writeJSON({clientID: -1}, fhrPath);
> +  clientID = yield datareportingService.getClientID();
> +  Assert.equal(typeof(clientID), 'string');
> +  Assert.ok(uuidRegex.test(clientID));

Let's add a test for invalid JSON in fhrPath.
Attachment #8502749 - Flags: review?(gps) → feedback+
Assignee

Comment 25

5 years ago
Hm, i should have the reset part in fine... except that this has an annoying testing issue.

We yield on reporter.requestDeleteRemoteData():
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/healthreport/tests/xpcshell/test_healthreporter.js#l741

... which comes down to yielding on the chained promise here:
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/datareporting/policy.jsm#l829

... which gets resolved due to the call here:
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/healthreport/healthreporter.jsm#l1521
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/healthreport/healthreporter.jsm#l1409
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/datareporting/policy.jsm#l136

... which means it gets resolved before we reset the client id around here:
http://hg.mozilla.org/mozilla-central/annotate/f547cf19d104/services/healthreport/healthreporter.jsm#l1530

So, currently there is no good way to wait on this in testing (and the tests only worked fine by accident before).
Not quite sure yet how to fix that.
Assignee

Comment 26

5 years ago
(In reply to Gregory Szorc [:gps] from comment #24)
> ::: services/datareporting/DataReportingService.js
> @@ +134,5 @@
> >            this.policy = new DataReportingPolicy(policyPrefs, this._prefs, this);
> >  
> >            this._os.addObserver(this, "sessionstore-windows-restored", true);
> > +
> > +          this._loadClientIdTask = this._loadClientId();
> 
> This is going to incur extra I/O at start-up. Is it strictly necessary to
> perform I/O and generate an ID before any client requests one? I'm going to
> argue it isn't.

Fair point, removed.

> @@ +298,5 @@
> > +    return OS.Path.join(OS.Constants.Path.profileDir, "datareporting");
> > +  },
> > +
> > +  get _stateFilePath() {
> > +    return OS.Path.join(this._stateDir, "state.json");
> 
> Consider inlining these into assignments during the profile-after-startup
> observer.

Done.

> @@ +347,5 @@
> > +  }),
> > +
> > +  _saveClientID: Task.async(function* () {
> > +    let obj = { clientID: this._clientID };
> > +    yield OS.File.makeDir(this._stateDir);
> 
> I thought OS.File.makeDir() fails if the directory exists (unless
> ignoreExisting is in the options). Either OS.File.makeDir's docs are wrong,
> the behavior is different on your-platform, or we're lacking test coverage
> here.

ignoreExisting=true is the default behavior. I talked with Yoric last week and clarified the docs:
https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread#OS.File.makeDir%28%29


> While you were correct about there not being a race between _saveClientID()
> and getClientID(), this is another reason why skipping yields can be
> dangerous. I think your argument about performance is a bit flaky. Do we
> really want to return a client ID to consumers if that client ID hasn't been
> persisted to disk yet? That could be yet another source of orphaning. Let's
> yield on _saveClientID during _loadClientID.

"Flaky" is a bit dismissive, please be more specific. I also dont care about it too strongly and want this landed, so changing it.
Assignee

Comment 27

5 years ago
Attachment #8502749 - Attachment is obsolete: true
Attachment #8504769 - Flags: review?(gps)
Assignee

Comment 28

5 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #27)
> Created attachment 8504769 [details] [diff] [review]
> Migrate FHR client id to the datareporting service

Note that the changes in utils.jsm and policy.jsm are needed to address comment 25, i.e. waiting on the whole "delete remote data operation" and not just on the server requests etc. being finished.
Assignee

Comment 29

5 years ago
Small change to address comment 23. Now we only submit the client id in the Telemetry ping when FHR upload is enabled.

Bonus: check for the datareporting service being available in Cc.
Attachment #8504795 - Flags: review?(nfroyd)
Attachment #8504795 - Flags: review?(nfroyd) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #25)
> So, currently there is no good way to wait on this in testing (and the tests
> only worked fine by accident before).
> Not quite sure yet how to fix that.

Yikes. That's ugly.

So, sometimes you need to introduce ugly hacks to facilitate testing. This might be one of those times.

The correct approach is to move the "initiate FHR upload" APIs into DRS. healthreport.jsm is kinda/sorta laid out in such a way to facilitate this. There is an inline comment about that somewhere...
Comment on attachment 8504769 [details] [diff] [review]
Migrate FHR client id to the datareporting service

Review of attachment 8504769 [details] [diff] [review]:
-----------------------------------------------------------------

This LGTM modulo the unsafe OS.Constants.Path.profileDir usage, which I trust you to fix without re-review.

::: services/datareporting/DataReportingService.js
@@ +68,5 @@
> +  this._clientID = null;
> +  this._loadClientIdTask = null;
> +  this._saveClientIdTask = null;
> +
> +  this._stateDir = OS.Path.join(OS.Constants.Path.profileDir, "datareporting");

IIRC it isn't safe to access OS.Constants.Path.profileDir until during or after the profile-after-change observer. I think this only works because xpcshell does funky things with profile init.

@@ +335,5 @@
> +    // We dont have an id from FHR yet, generate a new ID.
> +    this._clientID = CommonUtils.generateUUID();
> +    this._loadClientIdTask = null;
> +    this._saveClientIdTask = this._saveClientID();
> +    yield this._saveClientIdTask;

Please document the importance of this yield:

We wait on persisting the generated ID to disk because failure to save the ID could result in the client creating and subsequently sending multiple IDs to the server. This would appear as multiple clients submitting similar data - orphans.

@@ +376,5 @@
> +    yield this._saveClientIdTask;
> +
> +    this._clientID = CommonUtils.generateUUID();
> +    this._saveClientIdTask = this._saveClientID();
> +    yield this._saveClientIdTask;

Let's return the new ID just to round out the API.
Attachment #8504769 - Flags: review?(gps) → review+
Assignee

Comment 33

5 years ago
(In reply to Gregory Szorc [:gps] from comment #32)
> ::: services/datareporting/DataReportingService.js
> @@ +68,5 @@
> > +  this._clientID = null;
> > +  this._loadClientIdTask = null;
> > +  this._saveClientIdTask = null;
> > +
> > +  this._stateDir = OS.Path.join(OS.Constants.Path.profileDir, "datareporting");
> 
> IIRC it isn't safe to access OS.Constants.Path.profileDir until during or
> after the profile-after-change observer. I think this only works because
> xpcshell does funky things with profile init.

Ah, indeed. This worked because we only trigger DRS init after do_get_profile().
Fixed now, which means updating some tests to fake "app-startup" and "profile-after-change" for the DRS.
Assignee

Comment 34

5 years ago
Still waiting on inbound to re-open.
Attachment #8504769 - Attachment is obsolete: true
Attachment #8505625 - Flags: review+
Assignee

Comment 35

5 years ago
Attachment #8502750 - Attachment is obsolete: true
Attachment #8505626 - Flags: review+
Assignee

Comment 36

5 years ago
Attachment #8504795 - Attachment is obsolete: true
Attachment #8505627 - Flags: review+
Assignee

Updated

5 years ago
Keywords: checkin-needed
Assignee

Comment 43

5 years ago
Looking into it, apparently i was too tired and looked at the wrong try push yesterday.
Flags: needinfo?(georg.fritzsche)
Assignee

Updated

5 years ago
Depends on: 1085146
Assignee

Comment 45

5 years ago
Turns out we are skipping the TelemetryPing setup when MOZILLA_OFFICIAL is set, which means we're skipping it on at least debug try builds as well.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=bbd9e0788c49
https://tbpl.mozilla.org/?tree=Try&rev=bbd9e0788c49
Attachment #8507559 - Flags: review?(nfroyd)
Comment on attachment 8507559 [details] [diff] [review]
Init TelemetryPing in tests even if MOZILLA_OFFICIAL is not set

Review of attachment 8507559 [details] [diff] [review]:
-----------------------------------------------------------------

Light grepping shows me only one test where we call TelemetryPing.setup(); do we need to call it in more places?
Attachment #8507559 - Flags: review?(nfroyd) → review+
Assignee

Comment 47

5 years ago
As of the patches here we have two places - it is required in test_TelemetryPing.js too, which tests the client id.

I'm not sure if pre-emptively adding them to more tests is useful right now.
(In reply to Georg Fritzsche [:gfritzsche] from comment #47)
> As of the patches here we have two places - it is required in
> test_TelemetryPing.js too, which tests the client id.

Ah, very good.

> I'm not sure if pre-emptively adding them to more tests is useful right now.

Let's leave well enough alone, then.
Could you also update about:telemetry to report the Telemetry ID? It should be a very small change, maybe put it in the "System Information" section
Assignee

Updated

5 years ago
Blocks: 1086252
Assignee

Comment 52

5 years ago
Weird, both bugs pushed were fine on try. Investigating :-/
Iteration: --- → 36.1
Assignee

Updated

5 years ago
Blocks: 1087318
Assignee

Updated

5 years ago
No longer depends on: 1085146
Assignee

Comment 55

5 years ago
Comment on attachment 8505626 [details] [diff] [review]
Add datareporting client id to Telemetry ping

Approval Request Comment
[Feature/regressing bug #]: Search telemetry.
[User impact if declined]: No search telemetry.
[Describe test coverage new/current, TBPL]: Fine on Nightly & data verification of FHR & Telemetry server-side.
[Risks and why]: Low risk, we verified things staying sane.
[String/UUID change made/needed]: None.
Attachment #8505626 - Flags: approval-mozilla-beta?
Attachment #8505626 - Flags: approval-mozilla-aurora?
Assignee

Comment 56

5 years ago
Comment on attachment 8505625 [details] [diff] [review]
Migrate FHR client id to the datareporting service

Approval Request Comment:
See comment 55.
Attachment #8505625 - Flags: approval-mozilla-beta?
Attachment #8505625 - Flags: approval-mozilla-aurora?
Assignee

Comment 57

5 years ago
Comment on attachment 8507557 [details] [diff] [review]
Don't send client id in ping when FHR upload is disabled

Approval Request Comment
See comment 55.
Attachment #8507557 - Flags: approval-mozilla-beta?
Attachment #8507557 - Flags: approval-mozilla-aurora?
Assignee

Comment 58

5 years ago
Comment on attachment 8507559 [details] [diff] [review]
Init TelemetryPing in tests even if MOZILLA_OFFICIAL is not set

Approval Request Comment
See comment 55.
Attachment #8507559 - Flags: approval-mozilla-beta?
Attachment #8507559 - Flags: approval-mozilla-aurora?
Assignee

Updated

5 years ago
Attachment #8515307 - Flags: review+
Attachment #8505626 - Flags: approval-mozilla-beta?
Attachment #8505626 - Flags: approval-mozilla-aurora?
Attachment #8505626 - Flags: approval-mozilla-aurora+
Comment on attachment 8515306 [details] [diff] [review]
Beta rebase - Add datareporting client id to Telemetry ping

In future, please try to put beta approvals on the rebased patches in order to make it clear which patches should be uplifted to which branches. (I took care of putting the approvals on the right patches this time.)
Attachment #8515306 - Flags: approval-mozilla-beta+
Attachment #8507557 - Flags: approval-mozilla-beta?
Attachment #8507557 - Flags: approval-mozilla-aurora?
Attachment #8507557 - Flags: approval-mozilla-aurora+
Comment on attachment 8515307 [details] [diff] [review]
Beta rebase - Don't send client id in ping when FHR upload is disabled

[Triage Comment]
Attachment #8515307 - Flags: approval-mozilla-beta+
Attachment #8505625 - Flags: approval-mozilla-beta?
Attachment #8505625 - Flags: approval-mozilla-beta+
Attachment #8505625 - Flags: approval-mozilla-aurora?
Attachment #8505625 - Flags: approval-mozilla-aurora+
Attachment #8507559 - Flags: approval-mozilla-beta?
Attachment #8507559 - Flags: approval-mozilla-beta+
Attachment #8507559 - Flags: approval-mozilla-aurora?
Attachment #8507559 - Flags: approval-mozilla-aurora+
Can someone point me at the privacy review or analysis which was done for this change?

Thanks,

Gerv
Assignee

Comment 66

5 years ago
(In reply to Gervase Markham [:gerv] from comment #65)
> Can someone point me at the privacy review or analysis which was done for
> this change?

There were some privacy discussions and privacy policy changes etc.
I can't find the relevant one off-hand, Benjamin, can you?
Assignee

Updated

5 years ago
Flags: needinfo?(benjamin)
As Firefox data collection steward/owner, I'm ultimately responsible for this decision. I did post about this publicly to fhr-dev and consult with John Jensen, Alex Fowler (while he was still at Mozilla) and Geoff Piper for the privacy notice.

The first public note about this was https://mail.mozilla.org/pipermail/fhr-dev/2014-March/000186.html
https://mail.mozilla.org/pipermail/fhr-dev/2014-October/000341.html has a link to the current plan.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #67)
> The first public note about this was
> https://mail.mozilla.org/pipermail/fhr-dev/2014-March/000186.html
> https://mail.mozilla.org/pipermail/fhr-dev/2014-October/000341.html has a
> link to the current plan.

Forgive me if I've missed it, but neither of these emails nor the linked document indicate that we were planning to change the released build of Firefox to send back a persistent per-user identifier, something we explicitly disclaimed doing when Telemetry initially shipped IIRC.

(If that's not quite what's happened, please do clarify - which channels send this identifier, how persistent is it, and how does it depend on the telemetry and FHR prefs?)

Gerv
My understanding is that FHR is enabled by default on all channels, and Telemetry is enabled by default on all channels except release. Is that right?

Gerv
Assignee

Comment 71

5 years ago
Yes:
* FHR upload is enabled by default on all channels, but no upload happens until we should the data notification
* Telemetry is off by default on release, on by default on pre-release channels

The client id is only submitted if both of these are on, which matches previous FHR behavior.
Given that we are moving the FHR data into Telemetry, we will end up submitting data that was previously in FHR via Telemetry in different "buckets" (pre-release, release and some different pings for data that should be separate from the ping... that is detailed in the referenced document) to end up with matching behavior re privacy.

Updated

8 months ago
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.