Implement telemetry experiment annotations to TelemetryEnvironment.jsm

RESOLVED FIXED in Firefox 53

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Dexter, Assigned: Dexter)

Tracking

(Blocks 1 bug)

Trunk
mozilla55
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox53 fixed, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: [measurement:client])

Attachments

(2 attachments)

Motivated by the ongoing Shield experiment work for Quantum, we want to add a standard way to annotate any experiment a Firefox client is participating in.

For annotating the experiment data, the following new methods will be exposed from TelemetryEnvironment.jsm:

TelemetryEnvironment.setExperimentActive(id, branch)
> This adds the given experiment data to the experiment annotations.
> It will also trigger a new subsession, except when throttling is in effect:
> After starting a new subsession, we throttle for 5 minutes. Any changes in that period don't start another subsession.
> The same throttling is in effect for 5 minutes after startup.
> Clients should call only this after successful experiment activation, to avoid noisy state when activation fails.
> Calling this function repeatedly with the same id will overwrite the state and trigger new subsessions (subject to throttling).

TelemetryEnvironment.setExperimentInactive(id)
> This removes the experiment data from the experiment annotations.
> It will also trigger a new subsession, except when throttling is in effect.

TelemetryEnvironment.getActiveExperiments()
> This (synchronously) returns the currently active experiments.
> It returns an object of the form outlined in Environment data above:
> {
>   "<experiment id>": { branch: "<branch>" },
>   // … 
> }

See [0] for the details about the client API: https://docs.google.com/document/d/11kDCK9W6KaPiS_pGcJc8mogc4imSO_ufkn6tJBGUEoM/edit#
Assignee: nobody → alessio.placitelli
Priority: -- → P1
Whiteboard: [measurement:client]
This is awesome.  Will it be possible to uplift this patch once done?   EKR wants to use it for some pref-testing work of his own (for feature rollout).
This is in fact awesome. The further it is uplifted the better. Otherwise I will have to do some pretty hacky stuff on the back-end.
If this lands the current week and nothing breaks on Nightly, I would feel ok with uplifting this to 54. I'm not sure if it's ok to uplift it to Beta, we'll see once it lands!
Duplicate of this bug: 1337081
See the duped bug 1337081 for prior discussions leading up to this.
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

https://reviewboard.mozilla.org/r/121910/#review124440

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:90
(Diff revision 1)
> +  },
> +
> +  setExperimentInactive(id) {
> +    return getGlobal().setExperimentInactive(id);
> +  },
> +

I'm missing `getActiveExperiments()`.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:751
(Diff revision 1)
>  
>    this._shutdown = false;
>    this._delayedInitFinished = false;
>  
> +  // A map containing the current experiment annotations.
> +  this._activeExperiments = new Map();

Do we actually need this cache here?
Can't we just directly store this in `_currentEnvironment`?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:865
(Diff revision 1)
>        return;
>      }
>      this._changeListeners.delete(name);
>    },
>  
> +  /**

Lets put the doc comments on the "public interface" further up.

::: toolkit/components/telemetry/docs/data/environment.rst:387
(Diff revision 1)
> +Experiment annotations can be added to this section through the API exposed in ``TelemetryEnvironment.jsm``:
> +
> +- ``TelemetryEnvironment.setExperimentActive(id, branch)``, adds an annotation to the environment for the provided ``id`` and ``branch. This triggers a new subsession.
> +- ``TelemetryEnvironment.setExperimentInactive(id)``, removes the annotation for the experiment with the provided ``id``. This triggers a new subsession.

This documentation is about data collection, it should live in `docs/collection/`.
I'm ok with a follow-up bug to fix this though.

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:795
(Diff revision 1)
>  }
>  
> +function checkExperimentsSection(data) {
> +  // We don't expect the experiments section to be always available.
> +  let experiments = data.experiments || {};
> +  if (Object.keys(experiments).length === 0) {

Nit: Usually we just do "==".

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1610
(Diff revision 1)
> +            "The old environment must not contain the new experiment annotation.");
> +
> +  TelemetryEnvironment.unregisterChangeListener("test_experimentsAPI2");
> +
> +  // Check that removing an unknown experiment annotation does not trigger subsession.
> +  TelemetryEnvironment.registerChangeListener("test_experimentsAPI3", () => {

This comment needs fixing - it does not test subsession splits, just that a change notification was triggered.
Proper subsession & throttling tests would happen in in test_TelemetrySession.js.
Attachment #8849050 - Flags: review?(gfritzsche)
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

https://reviewboard.mozilla.org/r/121910/#review124440

> Do we actually need this cache here?
> Can't we just directly store this in `_currentEnvironment`?

Good point, done.

> This documentation is about data collection, it should live in `docs/collection/`.
> I'm ok with a follow-up bug to fix this though.

Good point. I've moved the API docs to the index page in data/collection. If you think that should have a separate page, I can file a new bug and move it.
(In reply to Alessio Placitelli [:Dexter] from comment #9)
> > This documentation is about data collection, it should live in `docs/collection/`.
> > I'm ok with a follow-up bug to fix this though.
> 
> Good point. I've moved the API docs to the index page in data/collection. If
> you think that should have a separate page, I can file a new bug and move it.

Yes, this should live in a separate page. The index page should only be an overview of existing collection mechanisms.
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

https://reviewboard.mozilla.org/r/121910/#review124782

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:84
(Diff revision 2)
> +   * Add an experiment annotation to the environment or replace the current
> +   * annotation if one with the same id is already present. This triggers a
> +   * new subsession.

Slight reformatting?
"Add an experiment annotation to the environment.
If an annotation with the same id already exists, it will be overwritten.
This triggers a new subsession, subject to throttling."

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:88
(Diff revision 2)
> +   * @param {String} id The id of the active experiment.
> +   * @param {String} branch The experiment branch.

We need to enforce length limits on the passed data and document them.
We should probably truncate and warn.

Lets use the max addon id string length as a guide for the id limit.
For branch, how about a string length limit of 100?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:96
(Diff revision 2)
> +   * Remove an experiment annotation from the environment. This triggers a
> +   * new subsession if the annotation is found and successfully removed.

"...
If the annotation exists, a new subsession will triggered."

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:107
(Diff revision 2)
> +   * In the long term, this data will superseed the one returned from
> +   * |_getActiveExperiment()|.

I think the superseding comment doesn't belong in the API doc.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:110
(Diff revision 2)
> +  /**
> +   * Returns an object containing the data for the active experiments.
> +   * In the long term, this data will superseed the one returned from
> +   * |_getActiveExperiment()|.
> +   *
> +   * Format:

"The returned object is of the format:"?

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1602
(Diff revision 2)
> +  data = TelemetryEnvironment.currentEnvironment;
> +  checkEnvironmentData(data);
> +  checkExperiment(EXPERIMENT1, EXPERIMENT1_BRANCH, data);
> +  checkExperiment(EXPERIMENT2, EXPERIMENT2_BRANCH, data);
> +
> +  // The previous environment should only contain the first one.

"... the first experiment."

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1616
(Diff revision 2)
> +    Assert.ok(false, "Removing an unknown experiment annotation must not trigger a change.");
> +  });
> +  TelemetryEnvironment.setExperimentInactive("unknown-experiment-id");
> +  TelemetryEnvironment.unregisterChangeListener("test_experimentsAPI3");
> +
> +  // Check tha tremoving a known experiment leaves the other in place and triggers

"... that ..."

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1625
(Diff revision 2)
> +    deferred.resolve(data);
> +  });
> +  TelemetryEnvironment.setExperimentInactive(EXPERIMENT1);
> +  eventEnvironmentData = yield deferred.promise;
> +
> +  // Check that the current environment contains both the experiment.

That comment seems wrong?

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js:1302
(Diff revision 2)
> +  gMonotonicNow = fakeMonotonicNow(gMonotonicNow + 10 * MILLISECONDS_PER_MINUTE);
> +  now = fakeNow(futureDate(now, 10 * MILLISECONDS_PER_MINUTE));
> +  TelemetryEnvironment.setExperimentActive(EXPERIMENT1, EXPERIMENT1_BRANCH);
> +
> +  let ping = yield PingServer.promiseNextPing();
> +  Assert.ok(!!ping);

This test function is missing some assert messages.
Attachment #8849050 - Flags: review?(gfritzsche)
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

https://reviewboard.mozilla.org/r/121910/#review124894

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:47
(Diff revisions 2 - 3)
>  // The maximum length of a string (e.g. description) in the addons section.
>  const MAX_ADDON_STRING_LENGTH = 100;
>  // The maximum length of a string value in the settings.attribution object.
>  const MAX_ATTRIBUTION_STRING_LENGTH = 100;
> +// The maximum lengths for the experiment id and branch in the experiments section.
> +const MAX_EXPERIMENT_ID_LENGTH = 100;

Please double check that this is >= what the addon team uses for maximum addon id lengths.

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:1546
(Diff revisions 2 - 3)
>  });
>  
>  add_task(function* test_experimentsAPI() {
>    const EXPERIMENT1 = "experiment-1";
>    const EXPERIMENT1_BRANCH = "nice-branch";
> -  const EXPERIMENT2 = "experiment-2";
> +  // Create a 116 character long test id. This will get truncated.

Is it more readable to test the truncation separately?

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js:1309
(Diff revisions 2 - 3)
> -  Assert.equal(ping.type, PING_TYPE_MAIN);
> -  Assert.equal(ping.payload.info.reason, REASON_ENVIRONMENT_CHANGE);
> +  Assert.equal(ping.type, PING_TYPE_MAIN, "The received ping must be a 'main' ping.");
> +  Assert.equal(ping.payload.info.reason, REASON_ENVIRONMENT_CHANGE,
> +               "The 'main' ping must be triggered by a change in the environment.");
>    // We expect the current experiments to be reported in the next ping, not this
>    // one.
> -  Assert.equal(ping.environment.experiments, undefined);
> +  Assert.equal(ping.environment.experiments, undefined,

Assert.ok(!("experiments" in ping.environment), ...
Attachment #8849050 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #13)
> Comment on attachment 8849050 [details]
> Bug 1348748 - Implement telemetry experiment annotations to
> TelemetryEnvironment.jsm.
> 
> https://reviewboard.mozilla.org/r/121910/#review124894
> 
> ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:47
> (Diff revisions 2 - 3)
> >  // The maximum length of a string (e.g. description) in the addons section.
> >  const MAX_ADDON_STRING_LENGTH = 100;
> >  // The maximum length of a string value in the settings.attribution object.
> >  const MAX_ATTRIBUTION_STRING_LENGTH = 100;
> > +// The maximum lengths for the experiment id and branch in the experiments section.
> > +const MAX_EXPERIMENT_ID_LENGTH = 100;
> 
> Please double check that this is >= what the addon team uses for maximum
> addon id lengths.

After discussing this on #amo, it looks like 100 is a sane limit for the addon ids and enough for our case too.

Gregg, is that limit ok for the ids of your studies too?
Flags: needinfo?(glind)
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ca7a7a8800ef
Implement telemetry experiment annotations to TelemetryEnvironment.jsm. r=gfritzsche
Rebecca, this patch adds a mechanism to add experiment annotations to the Telemetry Environment. This is not adding any new point, just the mechanism. Does this need a data review?
data-r=me (also, if you don't set NEEDINFO we probably won't ever see this except by accident)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #19)
> data-r=me (also, if you don't set NEEDINFO we probably won't ever see this
> except by accident)

Whoops, sorry, I thought I did that.
Liz, we need to let this high priority patch aggressively ride the trains. I know we're in the middle of the Beta 53 cycle already, but is there any chance this could land in Beta?

It adds a mechanism to annotate Shield experiments in Telemetry. If this somehow breaks/doesn't work, we could simply not use the new API on Beta, so there shouldn't be too many problems there.

As soon as it hits the tree, I'll flag for uplifts.
Flags: needinfo?(lhenry)
Blocks: 1349629
(Clearing NI)

I have no idea if 100 is plenty.  I will accept that limit, and the first time we hit 100, we can re-discuss if needful!  I am not sure of the best way of notifying / erroring though.  It seems like an unexpected limit.
Flags: needinfo?(glind)
Sounds OK, you have data/privacy review, just request uplift as soon as you can. 

Can you make a plan for some sort of testing? Past problems with late beta telemetry changes have often been around unexpected performance issues. 

Will it interfere with existing tools/dashboards (mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1337081#c3) 

If 100 characters isn't enough for addon ids (or something is longer than that) what happens?
Flags: needinfo?(lhenry) → needinfo?(alessio.placitelli)
https://hg.mozilla.org/mozilla-central/rev/ca7a7a8800ef
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #23)
> Sounds OK, you have data/privacy review, just request uplift as soon as you
> can. 
> 
> Can you make a plan for some sort of testing? Past problems with late beta
> telemetry changes have often been around unexpected performance issues. 

There shouldn't be performance problems, as this isn't landing any data point and is basically just adding a new empty "experiments" section in the Environment of the main ping. We currently have no experiment annotating to that section, so the code is never hit outside of the test coverage.

However, to test this on Beta, in addition to manual QA, we'll discuss with Gregg the possibility of deploying a test experiment to a limited subset of the Beta population.

> Will it interfere with existing tools/dashboards (mentioned in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1337081#c3) 

It shouldn't, as this patch introduces a new section in the ping which wasn't being used. No data point/section will be removed. This will eventually supersede the old experiment sections in the ping, but not now and not as part of this patch.

The mentioned dashboards seem to use the "log"/"activeExperiment" sections of the main ping, so we should be safe.

> If 100 characters isn't enough for addon ids (or something is longer than
> that) what happens?

In general, it looks like we're not allowing addon ids with more than ~70bytes when signing on the server, so we should be safe.
In general, if a longer string is passed as the "experiment id", we truncate it to 100 characters and show a warning in the browser console. Keep in mind that we're the ones controlling the experiments, so if Gregg is fine with having less than 100 characters and a longer string is attempted, it should be an unexpected event.

If we decide that, for some reason, we need to relax the length limits, then we could patch & uplift. But I don't think that would be necessary.
Flags: needinfo?(alessio.placitelli)
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

Approval Request Comment
[Feature/Bug causing the regression]: None
[User impact if declined]: The SHIELD team won't be able to add experiment annotations to Telemetry (this might be used for Quantum)
[Is this code covered by automated tests?]: Yes, added in test_telemetryEnvironment.js and test_TelemetrySession.js xpcshell tests
[Has the fix been verified in Nightly?]: this
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Low risk
[Why is the change risky/not risky?]: There's currently no experiment running that makes use of this feature, so the API is never invoked. When it will be used, it will populate a new data section in the Telemetry Environment, which is not yet being used, so there shouldn't be risk of regressing other data points.
[String changes made/needed]: None
Attachment #8849050 - Flags: approval-mozilla-beta?
Attachment #8849050 - Flags: approval-mozilla-aurora?
(I agree to launching (small local) things as soon as a channel is ready to accept annotation.  Thank you for the quick/aggressive uplift plan).
Blocks: 1350907
Comment on attachment 8849050 [details]
Bug 1348748 - Implement telemetry experiment annotations to TelemetryEnvironment.jsm.

OK to uplift to aurora and beta from discussion in comments.
Attachment #8849050 - Flags: approval-mozilla-beta?
Attachment #8849050 - Flags: approval-mozilla-beta+
Attachment #8849050 - Flags: approval-mozilla-aurora?
Attachment #8849050 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.