Telemetry experiment to measure TLS 1.3 success/performance

NEW
Unassigned

Status

Firefox Health Report
Client: Desktop
10 months ago
5 days ago

People

(Reporter: ekr, Unassigned)

Tracking

unspecified
Points:
---

Firefox Tracking Flags

(firefox52+ wontfix)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

10 months ago
We want to do a telemetry experiment to measure TLS 1.3 success against a known-good server. This gives us two things:

1. Sanity checking that it works reliably under ideal conditions.
2. A measure of adverse network conditions (middleboxes, etc.) which might cause problems.

The initial experiment is very simple and will be run after we enable TLS 1.3 by default in Nightly:

- Connect to two hosts, one with TLS 1.3 and one without, both operated by Cloudflare on our behalf.
- Measure success rates, timing, etc.
- Report the results in a custom telemetry ping

Disable the experiment.

Ideally we would run it on the entire (not very large) Nightly population.
(Reporter)

Comment 1

10 months ago
Created attachment 8801356 [details] [diff] [review]
0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch
(Reporter)

Comment 2

10 months ago
Comment on attachment 8801356 [details] [diff] [review]
0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch

Review of attachment 8801356 [details] [diff] [review]:
-----------------------------------------------------------------

Note: I know that the start/stop times are wrong. I will fix before landing, but I don't have a perfect timeline yet
Attachment #8801356 - Flags: review?(benjamin)

Comment 3

10 months ago
Comment on attachment 8801356 [details] [diff] [review]
0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch

>diff --git a/experiments/tls13-compat-nightly52/code/bootstrap.js b/experiments/tls13-compat-nightly52/code/bootstrap.js

>+const kUNKNOWN_ERROR = 0x80040111; // NS_ERROR_NOT_AVAILABLE

This doesn't appear to be used anywhere else. There's a kPR_UNKNOWN_ERROR below though? Can we use Components.results.NS_ERROR_NOT_AVAILABLE as a constant here?

>+function report(result) {
>+  console.log("Result");
>+  console.log(result);
>+  
>+  return TelemetryController.submitExternalPing(
>+    "tls-13-study-v1",
>+    {
>+      results: result
>+    }, {});

Somewhere, we need to have documented the format of this ping for data review. I think an .md or .rst file checked in here would be fine. It's really hard for me reading this code to know exactly what "results" will contain, and we want the data docs to be explicit.

Are you recording anything about HTTP proxy config, offline state, or captive portal state? Are you worried about any of those significantly skewing the results? Do you want to/can you record the resolved IP address of the request to help diagnose the frequency of direct connections versus captive portal?

>+function disable() {
>+  Experiments.instance().disableExperiment("FROM_API");
>+}

So the desired behavior here is for the experiment to install, immediately do the network requests, send the ping, and then uninstall? It might be worth documenting that flow, and the edge cases of what should happen in errors.

>+function startup(data, reason) {
>+  let todo = [];
>+  let shuffled = shuffleArray(kURLs);
>+  
>+  if (reason != ADDON_ENABLE) {
>+    return Promise.resolve(true);
>+  }

I'm a little worried about whether we send ADDON_ENABLE notifications reliably for experiments. Felipe or Georg, do you know if we have either manual or automated testing that verifies the sequence of startup events for experiments?

>+  
>+  for (var i in shuffled) {
>+    todo.push(makeRequest(i, shuffled[i], null ));
>+  }
>+
>+  return Promise.all(todo)
>+    .then(result => report(result))
>+    .catch(e => Cu.reportError(e))

You don't need any telemetry for the error case? I figured you'd want to send an error ping.

Also, in case there are bugs, recording the telemetry clientId with this ping can help with deduping. Is there a reason this is sensitive enough that we wouldn't want to record the clientId?
Flags: needinfo?(gfritzsche)
Flags: needinfo?(felipc)
Flags: needinfo?(ekr)
Attachment #8801356 - Flags: review?(benjamin) → review-
(In reply to Benjamin Smedberg [:bsmedberg] from comment #3)
> >+function startup(data, reason) {
> >+  let todo = [];
> >+  let shuffled = shuffleArray(kURLs);
> >+  
> >+  if (reason != ADDON_ENABLE) {
> >+    return Promise.resolve(true);
> >+  }
> 
> I'm a little worried about whether we send ADDON_ENABLE notifications
> reliably for experiments. Felipe or Georg, do you know if we have either
> manual or automated testing that verifies the sequence of startup events for
> experiments?

No, actually the startup events for experiments are known to be buggy (i.e., different from the convention). See this here for some code that we use on every experiment: 
http://hg.mozilla.org/webtools/telemetry-experiment-server/file/tip/experiments/e10s-beta46-noapz/code/bootstrap.js#l25

As you just need to run this once, I recommend just using the install() function which will work correctly.
Flags: needinfo?(felipc)
Flags: needinfo?(gfritzsche)
(Reporter)

Comment 5

10 months ago
(In reply to Benjamin Smedberg [:bsmedberg] from comment #3)
> Comment on attachment 8801356 [details] [diff] [review]
> 0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch
> 
> >diff --git a/experiments/tls13-compat-nightly52/code/bootstrap.js b/experiments/tls13-compat-nightly52/code/bootstrap.js
> 
> >+const kUNKNOWN_ERROR = 0x80040111; // NS_ERROR_NOT_AVAILABLE
> 
> This doesn't appear to be used anywhere else. There's a kPR_UNKNOWN_ERROR
> below though? Can we use Components.results.NS_ERROR_NOT_AVAILABLE as a
> constant here?

Sure, that should be fine. Do I need to pull in Components.results specifically?


> 
> >+function report(result) {
> >+  console.log("Result");
> >+  console.log(result);
> >+  
> >+  return TelemetryController.submitExternalPing(
> >+    "tls-13-study-v1",
> >+    {
> >+      results: result
> >+    }, {});
> 
> Somewhere, we need to have documented the format of this ping for data
> review. I think an .md or .rst file checked in here would be fine. It's
> really hard for me reading this code to know exactly what "results" will
> contain, and we want the data docs to be explicit.

I will add something.


> Are you recording anything about HTTP proxy config, offline state, or
> captive portal state? Are you worried about any of those significantly
> skewing the results? Do you want to/can you record the resolved IP address
> of the request to help diagnose the frequency of direct connections versus
> captive portal?

I don't think that that's necessary. Given that we know that we just installed
the experiment, I think it's reasonably likely we have a clear connection.
With that said, if we see a big delta we ma need to revisit.



> >+function disable() {
> >+  Experiments.instance().disableExperiment("FROM_API");
> >+}
> 
> So the desired behavior here is for the experiment to install, immediately
> do the network requests, send the ping, and then uninstall? It might be
> worth documenting that flow, and the edge cases of what should happen in
> errors.

I added some text. Not sure what you mean about edge cases.


> >+function startup(data, reason) {
> >+  let todo = [];
> >+  let shuffled = shuffleArray(kURLs);
> >+  
> >+  if (reason != ADDON_ENABLE) {
> >+    return Promise.resolve(true);
> >+  }
> 
> I'm a little worried about whether we send ADDON_ENABLE notifications
> reliably for experiments. Felipe or Georg, do you know if we have either
> manual or automated testing that verifies the sequence of startup events for
> experiments?

I will use "install" as suggested by Georg. Should I still be calling disable()
> 
> >+  
> >+  for (var i in shuffled) {
> >+    todo.push(makeRequest(i, shuffled[i], null ));
> >+  }
> >+
> >+  return Promise.all(todo)
> >+    .then(result => report(result))
> >+    .catch(e => Cu.reportError(e))
> 
> You don't need any telemetry for the error case? I figured you'd want to
> send an error ping.

We should record XHR errors as errors in the results struct. And if the report
function fails, then that sugests we can't report.


> Also, in case there are bugs, recording the telemetry clientId with this
> ping can help with deduping. Is there a reason this is sensitive enough that
> we wouldn't want to record the clientId?

No, I just didn't think I needed it. Happy to do it.
Flags: needinfo?(ekr)

Comment 6

10 months ago
> Sure, that should be fine. Do I need to pull in Components.results
> specifically?

No.

> > >+function disable() {
> > >+  Experiments.instance().disableExperiment("FROM_API");
> > >+}
> > 
> > So the desired behavior here is for the experiment to install, immediately
> > do the network requests, send the ping, and then uninstall? It might be
> > worth documenting that flow, and the edge cases of what should happen in
> > errors.
> 
> I added some text. Not sure what you mean about edge cases.

I mean if one or more of the XHRs failed to connect at all, or timed out, or some other error caused those promises to reject, would we want to repeat the request again at next startup?

Mainly I want to make sure that the implemented behavior here matches what you actually want.

> I will use "install" as suggested by Georg. Should I still be calling
> disable()

Probably, since you want the experiment to immediately uninstall.
(Reporter)

Comment 7

10 months ago
(In reply to Benjamin Smedberg [:bsmedberg] from comment #6)
> > Sure, that should be fine. Do I need to pull in Components.results
> > specifically?
> 
> No.
> 
> > > >+function disable() {
> > > >+  Experiments.instance().disableExperiment("FROM_API");
> > > >+}
> > > 
> > > So the desired behavior here is for the experiment to install, immediately
> > > do the network requests, send the ping, and then uninstall? It might be
> > > worth documenting that flow, and the edge cases of what should happen in
> > > errors.
> > 
> > I added some text. Not sure what you mean about edge cases.
> 
> I mean if one or more of the XHRs failed to connect at all, or timed out, or
> some other error caused those promises to reject, would we want to repeat
> the request again at next startup?
> 
> Mainly I want to make sure that the implemented behavior here matches what
> you actually want.

Good point, but I think on balance that will cause more trouble than it's worth.

I actually specifically want to catch XHR failures


> 
> > I will use "install" as suggested by Georg. Should I still be calling
> > disable()
> 
> Probably, since you want the experiment to immediately uninstall.
(Reporter)

Comment 8

10 months ago
Created attachment 8802704 [details] [diff] [review]
0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch
Attachment #8801356 - Attachment is obsolete: true
(Reporter)

Comment 9

10 months ago
Comment on attachment 8802704 [details] [diff] [review]
0001-Bug-1310338-TLS-1.3-telemetry-experiment.patch

Review of attachment 8802704 [details] [diff] [review]:
-----------------------------------------------------------------

See revised
Attachment #8802704 - Flags: review?(benjamin)
(Reporter)

Comment 10

10 months ago
Question:
We may want to update this experiment later with more probes. What does that look like?
(Reporter)

Comment 11

10 months ago
Ping for re-review.
Flags: needinfo?(benjamin)

Comment 12

10 months ago
While the experiment is installed, you can update it by deploying a new version to the experiment server. You are likely to get a new install() invocation on upgrade. Once a particular experiment has been uninstalled, it won't ever be reinstalled/rerun for that user.
Flags: needinfo?(benjamin)

Updated

10 months ago
Attachment #8802704 - Flags: review?(benjamin) → review+

Comment 13

10 months ago
Eric, if you need QA assistance with getting this tested please ping RyanVM and he can help find a QA lead.
(Reporter)

Comment 14

10 months ago
Landed as: https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/2efaab39bba4114ffc3f6f257e9f11f900c68701
(Reporter)

Comment 15

10 months ago
Bsmedberg,

I accidentally left this on 10% of the population,but I'd like to actually run it on the whole Nightly population (we need a big sample to get significance). Any objections?
(Reporter)

Updated

10 months ago
Flags: needinfo?(benjamin)

Comment 16

10 months ago
I sent email. I figured this was being deployed to nightly to see whether the experiment worked correctly: I don't think it is possible to get real product insights from nightly, and so in terms of answering real-world questions you need to deploy to a fraction of beta or maybe even release.
Flags: needinfo?(benjamin)
(Reporter)

Comment 17

10 months ago
I think it's still possible to get real insights from Nightly. It represents a biased sample but so do all of our samples. And if we see a difference or don't see a difference, that's relevant. [Note that it's standard practice in clinical trials for drugs, for instance, to use totally unrepresentative subject samples, so it's not like there's something unscientific about this; though you do have to be careful about the conclusions you draw if your demographics are way out of whack.] In any case, this version of TLS 1.3 is only in 52, so while we might eventually extend the experiment to 52 beta or release, that's going to take a lot longer.

Hope that clarifies things. Any objections to my replacing this number with 1.0?

Comment 18

10 months ago
PASS
Please consider this the requested QA sign-off. Experiment installation and data collection works as expected.
Eric, thanks a lot for all your assistance. 
Please reference this link for details on install and data collection 
https://docs.google.com/spreadsheets/d/1jfpx0kT9N_eOpNO-fKv2EzC_WCp_WnQDhRabM7H1LuY/edit#gid=1087822068
Flags: needinfo?(lhenry)
Flags: needinfo?(ekr)
Flags: needinfo?(benjamin)

Comment 19

10 months ago
As long as Michelle has verified that the experiment uninstalls correctly, 100% "sample" should be fine, because it uninstalls automatically and won't block other potential experiments. It would be good to set a maximum runtime though: 2-3 days perhaps so that if there is a bug it doesn't consume the entire nightly population?
Flags: needinfo?(benjamin)
Sounds like a good idea to put a hard limit on this as Benjamin suggests. Thanks Michelle!
status-firefox52: --- → affected
tracking-firefox52: --- → +
Flags: needinfo?(lhenry)
(Reporter)

Comment 21

10 months ago
bsmedberg: which value are you talking about?     "maxActiveSeconds" : 604800?
Flags: needinfo?(ekr) → needinfo?(benjamin)

Comment 22

10 months ago
Yes, that's currently 7 days: can we reduce that to 2 or 3? I'm sorry for some reason I misread and thought that wasn't set at all.
Flags: needinfo?(benjamin)
(Reporter)

Comment 23

10 months ago
I assume this is just the limit on how long it runs on a given device?

If so, then I'll cut it down to 3600s which should be more than plenty

Comment 24

10 months ago
Correct.
Sylvestre, FYI for 52 since you are the release owner.
Flags: needinfo?(sledru)
I'll go ahead and give relman approval here since Sylvestre is out, so this can ship to Nightly.
Matt, is it your team who would deploy this? Can you find an owner for it? Thanks.
Flags: needinfo?(mgrimes)

Comment 28

10 months ago
We may do this as a Shield study in the future, but this iteration is going out via Telemetry Experiments. I do not know who the owner would be in that case.
Flags: needinfo?(mgrimes)
Looks like it needs signing (https://wiki.mozilla.org/QA/Telemetry/Developing_a_Telemetry_Experiment#Signing_the_XPI) And then someone from the Ops team should be contacted to deploy it.
Flags: needinfo?(sledru)
This bug seems to have gone stale.  Is this experiment done, or is there still some needed action here?
Flags: needinfo?(ekr)
(Reporter)

Comment 31

8 months ago
Thanks. I need to analyze the data and then revise it to run on 51 Beta per the long discussion elsewhere. I'll be doing that soon.
Flags: needinfo?(ekr)
Too late for firefox 52, mass-wontfix.
status-firefox52: affected → wontfix
(Reporter)

Comment 33

5 months ago
The experiment actually is done on 53. I am analyzing the data now.
You need to log in before you can comment on or make changes to this bug.