Closed Bug 1261387 Opened 7 years ago Closed 7 years ago

Analyse data for e10sCohort distribution

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: Felipe, Assigned: chutten)

References

Details

I'd like to analyse the data of the value recorded in the telemetry environment, "e10sCohort"  (it's in the same place where e10sEnabled is recorded).  It's a string value and I'd like to know the distribution of it.

What to analyse:
 - Beta 46 only
 - Ignore any pings where that value is not set

Then see what the distribution of the other values are.


Is it possible to analyse it per-user instead of per-submission? That would be a lot better.  There can be users where that value has changed over time (e.g., going from "test" to "disqualified" if they install an add-on), so you could just take the last value submitted by a user to analyse, or ignore users where that have changed.


Another thing I'd like to see is the mapping between e10sCohort x e10sEnabled. Basically, for every possible e10sCohort value, what is the distribution of e10sEnabled for this value. This one doesn't need to be per user.
For my first attempt, I tried querying the prestodb instance at sql.telemetry.mozilla.org, but e10s_cohort isn't in the schema. So no joy.

I guess I'll spin up a cluster at analysis.t.m.o and get my pings from there :)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
please note that the spelling is e10sCohort.. not sure if that's what you looked for.

See here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/docs/environment.rst#46
Out of a 10% sample of Beta 46 clients, we get the following proportions (all the rest have None):

{u'control': '8.82%',
 u'disqualified': '21.77%',
 u'optedIn': '0.01%',
 u'optedOut': '0.31%',
 u'test': '8.89%',
 u'unknown': '0.05%',
 u'unsupportedChannel': '0.00%'}

Now, this is imprecise, but gives us a "within an order of magnitude" idea at least.
Here is the analysis, if you'd like to look at it entire: https://gist.github.com/chutten/cd0d60e1419ff20e3cefe2d12a9c2c93
Thanks Chris. I looked at the notebook, and the numbers don't add up to 100% because cached_count include users who didn't have any e10sCohort in the data, which was true before this second phase started.
Counting only from when the experiment started, I got the following:

control             159669  22.13%
test                160911  22.30%
disqualified        394204  54.63%
optedIn             269     0.04%
optedOut            5693    0.79%
unknown             884     0.12%
unsupportedChannel  1       0.00%

which is exactly like what was expected from the 50% sampling that we used!
(In reply to :Felipe Gomes (needinfo me!) from comment #0)
> Another thing I'd like to see is the mapping between e10sCohort x
> e10sEnabled. Basically, for every possible e10sCohort value, what is the
> distribution of e10sEnabled for this value. This one doesn't need to be per
> user.

btw, did you get a chance to look at this other part?
Sorry, I assumed you wanted the numbers as a proportion of the user population, not just the "people who have non-None values"

I must admit I completely missed that second part. Let me spin up a new cluster to look at that.
Here it is: https://gist.github.com/chutten/cd0d60e1419ff20e3cefe2d12a9c2c93

defaultdict(int,
            {(u'control', False): 33665,
             (u'disqualified', False): 88126,
             (u'disqualified', True): 1,
             (u'optedIn', False): 46,
             (u'optedIn', True): 13,
             (u'optedOut', False): 1403,
             (u'optedOut', True): 1,
             (u'test', False): 190,
             (u'test', True): 33993,
             (u'unknown', False): 152,
             (u'unknown', True): 1})

So,
control is entirely !e10s. Good.
Test is not entirely e10s. Hm.

Opted in is strange to me with only 25% being e10s.
Opted-out is, as expected, predominantly !e10s.

disqualified is almost completely !e10s, which makes sense to me.
unknown is also almost completely !e10s.

No unsupportedChannel this time.
(In reply to Chris H-C :chutten from comment #8)
> Here it is: https://gist.github.com/chutten/cd0d60e1419ff20e3cefe2d12a9c2c93
> Opted in is strange to me with only 25% being e10s.

That is ok, "opted in" is marked for users who have opted in, but before the other disqualifying filters run. So there are opted-in users who might still get e10s blocked.

>             (u'test', False): 190,
>             (u'test', True): 33993,

The e10s activation might take one restart cycle to happen (depending on how late the add-on code runs), so this probably explains the test users on !e10s. Same thing for the existence of the "unknown" group.

Overall these look pretty good. Thanks for running it! I'll probably ask for a similar analysis when we get to 47..
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.