further validate Telemetry probe data against loop-server (and other?) data

RESOLVED WORKSFORME

Status

Hello (Loop)
General
P2
normal
Rank:
25
RESOLVED WORKSFORME
3 years ago
2 years ago

People

(Reporter: dmose, Assigned: standard8)

Tracking

unspecified
Points:
---
Bug Flags:
firefox-backlog +
qe-verify -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [metrics])

Attachments

(1 attachment)

3.87 KB, application/x-javascript
Details
(Reporter)

Description

3 years ago
In bug 1182031 comment 6, Vladan observed that we are collecting very little data from the Loop "opt-out" Telemetry probes.  The 39/release probe can be explained by the fact that opt-out is disabled on that channel.  However, even the 40/41/42 probes are collecting tiny very small amounts of data.

Through the course of bug 1182031 and then IRC, logged at http://logs.glob.uno/?c=mozilla%23loop&s=23+Jul+2015&e=24+Jul+2015#c43267 , Erik and I compared the number of LOOP_TWO_WAY_MEDIA_CONNECTION samples from July 20th with the number of samples of the loop-server sendrecv logging to try and figure out whether there was a problem with that telemetry specifically.

FF 38/release: sendrecv=250, LOOP_TWO_WAY_MEDIA_CONNECTION opt-out unavailable
FF 39/release: sendrecv=1983, telemetry: opt-out unavailable
FF 40/beta: sendrecv=90, telemetry: 27
FF 41/aurora: sendrecv=73, telemetry: (need to check)
FF 42/nightly: sendrecv=41, telemetry: 9

Then we cross-checked a room creation number and a room deletion number, and all followed the same trend: telemetry was between 1/4 and 2/3 of the loop-server log numbers.  So within the same order of magnitude, but nonetheless substantially less.

So the current hypothesis is that this can be largely explained some combination of the tiny sample size being more vulnerable to edge-case skew as well as by non-release cohorts of users being more aware of and interested in opting-out of data collection. 

FF 40/release will also have opt-out numbers disabled, but 41/release will (hopefully!) have them enabled.  So between now and the 41 -> release uplift, it may be worth doing a more in-depth validation of our numbers so that we have a higher level of confidence that the first opt-out release will get us sufficiently useful numbers.

Adam, can you give an opinion (and ideally some cycles, give our existing onboarding & feature schedule craziness?) to how much validation we want to do?

FWIW, http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#7820 has the list of probes that have landed on Nightly/42.
Flags: qe-verify-
Flags: firefox-backlog+

Updated

3 years ago
Rank: 25
Whiteboard: [metrics]
(Assignee)

Updated

3 years ago
Depends on: 1182031

Updated

3 years ago
Flags: needinfo?(adam)

Comment 1

3 years ago
Just taking some notes here, trying to replicate results and untangle some of the discrepancies.

From the IRC logs: "Okay, so 1983 for FF39 was correct"

I can't get exactly 1983 for July 20th, but I can get close with a couple of queries.

If I simply look for records with this query and then search for user_agent_version == 39, I get 1989:

  {
    "query": {
      "filtered": {
        "query": {
          "match_all": {}
        },
        "filter": {
          "and": [
            {
              "range" : {
                "Timestamp": {
                  "gte": date + "T00:00:00.000Z",
                  "lt":  date + "T00:00:00.000Z||+1d",
                }
              }
            },
            { "term": { "state": "sendrecv" } },
          ]
        }
      }
    }
  };

If I further limit my search to user_agent_browser == 'Firefox', I get 1985.

But that's not really the number we want to look at, since it catches all status logs where we are in sendrecv, regardless of whether we were in sendrecv before.

If I instead look for connections == 2 (so that we don't double-count sessions with screen shares), and only include those records where Session.streamCreated is either 'Publisher.streamCreated' or 'Session.streamCreated', then we knock the count for Firefox 39 down to 1458.

Applying these same filters for all Firefox versions logged at the server for 2015-07-20, I get:

  '31': 16,
  '33': 6,
  '34': 3,
  '35': 15,
  '36': 18,
  '37': 33,
  '38': 196,
  '39': 1458,
  '40': 72,
  '41': 65,
  '42': 36

I'll poke around the telemetry side of things as soon as I untangle what y'all were looking at. There does not appear to be a metric called "LOOP_TWO_WAY_MEDIA_CONNECTION"
Flags: needinfo?(adam)

Comment 2

3 years ago
Okay, this feels like we're getting close. If I eliminate link-clicker users from the logs for this 2015-07-20, I get:

  '40': 38,
  '41': 58,
  '42': 20 

Comparing that to LOOP_TWO_WAY_MEDIA_CONN_LENGTH_1 in telemetry -- which will be all non-link-clickers -- I see:

  beta/40: 27
  aurora/41: 15
  nightly/42: 18


It's not perfect -- and the gap for Aurora is particularly large -- but the difference is at least plausibly attributable to telemetry opt-out. I see a similar pattern for, e.g., 2015-08-10 (a date picked out of thin air):

Log stats for 2015-08-10:
  { '40': 41, '41': 37, '42': 32 }
Getting LOOP_TWO_WAY_MEDIA_CONN_LENGTH_1 for versions ["40","41","42"] on 2015-08-10
  beta/40: 44
  aurora/41: 20
  nightly/42: 22

These are again close, with the telemetry numbers being a bit smaller than the log numbers. I note that beta is slightly higher, but I think telemetry might timestamp when the telemetry upload happens, not when the call itself was made (which means that we'll be looking at slightly different sets of data).

Dan, Erik: Does this explanation sound plausible? If so, I'm willing to put the issue to bed as simply being a little confused about what data you were looking at.

(Can't ni dmose right now -- I'm ni'ing myself to remind me to come back and tag him when he's back from vacation)
Flags: needinfo?(erik)
Flags: needinfo?(adam)

Comment 3

3 years ago
Created attachment 8653088 [details]
connection-count.js

If either of you would like to double-check my analysis here, I've attached the script that I used to generate my numbers (you'll need a node.js version that supports promises; 11.{recent} or 12.0 should work). I think Erik has elasticsearch access -- Dan: ping me if you need to be put in touch with the people who can get you credentials.

Comment 4

3 years ago
Running numbers for room creation (ignoring beta and release, which were not apparently instrumented for telemetry on this front):

Log stats for 2015-07-20:
  '41': 49,
  '42': 19,

Getting LOOP_ROOM_CREATE for versions ["41","42"] on 2015-07-20
  aurora/41: 54
  nightly/42: 31

That seems reasonable-ish, given the same assumptions I document above regarding date skew. So, really, do deletes:

logs:
  '41': 30,
  '42': 23

versus telmetry:
  aurora/41: 9
  nightly/42: 18

Updated

3 years ago
Flags: needinfo?(adam) → needinfo?(dmose)
Status: NEW → ASSIGNED
(Assignee)

Comment 5

2 years ago
Stealing to look at and move forward
Flags: needinfo?(standard8)
Flags: needinfo?(erik)
Flags: needinfo?(dmose)

Comment 6

2 years ago
(In reply to Mark Banner (:standard8) from comment #5)
> Stealing to look at and move forward

Reassigning accordingly. :)
Assignee: adam → standard8
(Assignee)

Comment 7

2 years ago
Looking back at comment 2 and comment 4, I think the numbers are close enough. There's various things we do know about telemetry limitations, and I think we don't need to look further into these at this time.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(standard8)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.