1272395 - Test Pilot UT Pings under reporting compared to other sources

Reporter

Description

•

8 years ago

The Activity Stream active users count numbers in the thousands: https://sql.telemetry.mozilla.org/dashboard/activity-stream-current-active-users

This is roughly the same as what is tracked by GA for downloads.

However the UT TxP pings reveal a significantly lower amount of active users (either MAU or DAU).  This script (https://gist.github.com/rjweiss/1193b079c3bfaa7038c41ca4c2ceadff) suggests only a few hundred users.

It appears there is underreporting, either via the client or the pipeline.

rweiss@mozilla.com

Reporter

Updated

•

8 years ago

Flags: needinfo?(wclouser)

Flags: needinfo?(mreid)

Cory Price [:ckprice] (bugmail disabled, NI me!)

Updated

•

8 years ago

Blocks: 1257690

Thomas Huelbert

Updated

•

8 years ago

Points: --- → 1

Priority: -- → P1

Thomas Huelbert

Updated

•

8 years ago

Assignee: nobody → mreid

Wil Clouser [:clouserw]

Comment 1

•

8 years ago

This is also being tracked at https://github.com/mozilla/testpilot/issues/815 .  There is a function which wakes up every 10 minutes to see if it submitted something in the last 24 hours and, if not, submits a ping:  https://github.com/mozilla/testpilot/blob/master/addon/lib/metrics.js#L72

Flags: needinfo?(wclouser)

Mark Reid [:mreid]

Assignee

Comment 2

•

8 years ago

Per email discussion, in order to shed light on how to improve the latency we're seeing:

I was thinking of something like this:

- For each testpilot ping, grab testpilot install date, ping creation date, submission date, and clientid
- Find the earliest install date (or creation date) per clientid
- Compute the delta between the install/creation date and the submission date
- Look at the distribution in submission latency for clientids we *did* see.

Do the same for testpilottest pings to see how the latency distribution compares.

We can't efficiently filter the entire Telemetry corpus for "has testpilot enabled", but we can efficiently use the set of clientids in the union of both the sets above, and see what the latency looks like for main pings from the same clientids (and compare it to the background latency for all main pings) using the main_summary dataset.

Further, we should check how many testpilottest clientids were not found in the main pings during the same interval.

Some predictions:

If we find that testpilottest contains many clientids that did not report main pings or that the latency for testpilottest clientids in the main dataset is significantly higher than the background latency, we are probably running up against the throttling behaviour on the client. Follow-up: How many testpilottest pings are reported per clientid per day? Actions here would be to decrease the number of testpilottest pings or ease up on client throttling for tpt pings.

If we find that significantly more testpilottest clientids are present in the main pings than the testpilot pings (and that the latency is not significantly worse than the background rate), it should be safe to increase the frequency of testpilot submission, and that should improve latency.

Flags: needinfo?(mreid)

Mark Reid [:mreid]

Assignee

Comment 3

•

8 years ago

I ran the above analysis, here is the notebook with the code and results.
https://gist.github.com/mreid-moz/e007487a0b03f2ee40ad3ccd6b21f44a

From cells 25 and 26, it looks like the lower submission rate for testpilot pings is not reflected in the main telemetry pings for the same set of clientids, so I'm fairly confident we're not hitting client throttling behaviour. We should be safe to fix / increase frequency of the testpilot submissions without having a negative impact on other data reporting.

A client-side fix went in for https://github.com/mozilla/testpilot/issues/815, so I'll re-run the notebook in a few days and see if the symptoms improve.

Mark Reid [:mreid]

Assignee

Comment 4

•

8 years ago

I've updated the notebook, you can compare the differences with the last run on the "revisions" pane of the gist in comment 3.

The summary is that the number of unique clientids sending 'testpilot' pings has increased dramatically.  Looks like things are on the right track!

The interesting changes are in cells #5, 6, 26, and 27.

The "latency since install" graph (cell 29) also changed significantly, presumably due to previously-unreported clients sending in data after the above client fix.

Mark Reid [:mreid]

Assignee

Comment 5

•

8 years ago

I've checked again and the number of clientids reporting testpilot pings is fairly stable at the new and improved rate. It would appear that the client-side fix did the trick!

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Wil Clouser [:clouserw]

Comment 6

•

8 years ago

Thanks for the help, everyone.

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Quick Search

Test Pilot UT Pings under reporting compared to other sources

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rweiss, Assigned: mreid)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated