Closed
Bug 1272395
Opened 8 years ago
Closed 8 years ago
Test Pilot UT Pings under reporting compared to other sources
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rweiss, Assigned: mreid)
References
Details
The Activity Stream active users count numbers in the thousands: https://sql.telemetry.mozilla.org/dashboard/activity-stream-current-active-users This is roughly the same as what is tracked by GA for downloads. However the UT TxP pings reveal a significantly lower amount of active users (either MAU or DAU). This script (https://gist.github.com/rjweiss/1193b079c3bfaa7038c41ca4c2ceadff) suggests only a few hundred users. It appears there is underreporting, either via the client or the pipeline.
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(wclouser)
Flags: needinfo?(mreid)
Updated•8 years ago
|
Points: --- → 1
Priority: -- → P1
Updated•8 years ago
|
Assignee: nobody → mreid
Comment 1•8 years ago
|
||
This is also being tracked at https://github.com/mozilla/testpilot/issues/815 . There is a function which wakes up every 10 minutes to see if it submitted something in the last 24 hours and, if not, submits a ping: https://github.com/mozilla/testpilot/blob/master/addon/lib/metrics.js#L72
Flags: needinfo?(wclouser)
Assignee | ||
Comment 2•8 years ago
|
||
Per email discussion, in order to shed light on how to improve the latency we're seeing: I was thinking of something like this: - For each testpilot ping, grab testpilot install date, ping creation date, submission date, and clientid - Find the earliest install date (or creation date) per clientid - Compute the delta between the install/creation date and the submission date - Look at the distribution in submission latency for clientids we *did* see. Do the same for testpilottest pings to see how the latency distribution compares. We can't efficiently filter the entire Telemetry corpus for "has testpilot enabled", but we can efficiently use the set of clientids in the union of both the sets above, and see what the latency looks like for main pings from the same clientids (and compare it to the background latency for all main pings) using the main_summary dataset. Further, we should check how many testpilottest clientids were not found in the main pings during the same interval. Some predictions: If we find that testpilottest contains many clientids that did not report main pings or that the latency for testpilottest clientids in the main dataset is significantly higher than the background latency, we are probably running up against the throttling behaviour on the client. Follow-up: How many testpilottest pings are reported per clientid per day? Actions here would be to decrease the number of testpilottest pings or ease up on client throttling for tpt pings. If we find that significantly more testpilottest clientids are present in the main pings than the testpilot pings (and that the latency is not significantly worse than the background rate), it should be safe to increase the frequency of testpilot submission, and that should improve latency.
Flags: needinfo?(mreid)
Assignee | ||
Comment 3•8 years ago
|
||
I ran the above analysis, here is the notebook with the code and results. https://gist.github.com/mreid-moz/e007487a0b03f2ee40ad3ccd6b21f44a From cells 25 and 26, it looks like the lower submission rate for testpilot pings is not reflected in the main telemetry pings for the same set of clientids, so I'm fairly confident we're not hitting client throttling behaviour. We should be safe to fix / increase frequency of the testpilot submissions without having a negative impact on other data reporting. A client-side fix went in for https://github.com/mozilla/testpilot/issues/815, so I'll re-run the notebook in a few days and see if the symptoms improve.
Assignee | ||
Comment 4•8 years ago
|
||
I've updated the notebook, you can compare the differences with the last run on the "revisions" pane of the gist in comment 3. The summary is that the number of unique clientids sending 'testpilot' pings has increased dramatically. Looks like things are on the right track! The interesting changes are in cells #5, 6, 26, and 27. The "latency since install" graph (cell 29) also changed significantly, presumably due to previously-unreported clients sending in data after the above client fix.
Assignee | ||
Comment 5•8 years ago
|
||
I've checked again and the number of clientids reporting testpilot pings is fairly stable at the new and improved rate. It would appear that the client-side fix did the trick!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 6•8 years ago
|
||
Thanks for the help, everyone.
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•