Closed Bug 1402492 Opened 7 years ago Closed 7 years ago

Validate experiments daily aggregation logic

Categories

(Data Platform and Tools :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: spenrose, Unassigned)

Details

Attachments

(1 file)

The attached notebook identifies a client who submitted three pings with subsession_start_dates of 8-21-2017 and a main_summary.experiments value of {u'clicktoplay-rollout': u'test', u'e10sCohort': u'multiBucket4', u'pref-flip-searchcomp1-pref1-1390584': u'treatment', u'pref-flip-searchcomp1-pref2-1390584': u'control-ten', u'pref-flip-searchcomp1-pref3-1390584': u'gen1ser3gen5'} the subsession_length values were (in hours) [4.106388888888889, 0.9661111111111111, 0.060833333333333336] -> 5.133333333333334, but the corresponding row in experiments-daily has a subsession_hours_sum of 13.467777. So that's ... a problem.
I've dug into this a bit more and I have some additional light to shed. The v1 dataset (s3://net-mozaws-prod-us-west-2-pipeline-analysis/spenrose/experiments-daily/bug1390584/v1/) contains 1 row per client/date for profiles enrolled in all 3 pref-flip search experiments listed in comment 0, covering only the active days when profiles were enrolled. It spans a 3-week period. - As described in comment 0, this dataset exhibits inconsistencies in aggregated activity measures relative to main_summary. I compared it against an adhoc client/date aggregation of main_summary for the corresponding profiles, looking at subsession_hours_sum, active_hours_sum, search_count_all_sum, and scalar_parent_browser_engagement_total_uri_count_sum. - Most client/days were aggregated over the same number of pings, but had different values. Many experiment-daily activity values were specifically 3x higher. The v2 dataset (s3://net-mozaws-prod-us-west-2-pipeline-analysis/spenrose/experiments-daily/bug1390584/v2/) contains the same rows as v1, and additionally includes prior data for each of these profiles, spanning up to a month before each profile entered the experiment. - However, for each v1 row included in the v2 dataset, there is also a second v2 row for the same client and date for which the branch identifier is null. - This null-branch row (compared on the same measures listed above) almost always matches main_summary (aside from cases where the later run would have pulled more late-submitted pings into the aggregation). My conclusions based on this investigation are: - the v2 null-branch rows contain the good data (both prior to and during the experiment). My plan is to extract these and use them for the analysis. - something funky happened during aggregation for the with-branch rows. Given the 3x inflation factor, my guess is that this is related to the fact that these profiles were all enrolled in 3 experiments. AIUI, for each client/date, 3 rows are selected from base experiments-daily (which has 1 row per client/date/experiment), which should all contain identical data, and 1 row is retained out of those. Full details are in this rather long-winded notebook: https://metrics.mozilla.com/protected/dzeber/tmp/unified-search-v3-pull-data.html
We no longer plan to maintain experiments_daily as a separate dataset, instead we're adding experiments data to the very similar clients_daily dataset. See Bug 1431777.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Component: Datasets: Experiments → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: