Closed
Bug 1482924
Opened 7 years ago
Closed 7 years ago
Investigate backfilling Savant data to correct active_ticks values
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bugzilla, Assigned: klukas)
References
Details
Attachments
(3 files)
No description provided.
The Savant subsession-split meta-event uses the active_ticks value from simpleMeasurements, which is incorrect in ~8% of ping per bug 1482466. Thankfully, since this study ran entirely on FF 61, we can use the scalar value, which is correct.
I'll leave the investigation re: the best way to go about this to Jeff but since even a full backfill would not take too long, this seems like a clear-cut case where a full mitigation makes sense.
Blocks: 1482466
Assignee | ||
Updated•7 years ago
|
Priority: -- → P1
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
Discussed with Sunah, Josephine, and folks on the Amplitude side.
Looks like there's not really a concept of deleting events in Amplitude, so folks on the Amplitude side were recommending we start with a fresh project and backfill everything. That would require some coordination to update names of projects, API keys, etc., and would also mean blowing through another ~1 billion events of our annual quota (35 billion).
I think I'm going to pursue a solution for a "partial" backfill of just session split events, since those are the only ones with active_ticks populated, which is the affected quantity. There are only ~15 million of these events, so will have much less effect on the quota.
The current events are called "Meta - session split" and in the backfill, I'll plan to change the name to "Meta - session split v2". Once the code is updated for the new name, I'll backfill and then we can either set the old event to be inactive and not visible, or delete the old event type or set up a "data filter" to hide it.
See https://amplitude.zendesk.com/hc/en-us/articles/235649848-Settings
Comment 4•7 years ago
|
||
Assignee | ||
Comment 5•7 years ago
|
||
Kicked off on ATMO:
spark-submit --class com.mozilla.telemetry.streaming.EventsToAmplitude telemetry-streaming.jar --config-file-path savant2.json --url https://api.amplitude.com/httpapi --from 20180626 --to 20180814 --max-parallel-requests 40
Where savant2.json has all events stripped out except for the session split event.
Savant_Prod is showing 704,698,214 right now, and I expect the to increase to ~720M once this completes.
Comment 6•7 years ago
|
||
Assignee | ||
Comment 7•7 years ago
|
||
Ran into issues with deduplication, so had to make another logic change. Trying another backfill in increments.
Assignee | ||
Comment 8•7 years ago
|
||
The v3 event backfill looks good. I've changed names in the UI so that the v1 and v2 events are hidden and the v3 event shows up as "Meta - session split".
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Datasets: Events → General
You need to log in
before you can comment on or make changes to this bug.
Description
•