Closed Bug 1482924 Opened 7 years ago Closed 7 years ago

Investigate backfilling Savant data to correct active_ticks values

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: klukas)

References

Details

Attachments

(3 files)

No description provided.
The Savant subsession-split meta-event uses the active_ticks value from simpleMeasurements, which is incorrect in ~8% of ping per bug 1482466. Thankfully, since this study ran entirely on FF 61, we can use the scalar value, which is correct. I'll leave the investigation re: the best way to go about this to Jeff but since even a full backfill would not take too long, this seems like a clear-cut case where a full mitigation makes sense.
Blocks: 1482466
Priority: -- → P1
Discussed with Sunah, Josephine, and folks on the Amplitude side. Looks like there's not really a concept of deleting events in Amplitude, so folks on the Amplitude side were recommending we start with a fresh project and backfill everything. That would require some coordination to update names of projects, API keys, etc., and would also mean blowing through another ~1 billion events of our annual quota (35 billion). I think I'm going to pursue a solution for a "partial" backfill of just session split events, since those are the only ones with active_ticks populated, which is the affected quantity. There are only ~15 million of these events, so will have much less effect on the quota. The current events are called "Meta - session split" and in the backfill, I'll plan to change the name to "Meta - session split v2". Once the code is updated for the new name, I'll backfill and then we can either set the old event to be inactive and not visible, or delete the old event type or set up a "data filter" to hide it. See https://amplitude.zendesk.com/hc/en-us/articles/235649848-Settings
Kicked off on ATMO: spark-submit --class com.mozilla.telemetry.streaming.EventsToAmplitude telemetry-streaming.jar --config-file-path savant2.json --url https://api.amplitude.com/httpapi --from 20180626 --to 20180814 --max-parallel-requests 40 Where savant2.json has all events stripped out except for the session split event. Savant_Prod is showing 704,698,214 right now, and I expect the to increase to ~720M once this completes.
Ran into issues with deduplication, so had to make another logic change. Trying another backfill in increments.
The v3 event backfill looks good. I've changed names in the UI so that the v1 and v2 events are hidden and the v3 event shows up as "Meta - session split".
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Datasets: Events → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: