Devtools Amplitude is missing events from 4/19 - 4/21
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: frank, Assigned: akomar)
Details
Attachments
(1 file)
There are clearly some events missing. This query shows the few days with fewer events. It looks like we're missing events from a large cohort of users.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 1•5 years ago
•
|
||
This query looks good, there’s some variability but that’s related to weekends.
The problem lies in the readiness check [1] we use before starting the export, where we check if there is any data in the source view. However, devtools view merges events from telemetry.main (produced in main_summary DAG) and telemetry.event (produced in copy_deduplicate DAG) tables, so it is enough for just one of them to be ready for the export process to start.
We haven’t noticed this earlier because “event” events are a majority of our event volume and their task has been finishing before the “main” events. This was not the case on April 19-21, when copy_deduplicate dag started later than usual.
In order to avoid this problem in the future we should use ExternalTaskSensors to wait for both main_summary and copy_deduplicate DAGs before starting this export (I will file PR).
As for the data already exported to Amplitude:
- we can think about backfilling just the “event” events from April 19-21 - :digitarald - would this be useful?
- so far we have been in fact omitting “main” events in the daily exports, which will be corrected with the fix mentioned above. I would lean towards not backfilling them in Amplitude as they are only ~0.1% of the total volume.
[1] https://github.com/mozilla/telemetry-airflow/blob/master/dags/utils/amplitude.py#L56-L59
Comment 2•5 years ago
|
||
we can think about backfilling just the “event” events from April 19-21 - :digitarald - would this be useful?
Since we have these dips in data we are currently trying to answer around covid, backfilling would be appreciated if it doesn't add too much work.
Comment 3•5 years ago
|
||
| Assignee | ||
Comment 4•5 years ago
|
||
(In reply to :Harald Kirschner :digitarald from comment #2)
Since we have these dips in data we are currently trying to answer around covid, backfilling would be appreciated if it doesn't add too much work.
Backfill is now running, I'll update this bug when it finishes.
| Assignee | ||
Comment 5•5 years ago
|
||
Backfill is completed.
Description
•