Open Bug 1605470 Opened 6 years ago Updated 5 years ago

deprecate the taskclusteretl.timing table

Categories

(Data Platform and Tools :: General, task, P2)

task
Points:
3

Tracking

(Not tracked)

People

(Reporter: trink, Unassigned)

References

(Blocks 1 open bug)

Details

Initially this will just look like a rename timing -> derived_timing the main difference being the update schedule, once daily (the existing queries will be updated as necessary).

As we add more data sources and artifacts to the etl process the raw data ingestion will move closer to mirroring the original data source (like task_definition, task_resolution, worker_metrics etc). Most of the timing table is data from the parse of the live_backing.log combined with the task definition data (before the load process). In the future it will be created by de duping and joining the original raw tables in BigQuery (task_definition, an the new live_log and any additional information we decide to add). This should simplify some of the ETL process and make it easier to evolve and experiment with new schemas as needs change.

Points: --- → 3
Priority: -- → P2
Blocks: 1605994
You need to log in before you can comment on or make changes to this bug.