deprecate the taskclusteretl.timing table
Categories
(Data Platform and Tools :: General, task, P2)
Tracking
(Not tracked)
People
(Reporter: trink, Unassigned)
References
Details
Initially this will just look like a rename timing -> derived_timing the main difference being the update schedule, once daily (the existing queries will be updated as necessary).
As we add more data sources and artifacts to the etl process the raw data ingestion will move closer to mirroring the original data source (like task_definition, task_resolution, worker_metrics etc). Most of the timing table is data from the parse of the live_backing.log combined with the task definition data (before the load process). In the future it will be created by de duping and joining the original raw tables in BigQuery (task_definition, an the new live_log and any additional information we decide to add). This should simplify some of the ETL process and make it easier to evolve and experiment with new schemas as needs change.
Reporter | ||
Updated•6 years ago
|
Comment 1•1 day ago
|
||
Hello,
The Mozilla Data Engineering organization is currently going through our extensive backlog, consisting of hundreds of issues stretching back for nearly 10 years. We've done a pass through all of the open bugzilla bugs and have identified and tagged the ones that we think are relevant enough to still need attention. The rest, including the bug with which this comment is associated, we are closing as "WONTFIX" in a single bulk operation.
If you feel we have closed this (or any) issue in error, please feel free to take the following actions:
- Reopen the bug.
- Edit the bug to add the string
[dataplatform]
(including the brackets) to theWhiteboard
field. (Note that you must edit theWhiteboard
, not the similarly namedQA Whiteboard
.)
Doing this will ensure that we see the bug in our weekly triage process, where we will decide how to proceed.
Thank you.
Description
•