Closed
Bug 1294702
Opened 8 years ago
Closed 6 years ago
Overwrite partial datasets after a job has been retriggered
Categories
(Data Platform and Tools :: General, defect, P2)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Assigned: amiyaguchi)
References
Details
User Story
A job scheduled with Airflow might fail after it has started copying data on S3. When Airflow retriggers it, the job should delete the partial data before it actually starts. We need to make sure that this is the case for all jobs currently being run on Airflow.
No description provided.
Updated•8 years ago
|
Points: --- → 2
Priority: -- → P2
Reporter | ||
Updated•7 years ago
|
Component: Metrics: Pipeline → Scheduling
Product: Cloud Services → Data Platform and Tools
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → amiyaguchi
Assignee | ||
Comment 1•6 years ago
|
||
This needs to double checking to make sure longitundial handles this case. Otherwise, I don't think our jobs care about partial data.
I added this recently in Longitudinal: https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L172-L173 But, yes, otherwise we should be using the spark parquet writer in atomic mode.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Component: Scheduling → General
You need to log in
before you can comment on or make changes to this bug.
Description
•