Closed Bug 1516014 Opened 7 years ago Closed 6 years ago

Productionize duplicate job id on disk check

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bugzilla, Unassigned)

References

Details

(Whiteboard: [DataPlatform])

I hacked up a script for Bug 1515730 that checks parquet file names on disk to see if they have duplicate job IDs which works for most of our cases but the parquet file output names don't seem to be consistent everywhere. We need to: - Understand why parquet job IDs differ in some cases in normal runs - Clean up the script (needs some refactoring, etc) - Run this over historical data - Schedule this to run periodically
Points: --- → 2
Priority: -- → P2
Whiteboard: [DataPlatform]
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.