Closed Bug 1296302 Opened 8 years ago Closed 6 years ago

Increase speed of parquet2hive when --success-only flag is set

Categories

(Data Platform and Tools Graveyard :: Operations, defect, P3)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: frank, Unassigned)

References

Details

parquet2hive currently checks every partition it encounters for a _SUCCESS file. This makes it slow for datasets partitioned on multiple dimensions. This can be improved using less client/server communication, and multiprocessing.
Points: --- → 2
Priority: -- → P3
Component: Metrics: Pipeline → Presto
Product: Cloud Services → Data Platform and Tools
We aren't using --success-only anywhere with any of our datasets, is this still a concern :frank?
Flags: needinfo?(fbertsch)
Component: Presto → Operations
QA Contact: moconnor
Not if we're not using it.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(fbertsch)
Resolution: --- → WONTFIX
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.