Increase speed of parquet2hive when --success-only flag is set

NEW
Unassigned

Status

Data Platform and Tools
Presto
P3
minor
a year ago
6 months ago

People

(Reporter: frank, Unassigned)

Tracking

Details

(Reporter)

Description

a year ago
parquet2hive currently checks every partition it encounters for a _SUCCESS file. This makes it slow for datasets partitioned on multiple dimensions. This can be improved using less client/server communication, and multiprocessing.
Blocks: 1255752

Updated

a year ago
Points: --- → 2
Priority: -- → P3
(Reporter)

Updated

6 months ago
Component: Metrics: Pipeline → Presto
Product: Cloud Services → Data Platform and Tools
You need to log in before you can comment on or make changes to this bug.