Closed Bug 1290150 Opened 8 years ago Closed 8 years ago

Support Parquet schema evolution in Spark and Presto

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

User Story

Changing the schema of a dataset is painful; let's fix that.
      No description provided.
Points: --- → 3
Priority: -- → P2
Assignee: nobody → rvitillo
Priority: P2 → P1
With the latest changes to our infrastructure both Spark and Presto deal nicely with evolving schema. Parquet2hive picks the latest available schema for a (dataset, version) combo. As long as that schema is backward compatible (e.g. a new nullable column has been added) with the ones used to generate older files, both Spark and Presto know how to deal with it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.