Closed
Bug 1290150
Opened 9 years ago
Closed 9 years ago
Support Parquet schema evolution in Spark and Presto
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Assigned: rvitillo)
References
Details
User Story
Changing the schema of a dataset is painful; let's fix that.
No description provided.
Updated•9 years ago
|
Points: --- → 3
Priority: -- → P2
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → rvitillo
Priority: P2 → P1
Assignee | ||
Comment 1•9 years ago
|
||
With the latest changes to our infrastructure both Spark and Presto deal nicely with evolving schema. Parquet2hive picks the latest available schema for a (dataset, version) combo. As long as that schema is backward compatible (e.g. a new nullable column has been added) with the ones used to generate older files, both Spark and Presto know how to deal with it.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•