Support Parquet schema evolution in Spark and Presto

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
2 years ago
6 days ago

People

(Reporter: rvitillo, Assigned: rvitillo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

User Story

Changing the schema of a dataset is painful; let's fix that.
Comment hidden (empty)

Updated

2 years ago
Points: --- → 3
Priority: -- → P2
(Assignee)

Updated

2 years ago
Assignee: nobody → rvitillo
Priority: P2 → P1
(Assignee)

Comment 1

2 years ago
With the latest changes to our infrastructure both Spark and Presto deal nicely with evolving schema. Parquet2hive picks the latest available schema for a (dataset, version) combo. As long as that schema is backward compatible (e.g. a new nullable column has been added) with the ones used to generate older files, both Spark and Presto know how to deal with it.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED

Updated

6 days ago
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.