Closed Bug 1290150 Opened 8 years ago Closed 8 years ago

Support Parquet schema evolution in Spark and Presto

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

User Story

Changing the schema of a dataset is painful; let's fix that.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Description

•

8 years ago

      No description provided.

Thomas Huelbert

Updated

•

8 years ago

Points: --- → 3

Priority: -- → P2

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

8 years ago

Assignee: nobody → rvitillo

Priority: P2 → P1

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 1

•

8 years ago

With the latest changes to our infrastructure both Spark and Presto deal nicely with evolving schema. Parquet2hive picks the latest available schema for a (dataset, version) combo. As long as that schema is backward compatible (e.g. a new nullable column has been added) with the ones used to generate older files, both Spark and Presto know how to deal with it.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Support Parquet schema evolution in Spark and Presto

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Updated