Closed Bug 1246408 Opened 9 years ago Closed 9 years ago

Update EMR release to 4.3.0

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: whd)

References

Details

No description provided.
Once 1248336 lands, Parquet datasets will be accessible both from Spark and Presto. Packing more profiles per row group seems to be triggering a Spark bug that causes the "take(N)" operation to require a full scan of the dataset. The bug can be avoided by converting the dataset to a RDD, but that impacts performance. Spark 1.6 doesn't suffer from this issue and we should upgrade asap.
Flags: needinfo?(whd)
Blocks: 1242039
Priority: -- → P1
Note that Hive has to be deployed as well for Spark 1.6 to be able to read Parquet datasets.
Blocks: 1251580
Points: --- → 1
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(whd)
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.