Closed Bug 1467874 Opened 7 years ago Closed 5 years ago

Add Pioneer R analysis environment to standard EMR clusters

Categories

(Data Platform and Tools Graveyard :: Operations, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whd, Unassigned)

Details

The CFN we currently use to provision pioneer clusters lives at: https://github.com/mozilla-services/cloudops-deployment/blob/pioneer/projects/data/ansible/templates/pioneer/pioneer-r.yaml. It contains additional bootstrap logic above what we use on ATMO to install R, RStudio, and some R packages that were considered useful in bug #1457151. We should consider supporting the above environment for our general ATMO clusters. However, the way the bootstrapping is currently structured it is very time-consumptive, since it compiles many packages from source. We would need to mirror a bunch of packages, or create a custom EMR image or similar optimization before I would be comfortable adding this to the standard bootstrap. This bug is for tracking the work to optimize and upstream the changes to the standard ATMO clusters (if desired). Filed under the ops component for lack of a better one; note that a component of this work would be to have data engineering support using RStudio as part of the platform.
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.