Closed
Bug 1467874
Opened 7 years ago
Closed 5 years ago
Add Pioneer R analysis environment to standard EMR clusters
Categories
(Data Platform and Tools Graveyard :: Operations, enhancement, P3)
Data Platform and Tools Graveyard
Operations
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: whd, Unassigned)
Details
The CFN we currently use to provision pioneer clusters lives at: https://github.com/mozilla-services/cloudops-deployment/blob/pioneer/projects/data/ansible/templates/pioneer/pioneer-r.yaml. It contains additional bootstrap logic above what we use on ATMO to install R, RStudio, and some R packages that were considered useful in bug #1457151.
We should consider supporting the above environment for our general ATMO clusters. However, the way the bootstrapping is currently structured it is very time-consumptive, since it compiles many packages from source. We would need to mirror a bunch of packages, or create a custom EMR image or similar optimization before I would be comfortable adding this to the standard bootstrap. This bug is for tracking the work to optimize and upstream the changes to the standard ATMO clusters (if desired).
Filed under the ops component for lack of a better one; note that a component of this work would be to have data engineering support using RStudio as part of the platform.
| Reporter | ||
Updated•7 years ago
|
Priority: -- → P3
| Reporter | ||
Updated•5 years ago
|
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Updated•2 years ago
|
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•