Support EMR clusters with Spark.

RESOLVED FIXED in Future

Status

Webtools
Telemetry Dashboard
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: rvitillo, Assigned: rvitillo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

We support different instance types for different needs, e.g. c4.3x for mozilla map-reduce, r4.3 and r4.2 for Spark.
(Assignee)

Updated

3 years ago
Component: Telemetry → Telemetry Dashboard
Product: Toolkit → Webtools
Target Milestone: --- → Future
Version: unspecified → Trunk
(Assignee)

Comment 1

3 years ago
Per conversation on IRC with mreid, we agreed on using Amazon EMR for Spark clusters. So the first step here is to adapt Spark's bootstrap scripts to our infrastructure (i.e. https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/).

Once this is done, the self-service dash has to be changed to allow the launch of EMR clusters.
(Assignee)

Updated

3 years ago
Assignee: nobody → rvitillo
(Assignee)

Updated

3 years ago
Summary: Add option to select instance type for self service jobs → Support EMR clusters with Spark.
(Assignee)

Comment 2

3 years ago
Habemus EMR Spark cluster. To run a cluster using the AWS command line interface:

aws emr create-cluster --name SparkCluster --ami-version 3.3 --instance-type r3.2xlarge --instance-count 5 --applications Name=Hive --service-role EMR_DefaultRole --ec2-attributes KeyName=YOUR_KEY,InstanceProfile=telemetry-spark-emr --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark Path=s3://telemetry-spark-emr/telemetry.sh

Where --instance-type, --instance-count and KeyName are the important parameters. Once the cluster is running, one can access the IPython interface by forwarding port 8888, e.g.:

ssh -L 8888:localhost:8888 hadoop@ec2-54-188-187-154.us-west-2.compute.amazonaws.com

and finally opening the IPython interface by visiting http://localhost:8888 in Firefox. The cluster comes preconfigured with an API to access Telemetry data. I am planning to write a simple example to show how to use it.

Mark, now it's your turn :) We should integrate the functionality to launch a cluster in the self-service dashboard. It would also be great if you could have a look at the policy I defined (telemetry-spark-emr), thanks!
Flags: needinfo?(mreid)

Comment 3

3 years ago
I had a look at the telemetry-spark-emr role, and to me it looks overly permissive... The first statement:
> {
>   "Action": [
>     "cloudwatch:*", 
>     "dynamodb:*", 
>     "ec2:Describe*", 
>     "elasticmapreduce:Describe*", 
>     "rds:Describe*", 
>     "s3:*", 
>     "sdb:*", 
>     "sns:*", 
>     "sqs:*"
>   ], 
>   "Resource": [
>     "*"
>   ], 
>   "Effect": "Allow"
> }
grants all permissions on a broad range of services - are these all required to get spark up and running?
Flags: needinfo?(mreid)

Comment 4

3 years ago
Benson, can you review this role (or pick someone appropriate to check it out)?
Flags: needinfo?(bwong)
The IAM permissions does look too permissive for us to run it like that in production. In the dev iam, anything goes. 

I think the *:Describe* permissions are fine as they're read only. 

The s3:*, sdb:*, etc need the Resources to be specified. For example s3:* should have a bucket specified in Resources. 

The danger of s3:* is if an server gets compromised it will have full permissions to all of our S3 buckets.
Flags: needinfo?(bwong)
(Assignee)

Comment 6

3 years ago
Benson, is the role fine now?
Flags: needinfo?(bwong)
Can you please post the whole role and I'll help review it.
Flags: needinfo?(bwong)
(Assignee)

Comment 8

3 years ago
Created attachment 8545137 [details]
telemetry-spark-emr
Attachment #8545137 - Flags: review?(bwong)
Attachment #8545137 - Flags: review?(bwong)
(Assignee)

Comment 9

3 years ago
Benson, why did you remove the r? flag? Did you review the role or should I ask somebody else to review it?
Flags: needinfo?(bwong)
I forgot to comment. The role looks good to me.
Flags: needinfo?(bwong)
(Assignee)

Updated

3 years ago
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.