Closed Bug 1279268 Opened 8 years ago Closed 7 years ago

Deploy Scala notebook on Spark clusters

Categories

(Data Platform and Tools :: General, defect, P2)

defect
Points:
1

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: rvitillo, Unassigned)

References

Details

User Story

As we rely more on Scala for our ETL jobs there should be an easy way to experiment with it. The spark-notebook [1] project provides the features we want. 

In order to run spark-notebook on our Spark clusters, we need a script similar to [2] where the right version for EMR 4.3 is retrieved though (scala-2.10.5, spark-1.6.0, hadoop-2.7.1-with-hive-with-parquet). Furthermore, the configuration [3] needs to be adapted to ours, which basically means using the parameters specified in /etc/spark/conf/spark-default.conf.

Finally, to access Heka files on S3, a small library has to be extracted from telemetry-batch-view and made available from the spark packages repository.

[1] https://github.com/andypetrella/spark-notebook
[2] https://s3-us-west-1.amazonaws.com/spark-notebook-emr/4.6/emr-4.6.sh
[3] https://s3-us-west-1.amazonaws.com/spark-notebook-emr/4.6/emr-4.6-snb.conf

See also https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/
      No description provided.
Summary: Deploy Spark notebook on Spark clusters → Deploy Scala notebook on Spark clusters
Assignee: nobody → rvitillo
Points: --- → 1
Priority: -- → P3
User Story: (updated)
User Story: (updated)
Depends on: 1283446
Blocks: 1283447
Assignee: rvitillo → nobody
Priority: P3 → P2
User Story: (updated)
Component: Metrics: Pipeline → Spark
Product: Cloud Services → Data Platform and Tools
We now deploy Zeppelin notebooks on our clusters, which covers the scala use case.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
Component: Spark → General
You need to log in before you can comment on or make changes to this bug.