Closed Bug 1279268 Opened 8 years ago Closed 7 years ago

Deploy Scala notebook on Spark clusters

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: rvitillo, Unassigned)

References

Details

User Story

As we rely more on Scala for our ETL jobs there should be an easy way to experiment with it. The spark-notebook [1] project provides the features we want. 

In order to run spark-notebook on our Spark clusters, we need a script similar to [2] where the right version for EMR 4.3 is retrieved though (scala-2.10.5, spark-1.6.0, hadoop-2.7.1-with-hive-with-parquet). Furthermore, the configuration [3] needs to be adapted to ours, which basically means using the parameters specified in /etc/spark/conf/spark-default.conf.

Finally, to access Heka files on S3, a small library has to be extracted from telemetry-batch-view and made available from the spark packages repository.

[1] https://github.com/andypetrella/spark-notebook
[2] https://s3-us-west-1.amazonaws.com/spark-notebook-emr/4.6/emr-4.6.sh
[3] https://s3-us-west-1.amazonaws.com/spark-notebook-emr/4.6/emr-4.6-snb.conf

See also https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

8 years ago

      No description provided.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Summary: Deploy Spark notebook on Spark clusters → Deploy Scala notebook on Spark clusters

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Assignee: nobody → rvitillo

Points: --- → 1

Priority: -- → P3

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

User Story: (updated)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

User Story: (updated)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Depends on: 1283446

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Blocks: 1283447

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Assignee: rvitillo → nobody

Katie Parlante

Updated

•

8 years ago

Priority: P3 → P2

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

7 years ago

User Story: (updated)

Jason Thomas [:jason]

Updated

•

7 years ago

Component: Metrics: Pipeline → Spark

Product: Cloud Services → Data Platform and Tools

Mark Reid [:mreid]

Comment 1

•

7 years ago

We now deploy Zeppelin notebooks on our clusters, which covers the scala use case.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Spark → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Deploy Scala notebook on Spark clusters

Categories

(Data Platform and Tools :: General, defect, P2)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 1

Updated