1225080 - a.t.m.o should support Spark 1.5

Assignee

Description

•

9 years ago

The spark1.5 branch of emr-bootstrap-spark contains a working set of scripts to launch an interactive job. Mark, could you add the required changes to a.t.m.o to allow users to launch Spark jobs from the dashboard?

I am going to add support for batch jobs in the next days.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Flags: needinfo?(mreid)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Flags: needinfo?(mreid) → needinfo?(whd)

Mark Reid [:mreid]

Comment 1

•

9 years ago

I think :whd is going to be looking at this soon.

This amounts to updating the launcher scripts to use a similar command to the "Interactive job" section here:
https://github.com/mozilla/emr-bootstrap-spark/tree/spark1.5#interactive-job

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 2

•

9 years ago

https://github.com/mozilla/telemetry-server/pull/133

Flags: needinfo?(whd)

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Assignee: nobody → rvitillo

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 3

•

9 years ago

https://github.com/mozilla/emr-bootstrap-spark/pull/10

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 4

•

9 years ago

Mark, Wesley, who is supposed to review the patch?

Flags: needinfo?(whd)

Flags: needinfo?(mreid)

Wesley Dawson [:whd]

Comment 5

•

9 years ago

I'm reviewing this presently, which includes setting up a staging environment to test current jobs.

Flags: needinfo?(whd)

Flags: needinfo?(mreid)

Wesley Dawson [:whd]

Comment 6

•

9 years ago

In the interest of expedition I'm testing the current scheduled spark jobs via a database dump and standalone emr cluster first instead of setting up a proper staging environment.

The first job I tested completed (Addon Analysis), but had very different results from the current output (90 entries on a.t.m.o vs. 16 on spark 1.5.2). I'll look into this more presently.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 7

•

9 years ago

(In reply to Wesley Dawson [:whd] from comment #6)
> In the interest of expedition I'm testing the current scheduled spark jobs
> via a database dump and standalone emr cluster first instead of setting up a
> proper staging environment.
> 
> The first job I tested completed (Addon Analysis), but had very different
> results from the current output (90 entries on a.t.m.o vs. 16 on spark
> 1.5.2). I'll look into this more presently.

Do you have a log of the job by chance?

Wesley Dawson [:whd]

Comment 8

•

9 years ago

I re-ran the addon analysis and found that the reason the results were mismatched was PEBKAC. I futzed with get_pings parameters since the query for "yesterday" was returning an empty RDD set. I while diagnosing this changed the 1.3.0 notebook to look at release pings instead of nightly, which caused the discrepancy.

Looking at the other scheduled jobs, everything else I've tested so far is working as expected. Spark 1.5.2 seems to lose some precision in some calculations (e.g. 8.62309073992e-05 vs 8.623090739921546e-05 in 1.3.0) but that's more or less well into noise.

I'll finish up testing the remaining jobs and then deploy to a.t.m.o.

Katie Parlante

Updated

•

9 years ago

Priority: -- → P1

Wesley Dawson [:whd]

Comment 9

•

9 years ago

Another issue I noticed and forgot to mention about spark 1.5: it seems the spark web UI has changed to do some http redirection, which makes it harder to access that interface via port forwarding. Where before I could simply forward 4040 to the local host, that now results in a redirect to a different port, but also using the internal hostname of the host itself, something like:

http://ip-172-31-10-159.us-west-2.compute.internal:20888/proxy/application_1449014891980_0011/

If you know what you are doing this can be surmounted, but it's certainly an inconvenience.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 10

•

9 years ago

Unfortunately the interface never really worked using simple port forwarding, you should use a SOCKS proxy.

Wesley Dawson [:whd]

Comment 11

•

9 years ago

An update here: the code has been merged and I deployed it. However, when doing a final round of testing, launching a spark cluster resulted in a bootstrap failure, so I rolled back. I have not been able to reproduce the bootstrap failure when running things manually, and the logs (aside, we should enable s3 logging of EMR jobs) before the EMR cluster is terminated don't show anything fatal. I do see the emacs build failing:

make: *** [install-emacs] Error 255
/mnt/var/lib/bootstrap-actions/1/telemetry.sh: line 107: Submodule: command not found

but the final command (ipython) is succeeding. EMR says the bootstrap script is exiting with status code 2.

I've got a shadow copy of a.t.m.o running at ec2-54-213-222-151.us-west-2.compute.amazonaws.com (all scheduled jobs disabled) which I will continue to test with until I figure out what the problem is.

Wesley Dawson [:whd]

Comment 12

•

9 years ago

Looks like this was an issue with overriding the system python. https://github.com/mozilla/emr-bootstrap-spark/pull/11 has the fix. As for why it only happens when running via telemetry-dash, I haven't the faintest.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 13

•

8 years ago

Landed.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Quick Search

a.t.m.o should support Spark 1.5

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated