Closed
Bug 1318762
Opened 8 years ago
Closed 8 years ago
Add the ability to run more complex python + spark jobs
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1340595
People
(Reporter: mreid, Unassigned)
References
Details
We have the ability to run Jupyter Notebooks on both ATMO and Airflow, but they are not conducive to writing modular, well-tested, reusable code. They are great for simple tasks but as the notebook gets more complex, the convenience factor becomes less compelling than stability and correctness.
It would be nice to have something for Python jobs similar to what we have with Scala jobs in telemetry-batch-view[1] where we can structure the code and tests nicely, but still easily run a job on an ATMO-launched cluster or via Airflow.
We should provide a straightforward path from a functional Jupyter Notebook to a more robust python job.
It could be done within the existing telemetry-batch-view repo, python_moztelemetry, one of the various other existing repos, or we could create a new repo for python analyses.
The important parts in my mind are:
- test harness that runs on push (and includes test coverage info)
- code can be structured as more than one python source file
- code can easily be reused across analyses
- code can easily be run on atmo and by airflow
[1] https://github.com/mozilla/telemetry-batch-view
Updated•8 years ago
|
Priority: -- → P2
Comment 1•8 years ago
|
||
This is covered by bug 1340595
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•