Closed Bug 1318762 Opened 8 years ago Closed 8 years ago

Add the ability to run more complex python + spark jobs

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1340595

People

(Reporter: mreid, Unassigned)

References

Details

We have the ability to run Jupyter Notebooks on both ATMO and Airflow, but they are not conducive to writing modular, well-tested, reusable code. They are great for simple tasks but as the notebook gets more complex, the convenience factor becomes less compelling than stability and correctness. It would be nice to have something for Python jobs similar to what we have with Scala jobs in telemetry-batch-view[1] where we can structure the code and tests nicely, but still easily run a job on an ATMO-launched cluster or via Airflow. We should provide a straightforward path from a functional Jupyter Notebook to a more robust python job. It could be done within the existing telemetry-batch-view repo, python_moztelemetry, one of the various other existing repos, or we could create a new repo for python analyses. The important parts in my mind are: - test harness that runs on push (and includes test coverage info) - code can be structured as more than one python source file - code can easily be reused across analyses - code can easily be run on atmo and by airflow [1] https://github.com/mozilla/telemetry-batch-view
Priority: -- → P2
This is covered by bug 1340595
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.