Closed Bug 1369149 Opened 8 years ago Closed 5 years ago

Develop Deploy Mechanism for Dataset Creation Code

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: frank, Assigned: amiyaguchi)

Details

Frank Bertsch [:frank]

Reporter

Description

•

8 years ago

For telemetry-batch-view and python-mozetl, we have one deploy method: merge to master. The Airflow scripts pull down those repos, build the projects (where applicable), and launch the Spark jobs. Obviously this has some downsides. We can't have multiple versions of these dataset creation scripts, there is no separation of stage and prod, and we can't test out new releases. We should integrate our Airflow scripts, t-b-v releases, and python_mozetl releases . This could involve some sort of JAR building and deploying to s3 along with git releases, and pulling said jars down in the Airflow jobs. There are probably other/easier/better ways that we can investigate. python_mozetl could just use pypi, similar to our other python projects.

Ryan Harter [:harter]

Updated

•

8 years ago

Assignee: nobody → amiyaguchi

Points: --- → 3

Priority: -- → P3

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 1

•

8 years ago

There's currently a common entrypoint for mozetl jobs added in bug 1385232. However, the repository should definitely turn into a package with proper tagged releases, but only after the common submission script is utilized to a greater extent. Currently, 10/20 jobs in mozetl are using `mozetl-submit.sh` with the airflow wrapper. The submission script has an option for using alternate git paths and branches, which was primarily for my development workflow. On my local instance of airflow, I can edit the environment appropriately to read the package from my PR branch. I've used this pin versions of a dataset creation script in bug 1404502. The Click environment convention is fairly powerful and it could unify the two repositories under a single command line API. A wrapper script like [1] could abstract away the details of building and submitting jobs from Airflow. [1] https://github.com/acmiyaguchi/telemetry-airflow/blob/c75f08a260c956181801de3ddbffcdcdfe18b5d6/jobs/retention.sh

Jason Thomas [:jason]

Updated

•

7 years ago

Whiteboard: [SvcOps] → [DataOps]

Jason Thomas [:jason]

Updated

•

7 years ago

Whiteboard: [DataOps]

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 2

•

5 years ago

We now do the equivalent of this in GCP through the docker deployments of bigquery-etl

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Datasets: General → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Develop Deploy Mechanism for Dataset Creation Code

Categories

(Data Platform and Tools :: General, enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: frank, Assigned: amiyaguchi)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Comment 2

Updated