Closed Bug 1362229 Opened 8 years ago Closed 8 years ago

Determine WTMO deployment procedure

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: whd, Assigned: hwoo)

References

Details

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/telemetry-airflow/pull/198 8 years ago GitHub Bugzilla PR Linker 53 bytes, text/x-github-pull-request		Details \| Review

Wesley Dawson [:whd]

Reporter

Description

•

8 years ago

Now that WTMO has been migrated to dockerflow we need to determine the deployment cadence for it. There's been an email thread with some competing notions on how to proceed, so I'm filing this bug to determine a resolution. The models being considered are essentially fully automated deployment or two-step production deployment. I started to enumerate some technical details around them, but decided I wanted to spend a tractable amount of time writing this bug so I'm just going to point at https://github.com/mozilla-services/cloudops-deployment/pull/664, https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/data/puppet/yaml/app/data.prod.wtmo.yaml#L10-L12, and https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/data/puppet/yaml/app/data.stage.wtmo.yaml#L10-L12, which contain, in my opinion, the operative parameters for discussion. My original understanding of deployment requirements was that dag modifications (the majority of changes to our airflow container) should be automatically deployed to both staging and production. There is notably no or very little testing in this case, but as I understand it the majority of prior issues related to airflow were not around dag changes, but rather operational issues with the service that should now be resolved. In this model issues related to a dag deploy can be quickly addressed by merging the fix to master, which is then auto-deployed. There's a technical wrinkle around worker/scheduler replacement that could be resolved in at least four ways: social convention (don't merge to master while jobs are running or are about to run), using the new EMR operator / sensor mechanisms that are the subject of bug #1325393, some kind of operational instrumentation (e.g. of the worker queue) that exposes to the deployment pipeline whether and when it is safe to redeploy, or some mechanism like mounting an external volume to the docker container that facilitates dag "deploys" without requiring a rebuild of the container. Two-step deployment can be accomplished in myriad ways using the aforementioned parameters, depending on where we want to put the verification procedures. In the current configuration we have a full staging environment at https://data-wtmo.stage.mozaws.net/admin/ that has the same permissions as the production environment, but has all dags paused and is configured to dump analysis data to telemetry-test-bucket instead of our production data buckets. I have no particular preference on how we proceed, as I believe whatever method we decide can be implemented in such a way that there is no operator involvement.

Jason Thomas [:jason]

Updated

•

8 years ago

Assignee: nobody → whd

Points: --- → 2

Priority: -- → P2

Wesley Dawson [:whd]

Reporter

Comment 1

•

8 years ago

From email discussion, we're going to move forward with the following: 1) Split Airflow between the web service and the DAGs. 2) Use two-step-deployment for new Airflow versions and operator changes. 3) Continuously deployed individual DAGs and their tasks. I'll work out the implementation of this next sprint.

Harold Woo

Assignee

Comment 2

•

8 years ago

For my own reference: - Cfn/Ansible template for creating EFS in staging/prod - Modify app.yml (https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/data/ansible/templates/wtmo/app.yml) userdata to mount EFS volume to the webapp/worker instances - Mount EFS folder to containers - modify telemetry-airflow/airflow.cfg to point to EFS mounted folder? This may break current deployment? - Add cronjob on webapps/worker to keep EFS and github in sync (git pull) in userdata as well? - add circleci tests for DAG syntax on telemetry-airflow repo - modify circleci build so that changes to dag folder do not create new containers and deploy - airflow scheduler on worker instance needs logrotation(https://bugzilla.mozilla.org/show_bug.cgi?id=1392310)

Harold Woo

Assignee

Updated

•

8 years ago

Assignee: whd → hwoo

GitHub Bugzilla PR Linker

Comment 3

•

8 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/telemetry-airflow/pull/198 — Details

Harold Woo

Assignee

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Scheduling → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Determine WTMO deployment procedure

Categories

(Data Platform and Tools :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: whd, Assigned: hwoo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Updated

Updated

Attachment

General

Description

File Name

Content Type