Closed Bug 1307826 Opened 3 years ago Closed 3 years ago

Deploy PushApkWorker on its own production machine

Categories

(Release Engineering :: Release Automation: Other, defect)

Unspecified
Android
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: jlorenzo)

References

Details

Attachments

(3 files)

Bug 1306307 is making the staging instance live. This bug tracks the work related to the production environment.
Attached file Build cloud tool PR
:mtabara told me you were the expert on that matter :) Feel free to redirect.
Attachment #8809077 - Flags: review?(rail)
Assignee: nobody → jlorenzo
Comment on attachment 8809077 [details] [review]
Build cloud tool PR

commented in the PR
Attachment #8809077 - Flags: review?(rail) → review+
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

Here's the puppet changes for pushapkwoker production. Staging was already reviewed by Aki in bug 1306307.

As of now, the production instance is already live (via my personal environment). I followed the steps you mentioned at in bug 1308042 comment 33:
1. Packages with the correct version (0.1.3) exist at [1]
2. Hiera secrets are fed in releng-puppet2:/etc/hiera/secrets.eyaml
3. 700 on /build/pushapkworker[2]
4. no nuke needed, for now
5. verbose is currently on[3]

Regarding your other points:
a) production instance was built against a clean ec2
e) PR in build-cloud-tools already landed (see other attachments of this bug)
f) fqdn updated
g) scope needed are held here[4]
j) I'm waiting on your review before pushing to hg.m.o :) 

[1] https://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages-3.5/
[2] https://reviewboard.mozilla.org/r/92024/diff/3#38
[3] https://reviewboard.mozilla.org/r/92024/diff/3#0
[4] https://tools.taskcluster.net/auth/clients/#project%252freleng%252fscriptworker%252fpushapk%252fproduction
Attachment #8809433 - Flags: review?(mtabara)
Attachment #8812754 - Flags: review?(rail) → review+
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

https://reviewboard.mozilla.org/r/92024/#review94970

All in all it looks good ;) However I'd just advise for a rebase against the current head of puppet, lots have changed in terms of signing/scriptworker there and there might be numerous conflicts you'd have to tackle. Also, please make sure you push these changes to a temporary PR on https://github.com/mozilla/build-puppet to check the linters. You can close the PR once the tests pass. It saves lots of trouble later on.

I'll mark this as r- for now since I'm not sure if all the changes are successfully merge-able in the main repo.
Please correct me if I'm wrong.

::: manifests/moco-config.pp:406
(Diff revision 3)
>      # TC signing workers
>      $signingworker_exchange = "exchange/taskcluster-queue/v1/task-pending"
>      $signingworker_worker_type = "signing-worker-v1"
>  
> -    # TC signing scriptworkers
> -    $signing_scriptworker_provisioner_id = "scriptworker-prov-v1"
> +    # TC Scriptworkers
> +    $scriptworker_provisioner_id = "scriptworker-prov-v1"

Might need to rebase. This var no longer exists.

::: manifests/moco-config.pp
(Diff revision 3)
>  
>      # TC beetmover scriptworkers
>      $beetmover_scriptworker_task_max_timeout = 2400
>      $beetmover_scriptworker_artifact_expiration_hours = 336
>      $beetmover_scriptworker_artifact_upload_timeout = 600
> -    $beetmover_scriptworker_verbose_logging = false

Slippery fingers? :P Please leave this be, it's used in the beetmoverworker templates :)

::: manifests/moco-config.pp:465
(Diff revision 3)
>              beetmover_aws_s3_fennec_bucket => "net-mozaws-stage-delivery-archive",
>          }
>      }
>  
> +    ## TC pushapk scriptworkers
> +    $pushapk_scriptworker_taskcluster_client_id = secret("pushapk_scriptworker_taskcluster_client_id")

Don't forget to add these two in hiera before merging default to production or else it'll complain very ugly.

::: manifests/moco-nodes.pp:1189
(Diff revision 3)
> +    $timezone = "UTC"
> +    include toplevel::server::pushapkworker
> +}
> +
>  ## Loaners
> +

All good here, but I'm wondering if this should land on production branch. Might need to double-check with @rail or @aki on this one.

The way I did this was to take the diff and add the ## Loaners section as a patch on my puppet environment. You get the same result, absent having it deployed on production. But again, I could be completely wrong, please double check with more experienced puppet folks from our team ;)

::: modules/pushapkworker/manifests/init.pp:102
(Diff revision 3)
> +            group       => "${users::signer::group}",
> +            content     => secret('pushapk_scriptworker_release_google_play_certificate'),
> +            show_diff   => false;
> +    }
> +
> +    service {

No longer need this - since https://hg.mozilla.org/build/puppet/rev/f295d0822bd4 it's been a common disabled service

::: modules/pushapkworker/templates/config.json.erb:21
(Diff revision 3)
> +    "verify_chain_of_trust": false,
> +    "sign_chain_of_trust": false,
> +
> +    "credentials": {
> +        "clientId": "<%= scope.function_secret(["pushapk_scriptworker_taskcluster_client_id"]) %>",
> +        "accessToken": "<%= scope.function_secret(["pushapk_scriptworker_taskcluster_access_token"]) %>"

@rail discouraged me to use scope.function_secret directly in the templates as it hides out the secrets from the main moco-config.file. Instead, you can use the same mechanism used for worker_id above ^

::: modules/pushapkworker/templates/script_config.json.erb:8
(Diff revision 3)
> +    "schema_file": "<%= scope.lookupvar("config::pushapk_scriptworker_root") %>/lib/python3.5/site-packages/pushapkscript/data/pushapk_task_schema.json",
> +    "verbose": <%= @env_config["pushapk_scriptworker_verbose_logging"] %>,
> +
> +    "google_play_accounts": {
> +        "aurora": {
> +            "service_account": "<%= scope.function_secret(["pushapk_scriptworker_aurora_google_play_service_account"]) %>",

More or less a nit: @rail discouraged me to use scope.function_secret directly in the templates as it hides out the secrets from the main moco-config.file. Instead, you can use the same mechanism used for verbose above ^

::: modules/pushapkworker/templates/script_config.json.erb:12
(Diff revision 3)
> +        "aurora": {
> +            "service_account": "<%= scope.function_secret(["pushapk_scriptworker_aurora_google_play_service_account"]) %>",
> +            "certificate": "<%= scope.lookupvar("config::pushapk_scriptworker_aurora_google_play_certificate") %>"
> +        },
> +        "beta": {
> +            "service_account": "<%= scope.function_secret(["pushapk_scriptworker_beta_google_play_service_account"]) %>",

@rail discouraged me to use scope.function_secret directly in the templates as it hides out the secrets from the main moco-config.file. Instead, you can use the same mechanism used for verbose above ^

::: modules/pushapkworker/templates/script_config.json.erb:16
(Diff revision 3)
> +        "beta": {
> +            "service_account": "<%= scope.function_secret(["pushapk_scriptworker_beta_google_play_service_account"]) %>",
> +            "certificate": "<%= scope.lookupvar("config::pushapk_scriptworker_beta_google_play_certificate") %>"
> +        },
> +        "release": {
> +            "service_account": "<%= scope.function_secret(["pushapk_scriptworker_release_google_play_service_account"]) %>",

same here.
Attachment #8809433 - Flags: review?(mtabara) → review-
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

Thanks for the review Mihai! Here's a new revision that addresses the following points:
* Rebased on top of the latest tip[1]
* `function_secret()` is not used anymore in templates. Vars are now retrieved in moco-config.pp. This allows tc_credentials to be defined in the same hiera file
* PushApkScript has been upgraded to 0.1.4[2] (to avoid debug logs in oath2)
* Single quotes are now used when a string doesn't need to be evaluated
* The loaner entry has disappeared
* So does the rpcbind one
* The signer user is not the owner anymore. cltbld is.

This patch has been tested against
* the linter running in Github[3]
* the staging instance[4]
* the production instance[5] (still pinned to my personal env)

Asking new review to Rail, as Mihai is on PTO this week.

[1] https://hg.mozilla.org/build/puppet/rev/43cd21057086
[2] https://github.com/mozilla-releng/pushapkscript/releases/tag/0.1.4
[3] https://github.com/mozilla/build-puppet/pull/20
[4] https://tools.taskcluster.net/task-inspector/#J9WN95hvQOOw20tqaFfXyA/0
[5] https://tools.taskcluster.net/task-inspector/#JTWEANzkSBSZBL_ZPxfT7A/0
Attachment #8809433 - Flags: review?(rail)
No longer blocks: 1320672
Depends on: 1320672
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

Bug 1320672 being fixed, I changed the PR so we now have different certificates for dev and prod. This has been tested against staging[1] and against prod[2]. When puppet agent applied the changes, nothing happened in production (as expected) and new certs were applied to staging. I am sure the new certs were picked up as [3] complained about the insufficient permission (this has been fixed since).

As you can see [1] and [2] are marked as failed. This is a limitation of Google Play, which doesn't accept APKs to be uploaded twice. Today's APKs were uploaded today, in my preview comment.

r? Rail

[1] https://tools.taskcluster.net/task-inspector/#CIYaqzgdQc6GHIeVeN050w/0
[2] https://tools.taskcluster.net/task-inspector/#IixzRzW8TYuo3jMcVnk78g/0
[3] https://tools.taskcluster.net/task-inspector/#AD9kUpfNQyW43OhkKALY-Q/0
Attachment #8809433 - Flags: review?(rail)
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

https://reviewboard.mozilla.org/r/92024/#review96122

::: modules/pushapkworker/manifests/mime_types.pp:7
(Diff revision 8)
> +
> +    case $::operatingsystem {
> +        CentOS: {
> +            file { '/etc/mime.types':
> +                mode        => '0644',
> +                content     => 'application/vnd.android.package-archive     apk',

This one is a bit brutal. :) I hope we don't use this file anywhere else.
Attachment #8809433 - Flags: review?(rail) → review+
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

https://reviewboard.mozilla.org/r/92024/#review96122

> This one is a bit brutal. :) I hope we don't use this file anywhere else.

This file is used by google-api-python-client to make sure it's pushing an APK. It relies on https://docs.python.org/3/library/mimetypes.html which needs this file, no matter what disto we're on. Without this file, google-api-python-client just errors out it can handle the given file. I'll add a comment in the code about that.
Comment on attachment 8809433 [details]
Bug 1307826 - Deploy PushApkWorker on its own production machine

Carrying over r+. I just added a comment to explain why /etc/mime.types was necessary https://reviewboard.mozilla.org/r/92024/diff/8-9/

Landed on default branch at https://hg.mozilla.org/build/puppet/rev/94d76acb93cf
Attachment #8809433 - Flags: review+
I manually ran `sudo puppet agent --test`, which gave:
> Info: Retrieving pluginfacts
> Info: Retrieving plugin
> Info: Loading facts
> Info: Caching catalog for pushapkworker-1.srv.releng.use1.mozilla.com
> Info: Applying configuration version '190761744079'

Process seems still up and running, even after a `sudo supervisorctl restart pushapkworker`. I'll wait until the next aurora comes to see if the whole pipeline works.
It worked:
* Job appearing in Treeherder: https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=96503957841c8c7617a416719c89a06778de396a&filter-tier=3&selectedJob=4340617
* TC task: https://tools.taskcluster.net/task-inspector/#Lpu-6VuHT1ekFOP8FnlqTg/0

Due to yesterday's aurora bustage, I discovered a minor bug that prevents some results from being seen in Treeherder: https://github.com/mozilla-releng/fennec-aurora-task-creator/issues/9. As it's not related to the production deployment per se, I'll fix it there.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Good job on finishing this, jlorenzo++ ;)
You need to log in before you can comment on or make changes to this bug.