If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Make fennec aurora deploy updates automatically to Google Play Store

RESOLVED FIXED

Status

Release Engineering
Release Automation
RESOLVED FIXED
a year ago
11 months ago

People

(Reporter: jlorenzo, Assigned: jlorenzo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

As aurora isn't based on release tasks like beta and release, the solution is different from the one which will solve bug 1306311.
Created attachment 8799451 [details]
Nightlies Watcher first commit

Following up on Friday's chat, I started a small watcher that we can host on Heroku. Its role is to watch TC index for new auroras (nightly builds) and create publishing tasks (handled by pushapkworker[1]).


Alternatives to watcher:
* Add another step after [2] in buildbase. I discovered this doesn't work, because we have to provide every APKs URLs before giving them to pushapkworker. 
* Create a downstream job in Buildbot. One big drawback is that we'd have to port it once Fennec builds are all done in TC.
* Create a TC hook which runs a few hours after the nightly one[3]. This doesn't sound rock solid if one build takes longer than it should.


Speaking of drawbacks, here are the ones of the watcher:
* We don't wait until tests are green. The current deployment process made by Sylvestre already has this flaw.
* That's one more machine to monitor


Mitigation of risks:
* If a bad build has already been deployed, Google Play allows to unpublish it.
* If we have to stop publishing updates for a moment, we can either stop the worker, or temporarily revoke the aurora account on Google Play Store. This is done by restricting the projects the account has access to.
* Both of these process have to be done manually (hence, documented), but Sylvestre told me trees have to be stopped only once every quarter. 


Enhancements:
* Check if Aurora Desktop is already published before doing the same on Android. 
* Automate the account revocation alongside with the other kill switch


Requirements:
* Client with these scopes[4]


:Callek, if going with that watcher sounds good to you, I'll iron it out with tests and deploy it. Otherwise, I'll be glad to discuss another solution :)


[1] https://github.com/mozilla-releng/pushapkworker
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#1420
[3] https://tools.taskcluster.net/hooks/#project-releng/nightly-fennec-dev
[4] https://tools.taskcluster.net/auth/clients/#mozilla-ldap%252fjlorenzo@mozilla.com%252faurora-watcher-test
Attachment #8799451 - Flags: feedback?(bugspam.Callek)
> * We don't wait until tests are green. The current deployment process made by Sylvestre already has this flaw.
Tbh, I wasn't aware of this :p
Comment on attachment 8799451 [details]
Nightlies Watcher first commit

Your explanation and overall plan sounds good to me, at first glance. I'm far from aware of pro's/cons of heroku, nor how stable we need to make this. So I'm going to redirect the feedback request to :rail who I suspect has better insight.
Attachment #8799451 - Flags: feedback?(bugspam.Callek) → feedback?(rail)
After more investigation, there are some issues not called out above:
* The current watcher stores, in a file, the taskIds of the last submitted APKs. Heroku won't keep that file, due to its filesystem[1]. A way to fix it would be to store that in a database (which seems overkill, tbh)
* The publishing task is not linked to any task group. It doesn't report back to Treeherder either. Plugging it to Treeherder would be easy enough. However, I don't think this is the right place to expose that information. 


On the bright side:
* We can monitor what version has been uploaded by configuring Google Play. We can ask them to email a given email address every time new APKs are sucessfully uploaded.


[1] https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem
Discussed offline with Rail:
* Heroku is okay. Rail proposed to look into Kinto for data storage. [1] seems promising.
* Publishing on Treeherder is not a bad idea. Rail mentioned the result can be easily reported by adding a route like: tc-treeherder.v2.mozilla-aurora.{revision}.{push_id}. push_id can be found at [2]

[1] https://github.com/Kinto/kinto-heroku
[2] https://hg.mozilla.org/releases/mozilla-aurora/json-pushes
Comment on attachment 8799451 [details]
Nightlies Watcher first commit

In overall it LGTM with some minor comments and a joke in the push. :D
Attachment #8799451 - Flags: feedback?(rail) → feedback+
Feedback comments addressed in https://github.com/JohanLorenzo/nightlies-watcher/compare/3b9f954c2eeff497d435d7159a60e5d9e4dc36a7...84fde806d22086320eec6601ad6f7ad0901aa052
Created attachment 8800217 [details] [review]
PR: Report publication results to Treeherder

After looking more into Treeherder, we can actually use it as a database that stores what was the latest task published. Kinto is not necessary anymore.
Assignee: nobody → jlorenzo
Comment on attachment 8800217 [details] [review]
PR: Report publication results to Treeherder

Pulse is not yet used, but the PR to publish on Treeherder is already big enough.

Like said offline yesterday, we won't trigger a new task is one is already listed in Treeherder.

As an example, here's a (failing) task[1] reported on Treeherder[2]

[1] https://tools.taskcluster.net/task-inspector/#P--WFX6HRIm-zU25WkqLIw/ Failing, because the build has already been uploaded to Aurora.
[2] https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=2cebb2efe185a3a7593a738af84b7d70e327de77&filter-tier=3&exclusion_profile=false&selectedJob=3833576 You'll see more than one job. These were my different tries in putting the task definition in config.json
Attachment #8800217 - Flags: review?(rail)
Tests will also come next, once the pulse architecture in in place.
Attachment #8800217 - Flags: review?(rail) → review+
Created attachment 8802045 [details] [review]
PR: Use pulse events instead of TC polling
Comment on attachment 8802045 [details] [review]
PR: Use pulse events instead of TC polling

Here's a version that relies on Pulse to trigger new builds. I apologize in advance for the big review. The biggest part of changes are due to the inclusion of tests. We're now at 100% coverage.

In the Pulse workflow, we verify if all the builds are present each time we receive a message (like you suggested). We still check if a push has already been performed by looking at Treeherder. This may happen if the second build is done at the same time as the first Pulse message. When the second message comes, we won't trigger a new build this way.

Btw, the message queue is handled with aioamqp. I use some code from pulse-notify. The main reason is that aioamqp helped in having non-object oriented modules. It also brings async, but performance is not really helpful for this module. It'll deal with 2 messages per day, at most :)

There's still some work related to fetching configuration from ENV, so we'll be able to configure the heroku instances. I propose to do that in a next bug.
Attachment #8802045 - Flags: review?(rail)
Blocks: 1311624
Attachment #8802045 - Flags: review?(rail) → review+
Thank you for the reviews Rail! We now have a version that works locally. Bug 1311624 will make it deployed on Heroku
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.