Closed Bug 1409091 Opened 3 years ago Closed 2 years ago

Implement Focus/Klar release pipeline on Taskcluster

Categories

(Release Engineering :: Release Automation: Other, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: jlorenzo)

References

Details

(Whiteboard: [releaseduty])

Attachments

(7 files)

The Focus team would like to automate APK deployment on Google Play. Automation requires the whole process to be trusted. Then, I think we ultimately want to implement this automated process: https://docs.google.com/drawings/d/1SbuKD_4KWgxGCJTbLYxNqUiyYC6h2XBtIZ8iZDBYOyw/edit?usp=sharing


That's basically a subset of what Fennec implements. Every dark box is a task in the process. Every medium-lighted rounded box is a worker on which a task runs. Each task depends on the previous one.

The whole process is guarded against unknown origins by ChainOfTrust[1]. To sum CoT up, each task GPG-signs a specific artifact[2] to prove it ran on a trusted machine. Then, each downstream task verifies the identity of the upstream ones.

1. Trusted repo: Aki told me CoT was designed to support several project (including non-in-tree repos). However, Focus is the first project asking for it. There will be some modifications needed. At first, trusting the repo is a matter of adding a new entry here[3].

2. Decision: That's the task in charge of creating the others. It basically submit tasks definitions to TC's queue. A script on a docker-worker is enough. In the context of ChainOfTrust, the decision task must also sign the specific artifact. Per the CoT docs[4], the private GPG key of docker-workers lives on a trusted VM image (AMI), hosted on AWS EC2. You need to reach out the TC team, to gather the details.

3. Build: Crafts the APK with the release flags. In order to avoid confusion, the build should remain unsigned. zipaligning is not necessary. This task needs to run on a trusted AMI. Regarding the token you need to pull, this may be done in TC secrets[5].

4. Sign: We can use the existing signing_scriptworker and signing servers. Releng has to add support for Focus/Klar.

5. Generate screenshots: Like build and decision, you need a trusted AMI. You can also chunk jobs per locale or set of locales. Chunking has to be defined in the decision task.

6. Upload: We can use the existing beetmover_scriptworker. We just have to add support for Focus/Klar. This can easily be done by adding a similar template to the fennec one[6]. Same thing for screenshots. It can be done by the Focus team and reviewed by Releng.

7. Signoff: This is a dummy task that does nothing except waiting on somebody to complete it. After the build is tested and the release stakeholders agree to ship, someone from the Focus team will use the taskcluster-cli[7] tool to manually resolve this task.

8. Publish: Like other scriptworker instances, we can use the existing instance. The current implementation is fennec-specific[8]. We have to add support there too. pushapk_scriptworker mainly delegates checks and uploads to mozapkpublisher. That tool needs Focus support. Working on mozapkpublisher[9] doesn't require any Taskcluster knowledge.



Draft of TODO list. That's just a rough idea of what task can be done in which team, that's not an actual plan:

    Focus team?
        Decision: script the decision task to schedule dummy tasks.
        Decision: modify the decision task to chunk screenshots.
        Decision: modify the decision task to support ChainOfTrust
        Decision: modify the decision task to implement the real graph.
        Decision/Build: reach out to the TC team to get dedicated AMI. (They might allow you to use the same AMIs)
        Build: Verify the external token can be rotated.
        Build: Reach out to the TC team to validate TC secrets are the right way to store such secrets.
        Build: Configure the task to get an actual release build.
        Screenshot: You may want to look into TC hooks to craft screenshots every night, instead of every release.
        Upload: Add S3 template for Focus/Klar.
        Upload: You may want to test out by uploading unsigned APKs on a staging S3 bucket. Reach out to cloudops to get a staging bucket.
        Publish: Add Focus/Klar support on mozapkpublisher
        Publish: Add Focus/Klar supporton pushapk_scriptworker

    Releng?
        Decision: Review and deploy scriptworker to trust Focus' repo.
        Signing: Add support for Focus/Klar
        Signing: Deploy updated signing toolchain
        Upload: Review and deploy beetmover
        Publish: Review and deploy mozapkpublisher & pushapk_scriptworker


Due to the amount of work, this bug won't be finished by 2017Q4.


[1] http://scriptworker.readthedocs.io/en/latest/chain_of_trust.html
[2] https://public-artifacts.taskcluster.net/SXkpGlgbRGunmNAb4s7BhQ/0/public/chainOfTrust.json.asc
[3] https://github.com/mozilla-releng/scriptworker/blob/06cccc1502fb11c9254a4d0e586afac16e7c8c8f/scriptworker/constants.py#L189-L204
[4] http://scriptworker.readthedocs.io/en/latest/chain_of_trust.html#embedded-gpg-keys
[5] https://tools.taskcluster.net/secrets/
[6] https://github.com/mozilla-releng/beetmoverscript/blob/master/beetmoverscript/templates/fennec_nightly.yml
[7] https://github.com/taskcluster/taskcluster-cli
[8] https://github.com/mozilla-releng/pushapkscript
[9] https://github.com/mozilla-releng/mozapkpublisher
Duplicate of this bug: 1394486
(In reply to Johan Lorenzo [:jlorenzo] from comment #0)
> 5. Generate screenshots: Like build and decision, you need a trusted AMI.
> You can also chunk jobs per locale or set of locales. Chunking has to be
> defined in the decision task.

Nit: Screenshots aren't a part of the release graph. Hence, please ignore steep 5.
Whiteboard: [releaseduty]
Setting P2 until we prioritize this better.
Priority: -- → P2
(In reply to Johan Lorenzo [:jlorenzo] from comment #0)
> The Focus team would like to automate APK deployment on Google Play.
> Automation requires the whole process to be trusted. Then, I think we
> ultimately want to implement this automated process:
> https://docs.google.com/drawings/d/
> 1SbuKD_4KWgxGCJTbLYxNqUiyYC6h2XBtIZ8iZDBYOyw/edit?usp=sharing

I just signed+zipaligned a focus apk and noticed that the focus repo has a
.taskcluster.yml and release builds in taskcluster, and thought we should
CoT-enable it. GMTA, yay!

I had some thoughts, especially since many things have changed in the past 6
months. LMK if you want to chat about this. I'm happy to see this bug exists :)

- I think the largest outstanding task is taking gecko's `taskcluster/taskgraph`
  code and making a generic python module out of it. We should chat with the
  taskcluster team about this.

- We may or may not need beetmover, depending on whether we want to save the
  binaries+logs on S3. You mentioned it in the comment, but it's not in the
  drawing.

- We've moved away from the push-apk-breakpoint; we can allow for a 2nd phase
  to ship the release build if desired. I'm not sure if we want a separate
  Google Play strings task like we have for Fennec, but that also seems
  straightforward to replicate here.

- We probably want to have a separate set of cot keys and workerTypes than
  gecko.

- If we want extra verification, we could require the release tag be gpg
  signed by an allowlisted gpg key or something like that. We could also just
  rely on a hook + 2FA; that's probably easier to support.

- We may want to add autograph support in signingscript or elsewhere, since
  we're moving to autograph-signing for apks.

- We probably want to consider what we're doing about toolchains and
  docker-images. It's possible we curate these manually, or we could use the
  approach we're using in the gecko taskgraph.

- We want to make sure that the decision+action tasks are reproducible from
  .taskcluster.yml, and that they publish the artifacts important to cot
  (task-graph.json, label-to-taskid.json, actions.json, chainOfTrust.json.asc)
  Ideally this would be part of the generic taskgraph python module.

>     Focus team?
>         Decision: script the decision task to schedule dummy tasks.
>         Decision: modify the decision task to chunk screenshots.
>         Decision: modify the decision task to support ChainOfTrust
>         Decision: modify the decision task to implement the real graph.

We (releng? taskcluster team?) may want to genericize the taskgraph code; then
we could move the existing Focus builds from .taskcluster.yml to the graph.
Ideally we'll get some of the graph-specific things relatively easily, since we
can borrow liberally from the gecko transforms and configs.

This will be the model for supporting other smaller projects on Github - the
next one (Rocket?) should be a lot faster to implement!
Depends on: 1455290
(In reply to Aki Sasaki [:aki] from comment #4)
> I just signed+zipaligned a focus apk and noticed that the focus repo has a
> .taskcluster.yml and release builds in taskcluster, and thought we should
> CoT-enable it. GMTA, yay!
> 


> - I think the largest outstanding task is taking gecko's
> `taskcluster/taskgraph`
>   code and making a generic python module out of it. We should chat with the
>   taskcluster team about this.

> - We want to make sure that the decision+action tasks are reproducible from
>   .taskcluster.yml, and that they publish the artifacts important to cot
>   (task-graph.json, label-to-taskid.json, actions.json,
> chainOfTrust.json.asc)

At the moment, the Focus release task graph is simple enough to not need `taskcluster/taskgraph`. They have their own decision task creation. Like you said on IRC, this decision task must create the right CoT artifacts.


> - We may or may not need beetmover, depending on whether we want to save the
>   binaries+logs on S3. You mentioned it in the comment, but it's not in the
>   drawing.
Yeah, beetmover would be needed. That's a request from the Focus team. However, this is out of scope, for now. IIUC, they want Google Play publication first.


> - We've moved away from the push-apk-breakpoint

Good point. We might be okay to skip it thanks to the use of the GP Alpha track. Sebastian, what's current release process? Do you want to QA builds before they go on the Alpha track?


> - We probably want to have a separate set of cot keys and workerTypes than
>   gecko.
Yes! Discussed in bug 1455290.


> - If we want extra verification, we could require the release tag be gpg
>   signed by an allowlisted gpg key or something like that. We could also just
>   rely on a hook + 2FA; that's probably easier to support.
Good idea on the tag. I'll put on the the trello board https://trello.com/b/BIy6spbX/pushing-focus-to-google-play. 



> - We may want to add autograph support in signingscript or elsewhere, since
>   we're moving to autograph-signing for apks.
Agreed. This can be done in the future. For now, let's stick with the signing scriptworker/servers


> - We probably want to consider what we're doing about toolchains and
>   docker-images. It's possible we curate these manually, or we could use the
>   approach we're using in the gecko taskgraph.
What's the toolchains docker image?


> We (releng? taskcluster team?) may want to genericize the taskgraph code;
> then we could move the existing Focus builds from .taskcluster.yml to the graph.
> Ideally we'll get some of the graph-specific things relatively easily, since
> we can borrow liberally from the gecko transforms and configs.

Okay, that seems a lot of work for a single work week. Let's keep that in mind for a follow up. 


> This will be the model for supporting other smaller projects on Github - the
> next one (Rocket?) should be a lot faster to implement!
That would be awesome. At the end of the work week, I'll list up what we have to replicate for Rocket or another project. That's something the Focus team is interested in.
Flags: needinfo?(s.kaspari)
Flags: needinfo?(aki)
Toolchain tasks tasks are taskgraph tasks that build things like compilers; docker-image tasks build docker images. Once they're built, future graphs can use those toolchains and docker-images. If we're not using taskcluster/taskgraph, these may be a bit too much to add in; we can allowlist the docker image shas. If we figure out how to support multiple cot products in a single instance, we could also piggyback off gecko docker-image tasks.

I wrote comment 4 without knowing that the focus work week was dedicated to this, and yes, I mentioned a lot of tasks that might not be able to be finished within that week. Let's narrow down the scope tomorrow and move the rest to followups.
Flags: needinfo?(aki)
Depends on: 1456109
Assignee: nobody → jlorenzo
Attachment #8970167 - Flags: feedback?(aki)
I'm adding support of focus in the signing servers. I haven't tested it out yet. Is there a staging server I can use?
Flags: needinfo?(aki)
(In reply to Johan Lorenzo [:jlorenzo] from comment #11)
> I'm adding support of focus in the signing servers. I haven't tested it out
> yet. Is there a staging server I can use?

We aren't using the depsigning servers at all yet. I imagine you'd want to use a dep key rather than the real one, but that would allow you to test the flow, at least.
Flags: needinfo?(aki)
Comment on attachment 8970167 [details]
[build/puppet] Add Mozilla mobile signing and push apk instances

https://reviewboard.mozilla.org/r/238984/#review244664

::: manifests/moco-nodes.pp:931
(Diff revision 3)
>      include toplevel::server::signingscriptworker
>  }
>  
> +# https://github.com/mozilla-mobile workers. The "e" in mobile was stripped out
> +# in order to leave up to 100 workers instead of 10.
> +node /^mobil-signing-linux-\d*\.srv\.releng\..*\.mozilla\.com$/ {

This is getting close to the 22 char limit. We're probably fine here, but we could also do signing-focus-\d+
If we foresee the need for >9 of these, then renaming might make sense. I'm guessing we may revisit how scriptworkers are deployed before we need more than 9, though.
Attachment #8970167 - Flags: review+
Attachment #8970167 - Flags: feedback?(aki) → feedback+
Comment on attachment 8970188 [details]
[build/tools] Signing servers: Support focus-jar

https://reviewboard.mozilla.org/r/238998/#review244666

::: release/signing/signscript.py:199
(Diff revision 1)
> +        if not keystore:
> +            parser.error("%s required when format is %s" % (keystore_config_name, format_))
> +        if not keyname:
> +            parser.error("%s required when format is %s" % (keyname_config_name, format_))
>          copyfile(inputfile, tmpfile)
> -        jar_signfile(tmpfile, options.jar_keystore,
> +        jar_signfile(tmpfile, keystore, keyname, digestalg, sigalg, options.fake, passphrase)

Does this mean we have to have the same passphrase for both keys?

::: release/signing/signtool.py:46
(Diff revision 1)
>  
>  def main():
>      allowed_formats = ("sha2signcode", "sha2signcodestub", "signcode",
>                         "osslsigncode", "gpg", "mar", "mar_sha384", "dmg",
> -                       "dmgv2", "macapp", "jar", "emevoucher",
> +                       # "jar" alone is to sign Fennec
> +                       "dmgv2", "macapp", "jar", "focus-jar" "emevoucher",

tools/release/signing/signtool.py is used for tb and esr52. signingscript uses https://github.com/mozilla-releng/signtool .

We probably need a patch for that. We probably also need a patch for https://github.com/mozilla-releng/signingscript/blob/master/signingscript/task.py#L29-L40 . It's best practice to update the config in https://hg.mozilla.org/build/tools/file/tip/release/signing/signing.ini.template#l62 and https://hg.mozilla.org/build/tools/file/tip/release/signing/signing.ini.template#l81 - you may have missed this file in the puppet patch as well.
Attachment #8970188 - Flags: review?(aki) → review+
Attachment #8970491 - Flags: review?(aki)
Attachment #8970512 - Flags: review?(aki) → review+
Attachment #8970491 - Flags: review?(aki) → review+
Attachment #8970896 - Flags: review?(mozilla) → review+
Comment on attachment 8970896 [details] [diff] [review]
[build/puppet] Allow mobil-signing-linux-1.srv.releng.use1.m.c to reach signing-linux servers

Review of attachment 8970896 [details] [diff] [review]:
-----------------------------------------------------------------

Landed on default: https://hg.mozilla.org/build/puppet/rev/8ebc7493401b0ef90b1cb6070c19cd327a87650d
and production: https://hg.mozilla.org/build/puppet/rev/9e250776a77bfffe078f8b7c8f7e7a0f78faa8e2. I checked with :tomprince if we could push bug 1443588 to production too. He was fine with it.
Attachment #8970896 - Flags: checked-in+
Flags: needinfo?(s.kaspari)
I rebased the build/{puppet,tools} patches, and re-tested the whole chain at: https://tools.taskcluster.net/groups/Jhfgd0O-RleTMOkoRNW5XA. Please ignore the duplicated tasks, I inadvertently reran the decision task. I think this is ready to land.
Attachment #8976212 - Flags: review?(aki) → review+
Comment on attachment 8970188 [details]
[build/tools] Signing servers: Support focus-jar

Landed on default at: https://hg.mozilla.org/build/tools/rev/366c73410d4e68f4c752b87a2daf31f0804abe31

New SIGNING_SERVER tag at https://hg.mozilla.org/build/tools/rev/99d439f85098145a0da5f87d3e7edbf8997c2d59
Attachment #8970188 - Flags: checked-in+
Attachment #8970188 - Attachment description: Bug 1409091 - Signing servers: Support focus-jar → [build/tools] Signing servers: Support focus-jar
Keywords: leave-open
Pushed by jlorenzo@mozilla.com:
https://hg.mozilla.org/build/puppet/rev/1319f9406ee2
Add Mozilla mobile signing and push apk instances r=aki
Comment on attachment 8970167 [details]
[build/puppet] Add Mozilla mobile signing and push apk instances

Was r+'d by Aki on review board.
Landed on production at https://hg.mozilla.org/build/puppet/rev/69750b069145fd7d8fd2c06036f1decccf3cefaf
Attachment #8970167 - Attachment description: Bug 1409091 - Add Mozilla mobile signing and push apk instances → [build/puppet] Add Mozilla mobile signing and push apk instances
Attachment #8970167 - Flags: review+
Attachment #8970167 - Flags: checked-in+
Attachment #8970896 - Attachment description: Bug 1409091 - Allow mobil-signing-linux-1.srv.releng.use1.m.c to reach signing-linux servers → [build/puppet] Allow mobil-signing-linux-1.srv.releng.use1.m.c to reach signing-linux servers
This is now all deployed to production. There is no server pinned to my environment anymore. I retested signing and pushapk as part of [1]. The pushapk task failed at the second run because I tried to upload the same APK twice. This was an expected error. Tomorrow's nightly won't show it.

We can now close this bug. Please note comment 4, 5, and 6 which show how to improve the situation.

[1] https://tools.taskcluster.net/groups/Bf8nl1UIToKTM3IsiE6UuQ
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
See Also: → 1459181
Blocks: 1462534
See Also: → 1480085
Blocks: 1484950
You need to log in before you can comment on or make changes to this bug.