Closed Bug 1317747 Opened 3 years ago Closed 3 years ago

enable chain of trust verification in beetmoverworker

Categories

(Release Engineering :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aki, Assigned: jlund)

References

Details

Attachments

(3 files, 1 obsolete file)

Looks like beetmover scriptworker is ready to be tier-1 enabled.
We'll have to make these changes:

a. scopes

For signing, we have 3 levels of permissions, guarded by scopes.
project:releng:signing:cert:release-signing [1], which we only allow on release-capable branches, project:releng:signing:cert:nightly-signing [2], which we only allow on nightly-capable branches, and project:releng:signing:cert:dep-signing, which can be used anywhere.

Scriptworker uses the above-linked data structures to verify that a privileged scope is only used on an appropriate branch.  Signingscript determines which level of access to grant based on those scopes [3].

We need to follow this model for the other *scripts.  For beetmover, I think this would be bucket credentials.  Release bucket creds, nightly bucket creds, and staging bucket creds.  That would allow chain of trust verification to make sure we're not pushing to a privileged bucket from a non-privileged branch.

If we can't separate creds at first, let's file a followup bug to do so, and verify our upload bucket location matches release, nightly, or staging based on these scopes.

[1] https://github.com/mozilla-releng/scriptworker/blob/121c474f5b21084a4a3742f21c3f30c018e5c766/scriptworker/constants.py#L219
[2] https://github.com/mozilla-releng/scriptworker/blob/121c474f5b21084a4a3742f21c3f30c018e5c766/scriptworker/constants.py#L232
[3] https://github.com/mozilla-releng/signingscript/blob/master/signingscript/task.py#L19

b. downloads / upstreamArtifacts

For signing, we used to have task.payload.unsignedArtifacts, which was a list of URLs.  Now we have task.payload.upstreamArtifacts [4], which is a list of dictionaries that look like

    "upstreamArtifacts": [
        {
            "paths": [

              "public/build/target.tar.bz2",

              "public/build/target.checksums"

    ],

    "formats": [

    "gpg"

    ],

    "taskId": "GFPKeLbAQN2fytGOXgatIg",

    "taskType": "build"

    },

    { ... }

    ]

The taskId is the taskId of the task we're downloading from.  The paths are the artifact paths we're downloading.  I don't know if you need to use the "formats" key or need to embed any additional information; we can play with this schema.  "taskType" is there for chain of trust verification.  Currently we only support "build", "l10n", "decision", "docker-image", but we can add more.

Scriptworker will pre-download these artifacts into $artifact_dir/public/cot/$task_id/$path , and verify their SHAs before calling the script.  Beetmoverscript no longer needs to download these artifacts; it can and should use the pre-downloaded artifacts on disk.

I don't know how many artifacts we download, and how large this is going to get.  If we don't want to upload all of these upstreamArtifacts at the end of the beetmover task, we can move them to $work_dir or otherwise remove from $artifact_dir before the end of the task, or change where scriptworker downloads them to.

[4] https://queue.taskcluster.net/v1/task/M81unWcDQje2XEwhtmDXrw

c. scriptworker.cot.verify will need to support beetmover type workers

We'll also have to support any other new task types that we depend on.

d. upstream tasks will need to point at the right deps and have chain of trust generation enabled.

To enable chain of trust generation in a non-scriptworker task, set task.payload.features.ChainOfTrust to true.
When there are additional tasks we need to set as chain of trust dependencies in non-scriptworker tasks, we add them to task.extra.ChainOfTrust.inputs, which looks like

    "inputs": {

    "docker-image": "taskId",

    ...

    }


For upstream scriptworker tasks, we have sign_chain_of_trust [5] and upstreamArtifacts.  We can also follow the same task.extra.chainOfTrust.inputs model if that's easiest.

This may prevent us from fully enabling chain of trust verification on beetmover if we depend directly on other non-signing scriptworker tasks that don't yet have chain of trust enabled, but we have some prefs [6] we can use until it's all enabled end-to-end.

[5] https://github.com/mozilla-releng/scriptworker/blob/121c474f5b21084a4a3742f21c3f30c018e5c766/scriptworker/constants.py#L58
[6] https://github.com/mozilla-releng/scriptworker/blob/121c474f5b21084a4a3742f21c3f30c018e5c766/scriptworker/constants.py#L57-L60

e. puppet

With bug 1316702, we now have a shared scriptworker puppet module.  Let's use that.

* There are updated dependencies, all pushed to the python3.5 pypi location.
* We now use a scriptworker.yaml which is much larger than our previous config.json.  This is populated in the scriptworker module, using variables you pass [7].  I still have the supervisord settings in the signing scriptworker area, because the watch file list can be different per instance type.
* gpg keys - we'll need to create new gpg keys per scriptworker instance, and make sure they're signed by an appropriate key.  The trusted keys are in scriptworker/trusted and the worker keys go into scriptworker/valid in the cot-gpg-keys repo [8].

[7] https://hg.mozilla.org/build/puppet/file/tip/modules/signing_scriptworker/manifests/init.pp#l54
[8] https://github.com/mozilla-releng/cot-gpg-keys
Blocks: 1317789
Summary: enable chain of trust verification in beetmover → enable chain of trust verification in beetmoverworker
I imagine we're going to hit similar issues, and you may have more context around balrog scriptworker, so let's work closely on these bugs.  And thank you!
Assignee: nobody → jlund
I have a wip date patch in 1317800 that partially addresses beetmover, and started addressing the jsonschema in https://github.com/escapewindow/beetmoverscript/commits/cot ... I'm hoping those are helpful; if not, we don't have to use them.
https://github.com/escapewindow/scriptworker/commit/914e5c7b3e8604fd3cf7aacfd9649f4e7638f803 should check the restricted scopes against the tree in scriptworker.  Once beetmoverscript determines which bucket/creds to use based on scopes (and uses the latest scriptworker with that patch), that will complete the scopes circuit.
* fince I'm already testing the balrogworker patch, I thought it'd be a good idea to tweak the beetmoverworker side with all the puppet knowledge still fresh. 
* first iteration on beetmoverworker puppet refactoring usinng the shared scriptworker module. Didn't test it yet, will likely need few more tweakings before being production-ready.
* won't add reviwer yet, will follow-up with more tweakings later
See Also: → 1328873
Dropping here for later use the PR used in puppet to pin the loaner environment and prepare beetmoverworker CoT-enabled for the production switch. Won't add any feedback or review as it's for testing purposes only and we're going to re-tweak this diff again before going to production, to get rid of all the staging-environment dependendt variables.
Attachment #8823182 - Attachment is obsolete: true
* Used a hello-world dummy task https://tools.taskcluster.net/task-inspector/#RHF9KDIGRaCZzuh73RsIpg/0 to make sure we're getting to the task script running section. Am ready to push a new version of beetmoverscript (most likely 0.1.0 as jlund pointed in his last PR) and adapt accordingly the task to see if new CoT changes work as expected

* created gpg keys for existing beetmoverworker-1 - PR accordingly https://github.com/mozilla-releng/cot-gpg-keys/pull/12

* added the corresponding gpg keys in hiera
Attachment #8825057 - Flags: review?(aki)
Comment on attachment 8825057 [details] [review]
Add beetmoverworker-1 key in cot-gpg-keys.

Merged.
Attachment #8825057 - Flags: review?(aki) → review+
Prepping build-cloud-tools for ramping up new instances for both {beetmover,balrog}workers.
Attachment #8825097 - Flags: review?(rail)
Attachment #8825097 - Flags: review?(rail) → review+
Brought up another instance to prepare for the cut-over: beetmoverworker-2.srv.releng.usw2.mozilla.com
Depends on: 1330476
https://tools.taskcluster.net/task-inspector/#C44A3KawSoyXFt_xP2mnlQ/0 -

2017-01-13 00:30:16,502 - beetmoverscript.utils - INFO - {'mapping': {'en-US': {'target.complete.mar': {'s3_key': 'firefox-53.0a1.en-US.linux-i686.complete.mar',

                                               'update_balrog_manifest': True},

                       'target.tar.bz2': {'s3_key': 'firefox-53.0a1.en-US.linux-i686.tar.bz2'}}},

 'metadata': {'description': 'Maps Firefox Nightly artifacts to pretty names '

                             'for en-US',

              'name': 'Beet Mover Manifest',

              'owner': 'release@mozilla.com'},

 's3_prefix_dated': 'pub/firefox/nightly/2017/01/2017-01-12-22-01-43-date/',

 's3_prefix_latest': 'pub/firefox/nightly/latest-date/'}

Traceback (most recent call last):

  File "/builds/beetmoverworker/bin/beetmoverscript", line 9, in <module>

    load_entry_point('beetmoverscript==0.1.2', 'console_scripts', 'beetmoverscript')()

  File "/builds/beetmoverworker/lib/python3.5/site-packages/beetmoverscript/script.py", line 217, in main

    loop.run_until_complete(async_main(context))

  File "/tools/python35/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete

    return future.result()

  File "/tools/python35/lib/python3.5/asyncio/futures.py", line 274, in result

    raise self._exception

  File "/tools/python35/lib/python3.5/asyncio/tasks.py", line 239, in _step

    result = coro.send(None)

  File "/builds/beetmoverworker/lib/python3.5/site-packages/beetmoverscript/script.py", line 52, in async_main

    await move_beets(context, context.artifacts_to_beetmove, mapping_manifest)

  File "/builds/beetmoverworker/lib/python3.5/site-packages/beetmoverscript/script.py", line 65, in move_beets

    manifest['mapping'][locale][artifact]['s3_key'])

KeyError: 'target.checksums.asc'
https://tools.taskcluster.net/task-inspector/#XpvkNZbqTSeylnuxVYJAOg/0
scriptworker.exceptions.CoTError: 'path public/build/sv-SE/update/target.complete.mar not in beetmover:signing DsvGCksVRWupNNEDy8izXw chain of trust artifacts!'

It looks like we have an errant update/ (should be public/build/sv-SE/target.complete.mar )
https://hg.mozilla.org/projects/date/rev/3a13e245a272e5e58ff13c3ef4e3f5769b5c29a6
bug 1317747 - remove target.checksums{,.asc} references in beetmover. r=bustage
https://public-artifacts.taskcluster.net/IFJ90KFDRZSzm_UQqjr4uQ/0/public/logs/task_error.log
and https://public-artifacts.taskcluster.net/VzI_ye6ASvSmL3bQNGG9cg/0/public/logs/task_error.log
  File "/builds/beetmoverworker/lib/python3.5/site-packages/beetmoverscript/script.py", line 65, in move_beets
    manifest['mapping'][locale][artifact]['s3_key'])
KeyError: 'target.tar.bz2.asc'


https://public-artifacts.taskcluster.net/J6oqzvchT9yAcOy5OqsX9g/0/public/logs/task_error.log
uploaded 34 artifacts to s3 in parallel; we got
2017-01-13 04:41:51,867 - beetmoverscript.script - INFO - 400
2017-01-13 04:41:51,912 - beetmoverscript.script - INFO - <?xml version="1.0" encoding="UTF-8"?><Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>662ECD237729A92F</RequestId><HostId>kX6DO

I'm thinking we should limit the parallel uploads to... 20? 10?
I'm doing this in scriptworker here: https://github.com/mozilla-releng/scriptworker/blob/master/scriptworker/worker.py#L104
We could do that here: https://github.com/mozilla-releng/beetmoverscript/blob/master/beetmoverscript/script.py#L213


Hoping those are the last 2 errors, but we'll see.
Green BM and BM-S!

I still see a number of retries for the upload at 20. We can try 15 or 10 to see if that improves things; I'd aim for 100% successful as the general case, and use retries for actual errors, rather than always relying on retries and not having many left if there's a real hiccup.
No longer depends on: 1314596
Done on date.  Please comment+reopen if that's not the case.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Blocks: 1326419
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.