Closed Bug 1231320 Opened 8 years ago Closed 8 years ago

Taskcluster builds don't contain necessary API keys

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: dustin)

References

Details

Attachments

(3 files, 1 obsolete file)

This appears when comparing buildbot builds and taskcluster builds on m-i. nsURLFormatter.js differs on the value it receives for MOZILLA_API_KEY and GOOGLE_API_KEY, where taskcluster builds get the default "no-<something>-api-key" instead of an actual key. IIRC try builds aren't supposed to get the API keys, but m-i builds are (and, obviously, nightlies and releases are too). There are more API keys, too, but I'm not sure which builds receive which. As a matter of fact, that m-i builds don't get them all seems like a (separate) bug to me.

Anyways, the way they are given to builds is through .key files in /builds on the buildbot slaves. The path for these files is hardcoded in the various mozconfigs, and essentially are:

ac_add_options --with-google-api-keyfile=/builds/gapi.data
ac_add_options --with-google-oauth-api-keyfile=/builds/google-oauth-api.key
ac_add_options --with-mozilla-api-keyfile=/builds/mozilla-desktop-geoloc-api.key
Note this only affects opt builds ; debug builds don't use those.
It would be great to add some checks for these missing keys that we enable in automation.
Assignee: nobody → dustin
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> It would be great to add some checks for these missing keys that we enable
> in automation.

Yeah, we should make configure hard fail when passed non-existing files for --with-*-api-keyfile. That would however turn TC builds perma-orange until this bug is fixed.
Yeah, I think we can fix this with the secrets API.
16:30:47     INFO -  cat: /builds/mozilla-desktop-geoloc-api.key: No such file or directory
16:30:47     INFO -  cat: /builds/gapi.data: No such file or directory
16:30:47     INFO -  cat: /builds/google-oauth-api.key: No such file or directory
16:30:47     INFO -  cat: /builds/google-oauth-api.key: No such file or directory

I think if we just make those fatal (along with the bing key) that will at least prevent this occurring mysteriously!
OK, I have a patch that fails out with

19:19:00     INFO -  configure: error: --with-mozilla-api-keyfile file /builds/mozilla-desktop-geoloc-api.key not found
19:19:00     INFO -  *** Fix above errors and then restart with\
19:19:00     INFO -                 "/usr/bin/gmake -f client.mk build"

now to fix it :)
Depends on: 1230165
"there's a whole rabbit city down here!"
Depends on: 1234929
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

https://reviewboard.mozilla.org/r/28995/#review25875
Attachment #8701603 - Flags: review?(mh+mozilla) → review+
Still waiting on being able to get secrets from a task (the rabbit city in comment 7).  But thankfully not blocking making these builds tier-2.
I'd like to find a fairly generic way to accomplish the download of these secrets.  I don't see a smooth way to make this easy for developers trying to run the tasks at home, since http://taskcluster/secrets won't be available.  Ideally that could just be skipped for non-TC builds in the docker image, but then the patch above will cause those builds to fail.  I suspect that for the moment I shouldn't worry about it, as we need to come up with a more general solution to supporting docker runs at home.
I'm going to operate on the assumption that devs generally won't have --with-mozilla-api-keyfile in their mozconfigs, so attachment 8701603 [details] won't be a major PITA.  For the less common case of builds at home in the docker image, we'll be using nightly mozconfigs by default so the files will need to exist.

In that case, I think I can write a bit of scripty magic that tries to resolve `taskcluster` to determine whether it's running in a task container (Greg: or would it be better to look for TASK_ID?).  If so, it will get the secrets and write out the files.  If not, it will put up a nice error message instructing the user to put those secrets into env vars, which it will write out.

Because the secrets files are in /builds, they aren't in a docker-worker cache, and anyway often people running the docker images at home don't set up the same caches, so I think the env var input is the easiest.

Objections to this plan?
Flags: needinfo?(garndt)
We currently inject task_id and run_id so I think checking for the presence of those would be ok.  I don't think our run locally scripts include those env variables and I doubt devs would inject them locally so it might be safe to assume that if those exist, it's running within taskcluster.
Flags: needinfo?(garndt)
Random thought: Why not setup a public equivalent of the service giving out secrets, that would give out valid but useless data?
I think that might cause more confusion than help.
I added `secrets:get:project/releng/gecko/build/api-keys` to the `moz-tree:level:1` role, but forgot to add it to the task definitions, so that try job will fail.  But that's OK -- it will be good to test the failure modes.
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28995/diff/1-2/
Attachment #8701603 - Attachment description: MozReview Request: Bug 1231320: fail configure when api keyfiles do not exist; r?glandium → MozReview Request: Bug 1231320: fail configure when api keyfiles do not exist; r=glandium
The last try job was successful in downloading the API files, and beginning to build, but I've used fake keys.  Hopefully it has the key - how can I verify that?

Remaining to do here:
 * set this up so that it *doesn't* run on try pushes
 * move the secret permission to moz-tree:level:2
 * install actual secrets
Comment on attachment 8718904 [details]
MozReview Request: Bug 1231320: ensure exit status is correct when cleaning up; r?armenzg

https://reviewboard.mozilla.org/r/34793/#review31417

::: testing/taskcluster/scripts/builder/build-linux.sh:52
(Diff revision 1)
> +    exit $rv

Good catch! I just dealt with something like this.
Attachment #8718904 - Flags: review?(armenzg) → review+
Attachment #8718903 - Flags: review?(mh+mozilla)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review31577

I'm not the right person to review this, but I also don't know who is.
Attachment #8718903 - Attachment description: MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r?glandium → MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r?pmoore
Attachment #8718903 - Flags: review?(pmoore)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/1-2/
Comment on attachment 8718904 [details]
MozReview Request: Bug 1231320: ensure exit status is correct when cleaning up; r?armenzg

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34793/diff/1-2/
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review31715

This seems sane to me. I like it that the secrets don't get burned into the worker type, that the secrets are protected behind scopes, and that it can even survive running locally versus running in automation. So I say it is good to go.

Some thoughts:

1) Should we add support in docker engine of taskcluster-worker / docker-worker for declaring secret files as part of the payload body - i.e. `"secrets": {"secret":<secret_name>,"path":<file_system_path>,"encoding":"base64"/"plain", "compression":"none"/"zip"/"tar.gz"}`
2) Is checking for `TASK_ID` env var the preferred way of detecting if we are running inside TaskCluster? Should we introduce some absolutely-safe way (e.g. might involve worker update) to guarantee this (although `TASK_ID` seems like a reasonably safe bet)?
3) Should we add docs somewhere that explain what the environment variables are that are needed, and what devs should set them to, if they don't have access to the originals?
4) Should we also support downloading from TOOLTOOL, e.g. for devs running locally, outside of taskcluster?

We can spin off separate bugs about these things if you think any of them have merit - just wanted to capture them here.
Attachment #8718903 - Flags: review?(pmoore) → review+
Spoke to :jonasfj and :garndt quickly about this - we like the idea of adding a plugin at some point for the new worker to support declarative downloading of secrets to files - we'll create a bug for that, and implement it at some point in the future when taskcluster-worker is up on its feet.
(In reply to Pete Moore [:pmoore][:pete] from comment #34)
> 1) Should we add support in docker engine of taskcluster-worker /
> docker-worker for declaring secret files as part of the payload body - i.e.
> `"secrets":
> {"secret":<secret_name>,"path":<file_system_path>,"encoding":"base64"/
> "plain", "compression":"none"/"zip"/"tar.gz"}`

Yep, I think this is already planned.

> 2) Is checking for `TASK_ID` env var the preferred way of detecting if we
> are running inside TaskCluster? Should we introduce some absolutely-safe way
> (e.g. might involve worker update) to guarantee this (although `TASK_ID`
> seems like a reasonably safe bet)?

Per Greg, this is probably the best technique

> 3) Should we add docs somewhere that explain what the environment variables
> are that are needed, and what devs should set them to, if they don't have
> access to the originals?

I think the error message is sufficient for that.

> 4) Should we also support downloading from TOOLTOOL, e.g. for devs running
> locally, outside of taskcluster?

No, absolutely not - secrets don't go on tooltool.
Reminding myself (since I had 'hg push' already typed out before I remembered, and now I'm going into another dark period of meetings):

Remaining to do here:
 * set this up so that it *doesn't* run on try pushes
 * move the secret permission to moz-tree:level:2
 * install actual secrets
Hm, I'm not sure how to conditionalize this on try/non-try at the task level.  I think we'll need two things: (1) don't download the secrets in try (because we won't have the scopes to do so) and (2) don't pass --with-google-api-keyfile et al. on try.  I think the latter would involve a mozconfig change, which is tricky since try uses the same mozconfigs (nightly) as production.

Open to suggestions!
It looks like what we do in Buildbot is to include the options:

07:16:25     INFO -  Adding configure options from /builds/slave/try-lx-00000000000000000000000/build/src/.mozconfig
07:16:25     INFO -    --enable-update-channel=
07:16:25     INFO -    --enable-update-packaging
07:16:25     INFO -    --with-google-api-keyfile=/builds/gapi.data
07:16:25     INFO -    --with-google-oauth-api-keyfile=/builds/google-oauth-api.key
07:16:25     INFO -    --with-mozilla-api-keyfile=/builds/mozilla-desktop-geoloc-api.key

and, in my investigation, those contain production keys on try machines.  So, that's a neat factoid for you.
Looking at https://github.com/mozilla/build-puppet/blob/master/modules/slave_secrets/manifests/init.pp that seems pretty intentional -- the secrets are installed regardless of the $slave_trustlevel.

So I'm going to replicate that functionality for now.  If we want to try to do better someday, we can.  /ni catlee for a measure of the urgency.
Flags: needinfo?(catlee)
Production secrets installed in the secrets API (woo, first real secret!)
That failed on *Buildbot* with

09:08:21     INFO -  configure: error: --with-google-oauth-api-keyfile file /builds/google-oauth-api.key not found

not because it's not on the machine, but because it's cleverly not mounted into the mock environment.  Yet I see it listed in mock_copyin_files for every linux builder in https://dxr.mozilla.org/build-central/source/buildbot-configs/mozilla/config.py

So I need a little help here -- how is this working on Buildbot, and given the desire to land "Bug 1231320: fail configure when api keyfiles do not exist; r=glandium", what's the best way to fix things in Buildbot?
Flags: needinfo?(bugspam.Callek)
(In reply to Dustin J. Mitchell [:dustin] from comment #43)
> That failed on *Buildbot* with
> 
> 09:08:21     INFO -  configure: error: --with-google-oauth-api-keyfile file
> /builds/google-oauth-api.key not found
> 
> not because it's not on the machine, but because it's cleverly not mounted
> into the mock environment.  Yet I see it listed in mock_copyin_files for
> every linux builder in
> https://dxr.mozilla.org/build-central/source/buildbot-configs/mozilla/config.
> py
> 
> So I need a little help here -- how is this working on Buildbot, and given
> the desire to land "Bug 1231320: fail configure when api keyfiles do not
> exist; r=glandium", what's the best way to fix things in Buildbot?

This job uses mozharness for mock setup/teardown (the stuff you see in buildbot-configs is mainly for the buildbot builders, currently only used on beta/release)

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/builds/releng_base_linux_64_builds.py#36 is what you're looking for here.

Which makes me think its not properly working in buildbot atm.
Flags: needinfo?(bugspam.Callek)
I'm not going to concern myself with Buildbot or Mock - that way lies madness, I think.

I think the key here is that the mozconfigs need to know whether the automation run is for try or not, and only include the `ac_add_options` commands in comment 0 when *not* running in try.  :glandium, is that the right approach?  If so, what is the most appropriate indication of a try run?
Flags: needinfo?(catlee) → needinfo?(mh+mozilla)
Ah, mike helped me out:

 <mshal> dustin: we do that in mozconfig.cache by reading buildprops.json: https://dxr.mozilla.org/mozilla-central/source/build/mozconfig.cache?from=mozconfig.cache
Flags: needinfo?(mh+mozilla)
It looks like opt builds use all three keys, while debug builds only use google-oauth-api.key.  None of these keys are available for level-1 repositories (well, at least mozharness doesn't mount them into the mock environment -- that could easily be changed in a try push, obviously).  I can imagine we'll have further specializations of what secrets are available in what sorts of builds, with corresponding `ac_add_options` modifications, in the future.

There are three places that need to be configured:

 * secrets / scopes
 * the in-tree script to fetch from secrets
 * mozconfigs

I want to separate those out a little: the first two are security-related, and should go together; that is, the Python script should flexibly dump the available secrets into the docker container without erroring out.  The mozconfigs should correctly reflect exactly the cases where the API keys are to be used (they are incorrect right now).
So the new plan is:

 * build-linux.sh will take input from environment variables to get secret values and stick them into files; e.g.,

  GET_SECRET_GOOGLE_OAUTH_API_KEY="/builds/google-oauth-api.key project/releng/gecko/build/level-{{level}}/{{project}}/google-oauth-api-key"
 
with the secrets fetch skipped if GOOGLE_OAUTH_API_KEY is set, and an error if not running in a task.  This places the task inputs squarely where they should be, in the task definition.

 * I'll add MOZ_TREE and MOZ_TREE_LEVEL variables to taskcluster jobs and to Buildbot configs, then use those when MOZ_AUTOMATION is set to decide whether to add the appropriate --with-xxx-key options.
Slight correction: no change to BB (it's too hard), and I'll just set MOZ_TREE_LEVEL.
Chatting with mshal, I don't think that it's the right approach to be conditionalizing on the tree level in mozconfigs.  Instead, I'm going to supply fake secrets ("no-foo-api-key") at level 1 and real secrets at levels 2 and 3.  That will be way easier!
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28995/diff/2-3/
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/2-3/
Attachment #8718903 - Attachment description: MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r?pmoore → MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r=pmoore
Attachment #8718904 - Attachment is obsolete: true
https://reviewboard.mozilla.org/r/28993/#review35685

As-is, this patch only affects TaskCluster, so there's no risk to Buildbot.

Here's what it looks like in TC:
  https://tools.taskcluster.net/task-inspector/#TGaohxNvSUeXdLz02qJBFw/
  
The secrets are set up with fake secrets for level 1, and real secrets for the remaining levels:

```
{
  "google-api": "no-google-api-key",
  "mozilla-api": "no-mozilla-api-key",
  "bing-api": "no-bing-api-clientid no-bing-api-key",
  "google-oauth": "no-google-oauth-api-clientid no-google-oauth-api-key",
  "adjust-sdk": "no-adjust-sdk-app-token"
}
```
Attachment #8718903 - Flags: review?(mshal)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

I thought about this more overnight, and I want to make some changes.  It turns out that we have multiple filenames for some of these secrets, selected by different mozconfigs.  So:

* refactor to use individual secrets with a "secret" property, rather than one JSON blob

* name secrets after the /builds/<..> filename and install all of them (for now; we can limit that to particular tasks later)
Attachment #8718903 - Flags: review?(mshal)
Attachment #8718903 - Flags: review+
OK, new draft is ready, and run through Try already.  I also logged in with a one-click loaner and verifies that the secrets files contain the string "no-secrets-for-try" since this is a try job.  That's the value in the secrets API for "level-1", so I'm confident that on m-c it will get production secrets.
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/3-4/
Attachment #8718903 - Attachment description: MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r=pmoore → MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r?mshal r?garndt
Attachment #8718903 - Flags: review?(pmoore)
Attachment #8718903 - Flags: review?(mshal)
Attachment #8718903 - Flags: review?(garndt)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review35889

::: testing/taskcluster/scripts/builder/build-linux.sh:82
(Diff revision 4)
> +python2.7 $GECKO_DIR/testing/taskcluster/scripts/builder/get-secrets.py $var

Is there any value to write errors from calling this prefixed with "[taskcluster:error]" or "[task:error]" so it appears in treeherder?

::: testing/taskcluster/scripts/builder/build-linux.sh:83
(Diff revision 4)
> +

Doesn't appear that $var is set to anything and get-secrets.py doesn't do anything with it.

::: testing/taskcluster/scripts/builder/get-secrets.py:33
(Diff revision 4)
> +    for prop in properties:

I might be reading this wrong, but after the JSON response is parsed, we iterate over all the properties that were passed in...and the last property is what 'rv' is set to?

::: testing/taskcluster/scripts/builder/get-secrets.py:34
(Diff revision 4)
> +        rv = rv[prop]

If the key doesn't exit, this will throw an exception right? Would we want to check if it exists and give a better error message than an exception being thrown?

::: testing/taskcluster/scripts/builder/get-secrets.py:54
(Diff revision 4)
> +            "${} must have the format 'FILENAME SECRETNAME [PROEPRTY ..]'".format(var))

PROPERTY
Attachment #8718903 - Flags: review?(garndt)
> I might be reading this wrong, but after the JSON response is parsed, we iterate over all the properties that were passed in...and the last property is what 'rv' is set to?
..
> If the key doesn't exit, this will throw an exception right? Would we want to check if it exists and give a better error message than an exception being thrown?

Yes, it's a really dumbed-down version of JSONPath, since the secrets API only stores JSON blobs.  I could just as well make `content` the default, so that every secret fed to get-secrets.py has to have the form `{"content": ".."}`.
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/4-5/
Attachment #8718903 - Flags: review?(garndt)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review35907
Attachment #8718903 - Flags: review?(garndt) → review+
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review35947

I didn't perform a detailed review of get-secrets.py, since it may change depending on the outcome of the issue.

::: testing/taskcluster/tasks/builds/firefox_base.yml:22
(Diff revision 5)
> +      GET_SECRET_GOOGLE_API: "/builds/gapi.data project/releng/gecko/build/level-{{level}}/gapi.data"

The GET_SECRET variables and looping through the environment to find them seems like it would be better served as a data structure in a configuration file. We already have an almost identical list in mozharness configs for mock to copy files: https://dxr.mozilla.org/mozilla-central/rev/af7c0cb0798f5425d5d344cbaf0ac0ecb1a72a86/testing/mozharness/configs/builds/releng_base_linux_32_builds.py#40

Is it possible to enhance the mozharness configs to include the filename/secretname pairs as a list and add get-secrets.py as an action in there? Then we don't have to force structured data into the environment.
Attachment #8718903 - Flags: review?(mshal)
Those particular mozharness configs run *outside* the mock environment in Buildbot, so in a separate run of mozharness from the in-mock run.  So, I think that would be a difficult proposition.  My plan was just to delete the support for mock in mozharness once we're not running this within Buildbot anymore, so the duplication will go away.

I've deliberately put this at the outermost layer of abstraction for a few reasons:

 1. The secrets constitute an input to the build process, so it makes sense to have them exposed at that level
 2. This is actually a replication of the slave_secrets functionality in PuppetAgain, rather than mozharness
 3. Down the road, the worker may grow the ability to fetch secrets automatically based on the task definition, in which case these would move from environment variables to some other portion of the payload.
 4. With this structure, someone trying to run the docker container locally can simply omit the `-e GET_SECRET_..=..` to avoid failing to fetch the secrets; if the data is in an in-tree file, no such modification is possible.

I could see putting the list of secrets and filenames in a JSON file in-tree, and referring to that in an environment variable, but that feels like an unnecessary extra level of indirection, and then we're inventing filenames for files that contain a lot of replicated information (secrets_with_adjust_beta_key_and_fennec_mozilla_key.json, secrets_with_adjust_key_and_desktop_mozilla_key.json, etc.) both of which probably contain the same google keys.

What do you think?
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review35945

This looks pretty good, as a solution until the worker handles this natively.
Attachment #8718903 - Flags: review?(pmoore) → review+
Mike: your r? flag was removed, but I don't want to ship without your approval per comment 71.
Flags: needinfo?(mshal)
(In reply to Dustin J. Mitchell [:dustin] from comment #71)
> Those particular mozharness configs run *outside* the mock environment in
> Buildbot, so in a separate run of mozharness from the in-mock run.  So, I
> think that would be a difficult proposition.  My plan was just to delete the
> support for mock in mozharness once we're not running this within Buildbot
> anymore, so the duplication will go away.

What are the separate mozharness runs that you are referring to? As I understand it, there is only a single mozharness invocation at the end of build-linux.sh. Inside mozharness, there are actions that run both before and after mock are setup.

I'm glad to see the use of mock inside mozharness will eventually go away, but that doesn't address my concern that the data being placed inside environment variables, coupled with the loop over the environment, would be better served by a data structure defined in the python configuration files.

> I've deliberately put this at the outermost layer of abstraction for a few
> reasons:
> 
>  1. The secrets constitute an input to the build process, so it makes sense
> to have them exposed at that level

There are many things that constitute inputs to the build process, and we define many of them in the mozharness configs. I think the secrets variables could be defined as a data structure in those configs, and then have a "get_secrets" action wherever it makes sense in fx_desktop_build.py (likely one of the first steps).

By making build-linux.sh do more things than just the bare minimum to get Taskcluster to run mozharness, we're effectively blurring the lines of responsibility between mozharness and the task definitions.

>  3. Down the road, the worker may grow the ability to fetch secrets
> automatically based on the task definition, in which case these would move
> from environment variables to some other portion of the payload.

Can you elaborate on what this would entail?

>  4. With this structure, someone trying to run the docker container locally
> can simply omit the `-e GET_SECRET_..=..` to avoid failing to fetch the
> secrets; if the data is in an in-tree file, no such modification is possible.

What exactly would they be omitting? I don't see a '-e GET_SECRET_..' in the patch under review. Would they just be commenting out these lines?

   GET_SECRET_GOOGLE_API: "/builds/gapi.data project/releng/gecko/build/level-{{level}}/gapi.data"

If they need to modify something regardless, they could also simply add a --no-get-secrets flag to the mozharness invocation. Ideally of course a user could run the same thing locally as is running on try, without having to change anything - they would just have different (no-op) secrets. Is such a thing possible?
Flags: needinfo?(mshal)
Per our irc conversation earlier, I'm going to give it a shot.  I'm not convinced this isn't a huge hairball, but probably trying is the best way to find out.  The idea is to create a mozharness get-secrets action that only runs in taskcluster (not on devs' desktops, and not in buildbot, and easily disabled when running in docker containers outside of taskcluster).  That will take from the mozharness config a list of secrets and their sources.
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

r- from mshal
Attachment #8718903 - Flags: review-
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28995/diff/3-4/
Attachment #8718903 - Attachment description: MozReview Request: Bug 1231320: download API keys from secrets API or environment variables; r?mshal r?garndt → MozReview Request: Bug 1231320: pull from secrets API in TaskCluster
Attachment #8718903 - Flags: review- → review?(mshal)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/5-6/
Attachment #8718903 - Flags: review+
If run in a docker container outside of taskcluster, this will currently fail, and it's a little difficult to fix that -- you'd have to make an additional commit removing "--get-secrets" from build-linux.sh, and then update GECKO_HEAD_REV to point to that revision.  Is that worth fixing?

I could make the secrets handling a little more complex, too, with additional parameters in the dictionary:

 'min_level': 2,  # the minimum SCM level at which this secret should be fetched (so, not for try)
 'default': 'no-google-api-key',  # default value to write when below min_level (otherwise nothing is written)

try jobs have no secrets, so it would be `'min_level': 2` all around, which means try jobs would run outside of taskcluster with no modification.

I'll leave this patch for your review for the moment, but I welcome your thoughts on those improvements.
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review36583

All nits and/or followups. Feel free to land and address any issues later if they're relevant.

::: testing/mozharness/mozharness/mozilla/building/buildbase.py:552
(Diff revision 6)
>                  )}],
> +    [['--scm-level'], {
> +        "action": "store",
> +        "type": "int",
> +        "dest": "scm_level",
> +        "default": 1,

Similar question about default scm-level here.

::: testing/mozharness/mozharness/mozilla/secrets.py:36
(Diff revision 6)
> +        secret should be written.
> +        """
> +        secret_files = self.config.get('secret_files', [])
> +
> +        subst = {
> +            'scm-level': self.config.get('scm-level', 1),

As in the other two places, I'm wondering if this default should be 0.

::: testing/taskcluster/scripts/builder/build-linux.sh:23
(Diff revision 6)
>  : NEED_XVFB                     ${NEED_XVFB:=false}
>  
>  : MH_CUSTOM_BUILD_VARIANT_CFG   ${MH_CUSTOM_BUILD_VARIANT_CFG}
>  : MH_BRANCH                     ${MH_BRANCH:=mozilla-central}
>  : MH_BUILD_POOL                 ${MH_BUILD_POOL:=staging}
> +: MOZ_SCM_LEVEL                 ${MOZ_SCM_LEVEL:=1}

If a user runs this locally in docker, do they get the default level specified here? If so, maybe the default should be 0, since try, m-c, etc will override with the real levels, correct? Then get_secrets() could just skip getting secrets if level == 0.

::: testing/taskcluster/scripts/builder/build-linux.sh:128
(Diff revision 6)
> -  --no-upload-files \
> -  --no-sendchange \
>    --log-level=debug \
> +  --scm-level=$MOZ_SCM_LEVEL \
>    --work-dir=$WORKSPACE/build \
>    --no-action=generate-build-stats \

nit: I think this line is no longer necessary, similar to the other --no-* flags.

::: testing/taskcluster/tasks/builds/firefox_base.yml:18
(Diff revision 6)
>      image:
>        type: 'task-image'
>        path: 'public/image.tar'
>        taskId: '{{#task_id_for_image}}desktop-build{{/task_id_for_image}}'
> +    features:
> +      taskclusterProxy: true

What does taskclusterProxy do? I don't see anything else in the tree that matches it - was it intended to be part of another patch?
Attachment #8718903 - Flags: review?(mshal) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #87)
> If run in a docker container outside of taskcluster, this will currently
> fail, and it's a little difficult to fix that -- you'd have to make an
> additional commit removing "--get-secrets" from build-linux.sh, and then
> update GECKO_HEAD_REV to point to that revision.  Is that worth fixing?

Is this part still relevant if you go with the min_level / default key approach below?

> I could make the secrets handling a little more complex, too, with
> additional parameters in the dictionary:
> 
>  'min_level': 2,  # the minimum SCM level at which this secret should be
> fetched (so, not for try)
>  'default': 'no-google-api-key',  # default value to write when below
> min_level (otherwise nothing is written)
> 
> try jobs have no secrets, so it would be `'min_level': 2` all around, which
> means try jobs would run outside of taskcluster with no modification.

Ahh, I thought try still had some level of secrets, so you can ignore my comments about the default level being 0 in the review. I think the min_level + default parameters are a good solution, but it can be done in a followup.
The TaskCluster Proxy is what allows get_secrets to use 'http://taskcluster' without passing any credentials, and make TaskCluster API calls with the scopes defined by the task (which is why I also add the `secrets:get:..` scope; note that it's the {{level}} in that scope which prevents level-1 jobs from getting level-2 or level-3 secrets, based on the scopes afforded to level-1 trees by https://tools.taskcluster.net/auth/roles/#moz-tree:level:1)

The min_level/default thing would, I think, allow devs to run in a docker container outside of taskcluster.  I don't totally understand all of the permutations that devs try, though, so I'm not sure.

There is no level 0, and at least in this case there's no reason to invent one.  Anything in automation will at least be level 1.

You're right about --no-generate-build-stats (in fact, in bug 1251713 we want --generate-build-stats).

I'll fix the nits and add an additional commit with the min_level / default support.
Blocks: 1256730
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/6-7/
Attachment #8718903 - Flags: review?(pmoore)
Attachment #8718903 - Flags: review?(garndt)
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28995/diff/4-5/
Attachment #8701603 - Attachment description: MozReview Request: Bug 1231320: fail configure when api keyfiles do not exist; r=glandium → MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r?mshal
Attachment #8701603 - Flags: review?(mshal)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review36713

::: testing/mozharness/mozharness/mozilla/secrets.py:50
(Diff revision 7)
> +            # within a taskcluster task.  Outside of that environment, do not
> +            # use this action.
> +            url = "http://taskcluster/secrets/v1/secret/" + secret_name
> +            res = urllib2.urlopen(url)
> +            if res.getcode() != 200:
> +                self.fatal("Error fetching from secrets API:" + res.read())

I believe in previous revisions of this patch, [task:error] was added in places.  Do we not want to do that now?
Attachment #8718903 - Flags: review?(garndt) → review+
Mozharness's fatal() causes logging in a format that the log parser recognizes as an error, so that's not necessary.
Attachment #8701603 - Flags: review?(mshal) → review+
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

https://reviewboard.mozilla.org/r/28995/#review36715

::: testing/mozharness/mozharness/mozilla/secrets.py:25
(Diff revision 5)
> +        self.info("fetching secret {} from API".format(secret_name))
> +        # fetch from http://taskcluster, which points to the taskcluster proxy
> +        # within a taskcluster task.  Outside of that environment, do not
> +        # use this action.
> +        url = "http://taskcluster/secrets/v1/secret/" + secret_name
> +        res = urllib2.urlopen(url)

We might want to add retries here.

(In reply to Dustin J. Mitchell [:dustin] from comment #97)
> Mozharness's fatal() causes logging in a format that the log parser
> recognizes as an error, so that's not necessary.

Perfect, I didn't realize that.  Thanks!
(In reply to Michael Shal [:mshal] from comment #98)
> Comment on attachment 8701603 [details]
> MozReview Request: Bug 1231320: support min_scm_level and default secret
> specs; r?mshal
> 
> https://reviewboard.mozilla.org/r/28995/#review36715
> 
> ::: testing/mozharness/mozharness/mozilla/secrets.py:25
> (Diff revision 5)
> > +        self.info("fetching secret {} from API".format(secret_name))
> > +        # fetch from http://taskcluster, which points to the taskcluster proxy
> > +        # within a taskcluster task.  Outside of that environment, do not
> > +        # use this action.
> > +        url = "http://taskcluster/secrets/v1/secret/" + secret_name
> > +        res = urllib2.urlopen(url)
> 
> We might want to add retries here.

TaskCluster Proxy implements retries to the TaskCluster endpoint, so maybe we are already covered with retries. Hopefully we shouldn't drop connections to the proxy as it is running on the same host inside a linked docker container. The risk of doing retries here would be having retries of retries due to both layers performing them. That said, maybe we should disable retries in the proxy. I'm not sure what is sanest - as this risk of double layered retries is possible everywhere now where the proxy is used...
We should definitely keep the retries in taskcluster-proxy, as this kind of simple HTTP request in a shell/python script is common and should be very easy and reliable.
(In reply to Pete Moore [:pmoore][:pete] from comment #100)
> (In reply to Michael Shal [:mshal] from comment #98)
> > We might want to add retries here.
> 
> TaskCluster Proxy implements retries to the TaskCluster endpoint, so maybe
> we are already covered with retries. Hopefully we shouldn't drop connections
> to the proxy as it is running on the same host inside a linked docker
> container. The risk of doing retries here would be having retries of retries
> due to both layers performing them. That said, maybe we should disable
> retries in the proxy. I'm not sure what is sanest - as this risk of double
> layered retries is possible everywhere now where the proxy is used...

Ahh, I wasn't aware of that. Carry on!
Attachment #8718903 - Flags: review?(pmoore) → review+
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

https://reviewboard.mozilla.org/r/34791/#review36949

Otherwise looks good to me! Thanks!

::: testing/taskcluster/scripts/builder/build-linux.sh:118
(Diff revision 7)
>  done
>  
> -# Mozharness would ordinarily do the checkouts itself, but they are disabled
> -# here (--no-checkout-sources, --no-clone-tools) as the checkout is performed above.
> +# Mozharness would ordinarily do a whole mess of buildbot-specific steps, but those
> +# are overridden by this list of steps.  The get-secrets step is unique to TC tasks
> +# and not run in Buildbot
> +steps="--get-secrets --build --check-test"

Should this be conditional on whether running in TaskCluster automation, or locally (e.g. via "Run Locally" commands)? For example, by checking for the presence of `TASK_ID` env variable?

It feels a bit dirty to use a check like looking for a `TASK_ID` environment variable, but other solutions also have major challenges. For example, another solution might be taskcluster credentials and proxy settings are stored in e.g. ~/.taskcluster/credentials file (0400 file permissions) and instead of hitting the proxy, the "Run Locally" would also download the secrets, assuming you have the required scopes, based on your taskcluster creds. In automation, the taskcluster config/credentials file could have auth disabled, and a proxy set to http://taskcluster. Lastly, the config file could be volume mounted, so in automation the config is different to when a user runs locally. But this.... is a lot of work. So maybe `TASK_ID` env variable check is a good enough hack...
Of course, in my "alternative" solution above, we'd use the taskcluster client from mozharness, rather than making http requests against the proxy. That way, with auth disabled and a proxy enabled, the client could decide whether to inject Authorization header and hit endpoint directly, or to rewrite url to proxy, and disable Authorization header generation....

But maybe checking `TASK_ID` env variable is enough.
We currently have a plethora of "run locally" options (mach on laptop, mozharness on laptop, docker on laptop, interactive) and each brings its own challenges.  At some point we (TC) need to pick one of those as the supported option and really dig into supporting it.  At that point, we can consider adding additional in-tree support, noting that any in-tree behavior that differs between "run locally" and automation will be suspected of causing intermittents.

Until then, I think this should be sufficient: running a try job locally works without any need for secrets, so there's no need to distinguish running within a task.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a37a5bccf978&selectedJob=23927226

So it turns out, not every mozharness job we run takes the same set of mozharness actions -- several, in particular, do not support the `check-test` action.

I'm not sure how best to solve this.  Should the list of actions be a parameter to the task, along with MOZHARNESS_CONFIG and MOZHARNESS_SCRIPT?  Is there a way to just add one action to default_actions?
Flags: needinfo?(jlund)
(In reply to Dustin J. Mitchell [:dustin] from comment #108)
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=a37a5bccf978&selectedJob=23927226
> 
> So it turns out, not every mozharness job we run takes the same set of
> mozharness actions -- several, in particular, do not support the
> `check-test` action.
> 
> I'm not sure how best to solve this.  Should the list of actions be a
> parameter to the task, along with MOZHARNESS_CONFIG and MOZHARNESS_SCRIPT? 


I think consistency and explicitness would be best. i.e. passing each action via --action[1] for each action you want to run in the TC scheduling logic. Once we no longer run the build equivalent in buildbot however, you can just change the default_actions config item in the mh config itself and not pass anything via commmand line

this approach I feel is better than having a mix of --no-action[2]'s flags popping off actions from their respective build variant config's default_actions list.

> Is there a way to just add one action to default_actions?

yes you can pass --add-action[3] but iirc, that will just add the action to the end of default_actions list. And since actions are run in order, this may or may not be what you want.


[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/config.py#528
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/config.py#534
[3] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/config.py#531
Flags: needinfo?(jlund)
Comment on attachment 8718903 [details]
MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34791/diff/7-8/
Attachment #8718903 - Attachment description: MozReview Request: Bug 1231320: pull from secrets API in TaskCluster → MozReview Request: Bug 1231320: pull from secrets API in TaskCluster r=mshal r=pmoore r=garndt
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28995/diff/5-6/
Attachment #8701603 - Attachment description: MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r?mshal → MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore
Attachment #8701603 - Flags: review?(pmoore)
Attachment #8701603 - Flags: review?(garndt)
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

https://reviewboard.mozilla.org/r/28995/#review37289

Nothing changed since the last review I made, but will r+ since mozreview thinks I need to review again.
Attachment #8701603 - Flags: review?(garndt) → review+
Comment on attachment 8731475 [details]
MozReview Request: Bug 1231320: supply actions to mozharness from the task definition; r?jlund

https://reviewboard.mozilla.org/r/40625/#review37343

looks good. couple comments to discuss in line before stamping.

::: testing/taskcluster/scripts/builder/build-linux.sh:136
(Diff revision 1)
>  
>  python2.7 $WORKSPACE/build/src/testing/${MOZHARNESS_SCRIPT} ${config_cmds} \
>    $debug_flag \
>    $custom_build_variant_cfg_flag \
>    --disable-mock \
> -  $steps \
> +  $actions \

iiuc, this won't quite work because you have things in this cmd line call like:

`--update --no-update`

which will likely break or cause spurious results.

but taking a step back, if every variant has been updated to explicitly dictate actions, then we can remove all the --no-action args here: https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/builder/build-linux.sh?from=build-linux.sh#128

as soon as you use >= 1 --action call, default_actions is completly ignored and --no-action calls are a no-op.

make sense?

::: testing/taskcluster/tasks/builds/android_api_15_b2gdroid.yml:39
(Diff revision 1)
>        # TODO: make these additional configuration files go away
>        MOZHARNESS_CONFIG: >
>            builds/releng_base_android_64_builds.py
>            disable_signing.py
>            platform_supports_post_upload_to_latest.py
> +      MOZHARNESS_ACTIONS: 'get-secrets build multi-l10n update'

I think you will need to add 'get-secrets' here: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/fx_desktop_build.py#30

also, where is `def get_secrets()` defined? I don't see it in https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py
Attachment #8731475 - Flags: review?(jlund)
https://reviewboard.mozilla.org/r/40625/#review37343

> iiuc, this won't quite work because you have things in this cmd line call like:
> 
> `--update --no-update`
> 
> which will likely break or cause spurious results.
> 
> but taking a step back, if every variant has been updated to explicitly dictate actions, then we can remove all the --no-action args here: https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/builder/build-linux.sh?from=build-linux.sh#128
> 
> as soon as you use >= 1 --action call, default_actions is completly ignored and --no-action calls are a no-op.
> 
> make sense?

Where would `--no-update` come from, unless MOZHARNESS_ACTIONS had `no-update` (which it doesn't anywhere in this patch)?  Are you suggesting that I ensure that somehow?  Like, list all the acceptable mozharness actions in build-linux.sh, and fail with an error if MOZHARNESS_ACTIONS contains an unknown action?

> I think you will need to add 'get-secrets' here: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/fx_desktop_build.py#30
> 
> also, where is `def get_secrets()` defined? I don't see it in https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py

Hm, it seems to work without that, but yes, it should be in all_actions.

get_secrets is defined in the SecretsMixin.
https://reviewboard.mozilla.org/r/40625/#review37343

> Hm, it seems to work without that, but yes, it should be in all_actions.
> 
> get_secrets is defined in the SecretsMixin.

Oh, sorry, get-secrets is already added there.  It's not in dxr since this patch hasn't been landed and dxr shows the contents of mozilla-central.  So, nothing to do here.
https://reviewboard.mozilla.org/r/40625/#review37343

> Oh, sorry, get-secrets is already added there.  It's not in dxr since this patch hasn't been landed and dxr shows the contents of mozilla-central.  So, nothing to do here.

where is 'there'? is this patch based a project gecko branch? I was just going by the patch and m-c tip because this mozreview patch wouldn't let me expand the context lines down to lines like this: http://hg.mozilla.org/mozilla-central/file/3e04659fdf6a/testing/taskcluster/scripts/builder/build-linux.sh#l132
The changes you were looking for are in a different, earlier diff in the same review request.  Given that, r?
Comment on attachment 8731475 [details]
MozReview Request: Bug 1231320: supply actions to mozharness from the task definition; r?jlund

https://reviewboard.mozilla.org/r/40625/#review37429

given https://bugzilla.mozilla.org/show_bug.cgi?id=1231320#c121 r+ :shipit:
Attachment #8731475 - Flags: review+
/me doesn't understand mozreview, I thought I'd r+'d twice already! Looks like for some reason comment 116 didn't trigger the r+ though ... :/

Anyway, r+ ! :)
Comment on attachment 8701603 [details]
MozReview Request: Bug 1231320: support min_scm_level and default secret specs; r=mshal r=garndt r=pmoore

https://reviewboard.mozilla.org/r/28995/#review37543

Thunderbirds are Go!
Attachment #8701603 - Flags: review?(pmoore) → review+
Yeah, this bug has been a case study in mozreview failing, sorry about that and thanks for the r+
You need to log in before you can comment on or make changes to this bug.