taskcluster-base: base.config should be nice to use...

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: jonasfj, Unassigned)

Tracking

Details

Attachments

(2 attachments)

(Reporter)

Description

4 years ago
base.config which we use everywhere to load configuration from files and environment variables should be modified to also validate the loaded config against a JSON schema.

This way, if you don't supply test credentials, test won't run. If you deploy without sufficient config/credentials, it just won't even try to run. It'll be faster to figure out what is missing get the config right.

When doing this we should also:
 - Add YAML support
 - Remove JSON support (or at least consider it)

Each component should be configured in following way:
config/
  default.yml
  <profile>.yml
taskcluster-<name>.yaml (added to gitignore, only used locally)
taskcluster-<name>.yaml.enc
taskcluster-<name>.yaml.example

In addition to these selected config variables should be loadable by environment variables, by specifying them as is done now...

Using YAML we can add a lot more inline documentation comments.
Using JSON schema for validation of loaded config, we can show configuration documentation in our documentation...
(Reporter)

Comment 1

4 years ago
When we get around to refactoring this we should load config in the following order:

 1) config/default.yml
 2) config/<profile>.yml
 3) taskcluster-<name>-default.yml
 4) taskcluster-<name>-<profile>.yml
 5) ~/.config/taskcluster-<name>-default.yml
 6) ~/.config/taskcluster-<name>-<profile>.yml
 7) /etc/taskcluster-<name>-default.yml
 8) /etc/taskcluster-<name>-<profile>.yml
 9) <environment variables>

Note:
 - (5) through (8) would rarely be used, maybe /etc/ is useful for docker-worker.
 - Magic is needed for (9) if we want to support anything but strings, ie. object, lists, integers, booleans.
   (Either we do something smart, or we continue to ignore the problem)
 - We have the following profiles:
   A) default         (default configuration in most/any cases, commonly used as basis for another profile)
   B) test            (used when running npm test)
   C) localhost       (used when running locally, for example if you want to test against local queue)
   D) production      (used in production, contains all the non-secret production config options)
 - Future profiles
   E) staging         (maybe one day we'll have staging site)
   F) test-localhost  (used when running npm test against services on localhost, e.g. local queue)

For "test" and "localhost", it might be nice to have a taskcluster-<name>-<profile>.yml.example file.
We should also have a file: taskcluster-<name>-test.yml.enc with encrypted test credentials for use on travis.


Note, most of the files doesn't exist. Usually you'll have config for <test> and maybe <localhost> and <test-localhost> if you want to do that.
The idea with <localhost> being that say the queue runs on localhost, and <test-locahost> being that npm test
for scheduler can run against queue started with "node bin/server.js localhost".
(Reporter)

Comment 2

4 years ago
Note,
In discussion with jlal yesterday we talked a bit about step (9).

It was suggested that we keep default.js as JS and employ process.env to load from environment variables here.
The downside to is that:
 (A) We can't overwrite any configuration key with environment variables
 (B) Environment variables are overwritten by profile (2), local folder (3), etc...

I suggest that we instead give an imperative project-specific function that converts environment variables to a configuration object that can then overwrite config-keys from (1)-(8).
Ie. source (9) is a function that converts process.env to a config object.

This seems stupid, but it means that the name of a configuration key is unrelated to the name of the environment variable specifying it. And that we can parse JSON, numbers, etc. from environment variables.. So that we don't have to do that when reading the configuration keys.

--- 
I still don't see the perfect solution here. But we definitely need something that better, than what we have.
(Reporter)

Comment 3

4 years ago
@jlal,
You're welcome to voice any suggestions you have here :)
I am not completely happy here so NI some other folks to provide feedback... The stuff I like:

 - schema for config (but rather use Joi that turns into json schema then write it directly)
 - overrides in the form of <name>-<profile>

I strongly dislike the project.xfoo.bar environment variable thing (other projects do use it) primarily I dislike it because it does not make anything easier for me. In particular I have some environment variables set for github/aws/etc... and these get picked up in various tc/non-tc projects.

In addition to documentation I still like the idea of prompting for new config values (on npm install or something which would only be run when installing locally) this thing could potentially be the thing that fetches environment variables (and then sticks them in the file for config).

Few other things:

 - we should aim at getting this thing out of tc-base
 - we should research more projects (we use very few of nconf features and generally I am not huge fan)
Flags: needinfo?(jhford)
Flags: needinfo?(garndt)
(Reporter)

Comment 5

4 years ago
I generally agree... it would be neat if we could pick up environment variables like:
TASKCLUSTER_CLIENT_ID, TASKCLUSTER_ACCESS_TOKEN, AWS_ACCESS_KEY_ID and AWS_SCERET_ACCESS_KEY.
For that I fear we'll end up with something imperative. Which, as we'll be able to parse JSON too, might be okay.

Regarding, prompting on "npm postinstall", I think it would be sane to just check if config is correct, and then ask people to provide it, if missing while exiting non-zero. That way heroku will likely be noisy, and you'll have an error response from joi. Blocking on postinstall and waiting for input seems like bad idea, if/when using systems where we deploy automatically.

Anyways, we need to come up with a good way to handle configuration, that's all.

Comment 6

4 years ago
Anything we can do to prevent running something that can unexpectedly blow up later when a condition is met is awesome.

Seems that there are a few things going on in this thread:

1) yml vs json vs js for configuration file
2) joi vs json schema for source schema/rules
3) pre/post install config setup
4) Environment variables
5) profile overrides
6) move out of tc-base

1) Anything that can allow inline comments for configuration files is awesome.  The config file itself can also serve as a supplement to the more in depth project documentation. Yaml definitely makes dealing with configuration files a bit easier.  yml > js > json
2) +1 for using Joi.  Anything that helps not writing json schema directly is nice, but do we need to actually convert this to a json schema at all if we're only talking about hierarchy of configuration files?  Perhaps I'm overlooking what value having a generated json schema provides.
3) I would prefer configuration not be tied to installation but rather at time of execution then have something that can assist with populating the configuration (like docker-worker does).
4) For credentials, it makes sense not having some kind of project naming hierarchy included in the key name, but what about settings that share key names across projects but might have different values?  Do we just ensure that this doesn't happen or not have them as white listed variables?
5) +1 for test/local/prod profile overrides
6) Definitely support this idea.  Seems taskcluster-base is doing quite a bit.
Flags: needinfo?(garndt)
(Reporter)

Comment 7

4 years ago
Okay, so I'm just adding some pro/cons here :)

### Issue: 1)
json:
 + easy to load as data
 - no comments
 - hard to write
js:
 - hard to load from data file, require("/etc/myconfigfile") isn't good
 + can embed environment variables (using process.env.MY_ENV_VAR)
yaml:
 + easy to read/write
 + easy to load as data
 + can have comments
 + it's so pretty :)
 - can't embed environment variables natively

### Issue 2)
For this purpose I don't think it matters much.
But it might depend on whether or not we do joi for request validation.

### 3)
If anything I think validation on post-install + validation on startup is sufficient.
(post install validation would prevent us from deploying broken configs on heroku).
I don't see a need for an interactive guide, on a platform like heroku that could be annoying.
IMO comments in our default configuration file ought to be enough to help you configure something.

### 4)
In bug 1128931 I suggested that we add a config/envs.js file which looks like this:
module.exports = {
   taskcluster: {
     credentials: {
        clientId:          process.env.TASKCLUSTER_CLIENT_ID,
        accessToken:       process.env.TASKCLUSTER_ACCESS_TOKEN
     }
   }
}
Then load and merge all config files, before we load and merge the envs.js config file.
Note, config/envs.js could be a YAML file in which we after loading substitute all strings on the form "$<...>"
with the environment variable <...>, before merging with other config files.

Anyway, with an approach like this it would very clean which environment variables affects the application.
And we would still be able to load credentials from their commonly used names, that we have all in .bashrc

### 6)
taskcluster-base was never meant to be pretty. It's a great hack to avoid coming up with names for a lot of different npm modules. Anyways, some of it can definitely be replaced by existing npm modules, and the size of tc-base can be reduced.
Things like entities etc. can be moved out. But we'll still have somewhere with all the common glue code.
My vote would be for JSON/Yaml instead of JS.  While we're currently limiting our js to a subset of the language, we're leaving ourselves open to having the entire imperative nature of the language exposed to config writers.  I would like to build as many roadblocks to imperative configuration as possible for these kinds of things.  If you need an imperative structure, your configuration logic or your thing being configured is probably over complicated and needing fixing.

That said, I don't know if JSON Schema is really the tool for the job here.  It's a lot of overhead for something that's an internal matter that can be tested really quick.  I see json-schema as a great tool for validating input from the wild many times over, and that input being in a well defined format.  I just think that configurations can usually be tested quicker and easier just being a set of asserts (or Joi usage) in the constructors.  Testing a constructor should be pretty trivial.

For environment variables using JSON/YAML, we could have an internal format for these,

{
   "secretAccessKey": "${env.SECRET_ACCESS_KEY}"
}

and something that post-processes the config file for "${env[.](.*)}" and takes the value of the environment variable named by the first group.  To handle the case of an environment variable which is not defined, we could say that when we merge a given configuration file into another that we ignore keys which have values that resolve to undefined.
Flags: needinfo?(jhford)
(Reporter)

Comment 9

4 years ago
@jhford,
He he, I'll actually agree with you on avoiding anything complicated in our config file.
So maybe instead of having config/envs.js we have config/envs.yaml and use custom tags.

myConfigKey: !ENV MY_ENVIRONMENT_VARIABLE_NAME

In particular I want support things like this:

myConfigList:   !list-from-env MY_ENVIRONMENT_VARIABLE_CONTAINING_A_LIST
myConfigJSON:   !json-from-env MY_ENVIRONMENT_VARIABLE_CONTAINING_JSON
myConfigFlag:   !bool-from-env MY_BOOLEAN_ENVIRONMENT_VARIABLE
myConfigNumber: !number-from-env MY_ENV_VAR_WITH_NUMBER

Most of the environment variable we load are strings.
But we have a few common cases like `PORT`, `PUBLISH_REFERNCES` where we wish to load
numbers and booleans. We also have a few cases where we load lists or JSON objects from
from environment variables. We rarely do this, but in some rare cases it is necessary.
And parsing config entries from JSON in place were we use them is really painful.

Ideally, I would specify lists as:
  ENV_VAR_LIST_1 = 'entry 1'
  ENV_VAR_LIST_2 = 'entry 2'
  ENV_VAR_LIST_3 = 'entry 3'
ie. using multiple environment variables. And something similar for objects.
I guess that technically we could also do that with the pattern above. We just say that in
  myConfigList:   !list-from-env ENV_VAR_LIST_
And then `ENV_VAR_LIST_` is the prefix that our custom logic appends numbers to in-order to
get list entries.

Anyways, the !<custom-tag> syntax is actually part of the YAML spec. And js-yaml supports
custom tags. At the very least we can make a tag to load string from env vars. Granted it
would be nice to have logic for number, boolean, object and array.

See:
  https://github.com/nodeca/js-yaml
Example:
  https://github.com/nodeca/js-yaml/blob/master/examples/custom_types.yml
  https://github.com/nodeca/js-yaml/blob/master/examples/custom_types.js
Component: TaskCluster → General
Product: Testing → Taskcluster
(Reporter)

Updated

3 years ago
Component: General → Platform Libraries
(Reporter)

Comment 10

3 years ago
Created attachment 8639055 [details] [review]
Github PR for feedback

Hi, what do you guys think of this.
Then we'll only have "config.yml" in the repository, and it'll specify:
  1) Defaults (loaded from environment if secret)
  2) Profiles overwriting default configuration
In our local clones we can have user-config.yml overwriting config.yml
so we don't have to specify things by environment variables. And it'll even
support multiple profiles when running locally.

I imagine, profiles:
  production
  local-test
  integration-test
  load-test

Most notably, the coupling between environment variable names and which config
key is defined by an env var will be explicit. And we won't have to define a
list of environment variables to load from.

Additionally, it'll be easy and reliable to load numbers/lists/json from environment variables.
Attachment #8639055 - Flags: feedback?(jhford)
Attachment #8639055 - Flags: feedback?(garndt)
Comment on attachment 8639055 [details] [review]
Github PR for feedback

I like this a lot better than the current nconf system.  One question that I did have is whether you were intending to have a json schema as in the title of the bug or if I'm missing something.
Attachment #8639055 - Flags: feedback?(jhford) → feedback+
(Reporter)

Comment 12

3 years ago
Comment on attachment 8639055 [details] [review]
Github PR for feedback

@amiyaguchi,
You showed interest in improving how we load config.
Please take a look at this, let me know how awesome/crazy this is? :)
 - And add any suggestions you have, if see anything missing...

Anyways, this is why I'm not going to investing in fixing the old stuff.
Attachment #8639055 - Flags: feedback?(amiyaguchi)
(Reporter)

Comment 13

3 years ago
Comment on attachment 8639055 [details] [review]
Github PR for feedback

I've freshed up the PR renaming base.config to base.legacyConfig, so we can
keep using the old pattern, but at least we have to do so intentionally.
(ie. slow migration, later version we'll remove support for the old stuff)

The idea with this is:
  config.yml        <-- check into repo
  user-config.yml   <-- listed in .gitignore

Both on the form:
  Options:
    hostname:     localhost
    port:         8080
  Profiles:
    production:
      hostname:   example.com
      port:       !env:number PORT

So you can specify defaults and profiles in both files.
A profile overwrites defaults, and user-config.yml overwrites config.yml.

In repo we have:
  config.yml
Which contains:
  - default options
  - profile specific defaults
  - declaration of how to load secrets from env vars

Locally you have, either:
 1) env vars with secrets needed for tests
 2) user-config.yml with secrets needed for tests (in the test profile)
    (this way we can also have a "localhost" profile for running locally)

Obviously, option (2) is the most convenient, but (1) works great for
circle-ci/travis etc.

We should probably also have user-config-example.yml in repo serving as a
template for user-config.yml, showing which secrets to fill in.
Attachment #8639055 - Flags: review?(winter2718)
Comment on attachment 8639055 [details] [review]
Github PR for feedback

worked with Jonas on IRC about this.

My only comment (other than trying to understand how this will be used) is changing the label "Options" to "Defaults" to indicate what it is and the intended behavior.
Attachment #8639055 - Flags: feedback?(garndt) → feedback+
(Reporter)

Comment 15

3 years ago
Renamed "Options" -> "Defaults"
And support for parsing a list of quoted strings from env var, using !env:list

Also renaming bug, I think having a good config loader is better than having explicit JSON schema.
We won't have many people writing config files, so schema is overkill, comments and example config
files will do much better.

Feel free to disagree, I'm not oppose to introducing an optional "Schema" property to validate the
merged configuration against. It certainly has its merits. Misconfiguration is a common cause
of downtime when we rollout (though very quickly fixed). But with comments and decent loaders it seems
a bit overkill.
Summary: taskcluster-base: base.config should validate config against a JSON schema → taskcluster-base: base.config should be nice to use...
Attachment #8639055 - Flags: review?(winter2718) → review+
(Reporter)

Comment 16

3 years ago
Will land this in tc-base 0.9 when I get a few other things with it too...

Thanks for the review...
(Reporter)

Comment 17

3 years ago
Created attachment 8672702 [details] [review]
Github PR to new tc-lib-config

This carries with r+ from before... so really just a quick look to see if it's integrated correctly would be great...
Attachment #8672702 - Flags: review?(jhford)
Attachment #8639055 - Flags: feedback?(acmiyaguchi)
(Reporter)

Comment 18

3 years ago
This was fixed a long time ago...
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Attachment #8672702 - Flags: review?(jhford) → review+
You need to log in before you can comment on or make changes to this bug.