Closed Bug 1308054 Opened 8 years ago Closed 6 years ago

Implement fall-back gpg key pair to be used for chainOfTrust when no bespoke keypair is required

Categories

(Taskcluster :: Workers, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: grenade, Unassigned)

References

Details

test worker types were updated to 6.0.0 today but failed to take jobs. last log messages from g-w in pt read:

Oct 06 01:21:07 win7-i-0b21838e012d95e0d generic-worker:  Creating file generic-worker.config... 
Oct 06 01:21:07 win7-i-0b21838e012d95e0d generic-worker:  Error loading configuration from file 'generic-worker.config': 
Oct 06 01:21:07 win7-i-0b21838e012d95e0d generic-worker:  Config setting "signingKeyLocation" must be defined in file "generic-worker.config".

I have terminated running instances manually and limited worker types using g-w 6.0.0 to max-instances=1 until this is resolved
added dummy signingKeyLocation config to gecko win workertypes. now waiting to see that gecko-t-win instances take jobs...
Is the bug here that the worker should not require that location to be specified or that the deployment of the worker didn't specify it in the config?
it turned out that g-w was terminating because the config setting was required even if it only contains a dummy value. we resolved the problem by putting a dummy value in the config. i believe pete plans to remove the requirement for the setting but in any case, we have a workaround (dummy config) that resolves the issue. i'm happy for the bug to be closed unless it's useful for tracking change to the dummy config requirement.
I believe this bug title can be updated to what we really need to change and left open for g-w.  I'll let pmoore update it to be what should be changed.
So I'm in two minds about this. Currently we are not using chainOfTrust on all worker types, so technically we don't need a gpg signing key on all worker types. However, as a feature, it seems sensible to allow it on any worker types, otherwise we'll have to track which worker types support which features, and this gets complicated (especially wrt payload schemas). Since the cost of generating a key is low (it is a single generic-worker command to create a keypair) I think it makes sense to keep things simple, and say that the worker requires a keypair, even if you don't currently intend to use this feature. Otherwise, if somebody attempts to use the feature, we'll have a runtime error, and a complicated process to enable the feature.

In other words, since the cost is low to create a key, and it avoids runtime problems of discovering that a particular feature isn't available on a particular worker type only when a task is submitted, I'd say let's leave it as it is, as a requirement that the signingKeyLocation is provided.

I'd prefer us to actually create an arbitrary key, and point to that, rather than set a dummy value, so that if someone uses the feature, it works - even if they don't have a way to validate the signatures because there is no official public key to refer to.

Even better, maybe it makes sense for us to have a fallback shared key for worker types where we don't manage a bespoke key pair for that worker type, and put the public key on e.g. gpg.mozilla.org.

I think the complexity of making a feature optional, and tracking where it is enabled, handling runtime discovery that feature is not available, comes at a higher cost than enabling globally.

We can discuss, of course. I think this isn't blocking anything though.
Summary: Generic Worker 6.0.0 doesn't take jobs → Implement fall-back gpg key pair to be used for chainOfTrust when no bespoke keypair is required
Hey Aki, what are your thoughts on this?
Flags: needinfo?(aki)
No longer blocks: 1306988, 1307803
I imagine we want some sort of automation or pre-testing here.  We want to make sure that both a) all workerTypes are able to spin up, and b) we don't spin up specific workerTypes (decision, docker-image, build) without new gpg keys.

Using fallback keys will address (a), but if we roll out new AMIs of the wrong workerType with the default keys, then we won't discover that (b) is broken until we try to perform a task that relies on the signed chain of trust artifact.

We currently aren't relying on generic-worker chain of trust at all, so dealing with (a) should help avoid having to deal with this until the generic-worker builds are closer to tier1 capable.  We should keep in mind that we will need a solution for (b) at that point.
Flags: needinfo?(aki)
Right now within docker-worker deployments, we generate the key no matter what and will only update the gpg key repo when an ami is generated for a worker type we want to be doing signing with.
QA Contact: pmoore
(In reply to Greg Arndt [:garndt] from comment #8)
> Right now within docker-worker deployments, we generate the key no matter
> what and will only update the gpg key repo when an ami is generated for a
> worker type we want to be doing signing with.

This is also what we do with generic-worker.

It is safer to require that a key is always provided, rather than assuming that if one isn't, it can be anything. Our deployment procedure takes care of this and the docs are quite explicit about the designed behaviour.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.