open C:\generic-worker\ed25519-private.key: The system cannot find the file specified.
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Tracking
(Not tracked)
People
(Reporter: pmoore, Unassigned)
Details
The C:\generic-worker\ed25519-private.key is getting deleted on instances of type:
- gecko-t-win10-64-beta
- gecko-t-win10-64-cu
Could this be related to this commit?
Comment 1•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #0)
Could this be related to this commit?
that is quite likely. however, there is more than one way to look at this and possibly a bigger picture.
the problem evidenced by the papertrail log (generic-worker UTC open C:\generic-worker\ed25519-private.key: The system cannot find the file specified.) and the subsequent panic and machine shutdown is that gw is choosing to panic at what i believe to be an inopportune moment.
we currently only have a use case for cot keys on level 3 worker types. as far as i am aware, no other worker type makes use of these keys and there are no trust relationships in place that would make it meaningful to have keys at all, on any worker type that is not a level 3 gecko builder.
if there were any use case whatsoever for keys to exist on worker types that are not level 3 builders, i would not even bring this up, but so far, to the best of my knowledge, there is not.
there are however, numerous examples of a missing key on a worker that never uses it, causing problems with the maintenance of our worker types and creating busy work for engineers and trouble shooters and problem solvers. today's problems being a great example.
i could just go and write some code to generate this key and close this bug, but we will only be postponing the next issue caused by requiring the existence of this key and causing another waste of time and money that requiring this never-used key that serves no purpose and solves no problem, causes.
on level 3 builders we do require this key and the process to create it is manual for security reasons. a happy side effect of this manual process is that we have very few problems with missing keys in places where they are actually needed and frequently used.
we have many problems managing and ensuring the existence of this key in the places where we don't need it and never use it. we write lots of code in the occ and ronin-puppet repositories to make sure this unused, unneeded and massively useless key exists in the place where generic-worker expects to find it on penalty of a system shutdown when it isn't found.
on this occasion and to make this specific problem go away, i will of course write some more code to make sure this unneeded, unused, useless key exists, but i would be much happier if the logic for requiring that we panic when this key isn't found was reexamined.
Comment 2•6 years ago
|
||
perhaps a more opportune moment for a panic could be when gw detects it is running a task that makes use of the key but is unable to confirm its existence (i'm assuming that there is something in task configuration to say that the key will be required). to my perhaps simple logic, causing a panic and shutdown in those circumstances would be appropriate. a panic and shutdown of a worker type that never uses the key seems less appropriate to me.
Comment 3•6 years ago
•
|
||
i've had a look and i don't see how the commit referenced in comment 0 can be responsible for the problem. that change was only to remove generation of the deprecated openpgp key type required by older versions of cot and gw.
i've also seen that the build triggered by a commit from today which triggered a gecko-t-win10-64-beta ami build did run Set-ChainOfTrustKey and successfully created the ed25519 key that will never be used and is not needed on worker types that are not level 3 builders.
so in summary, i have changed nothing since this bug was created, but the last creation of a gecko-t-win10-64-beta ami did include key generation as can be seen in the papertrail logs.
Description
•