Closed Bug 1461913 Opened 7 years ago Closed 6 years ago

Don't continue running generic-worker if it returned with exit code 69

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: dragrom)

References

Details

Attachments

(1 file, 1 obsolete file)

If generic-worker detects it is in a bad state (e.g. can't run tasks due to i/o problems or corrupt data), it returns with exit code 69. If this happens, we should disable the worker, and ideally raise a bug in bugzilla for someone to look into it. Recently we hit an issue with t-yosemite-r7-449 boot-looping. I've created this bug so if in future a worker gets into a bad state, it doesn't eat through tasks or repeatedly crash, sending alerts. The generic-worker --help command explains what the 64-72 exit codes mean: > 69 Worker panic - either a worker bug, or the environment is not suitable for running > a task, e.g. a file cannot be written to the file system, or something else did > not work that was required in order to execute a task. See config setting > shutdownMachineOnInternalError.
See Also: → 1461914
See Also: → 1460446
Assignee: relops → dcrisan
Created RelOps bugzilla user for filing bugs via API. Added all information to passwords/bugzilla.gpg file stored in ssh://gitolite3@git-internal.mozilla.org/relops/gpg.git repository dhouse: Can you please make changes to your code and remove the existing api key?
Status: NEW → ASSIGNED
(In reply to Dragos Crisan [:dragrom] from comment #1) > Created RelOps bugzilla user for filing bugs via API. Added all information > to passwords/bugzilla.gpg file stored in > ssh://gitolite3@git-internal.mozilla.org/relops/gpg.git repository > > dhouse: Can you please make changes to your code and remove the existing api > key? Thank you for setting the user up. I changed roller prod to use the relops@ bugzilla user's apikey from the gpg repo. So roller and the quarantineworker will be using the same apikey; I don't think bugzilla provides different permissions for different apikeys for the same user, and so I think it is fine to not make separate keys. The key is stored, for roller prod to use, in releng-puppet's hiera as roller_bugzilla_api_key!prod. Roller dev is set up with a separate apikey for the dev bugzilla instance.
Attached file Github Pull Request for build-puppet (obsolete) —
A Pull Request from Dragos for implementing this in build-puppet repo.
Check the status code returned by generic worker If status code is 69, quarantine worker and create the bug If the worker will be back to normal, then remove the worker from the quarantine and close the bug
Attachment #8989487 - Attachment is obsolete: true
Attachment #8991612 - Flags: review?(pmoore)
Attachment #8991612 - Flags: review?(jwatkins)
Attachment #8991612 - Flags: review?(dhouse)
To reproduce the exit code 69, just change the ownership for /Users/cltbld/generic-worker.openpgp.key file
Attachment #8991612 - Flags: review?(dhouse) → review+
Attachment #8991612 - Flags: review?(pmoore) → review+
Attachment #8991612 - Flags: review?(jwatkins) → checked-in+
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: