Closed
Bug 1507408
Opened 7 years ago
Closed 6 years ago
Automated Bug Generator changes required
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: dlabici, Assigned: dragrom)
References
Details
Attachments
(1 file)
With the changes to the workflow that are coming very soon, the automated bug generator will need to suffer some modifications which I have covered with :dragrom in a meeting.
To outline the points discussed:
- Bug Title should be under the following format: [$(MDC)] $(HOSTNAME) Generic Worker CODE 69.
-- End result should look like: [MDC1] t-yosemite-r7-XXX Generic Worker CODE 69
- Add an Alias of $(HOSTNAME) to ensure we don't have duplicated bugs for the same machine. If that alias already exists, use the existing Problem Tracking bug .
- If possible, automatically add CiDuty's team account ( https://bugzilla.mozilla.org/user_profile?login=ciduty%40mozilla.com )to NeedInfo (everyone in the team has visibility and will receive the email from NI?)
| Assignee | ||
Updated•7 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 1•7 years ago
|
||
alias will be $(HOSTNAME)_code69, and the bug will be linked to the tracker bug returned by $(HOSTNAME) alias
Comment 2•7 years ago
|
||
The worker is now in working state
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Updated•7 years ago
|
Status: REOPENED → ASSIGNED
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(dcrisan)
Updated•7 years ago
|
Flags: needinfo?(dcrisan) → needinfo?
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(dcrisan)
Updated•7 years ago
|
Flags: needinfo?(dcrisan) → needinfo?
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?
Updated•7 years ago
|
Flags: needinfo?
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?
| Assignee | ||
Comment 3•7 years ago
|
||
As part of this bug, I'll also convert run-generic-worker.sh.erb from bash scripting to python
These are the possible exit codes of generic-worker:
Exit Codes:
0 Tasks completed successfully; no more tasks to run (see config setting
numberOfTasksToRun).
64 Not able to load generic-worker config. This could be a problem reading the
generic-worker config file on the filesystem, a problem talking to AWS/GCP
metadata service, or a problem retrieving config/files from the taskcluster
secrets service.
65 Not able to install generic-worker on the system.
66 Not able to create an OpenPGP key pair.
67 A task user has been created, and the generic-worker needs to reboot in order
to log on as the new task user. Note, the reboot happens automatically unless
config setting disableReboots is set to true - in either code this exit code will
be issued.
68 The generic-worker hit its idle timeout limit (see config settings idleTimeoutSecs
and shutdownMachineOnIdle).
69 Worker panic - either a worker bug, or the environment is not suitable for running
a task, e.g. a file cannot be written to the file system, or something else did
not work that was required in order to execute a task. See config setting
shutdownMachineOnInternalError.
70 A new deploymentId has been issued in the AWS worker type configuration, meaning
this worker environment is no longer up-to-date. Typcially workers should
terminate.
71 The worker was terminated via an interrupt signal (e.g. Ctrl-C pressed).
72 The worker is running on spot infrastructure in AWS EC2 and has been served a
spot termination notice, and therefore has shut down.
73 The config provided to the worker is invalid.
74 Could not grant provided SID full control of interactive windows stations and
desktop.
75 Not able to create an ed25519 key pair.
76 Not able to save generic-worker config file after fetching it from AWS provisioner
or Google Cloud metadata.
77 Not able to apply required file access permissions to the generic-worker config
file so that task users can't read from or write to it.
I would receommend that the following exit codes should NOT cause the worker to be quarantined, all other exit codes should cause an automatic quarantine:
Exit Codes:
0 Tasks completed successfully; no more tasks to run (see config setting
numberOfTasksToRun).
67 A task user has been created, and the generic-worker needs to reboot in order
to log on as the new task user. Note, the reboot happens automatically unless
config setting disableReboots is set to true - in either code this exit code will
be issued.
68 The generic-worker hit its idle timeout limit (see config settings idleTimeoutSecs
and shutdownMachineOnIdle).
| Reporter | ||
Comment 5•7 years ago
•
|
||
Had a meeting with :dragrom today and we discussed on how we can approach the new requirements, here is an overview of the work that it's gonna be done:
- Only use the existing bugs that we have for the machine, we will not use "_code69" in the alias anymore.
- Whenever an Exit code is hit from comment 4 we will: ReOpen the bug (if needed), Comment with the exitcode and it's description.
- If possible via the API, we will also update the whiteboard of the bug with the current exitcodes (are remove them when they are fixed).
As work distribution, :dragrom will be doing the implementation, I'll do the review + comment content/style.
| Assignee | ||
Comment 6•7 years ago
|
||
Refactoring automatic bug generation
| Assignee | ||
Comment 7•7 years ago
|
||
| Assignee | ||
Updated•6 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago → 6 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•