Closed
Bug 1375514
Opened 7 years ago
Closed 7 years ago
generic-worker doesn't exit when numberofTasksToRun is reached
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wcosta, Assigned: wcosta)
Details
Attachments
(1 file, 1 obsolete file)
1.48 KB,
patch
|
pmoore
:
review+
|
Details | Diff | Splinter Review |
When g-w runs "numberofTasksToRun" tasks, it should exit, but since version 10, it keeps running.
Comment 1•7 years ago
|
||
That is not true. It exits, but if you run it again, without resetting the counter, it will run indefinitely. The semantics are not well documented, but they are coherent and work. The implementation is trivial, which is adding a line to the puppet file run-generic-worker.sh to delete the file tasks-resolved-count.txt before running the worker, to reset the count. This can easily be backwardly compatible by simply using rm -f tasks-resolved-count.txt. We can change the semantics in a future release, but I see no reason to block the current rollout, which fixes other issues.
Comment 2•7 years ago
|
||
My proposal would be to add `rm -f tasks-resolved-count.txt` to `run-generic-worker.sh` in puppet, with a comment to say it resets the total count, so that the worker will exit after one task, and we follow up with a separate release for changing the semantics if we decide it is necessary, or at least improve the documentation, so it is clear the total count is persisted between runs.
Comment 3•7 years ago
|
||
We're going to meet on vidyo to discuss.
Assignee | ||
Comment 4•7 years ago
|
||
Due to Windows semantics, we are going to fix it in puppet configs and during All Hands discuss a proper solution.
No longer blocks: 1375015
Assignee | ||
Comment 5•7 years ago
|
||
If the file exists when the worker starts up and its content is equal numberofTaskToRun, we never see a reboot.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Comment 6•7 years ago
|
||
Comment on attachment 8880467 [details] [diff] [review] Remove task count persisted file. r=pmoore Review of attachment 8880467 [details] [diff] [review]: ----------------------------------------------------------------- r+ if you agree with applying the requested change. :) ::: modules/generic_worker/templates/run-generic-worker.sh.erb @@ +8,5 @@ > > +# If this file exists when the worker starts up, and its content is equal > +# to numberOfTasksToRun, the worker won't exit and machine won't reboot. > +rm -f /Users/cltbld/tasks-resolved-count.txt > + Unfortunately, this form is not protected against the working directory changing. This alternative form, however, is safe: rm -f tasks-resolved-count.txt Explanation =========== When the generic-worker starts up on line 15, it will determine the current task count by looking in *the current working directory* for the file tasks-resolved-count.txt. Therefore if we delete the file in the current directory, whatever that directory is, so long as we don't change directory before line 15, it is the one the worker will look in, to find the file. If you really want to, you could add a comment explaining this, so nobody introduces a "cd" command between lines 11 and 15. But I'll let you decide. :)
Attachment #8880467 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Attachment #8880467 -
Attachment is obsolete: true
Assignee | ||
Comment 7•7 years ago
|
||
If the file exists when the worker starts up and its content is equal numberofTaskToRun, we never see a reboot.
Assignee | ||
Comment 8•7 years ago
|
||
Comment on attachment 8880483 [details] [diff] [review] Remove task count persisted file. r=pmoore Delete file in current directory.
Attachment #8880483 -
Flags: review?(pmoore)
Comment 9•7 years ago
|
||
Comment on attachment 8880483 [details] [diff] [review] Remove task count persisted file. r=pmoore Review of attachment 8880483 [details] [diff] [review]: ----------------------------------------------------------------- Awesome, thanks!
Attachment #8880483 -
Flags: review?(pmoore) → review+
Assignee | ||
Comment 10•7 years ago
|
||
https://hg.mozilla.org/build/puppet/rev/7cd4e03689ef
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Component: Generic-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•