Closed Bug 1375514 Opened 7 years ago Closed 7 years ago

generic-worker doesn't exit when numberofTasksToRun is reached

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wcosta, Assigned: wcosta)

Details

Attachments

(1 file, 1 obsolete file)

When g-w runs "numberofTasksToRun" tasks, it should exit, but since version 10, it keeps running.
That is not true. It exits, but if you run it again, without resetting the counter, it will run indefinitely.

The semantics are not well documented, but they are coherent and work.

The implementation is trivial, which is adding a line to the puppet file run-generic-worker.sh to delete the file tasks-resolved-count.txt before running the worker, to reset the count.

This can easily be backwardly compatible by simply using rm -f tasks-resolved-count.txt.

We can change the semantics in a future release, but I see no reason to block the current rollout, which fixes other issues.
My proposal would be to add `rm -f tasks-resolved-count.txt` to `run-generic-worker.sh` in puppet, with a comment to say it resets the total count, so that the worker will exit after one task, and we follow up with a separate release for changing the semantics if we decide it is necessary, or at least improve the documentation, so it is clear the total count is persisted between runs.
We're going to meet on vidyo to discuss.
Due to Windows semantics, we are going to fix it in puppet configs and during All Hands discuss a proper solution.
No longer blocks: 1375015
If the file exists when the worker starts up and its content is equal
numberofTaskToRun, we never see a reboot.
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Comment on attachment 8880467 [details] [diff] [review]
Remove task count persisted file. r=pmoore

Review of attachment 8880467 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you agree with applying the requested change. :)

::: modules/generic_worker/templates/run-generic-worker.sh.erb
@@ +8,5 @@
>  
> +# If this file exists when the worker starts up, and its content is equal
> +# to numberOfTasksToRun, the worker won't exit and machine won't reboot.
> +rm -f /Users/cltbld/tasks-resolved-count.txt
> +

Unfortunately, this form is not protected against the working directory changing. This alternative form, however, is safe:

rm -f tasks-resolved-count.txt

Explanation
===========
When the generic-worker starts up on line 15, it will determine the current task count by looking in *the current working directory* for the file tasks-resolved-count.txt. Therefore if we delete the file in the current directory, whatever that directory is, so long as we don't change directory before line 15, it is the one the worker will look in, to find the file.

If you really want to, you could add a comment explaining this, so nobody introduces a "cd" command between lines 11 and 15. But I'll let you decide. :)
Attachment #8880467 - Flags: review+
Attachment #8880467 - Attachment is obsolete: true
If the file exists when the worker starts up and its content is equal
numberofTaskToRun, we never see a reboot.
Comment on attachment 8880483 [details] [diff] [review]
Remove task count persisted file. r=pmoore

Delete file in current directory.
Attachment #8880483 - Flags: review?(pmoore)
Comment on attachment 8880483 [details] [diff] [review]
Remove task count persisted file. r=pmoore

Review of attachment 8880483 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, thanks!
Attachment #8880483 - Flags: review?(pmoore) → review+
https://hg.mozilla.org/build/puppet/rev/7cd4e03689ef
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: