Closed Bug 1372817 Opened 7 years ago Closed 7 years ago

Cancel jobs from TreeHerder before decision task finishes triggers a task exception email but other jobs are still triggered

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1372892

People

(Reporter: xidorn, Unassigned)

Details

To reproduce this:
1. push a try run
2. open treeherder of this try run
3. click "Cancel all jobs" button before the decision task finishes

Expected result:
the decision task would be canceled, and no other jobs start anymore

Actual result:
the decision task is canceled, but it would send an email titled "Task exception: Gecko Decision Task ..." to the push user, and other jobs may continue to start.
Can you link to the case where this occurred?
This one: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a14987c29caae29f09c9caa4bdd7fc533332b71

I manually canceled other tasks again.

And I received the following email:
> Task N3cl5jmaSpCJllh6IEg3hg in task-group N3cl5jmaSpCJllh6IEg3hg is complete.
> 
> Status: exception (in 1 run)
> Name: Gecko Decision Task
> Description: The task that creates all of the other tasks in the task graph
> 
> Owner: mozilla-taskcluster-maintenance@mozilla.com
> Source: https://hg.mozilla.org/try/raw-file/3a14987c29caae29f09c9caa4bdd7fc533332b71/.taskcluster.yml
Looking at the logs from the decision task, it seems to have finished anyway.  Cancellation is based on polling (the next time the worker checks in with the queue, it finds out the task it's been running is cancelled), so this was basically the failure case of a race condition, where the task was marked cancelled but completed all the same.  We have designed things such that if the decision task does not complete successfully, then its downstream tasks don't run -- but there are some issues where some tasks are scheduled anyway (bug 1372892).  If that were fixed then none of the tasks you saw would have started, as expected.

Thanks for bringing this up!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
If this is just a race condition, the window of the race is pretty large I suppose, because I've hit this twice recently.
That's true!  Maybe "race condition" wasn't the right word, but I wanted to indicate that it's expected that a cancelled task may keep executing for some time and even finish, although it will still be marked as "exception" even in that case.
You need to log in before you can comment on or make changes to this bug.