Closed Bug 1346404 Opened 7 years ago Closed 6 years ago

Warn when attempting to retrigger a taskcluster job when the task doesn't exist

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1420482

People

(Reporter: botond, Unassigned)

Details

STR:
  1. Load https://treeherder.mozilla.org/#/jobs?repo=try&revision=73e7b2d116a6
  2. Ensure you're logged in
  3. Select a build job; I was testing with "Windows 2012 opt".
  4. Press "Retrigger"
  5. Observe that you get a green "Retrigger request sent" notification

ER:

  The job is retriggered.

AR:

  The job is not retriggered.
I'm able to retrigger buildbot jobs on your try push, but not the taskcluster jobs. I requested new Windows 2012 build jobs by using the Add New Jobs feature, and that appears to be working.


As far as I can tell, the network request when you retrigger a taskcluster job is returning a successful 200 status, although I'm not sure that the response json is what it should be:
{"next":null,"previous":null,"results":[]}

TC-based builds and jobs both seem to be affected.
Summary: Retrigger request results in "Retrigger request sent", but job is never retriggered → Retrigger request results in "Retrigger request sent", but taskcluster job is never retriggered
Bstack suggested pinging you, Greg :)
Flags: needinfo?(garndt)
The issue here is that the try push was done on January 20th. Try pushes have taskcluster tasks that expire after 28 days (non-try is kept for ~1 year, IIRC), so doing things to the push after the task expires won't work. 

The real issue here is that none of the responses from requesting the retrigger make this look like something went wrong (Ed had to dig in server logs to find that the task was missing).

We can either close this bug as a wontfix and wait for the retrigger action to be switched off of using pulse, and onto directly calling taskcluster APIs, or we can morph the bug into trying to dig out some more helpful error message when this happens.
Flags: needinfo?(garndt)
Summary: Retrigger request results in "Retrigger request sent", but taskcluster job is never retriggered → Warn when attempting to retrigger a taskcluster job when the task doesn't exist
I feel like we should probably be expiring the treeherder jobs when the taskcluster ones do. Taskcluster jobs already set an expiry time internally, it should be trivial to pass this to treeherder, store it as part of the job model, then make cycle data expire all jobs who have expired (in addition to ones that were submitted more than 4 months ago, which we do currently).
I can't decide whether we should do that or not.

Even if the tasks/logs have expired, there's still potentially useful in Treeherder that people can use. Also if some jobs disappear (due to be deleted) and others remain (eg buildbot jobs), might that be confusing?

Open to thoughts! :-)
Perhaps we should have an in-between state in treeherder where we can indicate that a task has expired and logs/retries won't be available, but the pass/fail information or whatever is stored in treeherder itself remains available?
Component: Treeherder → Treeherder: Job Triggering & Cancellation
So part of the problem here is that currently Taskcluster retriggers are performed by making a request to the Treeherder API, which then generates a pulse message, which is picked up by mozilla-taskcluster, that then attempts the retrigger. With this approach there's no easy way to provide any feedback as to whether the retrigger succeeded.

Bug 1420482 aims to make the Treeherder retrigger perform taskcluster retriggers client side, like already occurs for backfill/custom actions etc.

At that point, I'm pretty sure the error message will be returned by the API call made in the client and shown in the UI, reducing the potential for confusion.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Component: Treeherder: Job Triggering & Cancellation → TreeHerder
You need to log in before you can comment on or make changes to this bug.