1346404 - Warn when attempting to retrigger a taskcluster job when the task doesn't exist

Reporter

Description

•

7 years ago

STR:
  1. Load https://treeherder.mozilla.org/#/jobs?repo=try&revision=73e7b2d116a6
  2. Ensure you're logged in
  3. Select a build job; I was testing with "Windows 2012 opt".
  4. Press "Retrigger"
  5. Observe that you get a green "Retrigger request sent" notification

ER:

  The job is retriggered.

AR:

  The job is not retriggered.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 1

•

7 years ago

I'm able to retrigger buildbot jobs on your try push, but not the taskcluster jobs. I requested new Windows 2012 build jobs by using the Add New Jobs feature, and that appears to be working.


As far as I can tell, the network request when you retrigger a taskcluster job is returning a successful 200 status, although I'm not sure that the response json is what it should be:
{"next":null,"previous":null,"results":[]}

TC-based builds and jobs both seem to be affected.

Summary: Retrigger request results in "Retrigger request sent", but job is never retriggered → Retrigger request results in "Retrigger request sent", but taskcluster job is never retriggered

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 2

•

7 years ago

Bstack suggested pinging you, Greg :)

Flags: needinfo?(garndt)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 3

•

7 years ago

The issue here is that the try push was done on January 20th. Try pushes have taskcluster tasks that expire after 28 days (non-try is kept for ~1 year, IIRC), so doing things to the push after the task expires won't work. 

The real issue here is that none of the responses from requesting the retrigger make this look like something went wrong (Ed had to dig in server logs to find that the task was missing).

We can either close this bug as a wontfix and wait for the retrigger action to be switched off of using pulse, and onto directly calling taskcluster APIs, or we can morph the bug into trying to dig out some more helpful error message when this happens.

Flags: needinfo?(garndt)

Summary: Retrigger request results in "Retrigger request sent", but taskcluster job is never retriggered → Warn when attempting to retrigger a taskcluster job when the task doesn't exist

William Lachance (:wlach)

Comment 4

•

7 years ago

I feel like we should probably be expiring the treeherder jobs when the taskcluster ones do. Taskcluster jobs already set an expiry time internally, it should be trivial to pass this to treeherder, store it as part of the job model, then make cycle data expire all jobs who have expired (in addition to ones that were submitted more than 4 months ago, which we do currently).

Ed Morley [:emorley]

Comment 5

•

7 years ago

I can't decide whether we should do that or not.

Even if the tasks/logs have expired, there's still potentially useful in Treeherder that people can use. Also if some jobs disappear (due to be deleted) and others remain (eg buildbot jobs), might that be confusing?

Open to thoughts! :-)

Brian Stack [:bstack]

Comment 6

•

7 years ago

Perhaps we should have an in-between state in treeherder where we can indicate that a task has expired and logs/retries won't be available, but the pass/fail information or whatever is stored in treeherder itself remains available?

Ed Morley [:emorley]

Updated

•

7 years ago

Component: Treeherder → Treeherder: Job Triggering & Cancellation

Ed Morley [:emorley]

Comment 7

•

6 years ago

So part of the problem here is that currently Taskcluster retriggers are performed by making a request to the Treeherder API, which then generates a pulse message, which is picked up by mozilla-taskcluster, that then attempts the retrigger. With this approach there's no easy way to provide any feedback as to whether the retrigger succeeded.

Bug 1420482 aims to make the Treeherder retrigger perform taskcluster retriggers client side, like already occurs for backfill/custom actions etc.

At that point, I'm pretty sure the error message will be returned by the API call made in the client and shown in the UI, reducing the potential for confusion.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → DUPLICATE

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Treeherder: Job Triggering & Cancellation → TreeHerder

Bugzilla

Quick Search

Warn when attempting to retrigger a taskcluster job when the task doesn't exist

Categories

(Tree Management :: Treeherder, defect)

Tracking

(Not tracked)

People

(Reporter: botond, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated