Open Bug 1548781 Opened 6 years ago Updated 4 years ago

Report github status as "failure" if any task in group has failed

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: mhentges, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Attached image pending-github.png

While working on Fenix, I've noticed that the commit builds and PR builds are frequently shown with the yellow circle ("still processing") yet, when I follow the links to Taskcluster, I can see that >= 1 of the tasks have already failed.

I'm guessing that this is because taskcluster-github sees that there's still pending builds, so it reports that the overall build is pending (even though we already know that there's failures).

For example:

  1. (See attachment pending-github.png)
  2. (See attachment on-taskcluster.png)

Preferred behaviour:

  • If all tasks in group are green, report build as "succeeded"
  • If all tasks in group are either green or pending, report build as "pending"
  • If any task is failed/exception'd/etc, report build as "failed"
Attached image on-taskcluster.png
Flags: needinfo?(bugzeeeeee)

Dustin, I believe it's the same problem with the binding: queueEvents.taskFailed('route.${this.context.cfg.app.statusTaskRoute}')
because they have a decision task.

To solve the immediate need, I would roll back Checks altogether (we are the only people using them anyways, and I am in a different project at the moment so no work is being done), use only Statuses everywhere and return to the old schedulerId bindings.

Flags: needinfo?(bugzeeeeee)
Depends on: 1533235

This issue might be minor enough to just wait until we more fully support checks, rather than rolling back. Is that OK with you, Mitchell?

Actually, another thing to do on Fenix's end would be to edit the decision task to the route field to the task definitions, with value statuses. It's not how it's supposed to be (editing things manually I mean), but should do for now

Yeah, I'm sure we'll be okay without rolling back.
Just to make sure I understand correctly: owlish, you're referring to each task in the group having:

route:
  - statuses
  ...

Is that correct?

Flags: needinfo?(bugzeeeeee)

Correct!

Flags: needinfo?(bugzeeeeee)

I’m also seeing this behavior in Servo (for example in https://community-tc.services.mozilla.com/tasks/groups/dBc6k0i9S_65B08EOsNXLw), and would also prefer the behavior proposed in the bug’s description.

Component: General → Services

I think checks sort of "automatically" behave this way. Maybe the answer here is to just move everyone to checks?

Checks is not viable yet for Servo, because of bug 1533235.

Homu tests PRs serially, so the time until the Status is resolved matters a lot. This bug is really hurting that cycle time. Worse, manually cancelling tasks seems to make the Status never be resolved.

When reading the code it seems it already behave as expected:

https://github.com/taskcluster/taskcluster/blob/5e6c2b635b916f495873e9b64fd07a69fb090946/services/github/src/handlers.js#L387-L389

Then I read comment 2 again and realize this is what happens: tc-gh is not getting task-specific messages for tasks created by Servo’s decision task. Owlish, are there downsides to the work-around you suggested in comment 4?

Flags: needinfo?(bugzeeeeee)

Then I read comment 2 again and realize this is what happens: tc-gh is not getting task-specific messages for tasks created by Servo’s decision task. Owlish, are there downsides to the work-around you suggested in comment 4?

You have to manually edit the files..... and also when the work on checks is resumed and any changes made, you'd have to manually edit them again (likely)..... Like, your decision task code gets coupled a bit with our checks implementation.... Other than that, I don't think I see any other downsides.

Flags: needinfo?(bugzeeeeee)

Note that this issue is about Statuses, but yeah the work-around is the same as with Checks.

The situation with Checks + decision tasks is tracked in bug 1533235. It’s is arguably worse since we don’t get the benefit of taskGroupResolved messages: other tasks are ignored entirely.

when the work on checks is resumed and any changes made,

When the time comes, could you please comment in bug 1533235 about those changes?

Blocks: github-bugs
No longer depends on: 1533235
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: