Closed Bug 1213520 Opened 7 years ago Closed 5 years ago

Cancelling Try push marked all builds as cancelled, actually cancelled none and ran tests

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jesup, Unassigned)

Details

I cancelled a Try build (early on - <20 min), and it failed to cancel.

It shows them pink, but they finished anyways and started and ran all their tests.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bc6d86fa493

I cancelled the builds no more than 20-25 minutes in.  I'm pretty sure (very sure) they weren't really cancelled; philor commented it might be due to the steps between treeherder and actually cancelling something got lost

Per KWierso, see bug 1168148 (implement the pulse listener stuff) and bug 1212967 (pulse listener has been having problems lately)
According to https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/9bc6d86fa493, the Mac builds were cancelled, at 21:44:22, 20-60 minutes before the Linux builds finished. So it's not the case that Pulse was down, or buildapi was broken, or whatever sits between them was down, it's just the case that treeherder's cancellation is now "okay, I'll open the window and shout out it 'cancel these builds!' and then close the window and mark them as having been cancelled."
"the cancel all" API is not managed by pulse_actions. This is why bug 1168148 is not closed.
bug 1212967 is not relevant in the same manner.

I'm completely puzzled as to why only two of the few builds got cancelled instead of all.

jesup: did you see an LDAP prompt? buildapi does not let me see beyond 100 requests [1]

[1] https://secure.pub.build.mozilla.org/buildapi/self-serve/jobs
I don't remember if I saw an LDAP prompt on that cancel, but I was logged in and had been doing a series of trys and retriggers and such.  I'm sure I saw the "do you want to" prompt, which it does before a cancel.

I note it's occasionally annoying that I have to log in twice with treeherder to do things.
Hi Ed,
Job cancelling is not working properly.

I cancelled Linux buildbot jobs in here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=30eb0ab890a2
https://treeherder.mozilla.org/#/jobs?repo=try&revision=be8e9ae01fba
However test jobs were scheduled.

We're being told that jobs are being cancelled but the call to BuildAPI is not going through.

This is not super urgent but developers are trying to save resources and it is not working.
I don't know if this issue happens on other repos as well.
Flags: needinfo?(emorley)
The buildapi request happens in the client, direct to buildapi.

I just requested two retriggers on https://treeherder.mozilla.org/#/jobs?repo=try&revision=30eb0ab890a2 and then once they appeared, cancelled both - and it worked.

If this occurs again, can you capture the request and response between the UI and buildapi please.

At a guess I would say there was some issue with the requests to buildapi (either timeouts making them on your connection) or with buildapi actioning them.

(In reply to Randell Jesup [:jesup] from comment #3)
> I note it's occasionally annoying that I have to log in twice with
> treeherder to do things.

This is due to the way buildapi (the buildbot side) works. With bug 1032163 and bug 1168148 this would go away.
Flags: needinfo?(emorley)
Note bug 1163802 would likely make for a clearer UX - since at the moment jobs are marked pre-emptively as cancelled in the UI even if buildapi didn't cancel them.
Component: Treeherder → Treeherder: Job Triggering & Cancellation
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
Component: Treeherder: Job Triggering & Cancellation → TreeHerder
You need to log in before you can comment on or make changes to this bug.