Closed Bug 1174240 Opened 9 years ago Closed 9 years ago

Jobs stuck as pending on ash a87d5c89f249

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1059909

People

(Reporter: armenzg, Unassigned)

Details

Did you cancel pending jobs at all? If so, via self-server directly, treeherder's "cancel all", or ...?

(In reply to Armen Zambrano G. (:armenzg - Toronto) from comment #0)
> Could we check buildapi's complete jobs api and clear it?

We don't use that API at all - and I don't think we should start consuming yet another legacy API. This is likely a dupe of some of the already filed bugs around cancellation.

tl;dr buildbot not putting cancelled _pending_ jobs in builds-4hr is really unhelpful for Treeherder.
Flags: needinfo?(armenzg)
Component: Treeherder → Treeherder: Data Ingestion
Priority: -- → P2
Summary: Some jobs show as pending → Jobs stuck as pending on ash a87d5c89f249
I did not cancel any of those jobs.
Some of them seem to have been requested back in May.

What is the issue? People canceling jobs from within buildapi without going through th?
Flags: needinfo?(armenzg)
and what is the root buildbot filed bug so I can feed my curiosity?
If they were not cancelled, I'm not sure what happened (other than buildbot DB issues, or the scheduler DB being manually pruned).

The complications are due to this...

For running jobs:
If the job is cancelled, it stops showing in builds-running.js and eventually appears in builds-4hr.js as completed but 'usercancel'.

For pending jobs:
If the job is cancelled, it stops showing in builds-pending.js but never appears in builds-4hr.js.

Since Treeherder actually tracks pending/running jobs (unlike TBPL which just pretended to, by overlaying them on the pushes on the client only), if we don't see the job appear in builds-4hr.js it will be stuck as pending forever.

It's my understanding that this is a limitation of how the buildbot scheduler DB works (or has been implemented at Mozilla). There is no bug filed afaik, since I seem to remember being told this wasn't easily fixable.

We currently try to manage this by actively marking the job as cancelled in Treeherder's DB, when someone uses the cancel button. 

However:
1) Using the cancel-all button (vs cancel one at a time) is broken iirc
2) People can circumvent this by going straight to self-serve (or say if jobs are manually pruned from the scheduler DB)

Now bug 1059909 is filed for improving our handling of this - but it's a case of finding time to fix it. It's also pretty wearying since it's "yet another annoying issue we're having to work around due to the awfulness that is builds-{pending,running,4hr}" and also once we start using buildbot less and less this problem will go away on it's own - so it's hard to justify spending too much time on it really.
I think we could dupe this against bug 1059909.

I will continue chatting there.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.