Open Bug 1066272 Opened 10 years ago Updated 3 years ago

Display TaskCluster jobs that are expected to run but haven't yet been queued ("unscheduled" jobs)

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: lmandel, Unassigned)

References

Details

As a release manager, I want to see what has completed and what is still outstanding in the build. tbpl lists the completed work but continues to add new outstanding work as the build progresses. I would highly prefer to see the entire list of work (all builds/tests) up front when the build kicks off.
This isn't something treeherder can easily solve without things changing in buildbot first. In the past there was some discussion about wanting to have a graph of all jobs and how they chain up, checked into mozilla-central, which is the basis for job scheduling. I don't know if there is a bug filed for it, or if it's something that is still on releng's roadmap. Chris, do you know if there was a bug filed for that?
Flags: needinfo?(catlee)
We recently discussed this again in one of our planning meetings. I'm hoping this is a project we can get started on soon. I filed this as bug 1067592.
Flags: needinfo?(catlee)
That's great, thank you :-)
Depends on: 1067592
Priority: -- → P5
Summary: Display all builds/tests when the build starts → Add way to display outstanding jobs for a push even before they are scheduled
There is a subtle difference for this bug. Using the term unscheduled needs to be clarified. In the Buildbot implementation we have, we have a sendchange that will schedule an undetermined number of test jobs. Even though, those jobs are not in the database, we're expecting them to be scheduled. With TaskCluster, this is not a problem since the expectation of what will be scheduled is determine from the beginning. We could make this bug focus on "give Treeherder the ability to show the *dependent* jobs we're expecting to schedule". I want to make this clarification since Treeherder now gives you the button "Add new jobs" which will show you what has not been scheduled *and* we knew it was not going to be scheduled (e.g. platforms missed in try syntax). There is some subtleness in this last statement but I won't explain all of them as they're tangential to my main point. In short: * give TH the ability to show jobs that are marked to be run in the task graph (should this be "expected jobs" state?) * do not invest on making this work for Buildbot and instead move to in-tree scheduling through TC/BBB (which would give us the ability for Buildbot jobs since we're fixing it for TC in the point above) * file a separate bug to make truly "unscheduled jobs" as another state (jobs that were not going to run)
Summary: Add way to display outstanding jobs for a push even before they are scheduled → Add way to display jobs expected to be run (different than unscheduled jobs)
> Using the term unscheduled needs to be clarified. Interestingly, in taskcluster, jobs that the decision task created but that are waiting for dependencies to be finished are "unscheduled", making the summary still confusing, because I was actually about to file the same bug as "Treeherder should display unscheduled jobs". This is increasingly becoming a problem, because when we now land toolchain changes, mostly only the toolchains show up on treeherder until they're finished and the normal firefox builds can fire up. This is doubly confusing with the percentage of completeness that treeherder displays. And bug 1383880 will make things worse, because it will be even harder to figure out if jobs are not showing up because they're never going to happen or because they're eventually going to happen. Could this be reprioritized?
Does Taskcluster send pulse messages for those "unscheduled" jobs at the moment? If not, that's a necessary first step before we'd be able to display them.
Jonas, do you know the answer to comment 7?
Flags: needinfo?(jopsen)
When a task is defined a message is published to: exchange/taskcluster-queue/v1/task-defined source: https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/events Note: There are two ways to get this data: A) Tweak taskcluster-treeherder to also send messages to treeherder when tasks are defined. I think this as accidentally left out, it was in the original design proposal, and the ingestion schema did at one point define a state for unscheduled tasks. I assume it was removed from tc-th because at the time of implementation TH probably failed to ignore such messages and, hence, they created problems (just guessing). B) Delete taskcluster-treeherder an ingest messages from taskcluster directly in treeherder. Since taskcluster-treeherder is effectively just a rewriting messages specifically for treeherder, and TC is probably the primary source of input for TH these days. Given that TC exchanges are stable and all messages are verified against documented schemas, this wouldn't create tons of bugs.
Flags: needinfo?(jopsen)
Depends on: 1395254
Morphing summary to be about TaskCluster specifically -- this isn't something we can fix for buildbot, and for people submitting via our REST API (eg AWFY, autophone), we can possibly handle this later but in another bug. (In reply to Jonas Finnemann Jensen (:jonasfj) from comment #9) > Note: There are two ways to get this data: ... > B) Delete taskcluster-treeherder an ingest messages from taskcluster directly in treeherder. We should do this - it would be great to remove another layer of abstraction. Filed bug 1395254. That will need to be fixed before progress can be made here. Though we only have ~2 people working on Treeherder at the moment (including myself), so might be a while.
Component: Treeherder → Treeherder: Data Ingestion
Priority: P5 → P3
Summary: Add way to display jobs expected to be run (different than unscheduled jobs) → Display TaskCluster jobs that are expected to run but haven't yet been queued ("unscheduled" jobs)
<nalexander> It's very hard to interpret whatever TC + TH are doing with dependent toolchain tasks :( I guess this is cause TH doesn't show pending tasks, which I think will need to change as we get more and more complex dependency graphs out of the TC graph mechanism. <nalexander> As an example, try to figure out what the jobs shown in https://treeherder.mozilla.org/#/jobs?repo=try&revision=d96ee5e6c403e048e1fccc81ac1ca3a2179e1c41&selectedJob=134963456 have to do with the jobs requested by the try build. <nalexander> I asked for a build-android-gradle job, which exists in TC (I think?). <nalexander> but its dependent job chain is not complete, so it's not scheduled yet, and that means it doesn't appear in TH. I don't honestly know which side of the issue is responsible, but I think we need to improve it. It's very hard to interpret the try pushes. I've added this bug to the shortlist of things being considered for Q4. Thank you for the example use-case for where this bug would improve the workflow :-)
Priority: P3 → P2
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #9) > When a task is defined a message is published to: > exchange/taskcluster-queue/v1/task-defined > source: > https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/ > events > > > Note: There are two ways to get this data: > A) Tweak taskcluster-treeherder to also send messages to treeherder when > tasks are defined. > I think this as accidentally left out, it was in the original design > proposal, and the ingestion schema did at > one point define a state for unscheduled tasks. I assume it was removed > from tc-th because at the time of > implementation TH probably failed to ignore such messages and, hence, > they created problems (just guessing). The schema still has unscheduled as a status, it was intentionally left out because of the confusion it was having with showing unscheduled tasks in the TH UI. At the time, it was not important and was consistent with the behavior before pulse ingestion was introduced. This was indeed left out because it created a lot of confusion in the UI about why these things were sitting out there > B) Delete taskcluster-treeherder an ingest messages from taskcluster > directly in treeherder. > Since taskcluster-treeherder is effectively just a rewriting messages > specifically for treeherder, > and TC is probably the primary source of input for TH these days. > Given that TC exchanges are stable and all messages are verified against > documented schemas, > this wouldn't create tons of bugs. This is in the works as far as I understand it, but if this is causing too much confusion, and we're still a ways out from this being in TH, we could certainly have taskcluster-treeherder send the messages as long as TH did something to filter them out like we do for superseded. This should be done either way probably.
See Also: → 1062827
Priority: P2 → P3

:sclements, is this something we can consider adding to our P2 list?

Flags: needinfo?(sclements)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13)

:sclements, is this something we can consider adding to our P2 list?

Do you mean prioritize it for this quarter or next? I'll need to look into what this will entail.

Has any information changed from the last comments from 3 years ago?

Flags: needinfo?(sclements)

One major difference is we don't use buildbot at all anymore. It's all in taskcluster.

Another important fact is that this was not possible until last summer when we moved the treeherder-taskcluster service into Treeherder.
This means we were not able to subscribe to the exchange for it (without having to modify that service first).

Here's some docs about unscheduled tasks:
https://docs.taskcluster.net/docs/reference/platform/queue/task-life-cycle#unscheduled-tasks

Anyone know if the exchange task-defined is unscheduled tasks?
https://docs.taskcluster.net/docs/reference/platform/queue/exchanges

Bug 1653050 made things significantly worse.

(In reply to Mike Hommey [:glandium] from comment #17)

Bug 1653050 made things significantly worse.

Example:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=Q9wOrHkVSne9yXHUXSkojg.1&revision=f6c322cd0e96fbf9436edfa67d473027aa4cb321
This push was supposed to trigger builds, but looks like it didn't. In fact, it did. but they depend on a task that has failed... in another push. Here it's obvious, but in many cases, if you don't pay close attention, it's easy to miss such things happening.

You need to log in before you can comment on or make changes to this bug.