Closed Bug 1052397 Opened 10 years ago Closed 7 years ago

Display a rough "estimated time to completion" for each push

Categories

(Tree Management :: Treeherder, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: Sylvestre, Unassigned)

References

()

Details

We, the release team, are using TBPL/treeherder to know which changeset we should use to start a beta build.

However, both products do not give any progress and we have to guess when the whole process is completed.

The perfect solution would be:
* ETA: 2 hours, 10 minutes

But we would be happy with a basic "42% completed"

FYI, at some point, we are planning to automatically retrieve last correct changeset (cf bug 1040759)
This would also help the sheriffs, since we have to answer almost the same list of questions when deciding which changeset to merge from an integration repo to mozilla-central. That said "ETA" is not the only thing that determines which changeset to pick. We need to know:
* Has the push completed?
* If not, the ETA of the push being complete.
* Was the push successful? (Even if not completed, it might be useful to still show if there were issues so far, to save waiting for all jobs to complete - only to later spot that the push was bad anyway)
* If the push had failures, were they classified as categories that can be safely ignored (ie "intermittent" vs "fix-by-commit").
* Did the push run all of the necessary jobs? ie: a "completed all green push" is useless if half the jobs were coalesced away. Plus there are also the periodic jobs (PGO, non-unifed etc).

Given the sheer amount of information above, I'm wondering if whether we should both surface _some_ of the above in the main UI - but also create a separate "last good push" API endpoint/page that can (a) be used by automation, and (b) surface more data than we'd be comfortable surfacing in the main UI.

Having said that, the above is quite a bit of scope creep from comment 0 - and given that even for things like try pushes an ETA on it's own would be useful - perhaps we should just surface "push ETA" in the UI as a first step. 

Regarding just the ETA, generating a number is actually harder than just "take longest ETA of each individual job", since:

1) For pending jobs (jobs that are queued and waiting for a machine to take them) we currently don't have a way to know how long it will be until the job even starts running. The N jobs ahead of this one in the queue will all take varying times, and there each repo has different priorities, so if a new push is made to a higher priority repo, they jump ahead in the queue. Add to that AWS instance numbers changing constantly, the jacuzzi bottleneck etc, and it's quite variable. One way around this might just be to use "the oldest pending job of this platform/repo was scheduled X mins/hours ago and still isn't running, so the wait will be at least that long".

2) Some jobs result in other jobs being scheduled (eg builds that kick off tests). There isn't currently a graph of the jobs run (though there was talk of having such a thing in the future), so it's hard to know what is left to be scheduled.

Both #1 and #2 likely require work from releng.
Priority: P2 → P3
Treeherder is currently computing and storing a job's ETA (both pending to running and running to completed) based on the past runs of jobs with same reference data, ie repository, job type, options, etc. There is a periodic task that updates this expected value every 6h so that it can adapt to the load of our infrastructure. In my opinion it wouldn't be too hard to compute max(ETA(running) + max(ETA(completed)) from all the jobs of a result set and show it in the UI. 
Feel free to needinfo me or :jeads if you want more details about that.
That's great - that sounds like it would give us a reasonable rough figure for #1 in comment 0; the only thing is #2 - knowing what jobs are yet to even be scheduled & will follow on from the builds currently running.

I guess we could just look at all other jobs that have run in the past and calculate from that - however not all of them will necessarily be running on that push.
I think we can figure out something given that we have all the historical data; maybe we can add the time to pending (request time - push_timestamp ) to the list of ETAs we keep here ? https://github.com/mozilla/treeherder-service/blob/master/treeherder/model/sql/template_schema/project_jobs_1.sql.tmpl#L161-L198
Priority: P3 → P4
I think it's perfectly fine to start with a rough estimate based on historical data and refine that estimate as more jobs are kicked off and completed. This will certainly be useful for relman.
See Also: → 1072890
Bug 1127454 will add a progress percentage - eg that in attachment 8560184 [details].

Is ETAs something that you would still like (given that they are going to be fairly inaccurate)?
If so, I'll leave this bug open, but morph it to be about the ETAs specifically :-)
Depends on: 1127454
Flags: needinfo?(sledru)
See Also: 1072890
We were more hopping an eta in term of minutes/hours. The number of remaining jobs don't tell us much :/
Flags: needinfo?(sledru)
I was only going from:

(In reply to Sylvestre Ledru [:sylvestre] from comment #0)
> But we would be happy with a basic "42% completed"

:-)

But will morph the bug to be about ETA.
Summary: treeherder's main UI should show an ETA or a progress percentage → Display a rough "estimated time to completion" for each push
Priority: P4 → P5
Setting as incomplete as this is not something we can fix anytime soon! Ideally this is something that Taskcluster would tell us
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.