Closed Bug 1127454 Opened 9 years ago Closed 9 years ago

Replace counts of job statuses for each push with a percentage and count of in progress jobs

Categories

(Tree Management :: Treeherder, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

Attachments

(4 files)

This feature is generally not used/wanted.  It is also confusing people because the counts can be off.
Assignee: nobody → cdawson
Severity: normal → major
Priority: -- → P1
Blocks: 1125264
CCing sheriffs.

I would very much like to see us remove the counts (or at least the vast majority of them), since they don't really add much apart from visual noise to the UI.

However thinking about this, perhaps the count of running/pending jobs is still useful?

Or could this be a global count for the whole page? (Use case: "are there hundreds of pending jobs because releng infra is backlogged, that I can't see because I'm in only-unclassified 'u' mode?")
Or given that it's now possible to show running/pending + unclassified in the same view, is this redundant?

Plus once we add back the infrastructure menu, the counts across all repos will be visible too.
I do like the per-push coalesced count.
What's the use-case for it - in case there's a better way it can be solved? :-)
(Removing these counts will also have a perf win)
My use-case (to the limited extent that it can be trusted) is "have I loaded enough mozilla-inbound pushes yet?" Something, usually a "failure summary is empty," forces me to to reload the tab, and then I need to know whether the last three pushes have no more running/pending so I don't need to even add 10, or whether the last push of an added 10 is still running so I need to add another 10.
It gives a good quick indicator while in unstarred mode that lots of things got pushed in a short span of time without having to look at push timestamps.
I'd wouldn't be devastated to lose it, though.
My use case is basically the same as philor's - trying to determine when a push is completely finished running jobs. Onlyunstarred+pending+running is a useful view, but it's still pretty darn janky under regular sheriff workload.

Of course, we still have enough issues with orphaned jobs that TBH, neither the per-push numbers nor the visible view of pending/running jobs really effectively answers that question anyway. Ultimately, I end up having to use buildapi to be sure that a push is really finished running jobs.
Thank you, this is really useful :-)

So to summarise so far...

Are these counts ok to remove?
* Total jobs
* success
* failed
* busted
* exception
* retry

These TBD:
* pending
* running
* coalesced

Possible alternative alternative features to solve some use-cases:
* A "jump to last incomplete push" button that loads just as many pushes as is needed to see the last running job.
* Display a global count for the page on the toolbar, to give an idea of pending/running/coalesced.

Now for visual complexity, removing even some of them is a win.

@camd: how much of a perf win if we only remove some? (I haven't really ever looked at the counts code)
I actually do have a use for retry - when it's 5-10, that's just the normal Android crap that isn't ever worth looking at, but when it's more, that usually means something like the one it just told me about, t-w864-ix-080, is taking jobs and setting retry every 10 seconds and needs a reboot.
This is a before/after screenshot of what it would look like, even if we only removed some of the result counts - and then also fixed bug 1127500 and bug 1127479.

I think it's much cleaner.
(and excuse the awful jpeg quality from Paint)
(In reply to Phil Ringnalda (:philor) from comment #9)
> I actually do have a use for retry - when it's 5-10, that's just the normal
> Android crap that isn't ever worth looking at, but when it's more, that
> usually means something like the one it just told me about, t-w864-ix-080,
> is taking jobs and setting retry every 10 seconds and needs a reboot.

Ah yeah good point. I think perhaps another (or even better) way of handling this is if there are more than N retries for the same job consecutively, then we override "only unclassified" mode and show the retries anyway.

Alternatively this really is something that should be fixed on the releng side - or even just by the "machine view" tab that we'd talked about adding to Treeherder.

Given that the counts are wrong currently - I'm wondering if we can remove more than we would otherwise, right now, and then look to add them back (or some of the other features) in the future. Just thinking "no count" is better than "wrong count" (eg say in the case for coalesced).

One other thing we could do (from a UI simplification point of view, even though it won't help perf), is to only show the counts if they are greater than zero (eg don't shown "pending 0 running 0" on a completed push). 

We could also combine pending and running under "incomplete: 5".
(In reply to Ed Morley [:edmorley] from comment #12)
> One other thing we could do (from a UI simplification point of view, even
> though it won't help perf), is to only show the counts if they are greater
> than zero (eg don't shown "pending 0 running 0" on a completed push). 

I agree. Then any push which has no visible pending/running/retry/coalesced, is "done" for quick recognition for sheriffs.

Or potentially you could render an explicit "complete" once, in that same container if all counts are 0. ie. just for safety in case something breaks and >1 counts are all getting suppressed when they shouldn't.
(In reply to Ed Morley [:edmorley] from comment #8)
> Thank you, this is really useful :-)
> 
> So to summarise so far...
> 
> Are these counts ok to remove?
> * Total jobs
> * success
> * failed
> * busted
> * exception
> * retry
> 
> These TBD:
> * pending
> * running
> * coalesced
> 
> Possible alternative alternative features to solve some use-cases:
> * A "jump to last incomplete push" button that loads just as many pushes as
> is needed to see the last running job.
> * Display a global count for the page on the toolbar, to give an idea of
> pending/running/coalesced.
> 
> Now for visual complexity, removing even some of them is a win.
> 
> @camd: how much of a perf win if we only remove some? (I haven't really ever
> looked at the counts code)

Ed: the perf win would be minimized by not removing all counting.  We would get SOME still.  But not the bulk of the perf win.  Though if we really only want a count of the "incomplete" jobs, I could probably find an easier way to  calculate that that's less costly.  Your point of removing more now, and adding back in where people feel they really want it is a good one, I think.
Attached file ui PR 354
I'll attach a screenshot of where that PR currently is.  In it, I removed all the counts other than a % complete and count of "in progress" (pending and running together).

I haven't yet removed the logic for all the counts from the code.  This is just the UI pass so far.  But I can optimize this further if the UI looks good.
Attached image draft 2
The screenshot of draft 2 reflects a few things:

* The specific counts, as in the other screenshots are replaced by % done and count in-progress.
* when a resultset has no more in-progress jobs this line displays just "complete"
* The navbar button for collapsing all resultsets and dropdown with other options is replaced by one button to just hide/show the revisions for all resultsets.
* no more button to collapse revisions/jobs for each resultset
* buttons are gathered on the right side for each resultset
* borders around buttons removed to give a cleaner display
* removal of dead code from ``clonejobs.js``
Attachment #8558776 - Flags: review?(emorley)
Really sorry I haven't gotten to this review yet - I'm having to pop out in a bit but will take a look later this evening when I get back. Just needs a solid block of time to switch into thinking-about-UI/UX-mode :-)
Comment on attachment 8558776 [details] [review]
ui PR 354

Looking very good! Couple of tweaks and we should be there - have commented on the PR :-)
Attachment #8558776 - Flags: review?(emorley) → feedback+
Summary: Remove counts of jobs for each resultset → Replace counts of job statuses for each push with a percentage and count of in progress jobs
Blocks: 1052397
Attachment #8558776 - Flags: review+
Blocks: 1111084
The PR here fixes so many other bugs - it's awesome :-)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: