Closed Bug 1163064 Opened 9 years ago Closed 9 years ago

Add a view on tests that doesn't expose chunking to the user directly

Categories

(Tree Management :: Treeherder, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: chmanchester, Assigned: camd)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

From the standpoint of a developer, chunking is an implementation detail in many respects. Of course, to really dig into how jobs are run and figure out what's going on if something goes wrong you need to understand chunks, but to glance at the tree and see if everything's basically ok it's not necessary.

I'm imagining an optional ui that would collapse chunks within a job symbol and just display the worst status among the chunks (and maybe a light green to indicate "some chunks are done" if people want to see the logs).
Whiteboard: ateam-summer-of-contribution
Whiteboard: ateam-summer-of-contribution
Attached image chunkMock_r1
I think camd was looking parts of this over the work week, I knocked out a quick concept this evening in pShop on my windows box after I got back. The colors for busted and testfailed are just approximate.

Just wondering if this look might help identify them as failed chunk 'summaries' in a job group, per Chris's description? We could render them the same height as other regular job buttons.

We had discussed clicking on a job group possibly presenting a horizontal strip of some kind between the pinboard and the job details pane, in which the expanded job chunks would exist for examination. But maybe we could just expand them where they exist in the main job table.

I could well have missed the gist of the UX problem though :)
Attachment #8627074 - Flags: feedback?(cmanchester)
Attachment #8627074 - Flags: feedback?(cdawson)
Comment on attachment 8627074 [details]
chunkMock_r1

I like the mock up, but it's a little hard to tell how this will work without seeing more. I think we need to move towards a single colored indicator for all the chunks (with an aggregate color that's the worst of the colors in the individual chunks) and a simple way to expand to view details of failed chunks to help users connect the new UI to the information they currently get.
Attachment #8627074 - Flags: feedback?(cmanchester) → feedback+
As an additional requirement, I think we need to have a way to track the number of pending/running jobs (if any).  We could possibly use light/dark gray indicators for this that would expand to the chunk numbers that are pending/running.
(In reply to Chris Manchester [:chmanchester] from comment #2)
>
> (with an aggregate color that's the worst of the colors in the individual chunks)

I have a couple other ideas I could mock. Should we figure out an ordered ranking from best to worst? Our job types are in the notation field.
https://treeherder.allizom.org/help.html
I think I agree with comment 2.  I like the idea of just showing the worst of the errors, if there are any.  We'll have to have some kind of priority key which holds this rule.  What if no errors?  Show passing of some are passed? Or running if some are running?  Not sure which is best.  Probably the latter.
No idea about TaskCluster, which may well not actually have any statuses other than success, retry, warnings, exception, I don't think I've ever seen it do failure, but buildbot actually defines worst_status, http://mxr.mozilla.org/build/source/buildbot/master/buildbot/status/builder.py#29. That's what it will do for color when a single job has steps with more than one status, so there's no reason not to do the same for a color when mashing together multiple jobs into one mangle. When you include running and pending, that's SUCCESS > RUNNING > PENDING > SKIPPED > WARNINGS > FAILURE > EXCEPTION > RETRY > CANCELLED, it's pending until everything is running, then running until one fails or everything succeeds. The one iffy part is RETRY, which... you... that's... WTF would you do to show the fact, which does indeed happen, that one chunk has managed to cause infinite RETRY?
I think there are some statuses that we'll need to show independently from the "worst status", including pending, running, and retry.  I think we could use jfrench's idea, but possibly overlay an integer on them which indicates the number of chunks represented by that block.

In this design, mochitest might have an orange or green block (representing all the passed/failed chunks), a light gray block (for the pending chunks), a dark gray block (for the running chunks), and a blue block if any chunks have been retried.  I think this is a good compromise between visual noise and the need to get a certain level of detail at a glance without clicking through everything.
I don't mean to complicate things further, but wanted to verify what constitutes a chunk.  For instance, if Mochitest has these entries:

M( 1 2 3 JP bc1 bc2 dt1 dt2 dt3 gl oth p ) 

I know, of course that the 1,2,3 are chunks.  And bc1,bc2.  And dt1,dt2,dt3.  But are gl, oth and p also just considered chunks?

What I mean is, can we collapse all of those in the group?  Or do some need to be NOT collapsed?  Don't get me wrong: I would prefer to collapse and summarize everything within the group and expand it as requested by the user.
I think also, if we try to show counts within the group (count testfailed, busted, running, pending, etc..) then we are in danger of confusing chunks with counts.  Do we need to change all the chunk names that are just digits to have an accompanying alpha character?  "M( 1 2 3 )" becomes "M( M1 M2 M3 )" ?
I'd be okay with either counting every job within a group as one thing or only collapsing jobs whose only difference is the number at the end, whichever is easier to implement.
Assignee: nobody → cdawson
Comment on attachment 8627074 [details]
chunkMock_r1

Looks pretty cool.  I wonder how that would look with pending and running, also.  And what if we wanted to put counts in there?
Attachment #8627074 - Flags: feedback?(cdawson) → feedback+
Attached image chunkMock_r3a
Here's another mock which doesn't add any new button styling. It prefixes a > in front of any chunked job groups, and acts as a collapse/expand button.

Like rev1 this will work from an a11y standpoint in the sense it doesn't matter if you are color impaired.

The first row in the mock just shows the concept applied to a variety of jobs, failed 'B', running 'Be', successful 'Be'. The second and third rows show the collapsed and expand appearance with a retry reftest.

Perhaps we'd draw the < and > as a separate button at all times (eg. not within the failed 'B' orange btn per first row example. I think this will allow you to collapse the group versus allowing you to still click on the first expanded chunk.

We might want to consider painting a subtle pale background when the chunks are expanded, to visually unify the group. Though when selected each button would draw as it currently does, with a white background.

I would expect < and > to be the same mouse target size as a '1' or other single character job.
Simple but effective!
Blocks: 1175482
Nice, I like it! The plus/minus are more intuitive than the arrows for me.
Attached image railChunkingExample
:rail has provided a nice Funsize jobs example of the scale of chunking we should anticipate (or several magnitudes larger). Here's a screen grab for posterity and a link for the duration it exists on treeherder
https://treeherder.allizom.org/#/jobs?repo=mozilla-aurora&revision=1f4185f18537
(In reply to Jonathan French (:jfrench) from comment #16)
> Created attachment 8633488 [details]
> railChunkingExample
> 
> :rail has provided a nice Funsize jobs example of the scale of chunking we
> should anticipate (or several magnitudes larger). Here's a screen grab for
> posterity and a link for the duration it exists on treeherder
> https://treeherder.allizom.org/#/jobs?repo=mozilla-
> aurora&revision=1f4185f18537

Golly.  That's a lot of chunks...  :)
Attached file PR (obsolete) —
Please feel free to take a look at my prototype and provide feedback.  The PR has screenshots, but running the branch UI pointed to production may be more elucidating.  

Anyway, this was just the easiest first attempt.  I think with some polish, we can get there pretty soon.

To do: The n/p buttons need to expand a group that contains an unclassified failure and select it.
What does (-6- -2-) mean? Is that 6 chunks passed and 2 chunks failed? Or does that mean that chunk 2 failed? This might cause some confusion.
I noticed this bug was showing up as needing triage in the bug tables, so setting a Priority value.
Status: NEW → ASSIGNED
Priority: -- → P2
(In reply to Andrew Halberstadt [:ahal] from comment #20)
> What does (-6- -2-) mean? Is that 6 chunks passed and 2 chunks failed? Or
> does that mean that chunk 2 failed? This might cause some confusion.

It's the former.  the - delimiters denote a number of chunks, rather than the number of a specific chunk.  Do you think a different delimiter would help here?

No matter what the UI, we'll definitely need to do some education, and people will probably be a bit confused until they get used to the changes.
(In reply to Jonathan Griffin (:jgriffin) from comment #22)
> It's the former.  the - delimiters denote a number of chunks, rather than
> the number of a specific chunk.  Do you think a different delimiter would
> help here?

I guess I'm not convinced there's much value in keeping the passed and failed jobs separate vs having a single collapsed list with the worst color. But it's not a big deal, the - delimiters work.
Another option here might be to collapse the passed chunks but listed failed chunks individually.  In many cases, the failing jobs are the interesting ones and someone will end up expanding those anyway.  The passed jobs are rarely interesting.

I'm not sure what the use cases are for "I only want to know about the number of failed jobs but I don't care to expand them".
I think there are two similar but distinct goals here:

1. Make the UI useable for massive amounts of chunks.
2. Make chunking an implementation detail developers don't need to care about.

The distant utopia that Chris and I are imagining, is one where it isn't even possible to see how many chunks there are because it doesn't matter. E.g there might only be a single 'M' for all of mochitest, and clicking on it displays all the relevant errors from the different "chunks" ("chunks" in quotes because there is only one mochitest job in this fantasy land, the fact that that one job was split up across different machines is an implementation detail).
I agree, but I think for an initial implementation, dealing with goal #1 is probably sufficient.  I think there are other pieces that need to be in place on the harness side to make chunking a true implementation detail that developers don't need to care about, possibly including:  always applying run-by-dir at the directory level so people don't have to figure out which chunk a particular test was run in, and making chunking on try more dynamic and smarter so we only run chunks that make sense.
(In reply to Andrew Halberstadt [:ahal] from comment #20)
> What does (-6- -2-) mean? Is that 6 chunks passed and 2 chunks failed? Or
> does that mean that chunk 2 failed? This might cause some confusion.

Yeah, those are counts, not chunk numbers.  I share your concern about confusion there.  I've been experimenting with ways to visualize that they are counts vs. chunks.  Ideas welcome!  :)
(In reply to Jonathan Griffin (:jgriffin) from comment #24)
> Another option here might be to collapse the passed chunks but listed failed
> chunks individually.  In many cases, the failing jobs are the interesting
> ones and someone will end up expanding those anyway.  The passed jobs are
> rarely interesting.
> 
> I'm not sure what the use cases are for "I only want to know about the
> number of failed jobs but I don't care to expand them".

This is a really good point.  We could definitely do that.  And then, if the user clicks a count or the group name itself, it would expand it out so they can see everything.
(In reply to Cameron Dawson [:camd] from comment #27)
> Yeah, those are counts, not chunk numbers.  I share your concern about
> confusion there.  I've been experimenting with ways to visualize that they
> are counts vs. chunks.  Ideas welcome!  :)

In the collapsed state I wonder if we could render the job type as the first letter for its type (success, testfailed, etc), in its color, either with a trailing superscript count number - or an adjacent paren if superscript is too small.

eg. for Mochitest, 5 successes, 2 testfailed, 7 busted, using the adjacent paren approach:

M( S(5) T(2) B(7) )

And when expanded we paint the background per https://bug1163064.bugzilla.mozilla.org/attachment.cgi?id=8629681 to visually differentiate them from other non-chunked groups.

The only job types which share the first leading letter are 'running' and 'restarted' so that might need to be reconciled.
Comment on attachment 8633716 [details] [review]
PR

This is now ready for review.  I've tested it a lot.  But any additional testing very welcome.  :)
Attachment #8633716 - Attachment description: WIP PR → PR
Attachment #8633716 - Flags: review?
Attachment #8633716 - Flags: feedback?(tojonmz)
Thanks camd, I'll do some exploratory testing on it this morning and report back.
Attachment #8633716 - Flags: review? → review?(emorley)
I did about 3.5hrs of testing, we're going to isolate the relevant issues (fixed, tbd, followup bug) and I'll likely include the remaining tbd's in the PR in reviewable :)
Attachment #8633716 - Flags: feedback?(tojonmz) → feedback+
Blocks: 1190528
Blocks: 1190512
Attachment #8633716 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/87efb1dea15f83bb9ec4ee61cfc9f68fc338b9a8
Bug 1163064 - Collapse test group chunks down to counts by result and state

This branch aggregates non-failed jobs within groups and shows the counts of each result/state within.
A ``+`` precedes each count to indicate that's what it is.

A few points of interest:
* Clicking any group or count will expand or collapse the group.  This choice is
remembered as jobs are updated and filters are changed.  Page reload resets it.
* Clicking on the ``( + )`` and ``( - )`` buttons will toggle groups expanded/collapsed
globally.  This is persisted in the URL.  Clicking this button will reset any group
expand/collapse state that was set individually.
* Failed Unclassified jobs are never put into "counts".  They show as top-level
jobs and are still hit with the ``n`` and ``p`` hot keys, etc.
Blocks: 1191454, 1191487
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/2e6b7b56c669c519f9ebe5ddcf65f895dabdfa41
Revert "Bug 1163064 - Collapse test group chunks down to counts by result and state"

This reverts commit 87efb1dea15f83bb9ec4ee61cfc9f68fc338b9a8.
I added a proposed appearance for a selected job chunk inside a collapsed group, if we elect to do that.
https://github.com/mozilla/treeherder/pull/759#issuecomment-128211100
Just want to add that this looks great! I tested it today and I'm happy that there is already work going on, and I don't have to file a bug about.

One thing I noticed is that if you expand one group (maybe accidentally), you cannot easily collapse it again. You have to do a reload of the page, or click the toolbar +/- button twice. Will there be a way to collapse a specific group only?
(In reply to Henrik Skupin (:whimboo) from comment #36) 
> One thing I noticed is that if you expand one group (maybe accidentally),
> you cannot easily collapse it again. You have to do a reload of the page, or
> click the toolbar +/- button twice. Will there be a way to collapse a
> specific group only?

Cool! The job group letter acts as both an expand/collapse button on the group. eg. click on the 'B' in a B(+7 +2) to either expand it or collapse it (on top of clicking on the +n to expand). We're hoping the identical blue hover color over the job group letter will be a sufficient initial cue for users, but Cameron was looking at potentially styling the 'B' when in its expanded state, perhaps as a follow up.
Depends on: 1192375
Attached file PR take-2
Ed: adding you for feedback since you reviewed it originally.  But Jon worked with me closely in the follow-up testing.
Attachment #8633716 - Attachment is obsolete: true
Attachment #8647815 - Flags: review?(tojonmz)
Attachment #8647815 - Flags: feedback?(emorley)
Comment on attachment 8647815 [details] [review]
PR take-2

Some minor nits, but I defer to emorley on the substance of the changes :)

Apologies for not using reviewable, I started and realized after several comments. Camd is this latest revision available for testing on your gh-pages, or perhaps stage?
Attachment #8647815 - Flags: review?(tojonmz) → review+
Comment on attachment 8647815 [details] [review]
PR take-2

This is more of a rubber stamp than anything else, I'm not very familiar with this code and my review was 9 days ago, so I've paged a lot of it out by now. But looks ok at a skim read anyway :-)

You were probably going to do this anyway, but when this lands can you update the commit message of the "revert revert ..." commit to be of form "Bug NNNN - Foo" (with a mention that it's a relanding in the multi-line commit message, if you wish) :-)
Attachment #8647815 - Flags: feedback?(emorley) → feedback+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/e633a67640f4806e77d0ec158daa586621913069
Bug 1163064 - Collapse test group chunks down to counts by result and state

This branch aggregates non-failed jobs within groups and shows the counts of each result/state within.
A ``+`` precedes each count to indicate that's what it is.

A few points of interest:
* Clicking any group or count will expand or collapse the group.  This choice is
remembered as jobs are updated and filters are changed.  Page reload resets it.
* Clicking on the ``( + )`` and ``( - )`` buttons will toggle groups expanded/collapsed
globally.  This is persisted in the URL.  Clicking this button will reset any group
expand/collapse state that was set individually.
* Failed Unclassified jobs are never put into "counts".  They show as top-level
jobs and are still hit with the ``n`` and ``p`` hot keys, etc.

Also fixes:
* Bug 1191454 - classifying 3 jobs, one selected, then clicking "n" for next will go to the first
* Bug 1191487 - Re-scroll selected job into view during chunk expand/collapse

This reverts commit 2e6b7b56c669c519f9ebe5ddcf65f895dabdfa41.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 1197258
Depends on: 1199139
You need to log in before you can comment on or make changes to this bug.