make it easier to see the full list of test-unexpected-fail messages for failures

RESOLVED FIXED

Status

Tree Management
Intermittent Failures View
RESOLVED FIXED
a year ago
a month ago

People

(Reporter: jmaher, Assigned: sclements)

Tracking

Details

Attachments

(1 attachment)

currently we summarize orangefactor data in bugs, so we click through to see more data in orangefactor.  Once there we have to click through to individual log files, there we can see what is really going on.

I would like to make this less effort while investigating bugs, either something inside of orangefactor, or a tool hooked into mach.
Do you mean including example log failure lines in the bugzilla comment, or making that information visible in the OrangeFactor UI without having to click through?

At whatever point OrangeFactor v2 is built on/into Treeherder these would be much easier to implement, however in the meantime I'd suggest either:
a) the OrangeFactor UI could use the job_id property in each ES record to fetch the error summary from Treeherder (bonus: the job_id exists for existing data many months back)
b) the payload sent to Elasticsearch by Treeherder could include the error summary (albeit which lines? first line, all the lines? at most 5 lines?) Bonus: fewer requests to Treeherder and faster when loading the OrangeFactor UI.

For example on:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1335801&startday=2017-02-08&endday=2017-02-15&tree=trunk

There's:
8 Feb 2017, 12:10
mozilla-inbound
c5b88e4e70f48955661dc7900b43a95c3c785836
OS X 10.10
opt
mochitest-e10s-browser-chrome-5
t-yosemite-r7-0308
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=75476137

...ie has a job_id of 75476137.

For approach (a), OrangeFactor would fetch the error summary using either of:
https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/75476137/text_log_errors/
https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/75476137/text_log_steps/

For approach (b), the payload sent to ES would be adjusted here:
https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/classification_mirroring.py#L36-L54
...and then exposed in the OrangeFactor API here:
https://hg.mozilla.org/automation/orangefactor/file/8c0c1bf00ac0/server/handlers.py#l363

For both (a) and (b), the results would be made use of here:
https://hg.mozilla.org/automation/orangefactor/file/8c0c1bf00ac0/html/scripts/woo.bugs.js#l237
What I meant, in https://groups.google.com/d/msg/mozilla.dev.platform/8mXiVo8kwE4/PHhSuMbIBgAJ which triggered Joel filing this, was indeed that I'd like to see the TEST-UNEXPECTED-FAIL lines in an easier-to-find way.

This would have made it much easier to determine that while orangefactor thinks bug 1285461 has happened 54 times, it's actually happened 2-3 times, since:
 * 2 of the stars were clearly correct
 * 1 log is unavailable
 * 49 of them were mis-stars that were actually bug 1159532 (in the same file, and that bug's failure suggests the two bugs)
 * 1 was a different failure in the same file
 * 1 was a different failure in a different file.

It would have been great if I'd been able to determine that without clicking through to 54 logs.  And when the underlying data are sometimes that bad, I feel like I do, in fact, have to do so.


It's also important because the range of the TEST-UNEXPECTED-FAIL messages can help make it clear what the actual problem is.  For example, the fact that every time in bug 1159532 was exactly 8s (when the times are usually not round) was what made me realize what the problem was.

This can also make it clear if what the maintainer of the code/tests would expect to be reported as a separate bug is actually being starred by sheriffs as the same bug, something that's basically unobservable today (since starring stopped making bugzilla comments).
:gbrown, this seems to be in the same general category of your test-info work, would you be interested in hacking on this?
Flags: needinfo?(gbrown)
Sure, I'll take it. I don't have a clear vision for this, and I have some higher priorities right now....might take me a while to get around to it / don't mind if someone wants to steal it. ;)
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
I've never made any progress here.

sclements - Any interest?
Assignee: gbrown → nobody
Flags: needinfo?(sclements313)
(Assignee)

Comment 6

2 months ago
Sure, I'll look into it.
Flags: needinfo?(sclements313)
(Assignee)

Updated

2 months ago
Assignee: nobody → sclements313
Component: OrangeFactor → Intermittent Failures View
(Assignee)

Updated

2 months ago
Attachment #8983592 - Flags: review?(emorley)
Attachment #8983592 - Flags: review?(cdawson)

Updated

2 months ago
Attachment #8983592 - Flags: review?(cdawson) → review+
Comment on attachment 8983592 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/3620

Deferring this review to George, since I think he'll have some ideas as to how to tweak the Django ORM parts.
Attachment #8983592 - Flags: review?(emorley) → review?(ghickman)

Updated

2 months ago
Attachment #8983592 - Flags: review?(ghickman) → review+

Comment 9

a month ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/b1585c2ac119611943a0ba09f3cc7e9398ee92d1
Bug 1339937 - IFV show unexpected fails (#3620)

modify failuresByBug api to include test-unexpected-fail lines per job; modify bugdetails UI to include failure counts and tooltip with lines
Status: NEW → RESOLVED
Last Resolved: a month ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.