Open Bug 1794046 Opened 4 months ago Updated 1 month ago

add method to collect metrics of "new" failures and how they get annotated


(Tree Management :: Treeherder: Frontend, task, P2)


(Not tracked)


(Reporter: jmaher, Assigned: jmaher)



every time we display a "new" button for the sheriffs to indicate the failure line is not seen in the last 21 days, we would like to track this so we know the button is not misleading or missing too many things.

In addition, we should track:

  • a new button -> a new bug (includes classification::intermittent)
  • a new button -> classification (infra, intermittent, fixed by commit, expected fail)
Assignee: nobody → jmaher
Severity: -- → S4
Priority: -- → P2
Depends on: 1799785

I was focused on glean and have a working PR up that is almost all good except for data opt out/in. Given the low frequency of data points we have, having an option for opt out could skew our results too much- I am exploring other options.

right now I am thinking in the job_note table, adding a new_failure: boolean field. Then when saving a classification, I just add the new_failure data point.

There are 207K job_notes, so this would be a pretty low impact to add the field and deal with the extra data in storage and queries.

That is a simplistic view of a solution, there might be more interesting cases.

Here is what to track:
For every failing task that has a matching (matching == <error_tag> | <test_name> | <failure_message>) we would look at the data from the bug_suggestion, specifically data from the 21 day cache:

  • counter (1 == first time seeing the line, etc.)
  • new_failure_in_rev (this failure might be counter>1 but want to show that all of the same failures in the rev are new and expected)

And to answer the question- how accurate is the NEW failure notification- we could do that with:

  • newBug vs newBugNewFailure (ideally all new bugs with matching summary should have a newfailure notification or we are missing the NEW notification)
  • fixedByCommit vs fixedByCommitNewFailure (same as above, with matching summary these should be equal or we are missing NEW notifications)
  • otherNewFailure - any new failure that isn't related to the newBug or FBC will be associated with an existing intermittent bug, these would be false positives and we shouldn't be showing these.

Given ~200 new bugs/week and probably 75+ fixed_by_commits/week, I imagine that 30 days will give us enough information. I think anything >90% accurate is ok; >95% accuracy is good and we should consider opening up to try, >99% accuracy and big bonuses for all!

The advantage of adding a field in a database is for the otherNewFailure cases we can look at the and dig into what was showing. Honestly this field should be an int or we should have a second boolean, so that we can filter out failures that don't qualify (such as infra errors)

Depends on: 1805793

the github PR was just deployed this morning to production

You need to log in before you can comment on or make changes to this bug.