Closed Bug 1473099 Opened 5 years ago Closed 3 years ago

determine if we are getting value from triaging intermittent failures


(Testing :: General, enhancement, P3)



(Not tracked)



(Reporter: jmaher, Unassigned)


for the last 1.5 years we have been actively triaging high frequency intermittent bugs as stockwell bugs.  While this has helped us fix and disable many tests, it is a lot of manual work that many times ends up with the bug magically becoming less frequent, or getting disabled.

What is our engagement rate on stockwell bugs that are triaged?

There are a few things to consider:
* we might have engagement, but bugs end up disabled
* we needinfo the triage owner, we assume that is the right person
* we have many bugs that are owner triaged, those should be the ideal situation
* how to track when we get responses.

What I propose is we look at recent data for 4 weeks and determine the effectiveness of triaging bugs.  We stop triaging bugs but continue our work on fresh oranges and disabling 200 instances in 30 days.  After 4 weeks of no triage, we can then measure the effect triage is having.

HFIF = high frequency intermittent failure bugs (i.e. [stockwell *] whiteboard tag)
How to measure this:
1) we could look at total disabled/unknown/fixed bugs compared to total population of HFIF for a given time window
2) we could analyze bug comments for all bugs and determine response times and resolution.  We could bucket response times to needinfo by 24 hours, 72 hours, 1 week, 1+ week; likewise we could count the number of needinfo requests.  Taking this data and viewing it in a table with the corresponding resolution, we could find a pattern of engagement based on resolution of the bug.

I see option #1 as being easy to determine and track, #2 will require a bit of work, but possibly more enlightening.
:gbrown, do you have additional thoughts on this or ideas worth considering?
Flags: needinfo?(gbrown)
Before disabling tests, we have a duty to try to notify interested parties. Triage could be seen as part of that notification process. I don't think it is a big deal, but before changing the triage routine - even temporarily - I think we need to announce/discuss on dev.platform or similar.

Let's be careful that neither our baseline nor experimental measurements include irregularities like All Hands or significant holidays.

4 weeks sounds reasonable to me, but consider a longer time frame to get more data / higher confidence.

Regarding options 1 and 2, my first thought is that triage is intended to reduce response time, whereas bug resolution is also dependent on other factors, like how difficult the bug is to fix and whether people are available to work on it. On the other hand, if it turns out triage is not improving bug outcomes, for whatever reason, why are we doing it?
Flags: needinfo?(gbrown)
thanks for your insight :gbrown.  I agree that #2 would be a more meaningful measurement.  Ideally teams should be triaging bugs in their own components, but we know that many components have no teams and many components ignore intermittent-failure bugs.

Ideally I see a few outcomes:
1) we find triage is engaging and useful
2) we find triage provides no value
3) we find patterns in this that show we could simplify our time spent triaging and either push harder for self triage or adjust the frequency and criteria we use for triage
Priority: -- → P3

I would like to consider resolving this as wontfix. There are many routes to take in solving intermittents and we need to revisit this in Q2; Overall we know the existing triage isn't so effective at preventing disabling (maybe many are resolved).

A few possible solutions are:

  • find root cause and backout or disable- use robots as much as possible
  • stop tracking intermittents, determine if a failure is intermittent or regression and only show regression failures
  • have a test classification system that runs tests at different frequencies (candidate for per commit, m-c only, once/week, never)

All of these above solutions do not depend on outcome of this bug, in fact I think the work spent to get answers for this bug and then take action on it would be better spent elsewhere.


Flags: needinfo?(gbrown)

Possibly there were too many big ideas here and not enough specifics; I don't know how to proceed. wontfix is fine with me.

Flags: needinfo?(gbrown)

if we decide in the future that a renewed effort in triage is needed this work should be a prerequisite.

Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.