Closed Bug 1035979 Opened 10 years ago Closed 9 years ago

dzAlerts - Too many Alerts!

Categories

(Datazilla Graveyard :: Metrics, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ekyle, Unassigned)

References

Details

The volume of alerts is still a bit too high for making them all actionable- I would like to propose some adjustments to the thresholds of what gets sent via email:
1) a single test (page) for a revision >15% regression
2) >1 page OR >1 platform for a revision >10% regression
3) the geometric mean of all tests (pages) in suite (i.e. tsvgr) for a revision >5% regression
How many alerts/day does it generate right now?

How many alerts/day does the graph-server-based system generate?

(would be useful to compare the numbers from both system over the same time period).

What's our definition (a number) of "reasonable numbers of alerts/day"?

Also, as kyle noted via emails, grouping alerts by revision/changeset doesn't work optimally right now due to the fact that a changeset appears as different revision on different branches, and that sometimes a change is independently detected on adjacent revisions, while it's possible that both changes are a result of a single changeset.

Does the graphserver based system does it better? Anything to learn from it?
I would say a 5%-15% regression on a test is generally an unacceptable regression and must be fixed. We currently use a threshold of 4.5%, no? So why do you say they are not actionable?
Flags: needinfo?(klahnakoski)
(In reply to Vladan Djeric (:vladan) from comment #2)
> I would say a 5%-15% regression on a test is generally an unacceptable
> regression and must be fixed. We currently use a threshold of 4.5%, no? So
> why do you say they are not actionable?

No, we're still at the ~2% threshold, because increasing the threshold would have lost us high regressions on a single test, such that the overall suite regression could make it unseen. This is when the "high resolution alerts" started to allow higher average threshold (4.5%) but still detect a single test regression which is significant, but which results in a sub 4.5% regression on average. And then we found out dzAlerts already works this way.
Vladan, we are also talking about single pages vs test suites.  I think seeing some data that shows the difference in alert volume will be useful to show how unmanageable this is.  Also keep in mind the fix rate for current talos regression is ~30%, so filing even more bugs will just be wasted efforts.

All alerts after July 1, we have 108 alerts from graph server.
* 28 regression alerts
All alerts after June 1, we have 569 alerts from graph server.
* 271 regression alerts
All alerts after May 1, we have 1444 alerts from graph server.
* 604 regression alerts
dzAlerts was still playing catchup yesterday.  Today is the only day with only fresh alerts:

Here are the alert counts for the last 24 hours:
Tp5o = 7 (threshold= 5%)
Dromaeo = 14 (threshold=10%)
All others = 8 (threshold= 5%)
For now I have increased the thresholds for dromaeo and tp5o to 15%.  Geometric mean thresholds, and other aggregates thresholds require some coding.

https://github.com/klahnakoski/datazilla-alerts/blob/dev/resources/settings/talos_settings.json#L46
Flags: needinfo?(klahnakoski)
(In reply to Joel Maher (:jmaher) from comment #4)
> Also keep in mind the fix rate for current
> talos regression is ~30%, so filing even more bugs will just be wasted
> efforts.

The resolve rate is 70% though*, so even if everything doesn't get fixed, it's good to inform developers of significant regressions so they can make a decision on what needs to be fixed + we and RalMan can then track the regressions for each release. Maybe filing bugs is the wrong mechanism though.

* http://elvis314.files.wordpress.com/2014/05/resolution_status.png

> All alerts after June 1, we have 569 alerts from graph server.
> * 271 regression alerts

This is indeed massive. How many distinct changesets are responsibly for these 271 alerts?

(In reply to Kyle Lahnakoski [:ekyle] from comment #6)
> For now I have increased the thresholds for dromaeo and tp5o to 15%. 
> Geometric mean thresholds, and other aggregates thresholds require some
> coding.

Why are these two tests causing so many regressions, are they noisy or are these real regressions? Can you link me to the graphserver graphs for these two measures?

I'm also disappointed that people are landing regressions in the tree without pushing to try first :(
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #4)
> ...
> 
> All alerts after July 1, we have 108 alerts from graph server.
> * 28 regression alerts
> All alerts after June 1, we have 569 alerts from graph server.
> * 271 regression alerts
> All alerts after May 1, we have 1444 alerts from graph server.
> * 604 regression alerts

If I interpret these numbers correctly, then during the past 70 days or so (since May 1st) we had roughly 600+ alerts a month (improvements+regressions), where the regressions are less than half of those (I'm only concluding that the rest are improvements). So maybe 270 regressions per month?


(In reply to Kyle Lahnakoski [:ekyle] from comment #5)
> dzAlerts was still playing catchup yesterday.  Today is the only day with
> only fresh alerts:
> 
> Here are the alert counts for the last 24 hours:
> Tp5o = 7 (threshold= 5%)
> Dromaeo = 14 (threshold=10%)
> All others = 8 (threshold= 5%)

If we assume that this relatively small sample is representative, then it's about 650 regressions a month.

Considering that currently we can (can we?) handle, say, 300 regressions a month, is the goal for dzAlerts then to generate roughly this amount per month? or else, how would we know when it's not "too many" as the bug suggest is our current state?

Also, considering that according to Joel's numbers we have more improvements than regressions per month (on July we had 28 regressions and 80(!) improvements), are we chasing ghosts when we only followup on regressions while completely ignoring improvements??
(In reply to Avi Halachmi (:avih) from comment #8)
> If we assume that this relatively small sample is representative, then it's
> about 650 regressions a month.

Oops. 850.
bugs filed (most likely related to changesets):
since May 1: 51 (604 regression alerts)
since June 1: 17 (271 regression alerts)
since July 1: 2 (28 regression alerts)

Most of the reduction from regression alerts to bugs is duplication between platforms and branches.
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #10)
> bugs filed (most likely related to changesets):
> since May 1: 51 (604 regression alerts)
> ...
> 
> Most of the reduction from regression alerts to bugs is duplication between
> platforms and branches.

Seems like Joel is able to reduce the alerts by a factor of 12 by correlating a single patch to roughly 12 different alert messages.

_This_ is something which would be highly useful to do automatically.
I do that semi-automatically now in alert manager (about 4:1 is automated), it is priority 2 for me right now to work on that until I get my punch list done of new tests running.  Kyle has looked into taking similar tooling as part of dzalerts- it just takes a bit of time.
(In reply to Joel Maher (:jmaher) from comment #12)
> I do that semi-automatically now in alert manager (about 4:1 is automated),
> it is priority 2 for me right now to work on that until I get my punch list
> done of new tests running.  Kyle has looked into taking similar tooling as
> part of dzalerts- it just takes a bit of time.

If you could do it automatically, wouldn't it be better that dzAlert does it instead, and voila, x4 factor less alerts, which will surely make it not "too many"? and that's even before the further reduction which is not yet automated.

Sounds like a classic goal for dzAlerts, and one which will hit another goal of not generating "too many" alerts.
Summary: Too many Alerts! → dzAlerts - Too many Alerts!
We should make it a goal to reduce the alerts by a factor of 12 via automation- unfortunately that takes time :(

I think it is realistic to get 4:1, but reducing the pgo/non pgo will take time- and sorting out platform redundancy will take time as well.

Really we need to be adjusting our algorithms for detection to be generating alerts that are as actionable as possible.  All duplication between platforms/branches should be automatically filtered (bundled up).

Can we assert that by looking at raw pages we should only have twice the volume of alerts that we would normally see from graph server?  Of course we don't want to miss anything, but we do need to be realistic as well.
dzAlerts is a dead project
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.