Closed
Bug 1285322
Opened 9 years ago
Closed 6 years ago
Implement auto-classification of "downstream" alerts
Categories
(Tree Management :: Perfherder, enhancement, P3)
Tree Management
Perfherder
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: wlach, Unassigned)
References
Details
Attachments
(1 file)
A frequent situation in perfherder is where a regression gets merged to another integration branch and we generate a new "alert" based on that merge commit. To trace these alerts back to their root cause, we have a status called "downstream" that can be assigned to these alerts. Right now we rely on human beings to mark up alerts with this status, but given that they have a very regular pattern (% change similar to previously filed alert, resultset of commit corresponds to a merge) we should be able to automatically detect this situation.
Here's an example of an alert summary with two alerts marked as downstream:
https://treeherder.mozilla.org/perf.html#/alerts?id=1701
Let's start by seeing if we can write a script which can automatically detect this type of situation. I originally thought you might need to download the performance data to be sure, but on further reflection I think the alerts themselves might have sufficient information in them (when correlated with result set info)
We should be able to grab all the alerts (summaries) with a query like this:
[mozilla-inbound]
https://treeherder.mozilla.org/api/performance/alertsummary/?alerts__series_signature__signature_hash=5ec2754e398f6f440316bd82ff738cb21ba9ff70&repository=2
[fx-team]
https://treeherder.mozilla.org/api/performance/alertsummary/?alerts__series_signature__signature_hash=5ec2754e398f6f440316bd82ff738cb21ba9ff70&repository=14
You can get resultset information (to determine whether something is a merge commit) by using a query like this (note that alerts/summaries may correspond to a range of result sets so you may need to call this multiple times):
https://treeherder.mozilla.org/api/project/mozilla-inbound/resultset/33974/
Let me know if this is enough to get started, or if you need more info!
Comment 1•9 years ago
|
||
you could use the existing data of annotated alerts to verify that this script is accurate (or find errors in our manual process)!
There are a few advanced tricks once this original description is met:
1) don't assume the revision range specified is accurate, in fact many times we should add +-1 push into the range of revisions
2) many times we don't see alerts on all branches
3) pgo vs opt, these are different signatures and different magnitudes
:wlach, :jmaher
I have a very crude script going right now, and now I have some questions I want to clarify!
1. Are we limiting the number of revisions to display on the API?
Downstream: https://treeherder.mozilla.org/perf.html#/alerts?id=1904
Downstream AlertSummary: https://treeherder.mozilla.org/api/performance/alertsummary/1904/
Downstream resultset: https://treeherder.mozilla.org/api/project/mozilla-inbound/resultset/?count=2&full=true&id__in=34470,34471&offset=0
Upstream: https://treeherder.mozilla.org/perf.html#/alerts?id=1884
Upstream AlertSummary: https://treeherder.mozilla.org/api/performance/alertsummary/1884/
Upstream resultset: https://treeherder.mozilla.org/api/project/autoland/resultset/?count=2&full=true&id__in=604,605&offset=0
The commit that could be responsible for the Alerts are 803d0028289a74df1be4220a61dd88802a3563a9, 328df07cbfc206d3be093e422f06a6b21c1f1c53, a86bfdfc575dd02b02d78f0f6f1ebbcfe45ea6f3.
When I call the upstream resultset, I don't see these commits. I have tried to expand the range by +-1, to no avail :(
2. How can I tell if the revision range specified is accurate? What are the circumstances when I should expand the search?
Thanks for all the help!
Comment 3•9 years ago
|
||
pretty much all downstream regressions will have some type of large commit range as the root cause since we normally merge 39-150 commits at a time. I am not sure if the api is limiting the revisions or not, maybe wlach could help uncover that.
automatic alerts are a suggestion, I find in most cases they are off by +-2 (-2 for noise, +2 for not running every job on every commit). There are some exceptions where it is greater than that. But if you stick to the rule of the suspect range +- 2 more pushes (a push can be >1 revision), then most will be successfully identified.
A few caveats here:
* pgo - this is a periodic build we do and regressions always have a range (and we might not build pgo when we merge, so it could be a few pushes later)- possibly be ok with missing a lot of pgo alerts
* landing/backouts, or >1 regression in a small window- often we will have >1 change on a test in a small window, then that will get merged and we will see downstream effects. Here our algorithm might mark the wrong root cause as the original, I think that is ok.
| Reporter | ||
Comment 4•9 years ago
|
||
Hmm, I think treeherder might limit the # of revisions ingested. You might need to refer to the json-pushes API to get a complete set of revisions in a push. For the above example you could try something like this for getting the changesets associated with the m-i merge:
https://hg.mozilla.org//integration/mozilla-inbound/json-pushes/?full=1&version=2&changeset=711963e8daa312ae06409f8ab5c06612cb0b8f7b
Comment 5•9 years ago
|
||
:wlach, :jmaher Here is the result of the latest classification!
https://pastebin.mozilla.org/8885792
``Narrowed upstream`` is a dictionary of alert id and the potential upstream alertsummary ids
| Reporter | ||
Comment 7•9 years ago
|
||
Hey Roy, do you think you would have time to pick this up again soon? Otherwise we might pick this up where you left off. :)
Flags: needinfo?(crosscent)
(In reply to William Lachance (:wlach) from comment #7)
> Hey Roy, do you think you would have time to pick this up again soon?
> Otherwise we might pick this up where you left off. :)
Hey Will! Sorry for putting it off for this long. I've pushed the newest commit. I am trying to make the database dump you gave me work with the new system, so I can test with actual result. It has testings done for most of the functions. I will find you on IRC on Monday.
Flags: needinfo?(crosscent)
Comment 9•8 years ago
|
||
Hi Roy! Do you know if you'll be returning to this soon? :-)
Flags: needinfo?(crosscent)
| Reporter | ||
Comment 10•8 years ago
|
||
FWIW I think this would be worth picking up even if Roy isn't able to work on it, as it has the potential to save a lot of time/effort with performance sheriffing.
Updated•8 years ago
|
Assignee: crosscent → nobody
Updated•8 years ago
|
Priority: -- → P3
Updated•8 years ago
|
Flags: needinfo?(crosscent)
Comment 11•6 years ago
|
||
mozilla-inbound isn't running any more perf tests.
If there's any need to automated downstreams, that would involve the mozilla-beta tests. But that has its own peculiarities. I don't think the effort there would justify the benefits.
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
Type: defect → enhancement
You need to log in
before you can comment on or make changes to this bug.
Description
•