Closed Bug 1150152 Opened 10 years ago Closed 10 years ago

Generate and persist performance series alerts in perfherder

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

References

Details

Attachments

(3 files, 3 obsolete files)

perfalerts.csv 10 years ago William Lachance (:wlach) 236.89 KB, text/csv		Details
alerts from alert manager 10 years ago Joel Maher ( :jmaher ) (UTC -8) 48.89 KB, text/csv		Details
perfalerts.csv 10 years ago William Lachance (:wlach) 247.21 KB, text/csv		Details
alerts-jul14-2015.csv 10 years ago William Lachance (:wlach) 226.33 KB, text/csv		Details
alerts-jul17.csv 10 years ago William Lachance (:wlach) 218.65 KB, text/csv		Details
PR for persistence 10 years ago William Lachance (:wlach) 1.77 KB, text/plain	jmaher : feedback+	Details

William Lachance (:wlach)

Assignee

Description

•

10 years ago

We should generate summary alerts in treeherder/perfherder. This will allow us to tightly integrate the alerts with various views of the performance data (annotated graphs, etc.) and paves the way towards eventually integrating the alertmanager interface inside treeherder/perfherder itself. My current plan is to more or less reuse the current graphserver algorithm. For a first iteration, I would be happy providing some kind of JSON feed of alerts, that can be compared against what's currently being produced by graphserver. That will cover the scope of this bug. After that's finished, we can file followups to reach full feature parity with graphserver alerts.

William Lachance (:wlach)

Assignee

Updated

•

10 years ago

Depends on: 1174877

Treeherder GitHub Bugbot

Comment 1

•

10 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/8baa73b9b4dfa679056516bdfb6b8dabcbd97ed1 Bug 1150152 - Management command for generating perf regression lists in CSV This is an interim test script to allow us to prove out that we're still generating the same performance regressions as graphserver (roughly). I suspect it'll still be useful once we have proper alerts inserted into the database, however.

William Lachance (:wlach)

Assignee

Comment 2

•

10 years ago

I just pushed a testing script that allows us to generate a list of alerts in CSV format. I didn't bother asking for review as it doesn't change any production functionality (and wanted to be able to use it in production to produce testing data for :jmaher asap)

William Lachance (:wlach)

Assignee

Comment 3

•

10 years ago

Attached file perfalerts.csv (obsolete) — Details

Hey Joel, here's the set of summary alerts that are detected by the graphserver code that I imported into the tree. I believe it is using the same settings as graphserver. It's a CSV file, header information on the fields is on the top. Should be easy to import into a spreadsheet or whatever.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

10 years ago

Attached file alerts from alert manager — Details

these are the alerts from alert manager. I know if I am on us holiday and a weekend, I might not get to this before next Monday. Of course this could wait until then. Ideally we need to match the fields up as much as possible.

Flags: needinfo?(jmaher)

William Lachance (:wlach)

Assignee

Comment 5

•

10 years ago

Attached file perfalerts.csv (obsolete) — Details

perfalerts with platform info!! LOL

Attachment #8627810 - Attachment is obsolete: true

Treeherder GitHub Bugbot

Comment 6

•

10 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/2fdcc14e3765bd741d951a83f9798596802a5a5d Bug 1150152 - Include platform info in generated alert CSV

Joel Maher ( :jmaher ) (UTC -8)

Comment 7

•

10 years ago

a couple things: 1) how to deal with reverse tests? what is a + or a - in the percent field? I am not sure the best way to query this. I am thinking something like this: for type in [regression, improvement]: for branch in branches: for revision in branch[revisions]: phAlerts = phQueryAlerts(type, branch, revision) gsAlerts = gsQueryAlerts(type, branch, revision) compareAlerts(phAlerts, gsAlerts) compareAlerts(phAlerts, gsAlerts): for phTest in phAlerts: for gsTest in gsAlerts: if len(gsTest) != len(phTest): print '-- gsTest != phTest' else: print '++ gsTest' essentially comparing on the revision axis vs per test, etc.

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

10 years ago

waiting for perfherder data to be less duplicated- next week lets regenerate our data and try again.

William Lachance (:wlach)

Assignee

Comment 9

•

10 years ago

Attached file alerts-jul14-2015.csv (obsolete) — Details

Alerts from July 14 with de-duplicated data

Attachment #8628949 - Attachment is obsolete: true

William Lachance (:wlach)

Assignee

Comment 10

•

10 years ago

Attached file alerts-jul17.csv — Details

There was yet another case where we could have duplicated datapoints due to the result entries being in a different order (see the followups in bug 1182282). Here's a regenerated set with that fixed.

Attachment #8633530 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Comment 11

•

10 years ago

looking over this we have: * half of the alerts are 100% match with graph server * 30% are just slightly off (in % or revisions) * 20% are duplicates handled differently or tests which we need to work on some next steps: 1) verify test by test that we have valid data (look at the summary data). For example pageloader style tests where we calculate the geomean are not 100% identical- lets understand it 2) fix v8, kraken, canvasmark to use a proper summarization of the subtests 3) resolve xperf and related metrics 4) look at counters- verify we can generate alerts for those I think once we have done these 4 things, we can mark this bug as done and assume that our data and alert generation code is in parity with graph server.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1184966

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1184968

Joel Maher ( :jmaher ) (UTC -8)

Comment 12

•

10 years ago

for the xperf data and counters, we will need to do a small database refactor. This is on the wlach queue for next week. The main work here is to define what a subtest is vs a summary. Currently we only alert on summaries, but counters would either need to have a summary or we need to redefine the data to be alertable.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1186569

William Lachance (:wlach)

Assignee

Updated

•

10 years ago

Blocks: 1184966

No longer depends on: 1184966

William Lachance (:wlach)

Assignee

Updated

•

10 years ago

No longer blocks: 1184966

William Lachance (:wlach)

Assignee

Updated

•

10 years ago

Summary: Generate series summary alerts in perfherder → Generate and persist performance series alerts in perfherder

William Lachance (:wlach)

Assignee

Updated

•

10 years ago

Blocks: 1201154

William Lachance (:wlach)

Assignee

Comment 13

•

10 years ago

Attached file PR for persistence — Details

Attachment #8658918 - Flags: feedback?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

10 years ago

Comment on attachment 8658918 [details] PR for persistence I would like to know if the amount_pct is +-? Also how do we know if it is reverse or not (e.g. improvement or regression?) does result_set_id give us the ability to query the revision, branch, platform, buildtype? lastly will counters fall into the same bucket with suite_name: "tp5o_main_RSS" or something like that? overall this is a great starting point!

Attachment #8658918 - Flags: feedback?(jmaher) → feedback+

William Lachance (:wlach)

Assignee

Comment 15

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #14) > Comment on attachment 8658918 [details] > PR for persistence > > I would like to know if the amount_pct is +-? Also how do we know if it is > reverse or not (e.g. improvement or regression?) Its value is taken directly from the alerts code. We currently don't keep any metadata on whether higher or lower values are better per performance series, so you need to do client side detection of that. We should fix this, see bug 1158233. > does result_set_id give us the ability to query the revision, branch, > platform, buildtype? Revision, yes. Branch is implied by using the project endpoint. For platform and buildtype, you'd currently need to look that up via the appropriate endpoints (performance signature and option collection) We could potentially put this information directly in the alerts, at the cost of a bit of extra bandwidth and CPU on the server side. > lastly will counters fall into the same bucket with suite_name: > "tp5o_main_RSS" or something like that? Yeah, we'll just whatever suite they belong to in the alert.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1204633

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

10 years ago

what are the next steps here?

William Lachance (:wlach)

Assignee

Comment 17

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #16) > what are the next steps here? There are two ways forward that have been proposed: 1. Try to import this data into alert manager. 2. Try to build a new UI around this API and use it side-by-side with alert manager until we're satisfied with it. I'm personally kind of inclined to just do (2) right away. I'm not sure if there'd be a lot of value in putting perfherder alerts into alertmanager.

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

10 years ago

if we have an API then 1 or 2 will be fine- I would be happy to help contribute to a simple UI in perfherder as well.

William Lachance (:wlach)

Assignee

Comment 19

•

10 years ago

This part is done, more or less. Just refining the sheriffing workflow now.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.