Closed Bug 1142680 Opened 9 years ago Closed 9 years ago

Write a compare-talos equivalent and integrate it into PerfHerder

Categories

(Tree Management :: Perfherder, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: jmaher)

References

Details

Attachments

(5 files, 2 obsolete files)

According to Avi, compare-talos is an invaluable tool for measuring the changes between two revisions: 

https://bugzilla.mozilla.org/show_bug.cgi?id=1141252#c15

In addition to the graphs view, we should probably try to incorporate something like it into perfherder.
OS: Mac OS X → All
Hardware: x86 → All
Current production using datazilla: https://compare-talos.paas.mozilla.org/
Current code: https://hg.mozilla.org/users/mozilla_noorenberghe.ca/compare-talos/

I would also take patches to just use the TH API on the existing site.
(In reply to Matthew N. [:MattN] from comment #1)
> Current production using datazilla: https://compare-talos.paas.mozilla.org/
> Current code:
> https://hg.mozilla.org/users/mozilla_noorenberghe.ca/compare-talos/
> 
> I would also take patches to just use the TH API on the existing site.

If we integrated this into treeherder, we wouldn't have to worry about deploying or maintaining a separate project, which I think would be a win. For something that only requires front-end work (I believe this is one such thing), you can get instance of treeherder-ui running against stage/production in just a few minutes.

https://github.com/mozilla/treeherder-ui#development

I'd be very open to reusing code from the existing compare-talos where it makes sense.
Bug 1124779 deals with modifying a python script which performs a similar function to compare-talos to use treeherder/perfherder.
See Also: → 1124779
Thanks for filing this.

Imagine treeherder could display a link "compare performance between this push to the previous one" besides every treeherder push section which will open a compare-talos (or equivalent) page withe such overview.

This could trigger the comparison (and fetch of the data) when clicked - like compare-talos works now, or, treeherder could run it once and get some "summary" (TBD) of the comparison which could be rendered into a color.
Does this overlap with bug 1067273? :-)
(In reply to Ed Morley [:edmorley] from comment #5)
> Does this overlap with bug 1067273? :-)

Indeed. :)

Let's give this bug the same arbitrary priority
Priority: -- → P4
Blocks: 1150616
Attached file initial work for compare-talos (obsolete) —
this is not perfect, but it does take into account retriggers and supports most of the main functionality that compare-talos has on the front page.  The input part doesn't work, but hardcoding via the URL works great:
http://localhost:8000/app/compare.html#/comparetalos?originalProject=try&originalRevision=76b999af7321&newProject=try&newRevision=4dff56e21e1e
Attachment #8590969 - Flags: feedback?(wlachance)
Attachment #8590969 - Attachment is obsolete: true
Attachment #8590969 - Flags: feedback?(wlachance)
Attachment #8592238 - Flags: feedback?(wlachance)
Comment on attachment 8592238 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/465

Good start!
Attachment #8592238 - Flags: feedback?(wlachance) → feedback+
Comment on attachment 8592238 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/465

updated the pull request, there are a few pending this:
1) getSeriesSummary - make a core graphs library
2) in displayResults - need to use a forEach for seriesList
3) combine verifyRevision() - done, but really hacky
4) comparectrl.html has inline css style for dynamic choosing, don't know how to get this into perf.css

all other feedback should be addressed- please be as picky as you would like.  I am sure this cleaned up code could entice one to have new comments.
Attachment #8592238 - Flags: feedback+ → feedback?(wlachance)
updated my PR to address all feedback to date.

What is left todo:
1) getSeriesSummary, etc. - make a core graphs library
2) 11 TODO statements in the code, some will be resolved by #1
3) a way to enter data in the UI instead of hardcoding in the URL
4) add support/test the URL options for e10s/pgo, as well as 'show only failures'
and a few more changes thanks to the testing of :avih!  also url param support for hideMinorChanges (only shows highlighted improvements/regressions):
http://localhost:8000/app/compareperf.html#/compare?originalProject=try&originalRevision=76b999af7321&newProject=try&newRevision=59b75c0e8e97&hideMinorChanges=1

I feel as though I am hacking the angular<->html interface a bit- maybe when changes are done we could find a cleaner method for it all!
Comment on attachment 8592238 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/465

updated with all feedback except for hardcoding the headers into the html and using ng-repeat for printing the data.
Attachment #8592238 - Flags: feedback?(wlachance) → review?(wlachance)
I've just tried comparing a grunt build with/without a s/compareperf2.min.js/compareperf.min.js/ and they are identical.

Will, what issue did you have without the suffix?

Also we'll need to update the apache configs, both in repo:
https://github.com/mozilla/treeherder/blob/0492e2d8e37b8b25c1e510d213c5bc20badf7fe6/puppet/files/apache/treeherder-service.conf#L20

...and also on dev/stage/prod. 

(And yeah our apache configs suck.. we should hardcode the /api/ and /embed/ gunicorn endpoints and let the rest fall through to apache, not vice versa)
(In reply to Ed Morley [:emorley] from comment #19)
> I've just tried comparing a grunt build with/without a
> s/compareperf2.min.js/compareperf.min.js/ and they are identical.
> 
> Will, what issue did you have without the suffix?

Ah I see the filename added to the html file was wrong. The reason is that the "replacement" field under the gruntfile 'cache-busting' section is a string/partial-string for search and replace, not a complete filename.

The solution is to list compareperfjs above perfjs, so it hits the specific case first. Or alternatively update to grunt-cache-busting 0.0.11 (pinned to 0.0.10 in package.json at present), to pick up regex support so we can use start of line anchors (https://github.com/PaulTondeur/grunt-cache-busting/pull/8).
(In reply to Ed Morley [:emorley] from comment #20)
> (In reply to Ed Morley [:emorley] from comment #19)
> > I've just tried comparing a grunt build with/without a
> > s/compareperf2.min.js/compareperf.min.js/ and they are identical.
> > 
> > Will, what issue did you have without the suffix?
> 
> Ah I see the filename added to the html file was wrong. The reason is that
> the "replacement" field under the gruntfile 'cache-busting' section is a
> string/partial-string for search and replace, not a complete filename.
> 
> The solution is to list compareperfjs above perfjs, so it hits the specific
> case first. Or alternatively update to grunt-cache-busting 0.0.11 (pinned to
> 0.0.10 in package.json at present), to pick up regex support so we can use
> start of line anchors
> (https://github.com/PaulTondeur/grunt-cache-busting/pull/8).

Ok, that's ridiculous. :)

After thinking about this over the weekend, I think I'm just going to merge everything into one file (called perf.html) this week, which will just magically fix things.

Hopefully we won't be using grunt at all in the not-too-distant future.
Comment on attachment 8592238 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/465

We merged this in.
Attachment #8592238 - Flags: review?(wlachance) → review+
address remaining review issue from last week, added extra cleanups as well.
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8594913 - Flags: review?(wlachance)
updated with a real PR :)
Attachment #8594913 - Attachment is obsolete: true
Attachment #8594913 - Flags: review?(wlachance)
Attachment #8594918 - Flags: review?(wlachance)
Comment on attachment 8594918 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/478

Looks good! Just some minor comments. Please fix those up, then we can get this landed.
Attachment #8594918 - Flags: review?(wlachance) → review+
This is fairly usable now. Give it a try at:

https://treeherder.allizom.org/perf.html#/comparechooser

There are a few more minor cleanups/nits that I'd like to address tomorrow, but I'm pretty close to being ready to close this out (and file followup fixes/enhancements as extras).
Well, the sub-tests view is quite essential here, so I don't know if it should be considered extra/followup.

Regardless though, the base view looks good. There's still an issue where the change highlight (red/green) is dumb and doesn't take the noise level into account, but these are known issues which we still want to improve.
support for subtest viewing, keep in mind this is on my 'subtest' branch if you pull from my repo to test it locally.
Attachment #8596822 - Flags: feedback?(wlachance)
Comment on attachment 8596822 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/485

Looking really good. Let's fix the low-hanging fruit and get this landed!
Attachment #8596822 - Flags: feedback?(wlachance) → feedback+
Comment on attachment 8596822 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/485

PR is updated with all feedback from both cycles!
Attachment #8596822 - Flags: review?(wlachance)
Comment on attachment 8596822 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/485

Looks good, I cleaned up a few minor details (link to local copy when running for testing, some minor formatting issues in the html) and pushed. Some stuff still to work on (possibly in this bug, possibly in a followup):

1. We need some kind of "loading" indicator, as we do in the main compare view. Sometimes subtests take a while to load.
2. Showing +/-NaN (NaN%) is pretty ugly for stddev when there's no retriggers. We should probably just print something like "N/A"
3. The graphs view should link to this instead of attempting to show its own subtest summary. This is better.
4. We probably want to allow highlighting multiple revisions in the graphs view.
5. The approach here is really **** the treeherder server. We should be able both to make things more performant and avoid DDOS'ing our running instance by: (1) batching queries to get series signature information, (2) requesting smaller series intervals for recent revisions
6. PGO/e10s still not exposed in the UI as options.

That's all I can think of for now, but there's probably more...
Attachment #8596822 - Flags: review?(wlachance) → review+
Looks great!

(In reply to William Lachance (:wlach) from comment #35)
> 4. We probably want to allow highlighting multiple revisions in the graphs
> view.

Not sure I understand. Do you mean in order to compare more than 2 revisions?

> 5. The approach here is really hard on the treeherder server. We should be
> able both to make things more performant and avoid DDOS'ing our running
> instance by: (1) batching queries to get series signature information, (2)
> requesting smaller series intervals for recent revisions

What does (2) mean? would it affect the displayed values?


Few more things we mentioned on IRC, so adding here:

1. Title for the subtests page (HTML and/or in-page).

2. Link back from the subtests page to the overview page.

3. Take stddev into account when highlighting green/red (currently it highlights based on the average diff only: >2%).

4. The bar view is actually quite useful (and still missing). It makes the bigger changes "pop", vs right now one has to look at all the % changes in order to identify which are the biggest ones.
(In reply to Avi Halachmi (:avih) from comment #36)
> Looks great!
> 
> (In reply to William Lachance (:wlach) from comment #35)
> > 4. We probably want to allow highlighting multiple revisions in the graphs
> > view.
> 
> Not sure I understand. Do you mean in order to compare more than 2 revisions?
> 
> > 5. The approach here is really **** the treeherder server. We should be
> > able both to make things more performant and avoid DDOS'ing our running
> > instance by: (1) batching queries to get series signature information, (2)
> > requesting smaller series intervals for recent revisions
> 
> What does (2) mean? would it affect the displayed values?

No, it just means only requesting the data we actually need. Right now we pull down 90 days of history to get the datapoint information. We don't need nearly that much.

> Few more things we mentioned on IRC, so adding here:
> ...

Thanks for noting those, hopefully we can get to them next week.
Oops, forgot to reply to one point.

(In reply to William Lachance (:wlach) from comment #37)
> > > 4. We probably want to allow highlighting multiple revisions in the graphs
> > > view.
> > 
> > Not sure I understand. Do you mean in order to compare more than 2 revisions?

Yes.
(In reply to William Lachance (:wlach) from comment #38)
> Oops, forgot to reply to one point.
> 
> (In reply to William Lachance (:wlach) from comment #37)
> > > > 4. We probably want to allow highlighting multiple revisions in the graphs
> > > > view.
> > > 
> > > Not sure I understand. Do you mean in order to compare more than 2 revisions?
> 
> Yes.

He fixed it on IRC. He means "no". As in, add the option to select two revisions on the graph in order to compare them. There are no plans to support three or more points to compare.
moar work
Attachment #8598218 - Flags: review?(wlachance)
remaining work from items from previous comments:
1. We need some kind of "loading" indicator, as we do in the main compare view. Sometimes subtests take a while to load.
2. Showing +/-NaN (NaN%) is pretty ugly for stddev when there's no retriggers. We should probably just print something like "N/A"
3. The graphs view should link to this instead of attempting to show its own subtest summary. This is better.
4. We probably want to allow highlighting multiple revisions in the graphs view.
5. PGO/e10s still not exposed in the UI as options.
6. Title for the subtests page (HTML and/or in-page).
7. Link back from the subtests page to the overview page.
8. Take stddev into account when highlighting green/red (currently it highlights based on the average diff only: >2%).

items 1,2,6,7,8 should be easy to tackle, I also want to tackle #5, lets see how far I get with addressing reviews and hacking on these items.  I think we can close this bug this week.
Comment on attachment 8598218 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/490

Some minor requests for modification, but on the whole this looks good! Would be happy to see this go in with the requested changes.
Attachment #8598218 - Flags: review?(wlachance) → review+
Bug: If data cannot be fetched, the subtests page should let the user know.

If the revision doesn't exist at the tree (e.g. it's on try but the UI was left at m-c), or if the URL just ends up with an invalid tree id (e.g. 'Try' instead of 'try'), and probably (untested) if the revision is incorrect somehow, then the sub-tests page just animates the cat forever.

It should instead display a message such as:
- Unknown tree: 'Try'
or
- Revision 'ABCDE12...' doesn't exist on tree 'mozilla-central'
or (maybe)
- 'ABCZZZ' is not a valid revision format

etc.
Commits pushed to master at https://github.com/mozilla/treeherder-ui

https://github.com/mozilla/treeherder-ui/commit/571a8c17b67e53aad02a9d958250f9c4b41bbff9
Bug 1142680 - Write a compare-talos equivalent - items 1,2,7,8

https://github.com/mozilla/treeherder-ui/commit/87c0d6ca09373f30fa7134ec2aa46c08e20bcdcf
Merge pull request #495 from jmaher/nits

Bug 1142680 - Write a compare-talos equivalent - items 1,2,7,8
what is left:
3. The graphs view should link to this instead of attempting to show its own subtest summary. This is better.
4. We probably want to allow highlighting multiple revisions in the graphs view.
5. PGO/e10s still not exposed in the UI as options.
6. Title for the subtests page (HTML and/or in-page).
9. Error handling with nice message if revision/branch are incorrect or missing

I think we could realistically do:
* remove the graph server subtest summary- have a link to the subtest page
* add pgo checkbox in the UI/urlparam to show/hide pgo related text
* add a nice title to the page, we can make this more informative
* do a first round of error handling

That would leave:
* e10s
* error handling polish
* additional polish
* additional features

please do weigh in on any thoughts you might have
(In reply to Joel Maher (:jmaher) from comment #46)
> * add pgo checkbox in the UI/urlparam to show/hide pgo related text

If I understand the task correctly*, then I don't think this is important.

* As far as I understand, by default it does show both PGO (if exists) and non-PFO (if exists) and separates them into two "test sections", right? if yes, then I think it's better to always shows everything, and whoever views it can tell per improvement/regression if it's PGO or non-PGO by the section title.

If it turns out to be too cluttered on some cases, then we can rethink this, but for now, I think it's not required.
yes, pgo is ignored for now, but it could be added and would be a different test.  Right now it isn't mixed in and won't be mixed in.  

As it stands there is no way to have PGO, it is simple to add it in and for this round, I will just add it in if it is there.

One other thought- 
if a test has no data at all, should we display the empty header?  I recall wlach suggesting we squash it completely, that can be done without too much trouble and would help ignore duplicating tests/etc when we display pgo.
So the task is not to allow hiding of PGO data, but rather to:
1. show also pgo data
2. Allow to disable showing PGO data.

is this correct?

If yes, 1 is very important, and 2 probably isn't too important. I thought the task was only 2.

(In reply to Joel Maher (:jmaher) from comment #48)
> One other thought- 
> if a test has no data at all, should we display the empty header?  I recall
> wlach suggesting we squash it completely, that can be done without too much
> trouble and would help ignore duplicating tests/etc when we display pgo.

I'm not sure I understand the options. I understand squashing them completely, but what other option exist (that's not a rhetoric question)? How can you know that there are some tests for which you don't have data? What about android-only tests? etc.
(In reply to Avi Halachmi (:avih) from comment #36)
> 3. Take stddev into account when highlighting green/red (currently it
> highlights based on the average diff only: >2%).

So, Joel has since changed it to some formula which I'm not able to understand because I don't know what its inputs are and the inputs don't look symmetric to me, which I find suspicious (i.e. I don't think it should matter what's considered old/new revision in order to decide if the diff between them is meaningful enough to highlight): https://github.com/jmaher/treeherder-ui/blob/nits/webapp/app/js/perf.js#L174


Kyle, let's say we have:
- N retriggers for test T at revision R1
- M retriggers for the same test T at revision R2

What would you suggest to use in order take these N values and M values and decide if the difference between them is meaningful (in order to highlight the diff in red/green when it's meaningful)?
Flags: needinfo?(klahnakoski)
items 5,6,9.  I believe wlach has solved item 3 (subtest view from graphs to use compare mode).  what is left from the original list:
4. We probably want to allow highlighting multiple revisions in the graphs view.

In addition:
10. ensure the highlighting mode (red/green/orange) has a useful and meaningful algorithm.  this was updated recently to copy compare-talos verbatim, now it could use a bit more scrutiny and thought before calling it final.

11. the UI for comparechooser should have better validate of branches/revisions
12. error messages when parsing a URL should direct to comparechooser if possible, or at least make something more actionable for the consumer of the error.
13. add a link for graphs on the summary (right now it is just a link to 'details' - the subtest compare view)

All of these items I would consider a P2, if we agree, then I will file bugs and we can prioritize, etc.
Attachment #8600277 - Flags: review?(wlachance)
Comment on attachment 8600277 [details] [review]
https://github.com/mozilla/treeherder-ui/pull/500

Some minor change requests. Feel free to merge after those are addressed.
Attachment #8600277 - Flags: review?(wlachance) → review+
Commits pushed to master at https://github.com/mozilla/treeherder-ui

https://github.com/mozilla/treeherder-ui/commit/afec43d0d7bb16a153389fad75b24b258b5c89e1
Bug 1142680 - Write a compare-talos equivalent, items 5,6,9

https://github.com/mozilla/treeherder-ui/commit/338ae0b90dedd089d46b0f5f391415474ae13a53
Merge pull request #500 from jmaher/nits

Bug 1142680 - Write a compare-talos equivalent, items 5,6,9
Kyle, this is tracked in bug 1160613 now.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(klahnakoski)
Resolution: --- → FIXED
Depends on: 1160630
Commit pushed to master at https://github.com/mozilla/treeherder-ui

https://github.com/mozilla/treeherder-ui/commit/dbe3801090820519db60c4ff9ac2327be5d78e50
Bug 1142680 - Remove code to create inline subtest summary in graphs view

We now link to a compare view externally, which does the same thing -- the
display template was already gone, but I forgot to remove the supporting
code.
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/fd0f772aa63d9f7c6323bdb82e116365ed6bc2f9
Bug 1142680 - initial work to get a compare-talos equivalent stood up

https://github.com/mozilla/treeherder/commit/7b37c78f1dde712ed696a40e6e6182adb65bc493
Bug 1142680 - Previous commit was broken, fix

https://github.com/mozilla/treeherder/commit/cece16c32717699b1d0ba6d50556d526f9cf682a
Bug 1142680 - Fix grunt build for compare perf

https://github.com/mozilla/treeherder/commit/1338ac79bb76252d2b0d34ff5420f3122843c7e5
Bug 1142680 - Add a user interface for choosing compare data

https://github.com/mozilla/treeherder/commit/1edde1d98cdc4a06fb4bea9c155c153ac3e34379
Bug 1142680 - Missed one file

https://github.com/mozilla/treeherder/commit/a0c3f2197c05360cd90be013472fb1b5e7496811
Bug 1142680 - Merge all perf stuff (compare + graphs) into one app

https://github.com/mozilla/treeherder/commit/8f8cf3f09a8e3be931dfbfd2cce5df17b10c3fa6
Bug 1142680 - Clean up compareperf controllers a bit

https://github.com/mozilla/treeherder/commit/ec09ee0a250428eba86e1691c936532f61d7e3e0
Bug 1142680 - Add support for comparing subtest details

https://github.com/mozilla/treeherder/commit/f1172b30c6bac20c8fdb5734492f1c006bd245f7
Bug 1142680 - Optimizations and bar graph for compare talos

https://github.com/mozilla/treeherder/commit/28a8cef6c42edf1343789d01649755bb3b0dcd72
Bug 1142680 - Write a compare-talos equivalent - items 1,2,7,8

https://github.com/mozilla/treeherder/commit/5ded3350ad4d3b280bcf19e89da28140ca82c39c
Merge pull request #495 from jmaher/nits

Bug 1142680 - Write a compare-talos equivalent - items 1,2,7,8

https://github.com/mozilla/treeherder/commit/5294a6647fc3b81e4801c1850fb4cdd61698c9f5
Bug 1142680 - Write a compare-talos equivalent, items 5,6,9

https://github.com/mozilla/treeherder/commit/ed4f759ea7c4da535c308e705b1442aa2e9efcd3
Merge pull request #500 from jmaher/nits

Bug 1142680 - Write a compare-talos equivalent, items 5,6,9

https://github.com/mozilla/treeherder/commit/442570ba6bb541cd9ab4d7252013f38666841b5e
Bug 1142680 - Integrate graphs with compare subtests view

https://github.com/mozilla/treeherder/commit/306ad61be3113cc053f5f4ebeca730f33ee5b3fd
Bug 1142680 - Remove code to create inline subtest summary in graphs view

We now link to a compare view externally, which does the same thing -- the
display template was already gone, but I forgot to remove the supporting
code.
Depends on: 1168614
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: