perfherder compare is very limited when developers are investigating a regression or making their test relevant or stable
Categories
(Tree Management :: Perfherder, task, P3)
Tracking
(Not tracked)
People
(Reporter: jmaher, Unassigned)
References
(Blocks 2 open bugs)
Details
(Keywords: student-project)
Attachments
(1 file)
211.57 KB,
image/png
|
Details |
our perfherder compare shows a lot of related data which is useful in many cases, but not all of them. When a regression shows up or a test is noisy, a test owner will often need to look at data in more details, sometimes the raw replicates. One idea from the devtools team is that we look at different calculations on the data (not the median value always). There are probably dozens of ways to look at the data which would be helpful to understand what the test is doing.
Reporter | ||
Comment 1•6 years ago
|
||
:ochameau can you provide a link/screenshot of some of your tools for analyzing the subtests/replicates of the damp test?
Comment 2•6 years ago
|
||
Here is a link: https://firefox-dev.tools/performance-dashboard/inspect/index.html?base=7337cfb80e8b285aab95789b04daab6825d765f3&new=03e1f8759a9d7cfe3de45a450d4513917ac866db&platform=linux64-opt (Note that it may break as anytime as that's a work in progress that I may easily change) Here is a screenshot in case it broke: https://screenshots.firefox.com/UP2OGzt5LlpkRHhT/firefox-dev.tools My goal here was to tweak both the dashboard *and* the test harness/scripts in order to have the dashboard say that all subtests have a "computed" difference close to 0%. And that actual warning are all off when comparing two distinct try push against the same m-c changeset. Another test was to see the dashboard report 1% difference when introducing a fake 1% regression. In this prototype I experiment with: * multiple data sets: with/without replicates and also with same filtering of data points outside of q1<=>q3. * multiple maths: mean, median, confidence interval The first conclusion of this experiment was that boxplots were looking more obvious to everyone I demoed this tool. i.e. it is easier to make a conclusion on a subtest when comparing the two boxplots rather than reading statistical numbers.
Comment 3•6 years ago
|
||
In retrospect I feel like the compare view is the weakest part of perfherder, but it's also one of the hardest things to get right. I think one thing I would definitely change would be to focus more on displaying the *distribution* of results, and less on derived measures like means/medians/standard deviations/confidence intervals (which can be pretty misleading and hard for most people to interpret correctly). To that end, I quite like the box plot / distribution views in that screenshot.
Updated•5 years ago
|
Reporter | ||
Comment 5•5 years ago
|
||
Hi saijatin28, thanks for your interest in working on this.
first thing is to setup a development environment for perfherder:
https://github.com/mozilla/treeherder/
I feel that this work item is slightly ambiguous and maybe not the best work item to tackle for one of the first couple bugs related to perfherder- I do think it is a great project to work on as it can shed light on many things and will be very useful to implement.
:igoldan, can you help add more detail here related to how specifically we would implement this, what data we would use, where it would go in the current UI, what controls might be needed? Also if there is related work that is planned which could yield bitrot or conflicts with the compare view.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
We weren't able to get to this in Q3, and have already planned Q4 work. Removing this from blocking the Q3/2019 meta bug so we can consider this for 2020.
Comment 7•8 months ago
|
||
PerfCompare will replace Perfherder's compare view, and currently shows the distribution of the results. Further improvements should be filed against Testing::PerfCompare.
Updated•8 months ago
|
Description
•