Open Bug 1481821 Opened Last year Updated 4 days ago

perfherder compare is very limited when developers are investigating a regression or making their test relevant or stable

Categories

(Tree Management :: Perfherder, task, P3)

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: student-project)

our perfherder compare shows a lot of related data which is useful in many cases, but not all of them.  When a regression shows up or a test is noisy, a test owner will often need to look at data in more details, sometimes the raw replicates.

One idea from the devtools team is that we look at different calculations on the data (not the median value always).  There are probably dozens of ways to look at the data which would be helpful to understand what the test is doing.
:ochameau can you provide a link/screenshot of some of your tools for analyzing the subtests/replicates of the damp test?
Flags: needinfo?(poirot.alex)
Here is a link:
  https://firefox-dev.tools/performance-dashboard/inspect/index.html?base=7337cfb80e8b285aab95789b04daab6825d765f3&new=03e1f8759a9d7cfe3de45a450d4513917ac866db&platform=linux64-opt
(Note that it may break as anytime as that's a work in progress that I may easily change)
Here is a screenshot in case it broke:
  https://screenshots.firefox.com/UP2OGzt5LlpkRHhT/firefox-dev.tools

My goal here was to tweak both the dashboard *and* the test harness/scripts in order to have the dashboard say that all subtests have a "computed" difference close to 0%. And that actual warning are all off when comparing two distinct try push against the same m-c changeset. Another test was to see the dashboard report 1% difference when introducing a fake 1% regression.

In this prototype I experiment with:
* multiple data sets: with/without replicates and also with same filtering of data points outside of q1<=>q3.
* multiple maths: mean, median, confidence interval

The first conclusion of this experiment was that boxplots were looking more obvious to everyone I demoed this tool.
i.e. it is easier to make a conclusion on a subtest when comparing the two boxplots rather than reading statistical numbers.
Flags: needinfo?(poirot.alex)
In retrospect I feel like the compare view is the weakest part of perfherder, but it's also one of the hardest things to get right. I think one thing I would definitely change would be to focus more on displaying the *distribution* of results, and less on derived measures like means/medians/standard deviations/confidence intervals (which can be pretty misleading and hard for most people to interpret correctly).

To that end, I quite like the box plot / distribution views in that screenshot.

Can I work on this bug. How do i get started?

Flags: needinfo?(jmaher)

Hi saijatin28, thanks for your interest in working on this.

first thing is to setup a development environment for perfherder:
https://github.com/mozilla/treeherder/

I feel that this work item is slightly ambiguous and maybe not the best work item to tackle for one of the first couple bugs related to perfherder- I do think it is a great project to work on as it can shed light on many things and will be very useful to implement.

:igoldan, can you help add more detail here related to how specifically we would implement this, what data we would use, where it would go in the current UI, what controls might be needed? Also if there is related work that is planned which could yield bitrot or conflicts with the compare view.

Flags: needinfo?(jmaher) → needinfo?(igoldan)
Type: defect → task
Flags: needinfo?(igoldan)

We weren't able to get to this in Q3, and have already planned Q4 work. Removing this from blocking the Q3/2019 meta bug so we can consider this for 2020.

No longer blocks: 1568462
You need to log in before you can comment on or make changes to this bug.