Closed Bug 1481821 Opened 6 years ago Closed 8 months ago

perfherder compare is very limited when developers are investigating a regression or making their test relevant or stable

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: student-project)

Attachments

(1 file)

Screenshot 2023-09-08 at 15.39.09.png 8 months ago Dave Hunt [:davehunt] [he/him] ⌚BST 211.57 KB, image/png		Details

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

6 years ago

our perfherder compare shows a lot of related data which is useful in many cases, but not all of them.  When a regression shows up or a test is noisy, a test owner will often need to look at data in more details, sometimes the raw replicates.

One idea from the devtools team is that we look at different calculations on the data (not the median value always).  There are probably dozens of ways to look at the data which would be helpful to understand what the test is doing.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

6 years ago

:ochameau can you provide a link/screenshot of some of your tools for analyzing the subtests/replicates of the damp test?

Flags: needinfo?(poirot.alex)

Alexandre Poirot [:ochameau]

Comment 2

•

6 years ago

Here is a link:
  https://firefox-dev.tools/performance-dashboard/inspect/index.html?base=7337cfb80e8b285aab95789b04daab6825d765f3&new=03e1f8759a9d7cfe3de45a450d4513917ac866db&platform=linux64-opt
(Note that it may break as anytime as that's a work in progress that I may easily change)
Here is a screenshot in case it broke:
  https://screenshots.firefox.com/UP2OGzt5LlpkRHhT/firefox-dev.tools

My goal here was to tweak both the dashboard *and* the test harness/scripts in order to have the dashboard say that all subtests have a "computed" difference close to 0%. And that actual warning are all off when comparing two distinct try push against the same m-c changeset. Another test was to see the dashboard report 1% difference when introducing a fake 1% regression.

In this prototype I experiment with:
* multiple data sets: with/without replicates and also with same filtering of data points outside of q1<=>q3.
* multiple maths: mean, median, confidence interval

The first conclusion of this experiment was that boxplots were looking more obvious to everyone I demoed this tool.
i.e. it is easier to make a conclusion on a subtest when comparing the two boxplots rather than reading statistical numbers.

Flags: needinfo?(poirot.alex)

William Lachance (:wlach)

Comment 3

•

6 years ago

In retrospect I feel like the compare view is the weakest part of perfherder, but it's also one of the hardest things to get right. I think one thing I would definitely change would be to focus more on displaying the *distribution* of results, and less on derived measures like means/medians/standard deviations/confidence intervals (which can be pretty misleading and hard for most people to interpret correctly).

To that end, I quite like the box plot / distribution views in that screenshot.

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

No longer blocks: 1481817

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Keywords: outreachy, student-project

Priority: -- → P3

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Blocks: 1520720

saijatin28

Comment 4

•

5 years ago

Can I work on this bug. How do i get started?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

5 years ago

Hi saijatin28, thanks for your interest in working on this.

first thing is to setup a development environment for perfherder:
https://github.com/mozilla/treeherder/

I feel that this work item is slightly ambiguous and maybe not the best work item to tackle for one of the first couple bugs related to perfherder- I do think it is a great project to work on as it can shed light on many things and will be very useful to implement.

:igoldan, can you help add more detail here related to how specifically we would implement this, what data we would use, where it would go in the current UI, what controls might be needed? Also if there is related work that is planned which could yield bitrot or conflicts with the compare view.

Flags: needinfo?(jmaher) → needinfo?(igoldan)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Blocks: 1565516

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Type: defect → task

Flags: needinfo?(igoldan)

Sarah Clements [:sclements]

Updated

•

5 years ago

Keywords: outreachy

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Blocks: 1568462

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 6

•

5 years ago

We weren't able to get to this in Q3, and have already planned Q4 work. Removing this from blocking the Q3/2019 meta bug so we can consider this for 2020.

No longer blocks: 1568462

Ionuț Goldan [:igoldan]

Updated

•

4 years ago

No longer blocks: 1520720

Kimberly Sereduck :kimberlythegeek

Updated

•

2 years ago

Blocks: 1754831

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 7

•

8 months ago

Attached image Screenshot 2023-09-08 at 15.39.09.png — Details

PerfCompare will replace Perfherder's compare view, and currently shows the distribution of the results. Further improvements should be filed against Testing::PerfCompare.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

8 months ago

Status: NEW → RESOLVED

Closed: 8 months ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.