Open Bug 1397745 Opened 7 years ago Updated 2 years ago

DAMP doesn't report significant regression/improvement because it only care at summary results

Categories

(DevTools :: General, defect, P3)

defect

Tracking

(firefox57 fix-optional)

Tracking Status
firefox57 --- fix-optional

People

(Reporter: ochameau, Unassigned)

References

(Blocks 2 open bugs)

Details

And it is *really* bad and makes the whole thing unfortunately useless.

Let's look at bug 1396619 as example.

Thanks to additional fixes to DAMP from bug bug 1394804, I get this DAMP results:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=527341ed12c473fc7b83f4a37fb277ea321de9d3&newProject=try&newRevision=94783d31562766bec9dda8710aaea2b4c79babee&framework=1&showOnlyImportant=0
  linux64   196.33 ± 1.17% 	< 	197.23 ± 0.80% 	0.46% 		0.79 (low)

=> No it reports no win at all

But it clearly hides a significant one, which deeply impact inspector usefullness.
And it is unfortunate as you can really see it when looking at subtests:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=527341ed12c473fc7b83f4a37fb277ea321de9d3&newProject=try&newRevision=94783d31562766bec9dda8710aaea2b4c79babee&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=inspector.&framework=1

  simple.inspector.open.DAMP 920.64 ± 1.21% > 874.37 ± 1.10% 	-5.03%	7.71 (high)

I looked at all bugs from bug 1373320 and I only see reports about damp summary regression, so please correct me if we also have alerts on subtests.


Now. I know it isn't easy story. I imagine we don't track each subtest for various reasons. But I would like to hear all of them to see how to move forward from here.
Blocks: damp
Brian, your contribution on this subject would be highlight welcomed :)
Flags: needinfo?(bgrinstead)
(In reply to Alexandre Poirot [:ochameau] from comment #0)
> And it is *really* bad and makes the whole thing unfortunately useless.
> 
> Let's look at bug 1396619 as example.
> 
> Thanks to additional fixes to DAMP from bug bug 1394804, I get this DAMP
> results:
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=527341ed12c473fc7b83f4a37fb277ea
> 321de9d3&newProject=try&newRevision=94783d31562766bec9dda8710aaea2b4c79babee&
> framework=1&showOnlyImportant=0
>   linux64   196.33 ± 1.17% 	< 	197.23 ± 0.80% 	0.46% 		0.79 (low)
> 
> => No it reports no win at all
> 
> But it clearly hides a significant one, which deeply impact inspector
> usefullness.
> And it is unfortunate as you can really see it when looking at subtests:
> 
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=try&originalRevision=527341ed12c473fc7b83f4a37
> fb277ea321de9d3&newProject=try&newRevision=94783d31562766bec9dda8710aaea2b4c7
> 9babee&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignatur
> e=edaec66500db21d37602c99daa61ac983f21a6ac&filter=inspector.&framework=1
> 
>   simple.inspector.open.DAMP 920.64 ± 1.21% > 874.37 ± 1.10% 	-5.03%	7.71
> (high)

I'm not surprised that a win on a single subtest doesn't move the summary much. That doesn't make wins on individual tools useless!

> I looked at all bugs from bug 1373320 and I only see reports about damp
> summary regression, so please correct me if we also have alerts on subtests.
>
>
> Now. I know it isn't easy story. I imagine we don't track each subtest for
> various reasons. But I would like to hear all of them to see how to move
> forward from here.

AIUI we just don't do reporting on Talos subtests. I have asked about this before and AIUI it would end up taking too much time to sort through the reports and track down regression windows, and also it's not clear what criteria there would be for backouts. For instance, if you regress page reload time with the debugger open should the patch be backed out (and at what percent regression)?

That said, I'd be interested in at least a way to be alerted of individual metric changes, even if it wasn't tied in to the sheriffing aspect of it.

Joel, anything to add?
Flags: needinfo?(bgrinstead) → needinfo?(jmaher)
correct, we do not report on subtests- they typically fluctuate more often and the volume of alerts would be insane.  In fact almost 90% of the reported alerts are for an incorrect revision, this is due to merging between branches, and the frequency we run tests (either on purpose for load reasons, or on accident for breakage of builds or test jobs, and sometimes just off by 1 based on the algorithm).  Given that there is work to ensure we have a sustained regression and point at the right changeset- while there is room for more alerts, there is not room for 20x the volume of alerts for our current 1 person sheriff team.

Generating alerts for others to use is an option, the problem I see with that is who is going to take action on them given the fact that they are guaranteed to be incorrect.  We could generate alerts for subtests as requested, or possibly specific subtests for a given suite or two.  

looking at a few subtests:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=mozilla-inbound,1501099,1,1&series=mozilla-inbound,1501097,1,1&series=mozilla-inbound,1501098,1,1&series=mozilla-inbound,1501068,1,1

we would see at least 4 extra alerts just for those 3 subtests in the last 30 days, probably 8 as the data would merge from one branch to another, + pgo, + other platforms.  Typically we end up running pgo on a different schedule than opt, which adds to the confusion.

Would having tools to analyze subtests when you care to look at it be useful?  I guess this would be similar to having a page of alerts.
Flags: needinfo?(jmaher)
Alex, what do you think about doing our own tracking of these probes using a dashboard like: https://health.graphics/quantum? I assume that's pulling the same data that perfherder is using.
yeah, it isn't hard to pull the perfherder data down like the quantum dashboard does
Thanks your feedback!

My call on that is that the data isn't great to start with.
When I see this graph:
  https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=mozilla-inbound,1501097,1,1
I do not see how we could safely say how this probe evolves. No way to say if that's stable or if that regress.

So I totally agree on the simple fact that we just can't have any decent automatic alert mechanism to track that.
But I would like to work on that and have much better data which would be so flaky.

Then, if I manage to do that, I would like to revisit the idea of tracking more than the summary.

In the meantime, yes, that would be great if I can at least keep track of some probes on my own!


(In reply to Brian Grinstead [:bgrins] from comment #4)
> Alex, what do you think about doing our own tracking of these probes using a
> dashboard like: https://health.graphics/quantum? I assume that's pulling the
> same data that perfherder is using.

That looks very handy! Do you know if the sources are available somewhere?
I'm really wondering how each DAMP probe look like with they graphs, may be it helps fading away the noise compared to perfherder?
(In reply to Alexandre Poirot [:ochameau] from comment #6)
> (In reply to Brian Grinstead [:bgrins] from comment #4)
> > Alex, what do you think about doing our own tracking of these probes using a
> > dashboard like: https://health.graphics/quantum? I assume that's pulling the
> > same data that perfherder is using.
> 
> That looks very handy! Do you know if the sources are available somewhere?
> I'm really wondering how each DAMP probe look like with they graphs, may be
> it helps fading away the noise compared to perfherder?

Forwarding that question to Harald
Flags: needinfo?(hkirschner)
Priority: -- → P3
I found it, forked it and got that graph for summary, inspector.open and webconsole.open since January 2017:
https://screenshots.firefox.com/pZNicITLTQ0HqEnJ/127.0.0.1

Harald, would you mind if I fork this app?
How/where do you host this node app?
I would love to merge it back, creating a new /devtools dashboard within health.graphics
Flags: needinfo?(hkirschner)
Product: Firefox → DevTools
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.