Bug 1650398 Comment 8 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Botond Ballo [:botond]

on 2020-07-08 15:37:59 PDT

Marking as P2 until we understand better what's going on here.

However, my investigation so far is pointing to bug 1611660 not being the culprit here.

I ran two Try pushes:

  * [one](https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=bIM7-eNsSOetsHLg1Z40AA.0&revision=919d667573861c89c95b744e2c5cad6ba0263194) with a recent mozilla-central
  * [another](https://treeherder.mozilla.org/#/jobs?repo=try&revision=16cad0cde0800af3b601d916045c52d4edacc3c7) with the same m-c revision + bug 1643604 and bug 1611660 backed out (bug 1643604 was a previous perf fix for bug 1611660 which touched the same code and so had to be backed out for bug 1611660 to back out cleanly)

I ran Raptor just on Linux x64 shippable, and retriggered the `google-c` job 10 times in each Try push.

I then entered these two revisions into the [Perfherder compare view](https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=16cad0cde0800af3b601d916045c52d4edacc3c7&newProject=try&newRevision=919d667573861c89c95b744e2c5cad6ba0263194&framework=10) and looked at the entry `raptor-tp6-google-firefox-cold replayed opt`, which shows the expected 10/10 runs.

The average delta in that entry is +1.00%. There is a fair amount of variation between individual runs, with a standard deviation of ~15%. So, while an individual run may show the patches that include bug 1611660 scoring 20-30% worse, others show them scoring better by a similar amount, and there seems to be no systematic effect.

Am I missing something / looking at the wrong thing here?

Revision 1 by

Botond Ballo [:botond]

on 2020-07-08 15:38:17 PDT

Marking as P2 until we understand better what's going on here.

However, my investigation so far is pointing to bug 1611660 not being the culprit here.

I ran two Try pushes:

  * [one](https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=bIM7-eNsSOetsHLg1Z40AA.0&revision=919d667573861c89c95b744e2c5cad6ba0263194) with a recent mozilla-central
  * [another](https://treeherder.mozilla.org/#/jobs?repo=try&revision=16cad0cde0800af3b601d916045c52d4edacc3c7) with the same m-c revision + bug 1643604 and bug 1611660 backed out (bug 1643604 was a previous perf fix for bug 1611660 which touched the same code and so had to be backed out for bug 1611660 to back out cleanly)

I ran Raptor tests on Linux x64 shippable, and retriggered the `google-c` job 10 times in each Try push.

I then entered these two revisions into the [Perfherder compare view](https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=16cad0cde0800af3b601d916045c52d4edacc3c7&newProject=try&newRevision=919d667573861c89c95b744e2c5cad6ba0263194&framework=10) and looked at the entry `raptor-tp6-google-firefox-cold replayed opt`, which shows the expected 10/10 runs.

The average delta in that entry is +1.00%. There is a fair amount of variation between individual runs, with a standard deviation of ~15%. So, while an individual run may show the patches that include bug 1611660 scoring 20-30% worse, others show them scoring better by a similar amount, and there seems to be no systematic effect.

Am I missing something / looking at the wrong thing here?

Back to Bug 1650398 Comment 8