14.12% Images (windows7-32) regression on push 484096481587c8c66e27a4d834ec62f596ae55f3 (Wed Jul 26 2017)

RESOLVED WONTFIX

Status

defect
RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: igoldan, Unassigned)

Tracking

({perf, regression})

Trunk
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=484096481587c8c66e27a4d834ec62f596ae55f3

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 14%  Images summary windows7-32 opt      5,848,094.98 -> 6,673,805.89

Improvements:

 12%  Resident Memory summary windows7-32 pgo      351,679,044.18 -> 308,778,121.19
 12%  Resident Memory summary windows7-32 opt      355,146,112.56 -> 313,664,646.67
 10%  Heap Unclassified summary windows10-64 opt   52,810,816.13 -> 47,701,511.52
  9%  Heap Unclassified summary windows7-32 opt    45,781,104.66 -> 41,482,616.99
  9%  Heap Unclassified summary windows7-32 pgo    46,101,523.65 -> 41,864,335.96
  9%  Heap Unclassified summary windows10-64 pgo   52,695,079.56 -> 47,950,030.30
  8%  Explicit Memory summary windows7-32 opt      263,844,307.38 -> 242,272,681.50
  8%  Explicit Memory summary windows10-64 opt     334,062,512.19 -> 307,433,843.39
  8%  Explicit Memory summary windows7-32 pgo      264,761,641.77 -> 244,414,845.87
  8%  Explicit Memory summary windows10-64 pgo     335,282,823.88 -> 309,669,867.80
  8%  JS summary windows7-32 opt                   100,814,693.52 -> 93,240,527.63
  7%  JS summary windows10-64 opt                  135,231,324.49 -> 125,191,533.85
  7%  JS summary windows10-64 pgo                  135,199,996.31 -> 125,432,489.84
  6%  Resident Memory summary windows10-64 pgo     485,515,548.69 -> 456,727,235.94
  6%  JS summary windows7-32 pgo                   100,355,727.35 -> 94,480,968.07
  6%  Resident Memory summary windows10-64 opt     494,017,414.31 -> 466,185,629.97


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8351

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY
Component: Untriaged → General
Product: Firefox → Testing
:erahm This may be just a simple notification on the AWSY test update, but were you expecting the Images summary regression?
Flags: needinfo?(erahm)
I would guess it's related to how the summary total score is calculated from subtests ?:
https://searchfox.org/mozilla-central/source/testing/awsy/awsy/process_perf_data.py#99

The new run has an additional subtest while reducing measured performance numbers on other tests.
(In reply to Ed Lee :Mardak from comment #3)
> I would guess it's related to how the summary total score is calculated from
> subtests ?:
> https://searchfox.org/mozilla-central/source/testing/awsy/awsy/
> process_perf_data.py#99
> 
> The new run has an additional subtest while reducing measured performance
> numbers on other tests.

Yeah that makes sense, we divide it by the length of checkpoints so it shouldn't have been too bad but I guess the extra processes number is rather high (49MB) :(

I think we can just accept this regression since it makes sense now.
Flags: needinfo?(wlachance)
This looks to mostly match the summary numbers (probably some rounding at some point):

(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #2)
> [1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=1efacc8c49ba68b524de18c6b30153cb78e524d2&newProject=mozilla-inbound&newRevision=484096481587c8c66e27a4d834ec62f596ae55f3&originalSignature=1652cbcf255142dfdb93f74b92fe72486f8988cc&newSignature=1652cbcf255142dfdb93f74b92fe72486f8988cc&framework=4

> oldV = Array.slice(document.querySelectorAll("ph-average[replicates*='orig']")).map(avg => avg.getAttribute("value") - 0).filter(v => v);
Array [ 16754797.333333334, 17010002.666666668, 26612546.666666668, 1122952, 1166352, 1978306.6666666667, 8709245.333333334, 51117800 ]
> newV = Array.slice(document.querySelectorAll("ph-average[replicates*='new']")).map(avg => avg.getAttribute("value") - 0).filter(v => v);
Array [ 15745090.666666666, 16006408, 25609714.666666668, 1122952, 1166352, 1585946.6666666667, 5175386.666666667, 50896018.666666664, 33704826.666666664 ]
> [Math.pow(Math.E, oldV.reduce((t,v) => t + Math.log(v), 0) / oldV.length), Math.pow(Math.E, newV.reduce((t,v) => t + Math.log(v), 0) / newV.length)]
Array [ 7374726.0166721195, 7893934.991355613 ]

Pretty close to the summary "regression" from 7,372,230 to 7,893,494.

If we pretend the baseline value included "Images Tabs closed extra processes opt" of the same new value (50,896,018), the weighted summary score would be increased to 9140316.076319937 resulting in a 16% improvement.

Basically, with the added subtest, the summary numbers aren't entirely comparable before/after.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.