perfherder summarization code that uses geomean is not at 100% parity with graph server

RESOLVED WONTFIX

Status

Tree Management
Perfherder
RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: jmaher, Assigned: wlach)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
oh, we have a slight difference in our summarization numbers between graph server and perf herder.  We need to investigate this and either accept or fix it.
(Reporter)

Comment 1

2 years ago
lets look at branch:mozilla-inbound, rev:ec3acc5237a8, platform:linux32_opt, test:tsvgx.

raw log:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1437136623/mozilla-inbound_ubuntu32_hw_test-svgr-bm103-tests1-linux-build2981.txt.gz

graph server view:
http://graphs.mozilla.org/graph.html#tests=[[281,131,33]]&sel=none&displayrange=7&datatype=geo

perfherder view:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,58e5edc18b2f6d4da80a4b7eb985c94ca3bbddbe,1]&highlightedRevisions=ec3acc5237a8

graph server value: 346.489
perf herder value:  351.34


raw data:
{"gearflowers.svg": [384.0, 318.0, 331.0, 322.0, 320.0, 322.0, 337.0, 333.0, 319.0, 315.0, 330.0, 317.0, 324.0, 312.0, 370.0, 333.0, 311.0, 330.0, 346.0, 323.0, 333.0, 316.0, 322.0, 334.0, 345.0], "hixie-007.xml": [1790.0, 1890.0, 1798.0, 1757.0, 1747.0, 1761.0, 1750.0, 1770.0, 1746.0, 1775.0, 1755.0, 1739.0, 1747.0, 1751.0, 1753.0, 1772.0, 1754.0, 1766.0, 1750.0, 1756.0, 1901.0, 1791.0, 1745.0, 1747.0, 1774.0], "hixie-003.xml": [259.0, 268.0, 264.0, 246.0, 240.0, 245.0, 239.0, 251.0, 241.0, 249.0, 244.0, 243.0, 244.0, 249.0, 251.0, 258.0, 246.0, 247.0, 246.0, 242.0, 243.0, 244.0, 239.0, 239.0, 246.0], "composite-scale-rotate.svg": [62.0, 63.0, 55.0, 57.0, 56.0, 54.0, 54.0, 56.0, 63.0, 54.0, 53.0, 58.0, 57.0, 59.0, 58.0, 62.0, 55.0, 52.0, 61.0, 60.0, 55.0, 57.0, 55.0, 56.0, 53.0], "hixie-006.xml": [5481.0, 5505.0, 5420.0, 5603.0, 5435.0, 5520.0, 5638.0, 5631.0, 5496.0, 5473.0, 5569.0, 5519.0, 5471.0, 5578.0, 5522.0, 5524.0, 5509.0, 5516.0, 5520.0, 5507.0, 5526.0, 5452.0, 5538.0, 5489.0, 5542.0], "composite-scale-rotate-opacity.svg": [74.0, 59.0, 55.0, 47.0, 51.0, 54.0, 54.0, 61.0, 56.0, 52.0, 52.0, 58.0, 57.0, 60.0, 53.0, 54.0, 55.0, 52.0, 53.0, 49.0, 49.0, 49.0, 61.0, 51.0, 52.0], "composite-scale.svg": [149.0, 59.0, 47.0, 49.0, 62.0, 54.0, 52.0, 49.0, 56.0, 64.0, 58.0, 53.0, 50.0, 49.0, 59.0, 48.0, 49.0, 51.0, 52.0, 51.0, 51.0, 53.0, 53.0, 65.0, 56.0], "hixie-005.xml": [2996.0, 3020.0, 3007.0, 3777.0, 2973.0, 3034.0, 2982.0, 3051.0, 3004.0, 3087.0, 3070.0, 3088.0, 2997.0, 2991.0, 2963.0, 2948.0, 2922.0, 3024.0, 3010.0, 2998.0, 2990.0, 2926.0, 2949.0, 3031.0, 2937.0], "hixie-001.xml": [547.0, 516.0, 502.0, 508.0, 497.0, 508.0, 505.0, 499.0, 495.0, 506.0, 504.0, 514.0, 512.0, 510.0, 521.0, 505.0, 509.0, 535.0, 504.0, 515.0, 513.0, 505.0, 506.0, 516.0, 515.0], "hixie-002.xml": [517.0, 508.0, 505.0, 506.0, 502.0, 523.0, 501.0, 499.0, 506.0, 507.0, 495.0, 502.0, 504.0, 500.0, 507.0, 502.0, 492.0, 516.0, 493.0, 507.0, 508.0, 505.0, 503.0, 503.0, 502.0], "hixie-004.xml": [999.0, 531.0, 525.0, 527.0, 525.0, 524.0, 523.0, 536.0, 525.0, 524.0, 522.0, 525.0, 527.0, 524.0, 523.0, 526.0, 729.0, 531.0, 525.0, 526.0, 526.0, 528.0, 525.0, 524.0, 535.0], "composite-scale-opacity.svg": [62.0, 52.0, 53.0, 55.0, 54.0, 52.0, 58.0, 55.0, 55.0, 55.0, 58.0, 57.0, 54.0, 54.0, 57.0, 61.0, 64.0, 59.0, 60.0, 62.0, 61.0, 58.0, 58.0, 59.0, 61.0]}
(Reporter)

Comment 2

2 years ago
more data from the raw log:
06:29:21     INFO -  __start_tp_report
06:29:21     INFO -  _x_x_mozilla_page_load
06:29:21     INFO -  _x_x_mozilla_page_load_details
06:29:21     INFO -  |i|pagename|runs|
06:29:21     INFO -  |0;gearflowers.svg;384;318;331;322;320;322;337;333;319;315;330;317;324;312;370;333;311;330;346;323;333;316;322;334;345
06:29:21     INFO -  |1;composite-scale.svg;149;59;47;49;62;54;52;49;56;64;58;53;50;49;59;48;49;51;52;51;51;53;53;65;56
06:29:21     INFO -  |2;composite-scale-opacity.svg;62;52;53;55;54;52;58;55;55;55;58;57;54;54;57;61;64;59;60;62;61;58;58;59;61
06:29:21     INFO -  |3;composite-scale-rotate.svg;62;63;55;57;56;54;54;56;63;54;53;58;57;59;58;62;55;52;61;60;55;57;55;56;53
06:29:21     INFO -  |4;composite-scale-rotate-opacity.svg;74;59;55;47;51;54;54;61;56;52;52;58;57;60;53;54;55;52;53;49;49;49;61;51;52
06:29:21     INFO -  |5;hixie-001.xml;547;516;502;508;497;508;505;499;495;506;504;514;512;510;521;505;509;535;504;515;513;505;506;516;515
06:29:21     INFO -  |6;hixie-002.xml;517;508;505;506;502;523;501;499;506;507;495;502;504;500;507;502;492;516;493;507;508;505;503;503;502
06:29:21     INFO -  |7;hixie-003.xml;259;268;264;246;240;245;239;251;241;249;244;243;244;249;251;258;246;247;246;242;243;244;239;239;246
06:29:21     INFO -  |8;hixie-004.xml;999;531;525;527;525;524;523;536;525;524;522;525;527;524;523;526;729;531;525;526;526;528;525;524;535
06:29:21     INFO -  |9;hixie-005.xml;2996;3020;3007;3777;2973;3034;2982;3051;3004;3087;3070;3088;2997;2991;2963;2948;2922;3024;3010;2998;2990;2926;2949;3031;2937
06:29:21     INFO -  |10;hixie-006.xml;5481;5505;5420;5603;5435;5520;5638;5631;5496;5473;5569;5519;5471;5578;5522;5524;5509;5516;5520;5507;5526;5452;5538;5489;5542
06:29:21     INFO -  |11;hixie-007.xml;1790;1890;1798;1757;1747;1761;1750;1770;1746;1775;1755;1739;1747;1751;1753;1772;1754;1766;1750;1756;1901;1791;1745;1747;1774
06:29:21     INFO -  __end_tp_report
06:29:21     INFO -  __start_cc_report
06:29:21     INFO -  _x_x_mozilla_cycle_collect,15883
06:29:21     INFO -  __end_cc_report
06:29:21     INFO -  __startTimestamp1437139761075__endTimestamp
06:29:21     INFO -  ------- Summary: start -------
06:29:21     INFO -  Number of tests: 12
06:29:21     INFO -  [#0] gearflowers.svg  Cycles:25  Average:329.88  Median:324.00  stddev:17.09 (5.3%)  stddev-sans-first:13.11
06:29:21     INFO -  Values: 384.0  318.0  331.0  322.0  320.0  322.0  337.0  333.0  319.0  315.0  330.0  317.0  324.0  312.0  370.0  333.0  311.0  330.0  346.0  323.0  333.0  316.0  322.0  334.0  345.0
06:29:21     INFO -  [#1] composite-scale.svg  Cycles:25  Average:57.56  Median:52.00  stddev:19.69 (37.9%)  stddev-sans-first:5.09
06:29:21     INFO -  Values: 149.0  59.0  47.0  49.0  62.0  54.0  52.0  49.0  56.0  64.0  58.0  53.0  50.0  49.0  59.0  48.0  49.0  51.0  52.0  51.0  51.0  53.0  53.0  65.0  56.0
06:29:21     INFO -  [#2] composite-scale-opacity.svg  Cycles:25  Average:57.36  Median:58.00  stddev:3.38 (5.8%)  stddev-sans-first:3.31
06:29:21     INFO -  Values: 62.0  52.0  53.0  55.0  54.0  52.0  58.0  55.0  55.0  55.0  58.0  57.0  54.0  54.0  57.0  61.0  64.0  59.0  60.0  62.0  61.0  58.0  58.0  59.0  61.0
06:29:21     INFO -  [#3] composite-scale-rotate.svg  Cycles:25  Average:57.00  Median:56.00  stddev:3.27 (5.8%)  stddev-sans-first:3.16
06:29:21     INFO -  Values: 62.0  63.0  55.0  57.0  56.0  54.0  54.0  56.0  63.0  54.0  53.0  58.0  57.0  59.0  58.0  62.0  55.0  52.0  61.0  60.0  55.0  57.0  55.0  56.0  53.0
06:29:21     INFO -  [#4] composite-scale-rotate-opacity.svg  Cycles:25  Average:54.72  Median:54.00  stddev:5.54 (10.3%)  stddev-sans-first:3.90
06:29:21     INFO -  Values: 74.0  59.0  55.0  47.0  51.0  54.0  54.0  61.0  56.0  52.0  52.0  58.0  57.0  60.0  53.0  54.0  55.0  52.0  53.0  49.0  49.0  49.0  61.0  51.0  52.0
06:29:21     INFO -  [#5] hixie-001.xml  Cycles:25  Average:510.68  Median:508.00  stddev:11.22 (2.2%)  stddev-sans-first:8.46
06:29:21     INFO -  Values: 547.0  516.0  502.0  508.0  497.0  508.0  505.0  499.0  495.0  506.0  504.0  514.0  512.0  510.0  521.0  505.0  509.0  535.0  504.0  515.0  513.0  505.0  506.0  516.0  515.0
06:29:21     INFO -  [#6] hixie-002.xml  Cycles:25  Average:504.52  Median:504.00  stddev:6.92 (1.4%)  stddev-sans-first:6.55
06:29:21     INFO -  Values: 517.0  508.0  505.0  506.0  502.0  523.0  501.0  499.0  506.0  507.0  495.0  502.0  504.0  500.0  507.0  502.0  492.0  516.0  493.0  507.0  508.0  505.0  503.0  503.0  502.0
06:29:21     INFO -  [#7] hixie-003.xml  Cycles:25  Average:247.32  Median:246.00  stddev:7.64 (3.1%)  stddev-sans-first:7.40
06:29:21     INFO -  Values: 259.0  268.0  264.0  246.0  240.0  245.0  239.0  251.0  241.0  249.0  244.0  243.0  244.0  249.0  251.0  258.0  246.0  247.0  246.0  242.0  243.0  244.0  239.0  239.0  246.0
06:29:21     INFO -  [#8] hixie-004.xml  Cycles:25  Average:553.40  Median:525.00  stddev:101.34 (19.3%)  stddev-sans-first:41.51
06:29:21     INFO -  Values: 999.0  531.0  525.0  527.0  525.0  524.0  523.0  536.0  525.0  524.0  522.0  525.0  527.0  524.0  523.0  526.0  729.0  531.0  525.0  526.0  526.0  528.0  525.0  524.0  535.0
06:29:21     INFO -  [#9] hixie-005.xml  Cycles:25  Average:3031.00  Median:2998.00  stddev:161.92 (5.4%)  stddev-sans-first:165.24
06:29:21     INFO -  Values: 2996.0  3020.0  3007.0  3777.0  2973.0  3034.0  2982.0  3051.0  3004.0  3087.0  3070.0  3088.0  2997.0  2991.0  2963.0  2948.0  2922.0  3024.0  3010.0  2998.0  2990.0  2926.0  2949.0  3031.0  2937.0
06:29:21     INFO -  [#10] hixie-006.xml  Cycles:25  Average:5519.36  Median:5519.00  stddev:53.93 (1.0%)  stddev-sans-first:54.48
06:29:21     INFO -  Values: 5481.0  5505.0  5420.0  5603.0  5435.0  5520.0  5638.0  5631.0  5496.0  5473.0  5569.0  5519.0  5471.0  5578.0  5522.0  5524.0  5509.0  5516.0  5520.0  5507.0  5526.0  5452.0  5538.0  5489.0  5542.0
06:29:21     INFO -  [#11] hixie-007.xml  Cycles:25  Average:1771.40  MediaINFO : Browser exited with error code: 0
06:29:26     INFO -  DEBUG : Terminating: 9806, 9888
06:29:26     INFO -  n:1756.00  stddev:40.48 (2.3%)  stddev-sans-first:41.16
06:29:26     INFO -  Values: 1790.0  1890.0  1798.0  1757.0  1747.0  1761.0  1750.0  1770.0  1746.0  1775.0  1755.0  1739.0  1747.0  1751.0  1753.0  1772.0  1754.0  1766.0  1750.0  1756.0  1901.0  1791.0  1745.0  1747.0  1774.0
06:29:26     INFO -  -------- Summary: end --------
(Reporter)

Comment 3

2 years ago
figured out the difference.

graph server:
page_medians = []
for page in svg:
    # drop the first 5 replicates, and take the median of the rest
    page_medians.append(median(page[svg][5:]))
summary_value = geometric_mean(page_medians) # a geometric mean of medians


perf herder:
all_values = []
for page in svg:
    all_values.extend(svg[page]) # effectively taking all data points in a large array
summary_value = geometric_mean(all_values) # a geometric mean of all values


we should determine if this is fine- if it is- we can close this bug.

:joy, is a geometric mean of medians for page load times any different than a geometric mean of all values
Flags: needinfo?(sguha)

Comment 4

2 years ago
Assuming all_values means all the replicates of all the runs (as if all the runs with all their replicates were combined into a single one big test), then geomeans(page_medians) is better.

The reason is that getting medians first for each run drops the outliers and the first replicate[s] of each run which are typically higher. This contributes to a more stable value. geomeans(all_values) doesn't drop any outlier at all (assuming all_values is what I expressed above).

Besides, geomeans(page_medians) is how we've been doing that all along, and which we trust. So unless the change to geomean(all_values) was done with intent and after consideration, I don't see why we shouldn't consider it plainly a bug or overlook - and fix it to use geomean(page_medians).
(Reporter)

Comment 5

2 years ago
thanks Avi, that makes sense.  In general I would agree with that.  Lets see what :joy thinks as well.
Yes the two are different, the median is robust to outliers and is therefore more averaging over medians is more stable than the entire data (in both cases we always ought to drop first 5). 

median(c(1800,1500,1900,1300,70000)) vs  exp(mean(log(c(1800,1500,1900,1300,70000))))

Not remarkably different and the latter takes into account the information contained in 70K.

If we believe that outliers are still important to consider, I will point out that the median drops information from a  lot of data  - 50% of the data (only takes ranks). If we want to use more data and still protect from outliers we can use the trimmed/winsorized mean and then take the g.m of those (as opposed to g.m of medians).

But as Avi pointed out, unless the new system really improves things (maybe make it more realistic since we drop less data) I don't see a pressing need to change it. We'd need to do a comparative study.


HTH
Flags: needinfo?(sguha)
Created attachment 8635451 [details] [review]
Use graphserver's formula

Here's a bug that attempts to change perfherder to use the graphserver formula. Note that some tests (e.g. the new tps test) don't have 5 replicates. We must be doing something different there. We'll need to update this patch to replicate that before it lands.
Assignee: nobody → wlachance
Attachment #8635451 - Flags: feedback?(jmaher)
(Reporter)

Comment 8

2 years ago
Comment on attachment 8635451 [details] [review]
Use graphserver's formula

overall this looks great.  Lets ensure we don't have issues with startup tests vs pageload tests.  Startup tests are tresize, ts_paint, tpaint, sessionrestore, sessionrestore_no_auto, media_tests.

Regarding dropping 5, this is done at the talos level, search for ignore_first in test.py:
http://hg.mozilla.org/build/talos/file/0600cd6a8eba/talos/test.py

sometimes we drop 1, other times we drop 5, sometimes we keep it all.

How about we do this:
if #_replicates > 15, drop 5, else use all.

if there are no subtests, then just use the single page and all the replicates (which is currently working just fine)
Attachment #8635451 - Flags: feedback?(jmaher) → feedback+
We decided to calculate the summary in talos instead. Bug 1184966
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
Actually let's mark this as a dupe.
Resolution: WONTFIX → DUPLICATE
Duplicate of bug: 1184966
Ok, I think we do, in fact want to fix this especially given that we have a proper patch already. We now only use the summarization code in perfherder for android, but it's easier to fix this than update Android talos (which is going away anyway).
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment on attachment 8635451 [details] [review]
Use graphserver's formula

This will most likely need to be updated for bitrot, fortunately the patch isn't too complex.

This should make all Android results in sync with Graphserver's (will of course test that again before landing).
Attachment #8635451 - Flags: review?(jmaher)
(Reporter)

Comment 13

2 years ago
Comment on attachment 8635451 [details] [review]
Use graphserver's formula

android has so few cycles, we would need to craft this carefully.
Attachment #8635451 - Flags: review?(jmaher) → review-
(In reply to Joel Maher (:jmaher) from comment #13)
> Comment on attachment 8635451 [details] [review]
> Use graphserver's formula
> 
> android has so few cycles, we would need to craft this carefully.

Did we use a special formula inside Talos for Android-specifically? AFAIK we didn't.
(Reporter)

Comment 15

2 years ago
there is no special formula for android talos, we just didn't have enough replicates- I think we defaulted to dropping 1 replicate and taking the median of the rest.
(Reporter)

Comment 16

2 years ago
do we need to keep this open?
yeah, let's just leave the old android stuff alone if we can.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.