Closed
Bug 1184968
Opened 9 years ago
Closed 9 years ago
perfherder summarization code that uses geomean is not at 100% parity with graph server
Categories
(Tree Management :: Perfherder, defect)
Tree Management
Perfherder
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jmaher, Assigned: wlach)
References
Details
Attachments
(1 file)
oh, we have a slight difference in our summarization numbers between graph server and perf herder. We need to investigate this and either accept or fix it.
Reporter | ||
Comment 1•9 years ago
|
||
lets look at branch:mozilla-inbound, rev:ec3acc5237a8, platform:linux32_opt, test:tsvgx. raw log: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1437136623/mozilla-inbound_ubuntu32_hw_test-svgr-bm103-tests1-linux-build2981.txt.gz graph server view: http://graphs.mozilla.org/graph.html#tests=[[281,131,33]]&sel=none&displayrange=7&datatype=geo perfherder view: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,58e5edc18b2f6d4da80a4b7eb985c94ca3bbddbe,1]&highlightedRevisions=ec3acc5237a8 graph server value: 346.489 perf herder value: 351.34 raw data: {"gearflowers.svg": [384.0, 318.0, 331.0, 322.0, 320.0, 322.0, 337.0, 333.0, 319.0, 315.0, 330.0, 317.0, 324.0, 312.0, 370.0, 333.0, 311.0, 330.0, 346.0, 323.0, 333.0, 316.0, 322.0, 334.0, 345.0], "hixie-007.xml": [1790.0, 1890.0, 1798.0, 1757.0, 1747.0, 1761.0, 1750.0, 1770.0, 1746.0, 1775.0, 1755.0, 1739.0, 1747.0, 1751.0, 1753.0, 1772.0, 1754.0, 1766.0, 1750.0, 1756.0, 1901.0, 1791.0, 1745.0, 1747.0, 1774.0], "hixie-003.xml": [259.0, 268.0, 264.0, 246.0, 240.0, 245.0, 239.0, 251.0, 241.0, 249.0, 244.0, 243.0, 244.0, 249.0, 251.0, 258.0, 246.0, 247.0, 246.0, 242.0, 243.0, 244.0, 239.0, 239.0, 246.0], "composite-scale-rotate.svg": [62.0, 63.0, 55.0, 57.0, 56.0, 54.0, 54.0, 56.0, 63.0, 54.0, 53.0, 58.0, 57.0, 59.0, 58.0, 62.0, 55.0, 52.0, 61.0, 60.0, 55.0, 57.0, 55.0, 56.0, 53.0], "hixie-006.xml": [5481.0, 5505.0, 5420.0, 5603.0, 5435.0, 5520.0, 5638.0, 5631.0, 5496.0, 5473.0, 5569.0, 5519.0, 5471.0, 5578.0, 5522.0, 5524.0, 5509.0, 5516.0, 5520.0, 5507.0, 5526.0, 5452.0, 5538.0, 5489.0, 5542.0], "composite-scale-rotate-opacity.svg": [74.0, 59.0, 55.0, 47.0, 51.0, 54.0, 54.0, 61.0, 56.0, 52.0, 52.0, 58.0, 57.0, 60.0, 53.0, 54.0, 55.0, 52.0, 53.0, 49.0, 49.0, 49.0, 61.0, 51.0, 52.0], "composite-scale.svg": [149.0, 59.0, 47.0, 49.0, 62.0, 54.0, 52.0, 49.0, 56.0, 64.0, 58.0, 53.0, 50.0, 49.0, 59.0, 48.0, 49.0, 51.0, 52.0, 51.0, 51.0, 53.0, 53.0, 65.0, 56.0], "hixie-005.xml": [2996.0, 3020.0, 3007.0, 3777.0, 2973.0, 3034.0, 2982.0, 3051.0, 3004.0, 3087.0, 3070.0, 3088.0, 2997.0, 2991.0, 2963.0, 2948.0, 2922.0, 3024.0, 3010.0, 2998.0, 2990.0, 2926.0, 2949.0, 3031.0, 2937.0], "hixie-001.xml": [547.0, 516.0, 502.0, 508.0, 497.0, 508.0, 505.0, 499.0, 495.0, 506.0, 504.0, 514.0, 512.0, 510.0, 521.0, 505.0, 509.0, 535.0, 504.0, 515.0, 513.0, 505.0, 506.0, 516.0, 515.0], "hixie-002.xml": [517.0, 508.0, 505.0, 506.0, 502.0, 523.0, 501.0, 499.0, 506.0, 507.0, 495.0, 502.0, 504.0, 500.0, 507.0, 502.0, 492.0, 516.0, 493.0, 507.0, 508.0, 505.0, 503.0, 503.0, 502.0], "hixie-004.xml": [999.0, 531.0, 525.0, 527.0, 525.0, 524.0, 523.0, 536.0, 525.0, 524.0, 522.0, 525.0, 527.0, 524.0, 523.0, 526.0, 729.0, 531.0, 525.0, 526.0, 526.0, 528.0, 525.0, 524.0, 535.0], "composite-scale-opacity.svg": [62.0, 52.0, 53.0, 55.0, 54.0, 52.0, 58.0, 55.0, 55.0, 55.0, 58.0, 57.0, 54.0, 54.0, 57.0, 61.0, 64.0, 59.0, 60.0, 62.0, 61.0, 58.0, 58.0, 59.0, 61.0]}
Reporter | ||
Comment 2•9 years ago
|
||
more data from the raw log: 06:29:21 INFO - __start_tp_report 06:29:21 INFO - _x_x_mozilla_page_load 06:29:21 INFO - _x_x_mozilla_page_load_details 06:29:21 INFO - |i|pagename|runs| 06:29:21 INFO - |0;gearflowers.svg;384;318;331;322;320;322;337;333;319;315;330;317;324;312;370;333;311;330;346;323;333;316;322;334;345 06:29:21 INFO - |1;composite-scale.svg;149;59;47;49;62;54;52;49;56;64;58;53;50;49;59;48;49;51;52;51;51;53;53;65;56 06:29:21 INFO - |2;composite-scale-opacity.svg;62;52;53;55;54;52;58;55;55;55;58;57;54;54;57;61;64;59;60;62;61;58;58;59;61 06:29:21 INFO - |3;composite-scale-rotate.svg;62;63;55;57;56;54;54;56;63;54;53;58;57;59;58;62;55;52;61;60;55;57;55;56;53 06:29:21 INFO - |4;composite-scale-rotate-opacity.svg;74;59;55;47;51;54;54;61;56;52;52;58;57;60;53;54;55;52;53;49;49;49;61;51;52 06:29:21 INFO - |5;hixie-001.xml;547;516;502;508;497;508;505;499;495;506;504;514;512;510;521;505;509;535;504;515;513;505;506;516;515 06:29:21 INFO - |6;hixie-002.xml;517;508;505;506;502;523;501;499;506;507;495;502;504;500;507;502;492;516;493;507;508;505;503;503;502 06:29:21 INFO - |7;hixie-003.xml;259;268;264;246;240;245;239;251;241;249;244;243;244;249;251;258;246;247;246;242;243;244;239;239;246 06:29:21 INFO - |8;hixie-004.xml;999;531;525;527;525;524;523;536;525;524;522;525;527;524;523;526;729;531;525;526;526;528;525;524;535 06:29:21 INFO - |9;hixie-005.xml;2996;3020;3007;3777;2973;3034;2982;3051;3004;3087;3070;3088;2997;2991;2963;2948;2922;3024;3010;2998;2990;2926;2949;3031;2937 06:29:21 INFO - |10;hixie-006.xml;5481;5505;5420;5603;5435;5520;5638;5631;5496;5473;5569;5519;5471;5578;5522;5524;5509;5516;5520;5507;5526;5452;5538;5489;5542 06:29:21 INFO - |11;hixie-007.xml;1790;1890;1798;1757;1747;1761;1750;1770;1746;1775;1755;1739;1747;1751;1753;1772;1754;1766;1750;1756;1901;1791;1745;1747;1774 06:29:21 INFO - __end_tp_report 06:29:21 INFO - __start_cc_report 06:29:21 INFO - _x_x_mozilla_cycle_collect,15883 06:29:21 INFO - __end_cc_report 06:29:21 INFO - __startTimestamp1437139761075__endTimestamp 06:29:21 INFO - ------- Summary: start ------- 06:29:21 INFO - Number of tests: 12 06:29:21 INFO - [#0] gearflowers.svg Cycles:25 Average:329.88 Median:324.00 stddev:17.09 (5.3%) stddev-sans-first:13.11 06:29:21 INFO - Values: 384.0 318.0 331.0 322.0 320.0 322.0 337.0 333.0 319.0 315.0 330.0 317.0 324.0 312.0 370.0 333.0 311.0 330.0 346.0 323.0 333.0 316.0 322.0 334.0 345.0 06:29:21 INFO - [#1] composite-scale.svg Cycles:25 Average:57.56 Median:52.00 stddev:19.69 (37.9%) stddev-sans-first:5.09 06:29:21 INFO - Values: 149.0 59.0 47.0 49.0 62.0 54.0 52.0 49.0 56.0 64.0 58.0 53.0 50.0 49.0 59.0 48.0 49.0 51.0 52.0 51.0 51.0 53.0 53.0 65.0 56.0 06:29:21 INFO - [#2] composite-scale-opacity.svg Cycles:25 Average:57.36 Median:58.00 stddev:3.38 (5.8%) stddev-sans-first:3.31 06:29:21 INFO - Values: 62.0 52.0 53.0 55.0 54.0 52.0 58.0 55.0 55.0 55.0 58.0 57.0 54.0 54.0 57.0 61.0 64.0 59.0 60.0 62.0 61.0 58.0 58.0 59.0 61.0 06:29:21 INFO - [#3] composite-scale-rotate.svg Cycles:25 Average:57.00 Median:56.00 stddev:3.27 (5.8%) stddev-sans-first:3.16 06:29:21 INFO - Values: 62.0 63.0 55.0 57.0 56.0 54.0 54.0 56.0 63.0 54.0 53.0 58.0 57.0 59.0 58.0 62.0 55.0 52.0 61.0 60.0 55.0 57.0 55.0 56.0 53.0 06:29:21 INFO - [#4] composite-scale-rotate-opacity.svg Cycles:25 Average:54.72 Median:54.00 stddev:5.54 (10.3%) stddev-sans-first:3.90 06:29:21 INFO - Values: 74.0 59.0 55.0 47.0 51.0 54.0 54.0 61.0 56.0 52.0 52.0 58.0 57.0 60.0 53.0 54.0 55.0 52.0 53.0 49.0 49.0 49.0 61.0 51.0 52.0 06:29:21 INFO - [#5] hixie-001.xml Cycles:25 Average:510.68 Median:508.00 stddev:11.22 (2.2%) stddev-sans-first:8.46 06:29:21 INFO - Values: 547.0 516.0 502.0 508.0 497.0 508.0 505.0 499.0 495.0 506.0 504.0 514.0 512.0 510.0 521.0 505.0 509.0 535.0 504.0 515.0 513.0 505.0 506.0 516.0 515.0 06:29:21 INFO - [#6] hixie-002.xml Cycles:25 Average:504.52 Median:504.00 stddev:6.92 (1.4%) stddev-sans-first:6.55 06:29:21 INFO - Values: 517.0 508.0 505.0 506.0 502.0 523.0 501.0 499.0 506.0 507.0 495.0 502.0 504.0 500.0 507.0 502.0 492.0 516.0 493.0 507.0 508.0 505.0 503.0 503.0 502.0 06:29:21 INFO - [#7] hixie-003.xml Cycles:25 Average:247.32 Median:246.00 stddev:7.64 (3.1%) stddev-sans-first:7.40 06:29:21 INFO - Values: 259.0 268.0 264.0 246.0 240.0 245.0 239.0 251.0 241.0 249.0 244.0 243.0 244.0 249.0 251.0 258.0 246.0 247.0 246.0 242.0 243.0 244.0 239.0 239.0 246.0 06:29:21 INFO - [#8] hixie-004.xml Cycles:25 Average:553.40 Median:525.00 stddev:101.34 (19.3%) stddev-sans-first:41.51 06:29:21 INFO - Values: 999.0 531.0 525.0 527.0 525.0 524.0 523.0 536.0 525.0 524.0 522.0 525.0 527.0 524.0 523.0 526.0 729.0 531.0 525.0 526.0 526.0 528.0 525.0 524.0 535.0 06:29:21 INFO - [#9] hixie-005.xml Cycles:25 Average:3031.00 Median:2998.00 stddev:161.92 (5.4%) stddev-sans-first:165.24 06:29:21 INFO - Values: 2996.0 3020.0 3007.0 3777.0 2973.0 3034.0 2982.0 3051.0 3004.0 3087.0 3070.0 3088.0 2997.0 2991.0 2963.0 2948.0 2922.0 3024.0 3010.0 2998.0 2990.0 2926.0 2949.0 3031.0 2937.0 06:29:21 INFO - [#10] hixie-006.xml Cycles:25 Average:5519.36 Median:5519.00 stddev:53.93 (1.0%) stddev-sans-first:54.48 06:29:21 INFO - Values: 5481.0 5505.0 5420.0 5603.0 5435.0 5520.0 5638.0 5631.0 5496.0 5473.0 5569.0 5519.0 5471.0 5578.0 5522.0 5524.0 5509.0 5516.0 5520.0 5507.0 5526.0 5452.0 5538.0 5489.0 5542.0 06:29:21 INFO - [#11] hixie-007.xml Cycles:25 Average:1771.40 MediaINFO : Browser exited with error code: 0 06:29:26 INFO - DEBUG : Terminating: 9806, 9888 06:29:26 INFO - n:1756.00 stddev:40.48 (2.3%) stddev-sans-first:41.16 06:29:26 INFO - Values: 1790.0 1890.0 1798.0 1757.0 1747.0 1761.0 1750.0 1770.0 1746.0 1775.0 1755.0 1739.0 1747.0 1751.0 1753.0 1772.0 1754.0 1766.0 1750.0 1756.0 1901.0 1791.0 1745.0 1747.0 1774.0 06:29:26 INFO - -------- Summary: end --------
Reporter | ||
Comment 3•9 years ago
|
||
figured out the difference. graph server: page_medians = [] for page in svg: # drop the first 5 replicates, and take the median of the rest page_medians.append(median(page[svg][5:])) summary_value = geometric_mean(page_medians) # a geometric mean of medians perf herder: all_values = [] for page in svg: all_values.extend(svg[page]) # effectively taking all data points in a large array summary_value = geometric_mean(all_values) # a geometric mean of all values we should determine if this is fine- if it is- we can close this bug. :joy, is a geometric mean of medians for page load times any different than a geometric mean of all values
Flags: needinfo?(sguha)
Comment 4•9 years ago
|
||
Assuming all_values means all the replicates of all the runs (as if all the runs with all their replicates were combined into a single one big test), then geomeans(page_medians) is better. The reason is that getting medians first for each run drops the outliers and the first replicate[s] of each run which are typically higher. This contributes to a more stable value. geomeans(all_values) doesn't drop any outlier at all (assuming all_values is what I expressed above). Besides, geomeans(page_medians) is how we've been doing that all along, and which we trust. So unless the change to geomean(all_values) was done with intent and after consideration, I don't see why we shouldn't consider it plainly a bug or overlook - and fix it to use geomean(page_medians).
Reporter | ||
Comment 5•9 years ago
|
||
thanks Avi, that makes sense. In general I would agree with that. Lets see what :joy thinks as well.
Comment 6•9 years ago
|
||
Yes the two are different, the median is robust to outliers and is therefore more averaging over medians is more stable than the entire data (in both cases we always ought to drop first 5). median(c(1800,1500,1900,1300,70000)) vs exp(mean(log(c(1800,1500,1900,1300,70000)))) Not remarkably different and the latter takes into account the information contained in 70K. If we believe that outliers are still important to consider, I will point out that the median drops information from a lot of data - 50% of the data (only takes ranks). If we want to use more data and still protect from outliers we can use the trimmed/winsorized mean and then take the g.m of those (as opposed to g.m of medians). But as Avi pointed out, unless the new system really improves things (maybe make it more realistic since we drop less data) I don't see a pressing need to change it. We'd need to do a comparative study. HTH
Flags: needinfo?(sguha)
Assignee | ||
Comment 7•9 years ago
|
||
Here's a bug that attempts to change perfherder to use the graphserver formula. Note that some tests (e.g. the new tps test) don't have 5 replicates. We must be doing something different there. We'll need to update this patch to replicate that before it lands.
Assignee: nobody → wlachance
Attachment #8635451 -
Flags: feedback?(jmaher)
Reporter | ||
Comment 8•9 years ago
|
||
Comment on attachment 8635451 [details] [review] Use graphserver's formula overall this looks great. Lets ensure we don't have issues with startup tests vs pageload tests. Startup tests are tresize, ts_paint, tpaint, sessionrestore, sessionrestore_no_auto, media_tests. Regarding dropping 5, this is done at the talos level, search for ignore_first in test.py: http://hg.mozilla.org/build/talos/file/0600cd6a8eba/talos/test.py sometimes we drop 1, other times we drop 5, sometimes we keep it all. How about we do this: if #_replicates > 15, drop 5, else use all. if there are no subtests, then just use the single page and all the replicates (which is currently working just fine)
Attachment #8635451 -
Flags: feedback?(jmaher) → feedback+
Assignee | ||
Comment 9•9 years ago
|
||
We decided to calculate the summary in talos instead. Bug 1184966
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Assignee | ||
Comment 10•9 years ago
|
||
Actually let's mark this as a dupe.
Resolution: WONTFIX → DUPLICATE
Assignee | ||
Comment 11•9 years ago
|
||
Ok, I think we do, in fact want to fix this especially given that we have a proper patch already. We now only use the summarization code in perfherder for android, but it's easier to fix this than update Android talos (which is going away anyway).
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8635451 [details] [review] Use graphserver's formula This will most likely need to be updated for bitrot, fortunately the patch isn't too complex. This should make all Android results in sync with Graphserver's (will of course test that again before landing).
Attachment #8635451 -
Flags: review?(jmaher)
Reporter | ||
Comment 13•9 years ago
|
||
Comment on attachment 8635451 [details] [review] Use graphserver's formula android has so few cycles, we would need to craft this carefully.
Attachment #8635451 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #13) > Comment on attachment 8635451 [details] [review] > Use graphserver's formula > > android has so few cycles, we would need to craft this carefully. Did we use a special formula inside Talos for Android-specifically? AFAIK we didn't.
Reporter | ||
Comment 15•9 years ago
|
||
there is no special formula for android talos, we just didn't have enough replicates- I think we defaulted to dropping 1 replicate and taking the median of the rest.
Reporter | ||
Comment 16•9 years ago
|
||
do we need to keep this open?
Assignee | ||
Comment 17•9 years ago
|
||
yeah, let's just leave the old android stuff alone if we can.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•