Closed
Bug 1193385
Opened 9 years ago
Closed 9 years ago
Test summarization for subtests gives incorrect results for tresize (and possibly other tests)
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wlach, Assigned: jmaher)
References
Details
Attachments
(2 files, 2 obsolete files)
1.81 KB,
patch
|
Details | Diff | Splinter Review | |
4.73 KB,
patch
|
parkouss
:
review+
|
Details | Diff | Splinter Review |
See here for example:
https://treeherder.mozilla.org/logviewer.html#?job_id=12743279&repo=mozilla-inbound
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1439303977/mozilla-inbound_ubuntu64_hw_test-chromez-bm103-tests1-linux-build3881.txt.gz
Despite having these replicates: 17.76164999999998, 20.503433333333373, 21.181566666666647, 18.56949999999996, 16.985500000000034, 17.831733333333343, 21.00719999999998, 18.389749999999985, 18.190666666666623, 17.328116666666666, 17.031466666666667, 19.077733333333324, 17.125199999999996, 16.616666666666674, 16.775350000000003, 20.67286666666666, 18.896600000000017, 18.486100000000025, 19.525000000000016, 19.44226666666668
We have these results for min/max/std/med/mean: {"std": 0.0, "min": 19.44226666666668, "max": 19.44226666666668, "median": 19.44226666666668, "filtered": 19.44226666666668, "mean": 19.44226666666668}
Something is clearly off
Comment 1•9 years ago
|
||
From Joel on IRC: this bug probably applies to all startup tests.
Reporter | ||
Updated•9 years ago
|
Summary: Test summarization for subtests for tresize (at least) → Test summarization for subtests gives incorrect results for tresize (and possibly other tests)
Assignee | ||
Comment 2•9 years ago
|
||
this is really an issue of --cycles >1, but we only do that for startup_tests, and only have a problem (I believe so) with tresize,ts_paint,sessionrestore*. These tests all have a pagename NULL.
This is a hack, I would like to have the tests spit out a common format and we can just simplify our parsing overall.
This is where I am, maybe some other eyes can help shed light on a more elegant solution.
Comment 3•9 years ago
|
||
Comment on attachment 8648066 [details] [diff] [review]
when we have --cycles >1, roll subtests together (1.0)
Review of attachment 8648066 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good to me, some minor style preferences that I find more intuitive.
This seems to work, I have an output like that with tresize and cycles==2:
{"std": 0.19869552666664703, "min": 12.417350793333359, "max": 12.814741846666653, "median": 12.616046320000006, "filtered": 12.616046320000006, "mean": 12.616046320000006}
Still we should try later to rework the results/output (maybe after graphserver end of life) to make summarization (and in a more general way result construction and printing) code looks better.
::: talos/output.py
@@ +10,5 @@
> import logging
> import post_file
> import time
> import utils
> +import results as TalosResults
Note that you can now use talos package. :) I think it would be good to import it with:
from talos.results import Results
since you only use that from the results module.
@@ +494,5 @@
> test_result['testrun']['date'] = self.results.date
>
> # serialize test results
> results = {}
> + tsresults = []
It seems to me that you can assign that to None, and just call it tsresult since later you only append if the list is not empty (one element max).
@@ +499,5 @@
> summary = {"suite": 0, "subtests": {}}
> if not test.using_xperf:
> vals = []
> +
> + #TODO: counters!!!! we don't have any, but they suffer the same
nit: space after "#TODO" - e.g., "# TODO" (there is another one after).
@@ +509,5 @@
> + results.setdefault(test.name(), []).extend(val)
> + if not tsresults:
> + r = TalosResults.Results()
> + r.results = [{'index': 0, 'page': test.name(), 'runs': val}]
> + tsresults.append(r)
only define tsresult if it is None here
@@ +512,5 @@
> + r.results = [{'index': 0, 'page': test.name(), 'runs': val}]
> + tsresults.append(r)
> + else:
> + #TODO: do we need such indirection
> + for res in tsresults:
same thing, only test for tsresult to be not None.
@@ +513,5 @@
> + tsresults.append(r)
> + else:
> + #TODO: do we need such indirection
> + for res in tsresults:
> + for r in res.results:
I think you can just use tsresult.results[0] since you add only one value above
@@ +521,5 @@
> + results.setdefault(page, []).extend(val)
> +
> + tresults = test.results
> + if tsresults:
> + tresults = tsresults
you can write a one line if you like, e.g.:
tresults = [tsresult] if tresult else tests.results
Attachment #8648066 -
Flags: feedback?(j.parkouss) → feedback+
Assignee | ||
Comment 4•9 years ago
|
||
hmm, we seem to have circular dependencies with:
from [talos.]results import Results
in results, we have:
from talos import output
and we use output.filters.
could just move the filters to results, have to think on this a bit.
Assignee | ||
Comment 5•9 years ago
|
||
updated patch with review feedback. Including in this patch is the actual ability to use the new fancy v8 stuff since the filter was never defined.
This should cleanup all our existing needs for talos summarization from the harness.
Attachment #8648066 -
Attachment is obsolete: true
Attachment #8648470 -
Flags: review?(j.parkouss)
Comment 6•9 years ago
|
||
Comment on attachment 8648470 [details] [diff] [review]
when we have --cycles >1, roll subtests together (1.1)
Review of attachment 8648470 [details] [diff] [review]:
-----------------------------------------------------------------
Unfortunately, it fails:
talos --develop -a tresize -e /usr/bin/firefox --cycles 2
...
Traceback (most recent call last):
File "/home/jp/dev/talos/venv/bin/talos", line 9, in <module>
load_entry_point('talos==0.0', 'console_scripts', 'talos')()
File "/home/jp/dev/talos/talos/run_tests.py", line 284, in main
sys.exit(run_tests(config, browser_config))
File "/home/jp/dev/talos/talos/run_tests.py", line 267, in run_tests
talos_results.output(results_urls)
File "/home/jp/dev/talos/talos/results.py", line 69, in output
results = _output()
File "/home/jp/dev/talos/talos/output.py", line 523, in __call__
result.values(test_result['testrun']['suite'],
AttributeError: 'list' object has no attribute 'values'
Also, I tried to fix the patch quickly (see the attachment), this seems to work for tresize (same with the previous patch) but with v8_7:
talos --develop -a v8_7 -e /usr/bin/firefox --cycles 3
...
{"std": 0.0, "min": 795.5449482895783, "max": 795.5449482895783, "median": 795.5449482895783, "value": 930.1661730000001, "filtered": 930.1661730000001, "mean": 795.5449482895783}
that seems not right (not sure if this patch is intended to address that or not though)
Attachment #8648470 -
Flags: review?(j.parkouss) → review-
Comment 7•9 years ago
|
||
Assignee | ||
Comment 8•9 years ago
|
||
thanks for checking this out- I think I got hurried. I have attached a patch that is the original + the fix. I have retriggered on try all the jobs and looked at the summary section of the talosdata.
Attachment #8648470 -
Attachment is obsolete: true
Attachment #8648663 -
Flags: review?(j.parkouss)
Comment 9•9 years ago
|
||
Comment on attachment 8648663 [details] [diff] [review]
when we have --cycles >1, roll subtests together (1.2)
Review of attachment 8648663 [details] [diff] [review]:
-----------------------------------------------------------------
Yup, np. :)
Looks good to me (not tested though), just one small nit.
::: talos/output.py
@@ +491,5 @@
> test_result['testrun']['date'] = self.results.date
>
> # serialize test results
> results = {}
> + tsresults = None
Nit: I would name this tsresult (without trailing 's') since this hold only one test result instance. This will make more sense for the one line construction:
tresults = [tsresult] if tsresult else test.results
Attachment #8648663 -
Flags: review?(j.parkouss) → review+
Assignee | ||
Comment 10•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•9 years ago
|
||
Did some verification, looks good for tresize and tspaint at least. Compare these two revisions, note how min/max didn't work before (were the same as mean), and do now:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,726a741fd1dd1e9fb0b85bc9733658f6eabd07df,1]&highlightedRevisions=f106756af2a4&highlightedRevisions=f195230a7c5d
Comment 12•9 years ago
|
||
(In reply to William Lachance (:wlach) from comment #11)
> Did some verification, looks good for tresize and tspaint at least. Compare
> these two revisions, note how min/max didn't work before (were the same as
> mean), and do now:
>
> https://treeherder.mozilla.org/perf.html#/
> graphs?timerange=604800&series=[mozilla-inbound,
> 726a741fd1dd1e9fb0b85bc9733658f6eabd07df,
> 1]&highlightedRevisions=f106756af2a4&highlightedRevisions=f195230a7c5d
Would this be a correct description of the issue, taking your ts_paint example? :
The TALOSDATA JSON blob (which treeherder uses as the main input from the raw log) had the correct (cycle) replicates both before and after the change for ts_paint, but the summary at the same JSOM blob for ts_paint used only the last replicate value for min/max/mean/etc ?
(looking at the raw logs, before: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1439391127/mozilla-inbound_ubuntu64_hw_test-other_l64-bm103-tests1-linux-build3821.txt.gz and after: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1439923834/mozilla-inbound_ubuntu64_hw_test-other_l64-bm104-tests1-linux-build1955.txt.gz )
Question: did it also affect graphserver? or are there a different talos code paths for cycle replicate summarization between outputs designated to graphserver and treeherder?
Assignee | ||
Comment 13•9 years ago
|
||
that is a correct statement of using only the last cycle. This did not affect graphserver at all as this is just the data inside the talosdata json blob.
Comment 14•9 years ago
|
||
Thanks. So, according to 'test.py', the following tests were affected since they have more than one cycle:
- ts_paint (20)
- sessionrestore (10)
- tresize (20)
- media_test (5)
- tps (5)
tcanvasmark has 5 "tpcycles" rather than "cycles" (not to be confused with tppagecycles). Was it affected (and fixed) as well?
Assignee | ||
Comment 15•9 years ago
|
||
When I looked at the data tcanvasmark was not affected, keep in mind sessionrestore_no_auto_restore would be affected, and we don't run media_tests anymore.
You need to log in
before you can comment on or make changes to this bug.
Description
•