Closed Bug 755527 Opened 12 years ago Closed 12 years ago

results processing should be more coherent end-to-end

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Unassigned)

References

Details

Attachments

(1 file, 9 obsolete files)

Our results chain now is somewhat Kafka-esque:

- ttest.py gets a reduced results string from BrowserLogResults:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/ttest.py#l369

- various pieces from this are returned to run_test.py:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l561

- we collect these pieces and send them as a dict amongst other
  things[*] to send_to_graph:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l585

send_to_graph does a lot of things.  It....

- formats and filters the results as appropriate to ts, amo, or tp
  type tests:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l174

- generates raw results which is a hand-constructed JSON string (See also
  https://bugzilla.mozilla.org/show_bug.cgi?id=744405), e.g.
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l183

- uploads the raw results to a hard-coded IP:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l246

- uploads the formatted results to the old graphserver and harvests
  links:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l253

- we do some other things too, e.g.:
  http://hg.mozilla.org/build/talos/file/d00eba6055f5/talos/run_tests.py#l586

I would prefer to move more results logic to results.py and the
classes that handle it. Starting with the (questionable) assumption
that you want results to be constructed in ttest.py:

- what is now BrowserLogResults is made into a factory to create
  PageloaderResults or a new class, e.g. TsResults.  These are
  accumulated in an array in ttest.py and returned as currently

- (a new class may also be needed for AMOResults)

- (We will also need to do something about counter results, perhaps
  beyond the scope of this bug)

- PageloaderResults and TsResults (and AMOResults) will have methods
  to:
  * generate raw results (``def raw_results``)
  * generate results in graphserver format (``def graphserver_results``)

- send_to_graph will then just loop over the results list and call
  result.graphserver_results() and post that to the graphserver

- likewise for raw results

This should make it easier to add multiple results URLs, to test
results, and otherwise make results more extensible.

As a huge bonus, but probably beyond the scope of the bug here, it
would be nice to make results output pluggable.  Doing these refactors
certainly won't hurt this.

See also:

- bug 745907
- bug 744403

----

[*] browser_config is a beast of its own right and is the subject of
but 755374
:jmaher, I'd love to know your thoughts on this.  I'm being somewhat hand-wavy here, as as long as the description is, it is mostly a napkin sketch.
love it.

I would like to see when we finish a test that we call an output parser and it generates a results object.  There are a couple scenarios here:
1) running >1 test
2) aux data (shutdown, counters, responsiveness, xperf, etc...)

If you think about the above two scenarios, the results object should be related to the test.  Maybe it should be test.results = results().  Ok, that would be too simple. 
 
anyway, when running multiple tests, we have an array of test objects, these test objects can store their respective results and send_to_graph can just take a list of test objects.  Then we have the test name, options and the results all in one place.

How do you take data from various sources and create a uniform object?  I believe we would add results.addStdout(...), results.addCounterData(...), results.addXperfData(...), results.addResponsivenessData(...).  Basically we could add data to the results object and when we serialize it as raw or graph server data it should know all about it.

The original proposal sounds pretty logical.  Hopefully my comment will help shape some of the lower level details.
This is also somewhat addressed in https://bugzilla.mozilla.org/show_bug.cgi?id=744409
See Also: → 744409
(In reply to Joel Maher (:jmaher) from comment #2)
> love it.
> 
> I would like to see when we finish a test that we call an output parser and
> it generates a results object.  There are a couple scenarios here:
> 1) running >1 test

So it may make sense to have a Results class which contains TestResults objects, the latter being tied to an individual test. I think diving in is the only way to really figure this out, but this is my inclination so far.  Otherwise, if it is not cumbersome, we can have a list of results objects, but if there is a bunch we need to do across objects then a container object makes more sense.

> 2) aux data (shutdown, counters, responsiveness, xperf, etc...)

So, IIRC, counters and xperf (not sure about responsiveness) are not test-specific and are, instead, time specific (though maybe I'm wrong here? or there is a little of both?)  If this is the case a master Results container class probably makes sense.

> If you think about the above two scenarios, the results object should be
> related to the test.  Maybe it should be test.results = results().  Ok, that
> would be too simple. 

In both scenarios, the e.g. PageloaderResults class should have the test dictionary available.

> anyway, when running multiple tests, we have an array of test objects, these
> test objects can store their respective results and send_to_graph can just
> take a list of test objects.  Then we have the test name, options and the
> results all in one place.
> 
> How do you take data from various sources and create a uniform object?  I
> believe we would add results.addStdout(...), results.addCounterData(...),
> results.addXperfData(...), results.addResponsivenessData(...).  Basically we
> could add data to the results object and when we serialize it as raw or
> graph server data it should know all about it.

I'm not sure what you mean by .addStdout()...if you mean browser log, my plan is something like instantiating individual e.g. PageloaderResults comes from parsing the browser log and passing these to the PageloaderResults.__init__ which will then make the instance.  The other methods definitely sound like something we will need.

We will also need (for now) two output methods:  

def graphserver_output():
    """returns a string of the PageloaderResults in graphserver format"""

def raw_output():
    """return a string of the PageloaderResults in raw (datazilla) format"""

(I'm using PageloaderResults as a shim example since this class already exists.  Obviously for TsResults, AMOResults, whatever will have these methods)

> The original proposal sounds pretty logical.  Hopefully my comment will help
> shape some of the lower level details.

Coolz.  I can't look at this this week or next, but it will be the next large straw I'm going to chew on unless something comes up.
counters and stuff is not specific to time based.  pageloader rss collection is done per page, not generic to a time interval.  Likewise with xperf information.

the addstdout was the pageloaderresults stuff.
Blocks: 722915
Attached patch WIP (obsolete) — Splinter Review
This has taken much longer than I had hoped and its still not finished.  In short

- I've moved out most (hopefully more by the time of review) of the of the results logic to results.py

- made --results_url and --raw_results_url lists instead of single items

- add --cycles so that i can test without going insane

- killed a bunch of silliness
I've also noticed a bug in how we accumulate ts results:

http://hg.mozilla.org/build/talos/file/170c100911b6/talos/run_tests.py#l185

we overwrite raw_values per browser dump (which, for ts, there are usually 20 of them).  Instead, we should append and deindent here. I will fix this
(In reply to Jeff Hammel [:jhammel] from comment #9)
> I've also noticed a bug in how we accumulate ts results:
> 
> http://hg.mozilla.org/build/talos/file/170c100911b6/talos/run_tests.py#l185
> 
> we overwrite raw_values per browser dump (which, for ts, there are usually
> 20 of them).  Instead, we should append and deindent here. I will fix this

This also would affect tp (see further in the file) though since we run tp for only one cycle we manage to squeak by
Attached patch proposed implementation (obsolete) — Splinter Review
This patch restructures how results reporting is done.  It should be backwards compatible.  The new architecture is as follows:

- a TestResult object is instantiated per test
- for each cycle, BrowserLogResults reads the browser log and returns a TsResults or PageloaderResults which is added to the TestResult object
- each TestResult is added to the TalosResults object at the end of the test
- calling TalosResults.output() with appropriate results urls (and raw results urls) gathers the appropriate output from each TestResult and outputs it to the according URL
- (I also added --cycle for ease of testing)
- (this architecture also made it easy to implement multiple --results_urls and --raw_results_urls as talked about in other bugs)

In short, it is far from perfect but I think it is a good step forward
Attachment #631941 - Attachment is obsolete: true
Attachment #632455 - Flags: review?(jmaher)
Attached patch proposed implementation (obsolete) — Splinter Review
Hadn't realize I had forgotten to save before uploading
Attachment #632455 - Attachment is obsolete: true
Attachment #632455 - Flags: review?(jmaher)
Attachment #632463 - Flags: review?(jmaher)
Attached patch take 2 (obsolete) — Splinter Review
fixed some variable names and extended __all__
Attachment #632463 - Attachment is obsolete: true
Attachment #632463 - Flags: review?(jmaher)
Attachment #632506 - Flags: review?(jmaher)
Note that the way raw_results_urls works for file:// URLs, the file will be overwritten if you have more than one activeTest. This is a bug, but IMHO not one worth delaying the patch for as the feature is still useful in the case where you have only one test.

Possible ways to fix this:

- combine the data for multiple tests into one JSON string.  This is not what datazilla expects currently and I'm guessing there are ramifications (as in, the bad kind) to this.  This is also, IMHO, better take after bug 744405 as trying to hand serialize JSON is an exercise fraught with errors

- provide a way of serializing output with a different filename (or, conceivably, url) per test. this will require a bit of thought.  For instance, if you had  --output-directory, this could be the equivalent for -a ts:tsvg of writing

ts.txt, tsvg.txt # graphserver output
ts.json, tsvg.json # datazilla output

A way I don't want to fix this (but will if really needed for the short term):

- do what --results_url does and just append to the file instead of overwrite.  The reason I don't want to do this is that the resulting structure will not be JSON. We could take this an interim fix, but IMO it is replacing one bug for another
Attached patch take 3 (obsolete) — Splinter Review
fix a few more bugs and fix the test.  i removed the test for per-test filters as its pretty hacky and was mostly for my own verification that things worked.  Longer term, we should reintroduce the test when there is less stubbing
Attachment #632506 - Attachment is obsolete: true
Attachment #632506 - Flags: review?(jmaher)
Attachment #632770 - Flags: review?(jmaher)
Blocks: 721097
pushed to try:https://tbpl.mozilla.org/?tree=Try&rev=724f7c61d184

Getting some graphserver unreachable errors....not really sure what is going on there, but will look as best i can from this vantage point
Also a few of these:

RETURN: cycle time: 00:13:50<br>
talos-r3-leopard-030: 
		Stopped Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r: 
		Started Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r: 
		Stopped Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_pbytes_paint: 
		Started Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_pbytes_paint: 
		Stopped Wed, 13 Jun 2012 14:09:27
No results collected for: tp5r_main_rss: 
		Error Wed, 13 Jun 2012 14:09:27
No results collected for: tp5r_content_rss: 
		Error Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_responsiveness_paint: 
		Started Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_responsiveness_paint: 
		Stopped Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_shutdown_paint: 
		Started Wed, 13 Jun 2012 14:09:27
Generating results file: tp5r_shutdown_paint: 
		Stopped Wed, 13 Jun 2012 14:09:27

command timed out: 3600 seconds without output, attempting to kill

I'll add some more diagnostics.  Overall, this is somewhat difficult to diagnose from this end
(In reply to Jeff Hammel [:jhammel] from comment #16)
> pushed to try:https://tbpl.mozilla.org/?tree=Try&rev=724f7c61d184
> 
> Getting some graphserver unreachable errors....not really sure what is going
> on there, but will look as best i can from this vantage point

The possible interesting bits:

Error in collecting counter: Private Bytes
Error in collecting counter: Main_RSS
Error in collecting counter: Content_RSS
Completed test tp5row: 
		Stopped Wed, 13 Jun 2012 14:37:49
RETURN: cycle time: 00:26:54<br>
talos-r3-leopard-021: 
		Stopped Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row: 
		Started Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row: 
		Stopped Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_pbytes_paint: 
		Started Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_pbytes_paint: 
		Stopped Wed, 13 Jun 2012 14:37:49
No results collected for: tp5row_main_rss: 
		Error Wed, 13 Jun 2012 14:37:49
No results collected for: tp5row_content_rss: 
		Error Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_responsiveness_paint: 
		Started Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_responsiveness_paint: 
		Stopped Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_shutdown_paint: 
		Started Wed, 13 Jun 2012 14:37:49
Generating results file: tp5row_shutdown_paint: 
		Stopped Wed, 13 Jun 2012 14:37:49
WARNING: graph server unreachable
WARNING: (54, 'Connection reset by peer')
WARNING: graph server unreachable
WARNING: (61, 'Connection refused')
WARNING: graph server unreachable
WARNING: (54, 'Connection reset by peer')
WARNING: graph server unreachable
WARNING: (61, 'Connection refused')
WARNING: graph server unreachable
WARNING: (61, 'Connection refused')
Failed sending results: 
		Stopped Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row: 
		Started Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row: 
		Stopped Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_pbytes_paint: 
		Started Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_pbytes_paint: 
		Stopped Wed, 13 Jun 2012 15:11:26
No results collected for: tp5row_main_rss: 
		Error Wed, 13 Jun 2012 15:11:26
No results collected for: tp5row_content_rss: 
		Error Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_responsiveness_paint: 
		Started Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_responsiveness_paint: 
		Stopped Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_shutdown_paint: 
		Started Wed, 13 Jun 2012 15:11:26
Generating results file: tp5row_shutdown_paint: 
		Stopped Wed, 13 Jun 2012 15:11:26

FAIL: Graph server unreachable (5 attempts)
RETURN:(61, 'Connection refused')
Traceback (most recent call last):
  File "run_tests.py", line 350, in <module>
    main()
  File "run_tests.py", line 347, in main
    run_tests(parser.config)
  File "run_tests.py", line 323, in run_tests
    talos_results.output(results_urls=results_urls, raw_results_urls=raw_results_urls)
  File "/Users/cltbld/talos-slave/talos-data/talos/results.py", line 135, in output
    raise e
utils.talosError: "Graph server unreachable (5 attempts)\n(61, 'Connection refused')"
program finished with exit code 1
elapsedTime=3632.542562
TinderboxPrint:<a href = "http://hg.mozilla.org/try/rev/724f7c61d184">rev:724f7c61d184</a>

TinderboxPrint: cycle time: 00:26:54<br>

TinderboxPrint:FAIL: Graph server unreachable (5 attempts)

TinderboxPrint:(61, 'Connection refused')
Comment on attachment 632770 [details] [diff] [review]
take 3

Review of attachment 632770 [details] [diff] [review]:
-----------------------------------------------------------------

I really got lost in results.py.  If you want an accurate review of that, give me a couple days or meet with me to walk through it.  It seems a bit overkill and I don't understand why there is so much code there to just collect and report results.

::: talos/results.py
@@ +108,5 @@
> +                    raw_results = test.datazilla_results(date=self.date, remote=self.remote, machine=self.title, browser_config=self.browser_config)
> +
> +                    for raw_results_url in raw_results_urls:
> +                        # post the data
> +                        # TODO: json encode

do we need to json encode?

::: talos/ttest.py
@@ +370,5 @@
> +                # add the results from the browser output
> +                # browser_log_results = results.BrowserLogResults(filename=browser_log_filename, counter_results=counter_results, global_counters=global_counters)
> +                # browser_results = browser_log_results.browser_results
> +                # format = browser_log_results.format
> +                test_results.add(browser_log_filename, counter_results=counter_results)

please remove the commented out code.

@@ +388,1 @@
>  

please remove commented out code.

@@ +394,5 @@
> +            test_results.all_counter_results.extend([{key: value} for key, value in global_counters.items()])
> +
> +            # return results
> +            return test_results
> +#            return (all_browser_results, all_counter_results, format)

and more commented code to remove.
Attachment #632770 - Flags: review?(jmaher) → review-
As per a conversation about this, I am fine with results.py as it is, but I would strongly prefer that we split datazilla and graph server output into separate classes.
Attached patch move output to its own file (obsolete) — Splinter Review
This moves all the output related stuff to output.py and makes it more pluggable.
Attachment #632770 - Attachment is obsolete: true
Attachment #634182 - Flags: review?(jmaher)
Attached patch correct variable name (obsolete) — Splinter Review
Attachment #634182 - Attachment is obsolete: true
Attachment #634182 - Flags: review?(jmaher)
Attachment #634228 - Flags: review?(jmaher)
Attachment #634228 - Attachment is obsolete: true
Attachment #634228 - Flags: review?(jmaher)
Attachment #634229 - Flags: review?(jmaher)
Attached patch a few more fixes (obsolete) — Splinter Review
Attachment #634229 - Attachment is obsolete: true
Attachment #634229 - Flags: review?(jmaher)
Attachment #634289 - Flags: review?(jmaher)
Comment on attachment 634229 [details] [diff] [review]
i didn't actually mean to delete the README

Review of attachment 634229 [details] [diff] [review]:
-----------------------------------------------------------------

since there is minor updates coming, I am going to stop my review and just send these comments along.

::: talos/output.py
@@ +180,5 @@
> +        """returns if the test is a responsiveness test"""
> +        # XXX currently this just looks for the string
> +        # 'responsiveness' in the test name.
> +        # It would be nice to be more declarative about this
> +        return 'responsiveness' in testname

should this be in the graphserver stuff or in a general place?

@@ +198,5 @@
> +        if responsiveness:
> +            _type = 'AVERAGE'
> +        elif self.results.amo:
> +            _type = 'AMO'
> +            info_format = self.amo_info_format

can this elif clause go in the amo class below?

@@ +277,5 @@
> +        for line in lines:
> +            if not line:
> +                continue
> +            if line.lower() in ('success',):
> +                print 'RETURN:addon results inserted successfully'

can this go in the amoresults class below?

@@ +416,5 @@
> +try:
> +    from amo.amo_api import upload_amo_results, amo_results_data
> +    # depends on httplib2 and json/simplejson so we conditionally import it
> +
> +    class AMOOutput(Output):

can this be a subclass of graphserver so we can just overload the functionality that changes for amo?
> > +        """returns if the test is a responsiveness test"""
> > +        # XXX currently this just looks for the string
> > +        # 'responsiveness' in the test name.
> > +        # It would be nice to be more declarative about this
> > +        return 'responsiveness' in testname
> 
> should this be in the graphserver stuff or in a general place?

For this patch, I don't really know.  It can go on the generic output class....I put it on the graphserver output class as it is only used there.  Really, its a horrible hack.  In the long long term, I would rather have test objects that actually knew if they were responsiveness tests vs some introspection of the name (vs the current test_config dictionary), but not important now
(In reply to Joel Maher (:jmaher) from comment #25)
> Comment on attachment 634229 [details] [diff] [review]
<snip/>
> can this go in the amoresults class below?
> 
> @@ +416,5 @@
> > +try:
> > +    from amo.amo_api import upload_amo_results, amo_results_data
> > +    # depends on httplib2 and json/simplejson so we conditionally import it
> > +
> > +    class AMOOutput(Output):
> 
> can this be a subclass of graphserver so we can just overload the
> functionality that changes for amo?

With regard to the amo issues, it is a bit complicated as when amo is set we do two things currently:

1. we upload our results to whatever graphserver URLs we have (i.e. results_urls) but in a different format: https://wiki.mozilla.org/Buildbot/Talos/DataFormat#AMO

2. we also upload results via simplejson encoding in a different format as given in amo.amo_api (formerly to a hardcoded URL, though I have un-hard-coded it here)

Since 'amo' is a global flag, and since the issues you point to here are in the graphserver code, I'm not sure what is best here if we want to preserve existing functionality.  (resonsiveness is similarly handled, fwiw).  An alternative would be to have a URL class (a new format in output.py), say 'amo_graphserver_url', and a new class AMOGraphserverOutput that inherits from GraphserverOutput, and have config['results_urls'] changed to config['amo_graphserver_urls'] in run_test.py . This will still leave responsiveness special-cased.  If this is clearer, I can do it this way.

I also don't mind deprecating amo uploading to graphserver since its not being used.  If in the future we'll want to upload to datazilla, this is also not covered by existing output semantics
Attached patch more fixesSplinter Review
Running on try.  Will flag for review on green
Attachment #634289 - Attachment is obsolete: true
Attachment #634289 - Flags: review?(jmaher)
Comment on attachment 634467 [details] [diff] [review]
more fixes

pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=a9113137937a

Its green enough that I'm going to put this up for review and hope nothing else falls out
Attachment #634467 - Flags: review?(jmaher)
This patch also doesn't get autolog results for xperf reporting:

http://hg.mozilla.org/build/talos/file/170c100911b6/talos/xtalos/mozautolog.py

It would be nice to bring that under this umbrella, but it will be non-trivial
Comment on attachment 634467 [details] [diff] [review]
more fixes

Review of attachment 634467 [details] [diff] [review]:
-----------------------------------------------------------------

as I have sort of reviewed this 3 times before there is not much to add.  It actually is starting to make sense:)

::: talos/output.py
@@ +86,5 @@
> +        return bool([i for i in memory_metric if i in resultName])
> +
> +    @classmethod
> +    def responsiveness_Metric(cls, val_list):
> +        return round(sum([int(x)*int(x) / 1000000.0 for x in val_list]))

someday this should go in the filters :)

::: talos/run_tests.py
@@ +269,5 @@
> +
> +  # results links
> +  outputs = ['results_urls', 'datazilla_urls']
> +  results_urls = dict([(key, config[key]) for key in outputs
> +                       if key in config])

I like this section!
Attachment #634467 - Flags: review?(jmaher) → review+
pushed: http://hg.mozilla.org/build/talos/rev/45b903369dca

> > +    def responsiveness_Metric(cls, val_list):
> > +        return round(sum([int(x)*int(x) / 1000000.0 for x in val_list]))
> 
> someday this should go in the filters :)

TBH, i'm not even sure what this means o_O
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Jeff Hammel [:jhammel] from comment #32)
> pushed: http://hg.mozilla.org/build/talos/rev/45b903369dca
> 
> > > +    def responsiveness_Metric(cls, val_list):
> > > +        return round(sum([int(x)*int(x) / 1000000.0 for x in val_list]))
> > 
> > someday this should go in the filters :)

Ticketed : bug 766598

> TBH, i'm not even sure what this means o_O
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: