Closed Bug 774419 Opened 12 years ago Closed 12 years ago

New package: MozTest (or similar) for Mozbase

Categories

(Testing :: Mozbase, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Assigned: mihneadb)

References

()

Details

Attachments

(2 files, 9 obsolete files)

738 bytes, text/plain
Details
14.04 KB, patch
k0scist
: review+
jgriffin
: review+
Details | Diff | Splinter Review
See bug 773394 for the origin of this discussion and https://groups.google.com/forum/#!topic/mozilla.tools/FJYN4wg4erk for the tools group discussion.

It would be nice to have a Mozbase package that would consolidate test results.  This includes the xunit writing code from bug 773394, but could include autolog reporting, datazilla reporting, etc.

We should spec out an API and figure out what goes there and file bugs for existing software to be brought under this banner.
(In reply to Jeff Hammel [:jhammel] from comment #0)

> We should spec out an API and figure out what goes there and file bugs for
> existing software to be brought under this banner.

Datazilla_client is one: https://github.com/mozilla/datazilla_client ; or the package could use this as a dependency either in the interim or in the long run
As long as we're in general agreement as per the code, I'm more than happy handing off the specing to whoever.
I agree with this package. I just want a clear definition of "test" vs "result" and how "result" ties into "log" (continuing discussion from the other bug).

Is there a case where:
A) We would use the test class but not the result class (can't think of one)
B) We would use the result class but not the test class (possibly, but still seems edge case-y)

Maybe a separate results.py is good enough (as opposed to a separate module)?

How would a results class affect existing harnesses that use mozbase? Is the results class now responsible for logging pass/fails or does it simply return an object to the harness? I'd imagine the latter, but then we need to keep in mind that several harnesses depend on receiving a continuous stream of pass/fail data for buildbot.

I'm just a little confused at how it all will fit together at the moment.
(In reply to Andrew Halberstadt [:ahal] from comment #3)
> I agree with this package. I just want a clear definition of "test" vs
> "result" and how "result" ties into "log" (continuing discussion from the
> other bug).
> 
> Is there a case where:
> A) We would use the test class but not the result class (can't think of one)
> B) We would use the result class but not the test class (possibly, but still
> seems edge case-y)

For A. I can't think of any off the top of my head, though I would guess there are cases.

For B. this seems likely if you are post-processing results and testing output and post-processing.  For instance, let's say you have one set of results you want to combine with another set and crunch some numbers on (say, "all failed tests" just to keep it easy).  I'm not trying to make things hard, but in general I much more fear trying to un-mix soup vs having a soup that's not mixed enough.

> Maybe a separate results.py is good enough (as opposed to a separate module)?

Package, you mean?  I don't know.  Even if we go with a single file, I'd prefer to have it be a separate package.  It isn't part of logging, exactly, unless we're defining logging == "all output"

> How would a results class affect existing harnesses that use mozbase? Is the
> results class now responsible for logging pass/fails or does it simply
> return an object to the harness? 

My opinion (which I don't claim is the One True Opinion) is that results will be a class where the harness accumulates results.  See, for example, https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/__init__.py#L43 or http://hg.mozilla.org/build/talos/file/1f8f47452048/talos/results.py (which is overly complex, but ignoring that).  In both cases, these basically do the following:

* provide a place to accumulate results
* provide some (very) basic methods to inspect
* provide places where output to various formats may plug in so that one can do: results => output
* some other framework-specific bells and/or whistles

The last should be doable via subclassing (again, IMHO, there are many architecture possibilities).

> I'd imagine the latter, but then we need to
> keep in mind that several harnesses depend on receiving a continuous stream
> of pass/fail data for buildbot.

Yes, I'd still say the harness's job is to gather results (again, my opinion).  I'd say MozResultsTestWhatever is a place to put them and output from them.  Other opinions welcome

> I'm just a little confused at how it all will fit together at the moment.

Does my sketch above make things clearer?  That's my vision, anyway
Good questions, Andrew!

I think the test result classes should be more or less dumb objects that hold data. Sure, they have some methods for adding content and loading/saving from/to other formats. The important bit is they are agnostic as to how things arrived there.

To support the case of "streaming" results, I think generic callbacks on the TestSuiteResults container class would do just fine.

def onTestResult(suite, result):
    if result.success:
        print 'TEST-PASS...'
    else:
        print 'TEST-FAIL...'

This could be installed by the test runner.

FWIW, I think our existing reliance on parsing output of tests during the whole chain should be marginalized as much as possible. We should aim to have machine-consumable test results which don't require that every downstream system invent a parser. At most, there should be one parser: the test suite runner. This should normalize everything as far as it can so test results are fully machine readable and don't require additional parsing.

I'm not advocating for getting rid of human-readable output (you can still convert to that). But, we shouldn't rely on the human readable input for any kind of post-run crunching.
See also some of my ideas in bug 774132 and comments in the README.rst on bug 751795. This might be easier to explain in person or IRC if you have any questions.
(In reply to Andrew Halberstadt [:ahal] from comment #3)
> I agree with this package. I just want a clear definition of "test" vs
> "result" and how "result" ties into "log" (continuing discussion from the
> other bug).
> 
> Is there a case where:
> A) We would use the test class but not the result class (can't think of one)
> B) We would use the result class but not the test class (possibly, but still
> seems edge case-y)
> 

I think B can occur if we try to assure backwards-compatibility. For example, the test results in the xpcshell harness are stored as manually created dictionaries. We could just convert those to a test-result-class instance without needing to rewrite the whole run_tests part.
Attached file example
So we probably should find an owner for this.  In basic form I think we'll want....

* a TestResults object for holding all test results
* an object to hold an individual test result
* tests can have names, a status (pass/fail are the most basic, although this could be any number), and probably a description of what happened (or maybe None if pass)
** and probably an arbitrary key-value pair

There should be output classes.  These are classes that take results and *do stuff with them* like post to datazilla: https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.py

See also https://developer.mozilla.org/en/Test_log_format

I'm happy to take or co-take this if no one else has time or if this is blocking.  While the first implementation does not have to be the Be All End All, I would like to see a little thought put into this such that we don't regret it a few months from now.
(In reply to Jeff Hammel [:jhammel] from comment #8)
> * a TestResults object for holding all test results
> * an object to hold an individual test result

+1

> * tests can have names, a status (pass/fail are the most basic, although
> this could be any number), and probably a description of what happened (or
> maybe None if pass)
> ** and probably an arbitrary key-value pair

I'd advocate for a stronger API and not allowing arbitrary key-value pairs. The reason is you are dealing with a format that will be used by a lot of different systems, some of which may not be under your direct control. If you leave them to their own ways, they will invent their own methods of doing things and you are back at the original problem of having N solutions to the same problem.

Force people to produce a known format so there are no one-offs and downstream tools can consume this one format. Adding new fields should be simple: just add a property/method to the base class. So, people shouldn't complain about that being a barrier.

> There should be output classes.  These are classes that take results and *do
> stuff with them* like post to datazilla:
> https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.
> py

I concur. Test runners can then instantiate whatever output class is appropriate (maybe several) and do the right thing.

> I'm happy to take or co-take this if no one else has time or if this is
> blocking.

As stated on IRC, this is not blocking and is probably low priority. Bug 773394 was originally filed as a DRY violation, nothing more.
(In reply to Jeff Hammel [:jhammel] from comment #8)
> Created attachment 642774 [details]
> example
> 
> So we probably should find an owner for this.  In basic form I think we'll
> want....
> 
> * a TestResults object for holding all test results
> * an object to hold an individual test result
> * tests can have names, a status (pass/fail are the most basic, although
> this could be any number), and probably a description of what happened (or
> maybe None if pass)
> ** and probably an arbitrary key-value pair
> 
I'd add duration to the standard list of attributes a test can have (which can be None if the harness doesn't care about test duration).  

Should we allow tests to distinguish between filename and name?  For Marionette tests (and other things using unittest), there's both a filename, and a testname, e.g., test_execute_script.py (filename), and test_execute_script.TestExecuteChrome.test_check_window (combination of filename, class and method names).  In this case, it's nice to have filename as a separate property, in case we will be using it to locate files (to generate a manifest, for example).

For tests that don't need to distinguish these, name could be == filename.

For status, maybe:  pass, fail, known-fail, unexpected-success (or should those just be lumped in with fail?), skipped, error (test harness error as opposed to test failure).
(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> I'd add duration to the standard list of attributes a test can have
> (which can be None if the harness doesn't care about test duration).  

I think we should *always* care about duration, even if the test harness today doesn't. Tracking it is one more piece of data. It is cheap to keep and you never know when it may come in handy.

Also, I would generally say to not record duration directly. Instead, log the start and end wall times explicitly. By logging absolute times instead of a range, you can do things like visualize how tests are running in parallel. You lose this if you just have a duration.

> Should we allow tests to distinguish between filename and name?

Yes. Filenames can change over time and if test names are constant, this can be used to track identical tests across renames.

> For status, maybe:  pass, fail, known-fail, unexpected-success (or should
> those just be lumped in with fail?), skipped, error (test harness error as
> opposed to test failure).

Store expected result and actual result separately.
The TestResults object will also need a variety of attributes to describe the environment that the tests were run in, e.g., https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.py#L55.

This is trickier than it sounds, because we already use different terms for the same os/platform/harness in different pieces of Mozilla infrastructure, and although we can't fix this problem with this package, we don't want to make it worse, either.
So as this is low-priority (I think?) I'm much relieved :)

I'm in favor of looser descriptions of test metadata rather than imposing more stringent ones for a few reasons:

1. Getting everyone to agree on what defines what all test results should have may be folly
2. If we change what that is, and build on it, and then a test comes along that does not have this characteristic, it may be hard to change the test harness

I'm not against e.g.

def add(self, status, name, description, duration=None, filename=None, **metadata)

OTOH, I would hate to block new frameworks etc on defining too tight of an API.

Duration would be great to have for harnesses that all access to such information, but I'd hate to require it.

filename vs name would be great if you have file based tests.  Is this always the case? (There are also at least URLs.  You could have more than one test per file.  Etc)

We probably should enumerate all of our actual statuses more formally.

In any case, this is all bikeshedding without alternate implementations.  I'd love to see an implementation that was both as tight as possible but also easily extensible.  Along with the argument that "if you give people a dict, they'll abuse  it" goes the argument "If you make something hard to do with the API, people will hack around it". Both extremes are bad
We should also consider BrowserTestResults, which will be a large but important subclass.  We probably in general should record extensions, Firefox version (buildid, etc), preferences, and anything else about the browser environment.  We should probably also transmit (if not add directly to results) the platform information from mozinfo
(In reply to Jeff Hammel [:jhammel] from comment #13)
> I'm in favor of looser descriptions of test metadata rather than imposing
> more stringent ones for a few reasons:
> 
> 1. Getting everyone to agree on what defines what all test results should
> have may be folly
> 2. If we change what that is, and build on it, and then a test comes along
> that does not have this characteristic, it may be hard to change the test
> harness

I never said all these things should be required ;)

Make the core set of required attributes small. Test name. Expected result. Actual result. Everything else is optional. Maybe an outputter requires an additional set of attributes to work. If so, the burden is on the test runner to populate them to get that feature.

This still maintains flexibility while imposing some law and order.
(In reply to Jeff Hammel [:jhammel] from comment #13)
> def add(self, status, name, description, duration=None, filename=None,
> **metadata)

Please see our discussion on bug 771030 where the usage of a ** argument blocks the extensibility of the API. I would vote that we do not make use of it but also use a real keyword argument here which defaults to None.
https://etherpad.mozilla.org/test-result-data-model now contains a proposal for the data model.

I think this supersedes what we being discussed in the other bug, so replacing the URL.
I ha
Assignee: nobody → mbalaur
I have two questions:

1. What results do we accept for the tests? (PASS, FAIL, SKIP etc)

2. What kind of results lists do we want to get from the TestResultCollection? (successful, failed, todo etc)
As discussed on IRC, the way of creating the test results will be:

1. instantiate the TestResult object
2. run the test
3. call finish(status) on the TestResult object
Attachment #643208 - Flags: feedback?(jhammel)
Attachment #643208 - Flags: feedback?(jgriffin)
Attachment #643208 - Flags: feedback?(gps)
Attachment #643208 - Flags: feedback?(ahalberstadt)
(In reply to Mihnea Dobrescu-Balaur from comment #19)
> I have two questions:
> 
> 1. What results do we accept for the tests? (PASS, FAIL, SKIP etc)
> 
> 2. What kind of results lists do we want to get from the
> TestResultCollection? (successful, failed, todo etc)

Some of these can be gleaned from https://developer.mozilla.org/en/Test_log_format . For 1. I think PASS, FAIL, SKIP are a good start.  We can always add more.

For 2., I think the short answer is we'd like to get as many as possible.  The ones you mention are interesting.  "All tests" of course.  If we care about more then it could be cool to look towards having some sort of 'query()' method for it, but this is probably YAGNI at this point
For 1) this is what mozlog currently defines: https://github.com/mozilla/mozbase/blob/master/mozlog/mozlog/logger.py#L21 .. Though nothing really uses mozlog yet.
A better list is probably:
https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/logger.py#L36

I have taken all those results from mochitests.
Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

I think it is looking good!

::: moztest/moztest/results.py
@@ +14,5 @@
> +    """ Stores test result data """
> +
> +    POSSIBLE_RESULTS = ('PASS', 'FAIL', 'SKIP', 'ERROR')
> +
> +    def __init__(self, name, filename=None, context=TestContext(''), result_expected='PASS'):

We *may* want to move to a model where we have multiple "context" types. This could be changed later, as the scope of a context increases.

@@ +27,5 @@
> +        self.time_end = None
> +        self.result_expected = result_expected
> +        self.result_actual = None
> +        self.finished = False
> +        self._output = ""

I think a list of strings would be better. Many test runners will read line output as it is available. I think it makes more sense to buffer in the test result and to retain the original structure of this data. We can bikeshed over whether the strings inside the list should have newline characters. I would vote for no.
Attachment #643208 - Flags: feedback?(gps) → feedback+
Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

+    POSSIBLE_RESULTS = ('PASS', 'FAIL', 'SKIP', 'ERROR')

I would probably not make this a tuple so that subclasses can (more easily) extend this.

+        self.filename = filename or name
I thought we decided that filename was optional and so could be added later?


+    def __str__(self):
+        return '%s - %s' % (self.name, self.result_actual)

I would ideally love to see the default format follow https://developer.mozilla.org/en/Test_log_format (assuming the test is done

Are result_actual and finished redundant?

So for some test harnesses (e.g. Talos), we may record the start and end time separately.  Maybe its better to have these as optional arguments to __init__ and finish and only calculate them if they're None?

+    @property
+    def context(self):
+        """ The TestContext in which the test was run """
+        return self._context
+
+    @context.setter
+    def context(self, c):
+        assert isinstance(c, TestContext)
+
+        self._context = c

So I find the context stuff a bit confusing:

- you can set this in __init__ and if not we create an instance for you (IMHO, we should not create an instance by default, although its hard to figure out from this code.  Maybe we shouldn't have TestContext at all until that is more than just a stub?)

- you can override this at any time with the setter

In general, I find setters and getters that don't do anything but type-checking to be more of a hindrance than a blessing.

The same goes for output.  Maybe output should be an argument to .finish()? Also, are we going to enforce a string?  

+    @property
+    def contexts(self):
+        """ List of unique contexts for the test results contained """
+        cs = [tr.context for tr in self]
+        return list(set(cs))

I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test

+    @property
+    def passed(self):
+        """ Returns a list of TestResults with the PASS result """
+        return self.filter(lambda t: t.result_actual == 'PASS')
+
+    @property
+    def failed(self):
+        """ Returns a list of TestResults with the FAIL result """
+        return self.filter(lambda t: t.result_actual == 'FAIL')
+
+    @property
+    def skipped(self):
+        """ Returns a list of TestResults with the SKIP result """
+        return self.filter(lambda t: t.result_actual == 'SKIP')

To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.  In the three-item case, what is here works.  However, if we have more than three states, then you might have more than one PASS, FAIL state.  If we are only going to have 'PASS', 'FAIL', 'SKIP' (not what I would advocate!), then we should probably make them not strings and break finish() into pass(), fail(), skip(), etc

Overall a good start.  I do think there's a bit to consider here, though.

For one, I'd love to see one or more existing harnesses be used as input here.  While I empathize with doing it right "from scratch", there are a lot of lessons in the existing harnesses.  Since the goal is to support them, we will want to ensure that whatever we're doing here can do so.
Attachment #643208 - Flags: feedback?(jhammel)
Attachment #643208 - Flags: feedback?(gps)
Attachment #643208 - Flags: feedback+
Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

I also think this is a good start.

I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').

We also need to support several different modes of operation.

1 - the type that this model is most closely adapted to right now: populating results realtime via some Python testrunner
2 - populating results by post-processing of a log file.  In this case, we'd need the ability to specify things like start and end times in the same call (presumably in __init__); I don't think it makes sense to call finish() in this context.
3 - populating results by near realtime streaming of log data.  We may want a separate subclass for this, and in fact we'd need to figure out how this relates to our existing logparser code, so we probably want to punt on this at this stage of design.

In case 2 above, for some test harnesses, we will have a test duration, but not start or end times; we need to support that as well in our model.
Attachment #643208 - Flags: feedback?(jgriffin) → feedback+
(In reply to Jeff Hammel [:jhammel] from comment #21)
> (In reply to Mihnea Dobrescu-Balaur from comment #19)
> 
> For 2., I think the short answer is we'd like to get as many as possible. 
> The ones you mention are interesting.  "All tests" of course.  If we care
> about more then it could be cool to look towards having some sort of
> 'query()' method for it, but this is probably YAGNI at this point

There is the filter method that just takes in a predicate. I guess that's what you meant?
(In reply to Mihnea Dobrescu-Balaur from comment #27)
> (In reply to Jeff Hammel [:jhammel] from comment #21)
> > (In reply to Mihnea Dobrescu-Balaur from comment #19)
> > 
> > For 2., I think the short answer is we'd like to get as many as possible. 
> > The ones you mention are interesting.  "All tests" of course.  If we care
> > about more then it could be cool to look towards having some sort of
> > 'query()' method for it, but this is probably YAGNI at this point
> 
> There is the filter method that just takes in a predicate. I guess that's
> what you meant?

Yep, that should be good enough for the time being
(In reply to Jonathan Griffin (:jgriffin) from comment #26)
> We also need to support several different modes of operation.
> 
> 1 - the type that this model is most closely adapted to right now:
> populating results realtime via some Python testrunner
> 2 - populating results by post-processing of a log file.  In this case, we'd
> need the ability to specify things like start and end times in the same call
> (presumably in __init__); I don't think it makes sense to call finish() in
> this context.
> 3 - populating results by near realtime streaming of log data.  We may want
> a separate subclass for this, and in fact we'd need to figure out how this
> relates to our existing logparser code, so we probably want to punt on this
> at this stage of design.

I think #2 and #3 are nearly identical.

All use cases can be accommodated by making __init__ take the fewest number of required arguments possible. Whatever creates TestResult instances can populate fields as it needs to (including the end time, bypassing finish() if it needs to). finish() is really just a convenience method, IMO. I originally proposed it to encapsulate the verification logic. Perhaps verification could be factored out and called automagically at important times?

I would also argue that we should minimize the amount of "log parsing" in our overall design. I would generally favor the end state to be tests which output machine-readable data (e.g. newline-delimited JSON arrays or objects). We'll need some form of parsing in the test runners because there is IPC no matter what. But, I think the long term goal should be to have everything normalized to TestResult instances as quickly as possible and to maintain that representation throughout all involved systems. In other words, the raw output from the test harnesses should be irrelevant and hidden from everyone except those who maintain them. If we need a human-friendly representation, we can call a method on TestResult.

This necessitates serialization/deserialization code for TestResult. Would it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?

I recognize that my vision may be a bit grandiose. The large practical concern I see is there are systems relying on the existing log output. Yes. And that's why TestResult instances holds the raw output from the test runner. We can use that as a shim to preserve the existing harness output until downstream systems (including humans) convert to whatever the future holds.
> This necessitates serialization/deserialization code for TestResult. Would it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?

IMHO, very no.  I envision output as a completely different set of modules that take a test result container (and/or a test result) and serialize it.  I would not want to add this to the testresult class.  Test results and test results containers should hold information.  Output should be handled separately and take these as arguments.
Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

+1, good start!

::: moztest/moztest/results.py
@@ +51,5 @@
> +        if self.finished:
> +            return self.time_end - self.time_start
> +        else:
> +            # returns the elapsed time
> +            return time.time() - self.time_start

I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.

@@ +111,5 @@
> +
> +    def append(self, test_result):
> +        assert isinstance(test_result, TestResult)
> +
> +        list.append(self, test_result)

Should catch list.insert() too
Attachment #643208 - Flags: feedback?(jgriffin)
Attachment #643208 - Flags: feedback?(ahalberstadt)
Attachment #643208 - Flags: feedback+
Attachment #643208 - Flags: feedback?(jgriffin) → feedback+
> @@ +111,5 @@
> > +
> > +    def append(self, test_result):
> > +        assert isinstance(test_result, TestResult)
> > +
> > +        list.append(self, test_result)
> 
> Should catch list.insert() too

I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing
> We *may* want to move to a model where we have multiple "context" types.
> This could be changed later, as the scope of a context increases.

As we talked on IRC, the Context thing is kind of a stub at the moment, until we actually define how we will represent it.

> I think a list of strings would be better. Many test runners will read line
> output as it is available. I think it makes more sense to buffer in the test
> result and to retain the original structure of this data. We can bikeshed
> over whether the strings inside the list should have newline characters. I
> would vote for no.

I thought we would use just a string to which we would append strings with newlines. Will change.

> I would probably not make this a tuple so that subclasses can (more easily) extend this.

Sure, I'll initiate a discussion about the representation of the context.

> I thought we decided that filename was optional and so could be added later?

It is optional, and :jgriffin said it should default to the name. Should it just be None?

> I would ideally love to see the default format follow https://developer.mozilla.
> org/en/Test_log_format (assuming the test is done

Ok, will do.

> Are result_actual and finished redundant?

Right now, yes, they are. I'll just remove finished and make a property.

> So for some test harnesses (e.g. Talos), we may record the start and end time separately.
> Maybe its better to have these as optional arguments to __init__ and finish and only
> calculate them if they're None?

There should be only one way of using this so that it doesn't get confusing. We can adapt whatever test runner after that, I think.

So let's agree on a way to do this and just use that everywhere.

> The same goes for output.  Maybe output should be an argument to .finish()? Also, are we
> going to enforce a string?

Output should be allowed to be added incrementally during the run process.

Regarding type checking, I did it because that's what it seemed is wanted from the discussions. I personally don't like it and find it very "non-pythonic".

> I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test

So the container would keep track of contexts and there should be one per run (per manifest) ?

> To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.

I don't understand, can you please elaborate?

> For one, I'd love to see one or more existing harnesses be used as input here.

Ok, I'll write a test script that runs a harness and stores its results in these classes.

> I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').

So should we just add methods like known_fails = tests with expected = actual = fail ?

> We also need to support several different modes of operation. [...]

That would mean having a jack-of-all-trades class with lots of functionality. Is this what we want?

> I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.

Talked about this on IRC and the elasped time seemed the preferred option. We should debate this, I guess.

> I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing

As I said above, I'm pro ducktyping and not doing type checking. It just seemed to me that this is desired. Can I just drop all the type checking?
Whoa, sorry about hte long lines, I thought bugzilla would take care of those.
(In reply to Mihnea Dobrescu-Balaur from comment #33)
> > We *may* want to move to a model where we have multiple "context" types.
> > This could be changed later, as the scope of a context increases.
> 
> As we talked on IRC, the Context thing is kind of a stub at the moment,
> until we actually define how we will represent it.
> 
> > I think a list of strings would be better. Many test runners will read line
> > output as it is available. I think it makes more sense to buffer in the test
> > result and to retain the original structure of this data. We can bikeshed
> > over whether the strings inside the list should have newline characters. I
> > would vote for no.
> 
> I thought we would use just a string to which we would append strings with
> newlines. Will change.

One bad thing about appending strings is that, in general, you may deallocate the string, memcpy it and replace it.

> > I would probably not make this a tuple so that subclasses can (more easily) extend this.
> 
> Sure, I'll initiate a discussion about the representation of the context.

Sounds good.

> > I thought we decided that filename was optional and so could be added later?
> 
> It is optional, and :jgriffin said it should default to the name. Should it
> just be None?

I thought in the ctor we were only taking required args? We should either do one or the other.

> > I would ideally love to see the default format follow https://developer.mozilla.
> > org/en/Test_log_format (assuming the test is done
> 
> Ok, will do.
> 
> > Are result_actual and finished redundant?
> 
> Right now, yes, they are. I'll just remove finished and make a property.
> 
> > So for some test harnesses (e.g. Talos), we may record the start and end time separately.
> > Maybe its better to have these as optional arguments to __init__ and finish and only
> > calculate them if they're None?
> 
> There should be only one way of using this so that it doesn't get confusing.
> We can adapt whatever test runner after that, I think.

So imagine this scenario:  You have a harness which launches a browser and then an extension-side JS runner starts and stops the test.  So the JS knows the start time and end time, not the python side. This will not work here.

> So let's agree on a way to do this and just use that everywhere.

Easy to say.  I think agreeing on a way should be informed by the existing harnesses, not just some ideal.

> > The same goes for output.  Maybe output should be an argument to .finish()? Also, are we
> > going to enforce a string?
> 
> Output should be allowed to be added incrementally during the run process.
> 
> Regarding type checking, I did it because that's what it seemed is wanted
> from the discussions. I personally don't like it and find it very
> "non-pythonic".
> 
> > I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test
> 
> So the container would keep track of contexts and there should be one per
> run (per manifest) ?

Not necessarily.  The context may switch during a "run"/"suite".  Imagine if you do some test, change a preference, and do some more tests

OTOH, the machine information will be the same

> > To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.
> 
> I don't understand, can you please elaborate?

So we have states: PASS, FAIL, REBOOT, SKIP, ABORT, CRASH, etc.  Right now there are only three in your current patch. PASS and assumedly SKIP are "good" (I assume); FAIL is bad.  But ABORT, CRASH, etc, are also bad.  I guess if you want to see the tests that are bad, you need something bucketing states into at least good and bad.

Similarly, if your expected != what you got, this is bad.  

> > For one, I'd love to see one or more existing harnesses be used as input here.
> 
> Ok, I'll write a test script that runs a harness and stores its results in
> these classes.
> 
> > I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').
> 
> So should we just add methods like known_fails = tests with expected =
> actual = fail ?
> 
> > We also need to support several different modes of operation. [...]
> 
> That would mean having a jack-of-all-trades class with lots of
> functionality. Is this what we want?

I'll defer to :jgriffin here, but I think so.

> > I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.
> 
> Talked about this on IRC and the elasped time seemed the preferred option.
> We should debate this, I guess.

Maybe we can call this 'elapsed' or something different?  Does that help?

> > I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing
> 
> As I said above, I'm pro ducktyping and not doing type checking. It just
> seemed to me that this is desired. Can I just drop all the type checking?

I'm +1 on dropping type-checking.  It will be easy to add in if we actually care about it.  I think the errors in incorporating these classes into a harness with incorrect types will be fairly obvious to figure out (and, for the most part, will not change once the incorporation is done).
> > I think we need to distinguish between failed tests (implicitly, unexpected 
> > failures) and known fail tests, since many of our test harnesses do.  That is, 
> > failed() should only return tests where actual_result == FAIL and expected_result 
> > == PASS; we should have another method to return expected failures (which several 
> > of our harnesses call 'todo').
> 
> So should we just add methods like known_fails = tests with expected = actual = 
> fail ?

Yes, I think so, maybe known_failed() to be consistent with the existing failed() method.  And failed() should not return these.

> > We also need to support several different modes of operation. [...]
> 
> That would mean having a jack-of-all-trades class with lots of functionality. Is 
> this what we want?

Our current test harnesses operate in fairly different ways.  I think all we need is to provide methods that work realtime (i.e., providing start and end times automatically as TestResult objects are created/completed) and ex post facto (i.e., letting the consumer provide these explicitly).
Not to bike shed, but perhaps REBOOT could be rolled into a general "system exception" enumeration.
(In reply to Gregory Szorc [:gps] from comment #37)
> Not to bike shed, but perhaps REBOOT could be rolled into a general "system
> exception" enumeration.

Sure.  All the more reason, IMHO, to match the exact status to a class of status (could even be a (suggested) exit code, if desired)
(In reply to Gregory Szorc [:gps] from comment #29)

> 
> I would also argue that we should minimize the amount of "log parsing" in
> our overall design. I would generally favor the end state to be tests which
> output machine-readable data (e.g. newline-delimited JSON arrays or
> objects). We'll need some form of parsing in the test runners because there
> is IPC no matter what. But, I think the long term goal should be to have
> everything normalized to TestResult instances as quickly as possible and to
> maintain that representation throughout all involved systems. In other
> words, the raw output from the test harnesses should be irrelevant and
> hidden from everyone except those who maintain them. If we need a
> human-friendly representation, we can call a method on TestResult.
> 
> This necessitates serialization/deserialization code for TestResult. Would
> it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?
> 
> I recognize that my vision may be a bit grandiose. The large practical
> concern I see is there are systems relying on the existing log output. Yes.
> And that's why TestResult instances holds the raw output from the test
> runner. We can use that as a shim to preserve the existing harness output
> until downstream systems (including humans) convert to whatever the future
> holds.

I agree this is an idealized state that we should move towards.

That said, in practical terms, the test harnesses are not always the best authority about everything that happened during a test.  Mostly, they know about test failures (although not always, see e.g. bug 677964) and maybe JS exceptions.  They generally don't know about other things which might be of interest to test authors/runners, depending on the context, including assertions, purify/valgrind errors, shutdown crashes/leaks, gonk errors (in the case of B2G), etc.  I think log parsing will always be desirable in at least some contexts.

I suppose it's an open question about how much of these non-test-failures we might want to include in this model; I'd like to retain the ability to include all of the above, again depending on context.

However, there's nothing about that which directly affects this model, except for requiring support for creating TestResult objects during post-processing.  Populating TestResults from a log should probably belong to a separate class or package.

One thing we haven't considered in our design are errors that do not belong to specific tests.  For example, when mochitest is run by TBPL, it can fail for a number of reasons that can't be pinned to a specific test.  These can be buildbot errors, or shutdown leaks/crashes, for example.  Do we want to include these in our model?  I'd argue yes.  We could reflect these kinds of errors using TestResult objects that have no name or have a special attribute, or we could add some additional attributes to TestResultCollection, and I'm sure there are other options.
> One thing we haven't considered in our design are errors that do not belong to specific tests.  For example, when mochitest is run by TBPL, it can fail for a number of reasons that can't be pinned to a specific test.  These can be buildbot errors, or shutdown leaks/crashes, for example.  Do we want to include these in our model?  I'd argue yes.

I'd argue yes as well
* dropped high level type checking
* added the ability to specify time_start and time_end in ctor & finish() method
* correct __str__ for TestResult
* real-world POSSIBLE_RESULTS
* output is a list (of strings)
* added Description to TestResult
* removed redundant finished attribute
* new tests_with_result(result) method for Collection class
* successful and unsuccessful tests lists for Collection class
* explicit [all] tests attribute for Collection class

Will work on the Context thing after I get some opinions on the mail thread (tools@lists.mozilla.org).

After I get to a base skeleton that everyone accepts I'll try to fit it to be used by an existing test suite.
Attachment #643208 - Attachment is obsolete: true
Attachment #643208 - Flags: feedback?(gps)
Comment on attachment 643678 [details] [diff] [review]
incremental update; modified some stuff according to feedback

Review of attachment 643678 [details] [diff] [review]:
-----------------------------------------------------------------

::: moztest/moztest/results.py
@@ +20,5 @@
> +        'TEST-UNEXPECTED-PASS',
> +        'TEST-UNEXPECTED-FAIL',
> +        'TEST-KNOWN-FAIL',
> +        'TEST-SKIPPED',
> +        'TEST-ERROR'

The 'TEST-' prefix isn't necessary.

What do Python conventions saying about enumerations like this? Should there be class level "constants" (which can be strings) or should people use strings everywhere?

@@ +97,5 @@
> +        return list(set(cs))
> +
> +    def filter(self, predicate):
> +        """ Returns a list of TestResults that satisfy a given predicate """
> +        return [tr for tr in self if predicate(tr)]

Use a generator?

@@ +116,5 @@
> +
> +    @property
> +    def tests(self):
> +        """ List of all tests in the collection """
> +        return self

Why?
(In reply to Gregory Szorc [:gps] from comment #42)
> Comment on attachment 643678 [details] [diff] [review]
> incremental update; modified some stuff according to feedback
> 
> Review of attachment 643678 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: moztest/moztest/results.py
> @@ +20,5 @@
> > +        'TEST-UNEXPECTED-PASS',
> > +        'TEST-UNEXPECTED-FAIL',
> > +        'TEST-KNOWN-FAIL',
> > +        'TEST-SKIPPED',
> > +        'TEST-ERROR'
> 
> The 'TEST-' prefix isn't necessary.

I found it included in [1] and [2]. *confused*
[1] https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/logger.py#L36
[2] https://github.com/mozilla/mozbase/blob/master/mozlog/mozlog/logger.py#L21

> 
> What do Python conventions saying about enumerations like this? Should there
> be class level "constants" (which can be strings) or should people use
> strings everywhere?

Don't know, PEP8 only talks about defining constants at module-level.

> 
> @@ +97,5 @@
> > +        return list(set(cs))
> > +
> > +    def filter(self, predicate):
> > +        """ Returns a list of TestResults that satisfy a given predicate """
> > +        return [tr for tr in self if predicate(tr)]
> 
> Use a generator?

The etherpad document said "[..] list of [..]". Should I do a generator? It's no problem, just replacing [] with ().

> 
> @@ +116,5 @@
> > +
> > +    @property
> > +    def tests(self):
> > +        """ List of all tests in the collection """
> > +        return self
> 
> Why?

In case someone expects it to behave like the usual python test result collections (i.e. unittest).
(In reply to Mihnea Dobrescu-Balaur from comment #43)
> (In reply to Gregory Szorc [:gps] from comment #42)
> > 
> > What do Python conventions saying about enumerations like this? Should there
> > be class level "constants" (which can be strings) or should people use
> > strings everywhere?
> 
> Don't know, PEP8 only talks about defining constants at module-level.

Would you rather prefer a class only with constants? Like:

class Results:
    PASS = 0
    FAIL = 1
[...]
Weighing in on the enumeration issue, it kinda depends on what the enumerates are for.  I'm still more in favor of a mapping.

If we don't want a mapping, then there are a number of solutions.

- we can use strings, as is.  This is common, albeit there is nothing telling what the strings are.  If the *only* place we care about this is `if expected_result == actual_result`: then this doesn't matter, although it does make it easier to misspell and have a harder-to-detect error

- we can use singletons

class State(object):
    def __init__(self, name): self.name = name

state_names = ['PASS', 'FAIL', 'SKIP']
for name in state_names:
    globals()[name] = State(name) # or whatever, e.g. setattr(self.states, name, name)

This way you can still check, e.g. results.PASS != results.FAIL, etc
This might also be extensible.
(I'm also not necessarily suggesting these live in global scope, I just wrote it this way as a cheap example.  Could be in class scope, etc.

I'd prefer not to do something like

class Results:
    PASS = 0
    FAIL = 1

This assigns an arbitrary value to a state and you can't loop over them effectively.
:jhammel, could you please show a snippet of code of the mapping you envision? Because I'm not sure I understand what you mean.

Thanks!
(In reply to Mihnea Dobrescu-Balaur from comment #46)
> :jhammel, could you please show a snippet of code of the mapping you
> envision? Because I'm not sure I understand what you mean.
> 
> Thanks!

So I was thinking of a mapping of the (free-form) state to some sort of category.  There is probably only one success states, but there could be e.g. multiple reasons the test was not ran (SKIP) or failed (the test failed, many sorts of infrastructure failure, browser crash, etc).

Thinking it over, though, this might be YAGNI.  If we force passing in the expected state (or maybe defaulting to PASS), then checking equality with the actual state is enough to get this down to a binary success/non-success, and that's probably all we need.  Other tools can classify e.g. the various kinds of infrastructure failures.

Vaguely relatedly, would be it a good idea to have an optional message for test results?  PASS may not need a message, but there could be a reason for skipping, or a failure message.  I'm thinking a la TinderboxPrint here or https://developer.mozilla.org/en/Test_log_format . 

For example, if we ported Talos to this, a pass or fail message could indicate some summary of the p-test for SfN. E.g.

- talos finishes a test
- results are uploaded to datazilla
- datazilla tells the result of the p-test
- we set this as the pass/fail message
Right, so there will be a string attribute called "reason" or something like that which states why a test is skipped or why it failed (with the stack trace or log being added to the output attribute).

How do we pass this in to the TestResult? Only via the finish() method? Or can it be also passed in via the constructor?

I'm still not sure of the "correct" way test harnesses are supposed/expected to operate with this.
(In reply to Mihnea Dobrescu-Balaur from comment #48)
> Right, so there will be a string attribute called "reason" or something like
> that which states why a test is skipped or why it failed (with the stack
> trace or log being added to the output attribute).

So I was intending (albeit, I won't claim it to be the only way) that reason/message be something short.  The log will be available separately.  OTOH, maybe keeping a stack trace in results is a good idea? (or not?)

> How do we pass this in to the TestResult? Only via the finish() method? Or
> can it be also passed in via the constructor?

I would pass it in the finish() method. 

> I'm still not sure of the "correct" way test harnesses are supposed/expected
> to operate with this.

To be honest, neither am I ;) I would need to try to hook this up to a harness (or several) before I could really give this much insight
(In reply to Jeff Hammel [:jhammel] from comment #49)
> 
> So I was intending (albeit, I won't claim it to be the only way) that
> reason/message be something short. 

I agree.

> The log will be available separately. 
> OTOH, maybe keeping a stack trace in results is a good idea? (or not?)

From what I've seen when a test fails it says something like "TEST FAILED: <short reason> View following log:" [...] and then's the log.

We could keep to that, I guess?

> 
> To be honest, neither am I ;) I would need to try to hook this up to a
> harness (or several) before I could really give this much insight

I'll look at some code and see.
> > Right, so there will be a string attribute called "reason" or something like
> > that which states why a test is skipped or why it failed (with the stack
> > trace or log being added to the output attribute).
> 
> So I was intending (albeit, I won't claim it to be the only way) that
> reason/message be something short.  The log will be available separately. 
> OTOH, maybe keeping a stack trace in results is a good idea? (or not?)
> 
We already have a description attr in the model; is this the same as 'reason'?  I think we wouldn't need both.

I do think we should keep stack traces in the model, in a separate attribute.  It can be useful for generating xUnit output, posting to Autolog, etc.
 
> > To be honest, neither am I ;) I would need to try to hook this up to a
> > harness (or several) before I could really give this much insight
> 
> I'll look at some code and see.

I think a productive approach would be to find failing logs for all the different harnesses at https://tbpl.mozilla.org/, and see how the failure data they report would be represented in this model.
(In reply to Jonathan Griffin (:jgriffin) from comment #51)
> > > Right, so there will be a string attribute called "reason" or something like
> > > that which states why a test is skipped or why it failed (with the stack
> > > trace or log being added to the output attribute).
> > 
> > So I was intending (albeit, I won't claim it to be the only way) that
> > reason/message be something short.  The log will be available separately. 
> > OTOH, maybe keeping a stack trace in results is a good idea? (or not?)
> > 
> We already have a description attr in the model; is this the same as
> 'reason'?  I think we wouldn't need both.

My thoughts are description == the description of the test, reason/message == why you got a certain state (e.g. finish(state='FAIL', reason='Node [id="foobar"] not found'))

> I do think we should keep stack traces in the model, in a separate
> attribute.  It can be useful for generating xUnit output, posting to
> Autolog, etc.
>  
> > > To be honest, neither am I ;) I would need to try to hook this up to a
> > > harness (or several) before I could really give this much insight
> > 
> > I'll look at some code and see.
> 
> I think a productive approach would be to find failing logs for all the
> different harnesses at https://tbpl.mozilla.org/, and see how the failure
> data they report would be represented in this model.

+1
Blocks: 775756
(In reply to Jonathan Griffin (:jgriffin) from comment #51)
> I think a productive approach would be to find failing logs for all the
> different harnesses at https://tbpl.mozilla.org/, and see how the failure
> data they report would be represented in this model.
Leak reporting, crash reporting, and logcat reporting are not supported by this model. For leak and crash reporting, we output a summary block of information and then a giant blob of details. (you could think of it as a BLOB anyway, but it is parsable information). I think that moving toward JSON representations of this information that we can then de-serialize into human readable (aka current log format) would be better. Until the outside tools (buildbot, tbpl, orangefactor etc) are updated to deal with the JSON we can have the harnesses de-serialize their output.

Logcat is a special case, it's a total blob that is relatively unstructured and is not easy to deal with (for machines or people).

All our test systems are python/JS systems. We have python runners that then call JS that runs inside the platform and then information is returned to python (sometimes just the status of the binary under test's output code, sometimes streams of results data that is generated on the JS side and spewed to stdout which the python side picks up, sometimes it is an intense communication between python and JS in real time as the test runs.)  So, I think we can create a set of methods for the class around that. In particular, there will always have to be some kind of profile setup (to inject that javascript into the system), some kind of cleanup, some kind of options handling, and some kind of "run" functionality.

All of these methods should be overridable/extendable by the specific test classes that implement a test type (a mochitest runner for instance).

And I'd recommend dropping the "TEST-" prefixes everywhere. Those are silly hold-overs of our current logging system, which isn't worth codifying into a solution like this. I like "UNEXPECTED-FAIL, EXPECTED-FAIL" etc. much better.

Those are some thoughts I had while reading over the bug and the etherpad.  This is a great idea, and great work. I'm excited to see what you come up with.
> Those are some thoughts I had while reading over the bug and the etherpad.  This is a great idea, and great work. I'm excited to see what you come up with.

Agreed, lots of good work so far. My one comment is that it seems like there is still some confusion as to what exactly should go into v1. At first we just wanted a Results class, but now it seems like we are doing an entire generic test harness. This is really exciting, but there are many different components each with many different possible implementations.

I think it would be wise to start with something basic (i.e the Results class since that seems to be where all the work has gone so far), get that reviewed and checked in (we don't need everyone to be in 100% agreement at this stage), and then file follow up bugs for additional features/components or for more in depth discussion on items that need a closer look (i.e what exact attributes the result class should/shouldn't have).
+1 on starting on the results class.  We'll need the context class for this too.

I'd probably go from there to a few output formats (xunit,datazilla, whatever) and then (or concurrently) try to get some existing harnesses to use the results class. marionette, mozmill, and mochitest seem easy targets.
A sample of the representation of tests results: [1]
A sample xunit xml file: [2]


[1] http://pastebin.mozilla.org/1712582
[2] http://swarm.cs.pub.ro/~mihneadb/out.xml
Attachment #643678 - Attachment is obsolete: true
Attachment #645101 - Flags: feedback?(jhammel)
Attachment #645101 - Flags: feedback?(jgriffin)
Attachment #645101 - Flags: feedback?(ahalberstadt)
And the XML formats being used are slightly different!

For xpcshell tests, I used http://stackoverflow.com/questions/3787886/producing-a-correct-and-complete-xml-unit-testing-report-for-hudson (bug 725478). This is using the Ant (Java) XML "schema".

For Marionette (bug 774419), it appears the XML schema for xUnit.net was used (https://xunit.codeplex.com/wikipage?title=XmlFormat).

They are mostly the same, but there are some differences.

Similar but different XML formats make me sad. I guess we need to decide if we should support 1 or both. Ugh.
:gps : yes, I used the codeplex.com one because that's the one :jgriffin linked me. :)

I don't mind switching to another format, just tell me which one we should support. I'd say it's better to only have one, this being a generic/shared library and all.
+1 on having a single format, preferably one that gets us what we want
Comment on attachment 645101 [details] [diff] [review]
Support for importing unittest (example: Marionette) results; xUnit output

+        self.env = env or {}

Is this supposed to be the system environment variables?  If so, why not os.environ ?

(Some comments would help)

TestContext should probably use mozinfo to gather machine information.  

arch should be None by default, not the empty string.

I'm not sure I see the point of the Results class as it is.  You essentially list the same thing three times 'PASS' vs PASS vs PASS in POSSIBLE_RESULTS.  Also, since the latter is a tuple, it cannot be altered.  Is that desired?  If so why (and it should be documented).  If the only thing that is desired is POSSIBLE_RESULTS, we should just make a global variable containing the strings since we aren't getting any information over this in this implementation

+        # assert isinstance(context, TestContext)
Please remove

+    def finish(self, result, time_end=None, output=[], reason='N/A'):

Don't use mutable default function arguments.

reason should probably be None by default

I'm not sure if we need to keep time_taken as an attribute

+            t = TestResult(name=str(test).split()[0], test_class=get_class(test),
+                           time_start=0, result_expected=result_expected)
+            t.finish(result_actual, time_end=0, reason=relevant_line(output),
+                     output=output)

Why 0 here?  


+    def add_unittest_result(self, result):
+        """ Adds the unittest result provided to the collection"""

Should definitely note a *python* unittest result

+def from_unittest_results(results_list):

If this is the way that (python) unittest results should be populated, I would prefer

- a subclass just for python unittest results
or
- a classmethod that makes an instance based on unittest results

+    def to_xunit(self):

This should go in a whole separate file, e.g. output.py, inheriting from an abstract base class that takes a test collection and yields a result.  Output should not live on the results class.  the results class is there to store results.  output classes will be there to format them
Attachment #645101 - Flags: feedback?(jhammel)
(In reply to Jeff Hammel [:jhammel] from comment #60)
> Comment on attachment 645101 [details] [diff] [review]
> Support for importing unittest (example: Marionette) results; xUnit output
> 
> +        self.env = env or {}
> 
> Is this supposed to be the system environment variables?  If so, why not
> os.environ ?

I was thinking we would pass in os.environ if we wanted to in the constructor.

> 
> (Some comments would help)
> 
> TestContext should probably use mozinfo to gather machine information.  
> 
> arch should be None by default, not the empty string.

Ok.

> 
> I'm not sure I see the point of the Results class as it is.  You essentially
> list the same thing three times 'PASS' vs PASS vs PASS in POSSIBLE_RESULTS. 
> Also, since the latter is a tuple, it cannot be altered.  Is that desired? 
> If so why (and it should be documented).  If the only thing that is desired
> is POSSIBLE_RESULTS, we should just make a global variable containing the
> strings since we aren't getting any information over this in this
> implementation

We were talking about having like an enum of them and not passing in strings so we won't have so many typos by bugs. It was as you said in the previous patch.

> 
> +        # assert isinstance(context, TestContext)
> Please remove
> 
> +    def finish(self, result, time_end=None, output=[], reason='N/A'):
> 
> Don't use mutable default function arguments.
> 
> reason should probably be None by default

Ok.

> 
> I'm not sure if we need to keep time_taken as an attribute

If we don't there's kind of no way of knowing how much time the unittests took.

> 
> +            t = TestResult(name=str(test).split()[0],
> test_class=get_class(test),
> +                           time_start=0, result_expected=result_expected)
> +            t.finish(result_actual, time_end=0,
> reason=relevant_line(output),
> +                     output=output)
> 
> Why 0 here?

Because there's no way of knowing how much time each test from the unittest suite took and so we need to pass in some dummy (int) values.

> 
> 
> +    def add_unittest_result(self, result):
> +        """ Adds the unittest result provided to the collection"""
> 
> Should definitely note a *python* unittest result

Ok, I figured since it's written in a single word (and not "unit test") it would be obvious. Will change

> 
> +def from_unittest_results(results_list):
> 
> If this is the way that (python) unittest results should be populated, I
> would prefer
> 
> - a subclass just for python unittest results
> or
> - a classmethod that makes an instance based on unittest results

Ok, I'll make it a classmethod.

> 
> +    def to_xunit(self):
> 
> This should go in a whole separate file, e.g. output.py, inheriting from an
> abstract base class that takes a test collection and yields a result. 
> Output should not live on the results class.  the results class is there to
> store results.  output classes will be there to format them

It seemed to "java-like", but ok. :)
Attachment #645101 - Attachment is obsolete: true
Attachment #645101 - Flags: feedback?(jgriffin)
Attachment #645101 - Flags: feedback?(ahalberstadt)
Attachment #645444 - Flags: feedback?(jhammel)
Comment on attachment 645444 [details] [diff] [review]
separated output; update according to feedback

So I'm still not sure about enumerating results, e.g.

PASS = 'PASS'

As I'm not sure what that buys us vs

POSSIBLE_RESULTS = set(['PASS', 'FAIL', ...])

Overall this is looking good.

I wonder how output classes should in general deal with output.  There are three usual cases:

1. You want to output results to a file
2. You want to output results to the screen
3. You want to upload results (e.g. HTTP POST) to a server

A few ways of doing this:

- Output.__call__ can always return a string.  Then the caller can decide what to do with this
- Output.__call__ can optionally accept a "file like object" which is written to.  If None is specified, the function can make a StringIO instance and call getvalue on at the end

I imagine at some point we're going to want something to route output meaningfully, but probably not until the pattern becomes apparent.

:jgriffin, I'd love any feedback you could give on this
Attachment #645444 - Flags: feedback?(jhammel) → feedback+
(In reply to Jeff Hammel [:jhammel] from comment #63)

> I wonder how output classes should in general deal with output.  There are
> three usual cases:
> 
> 1. You want to output results to a file
> 2. You want to output results to the screen
> 3. You want to upload results (e.g. HTTP POST) to a server
> 
> A few ways of doing this:
> 
> - Output.__call__ can always return a string.  Then the caller can decide
> what to do with this
> - Output.__call__ can optionally accept a "file like object" which is
> written to.  If None is specified, the function can make a StringIO instance
> and call getvalue on at the end
> 
> I imagine at some point we're going to want something to route output
> meaningfully, but probably not until the pattern becomes apparent.
> 
> :jgriffin, I'd love any feedback you could give on this

So in general we want to either:

 1 provide some serialized output, either to a file or stdout
 2 post some data somewhere, presumably using an existing library

For cases like Datazilla and Autolog, I think the output class should handle 2, but we'd also want to provide access to serialized output for logging/debugging, assuming that's available from the underlying package.

For this reason, maybe we should use explicit methods rather than __call__?  If an xUnit class uses __call__ to generate its xUnit output to a file or stdout, what should __call__ do for the Datazilla case?  Give you the serialized output, or actually post to Datazilla?  To avoid this confusion, maybe we should have serialize() and post() methods, or something similar.
(In reply to Jeff Hammel [:jhammel] from comment #63)
> Comment on attachment 645444 [details] [diff] [review]
> separated output; update according to feedback
> 
> So I'm still not sure about enumerating results, e.g.
> 
> PASS = 'PASS'
> 
> As I'm not sure what that buys us vs
> 
> POSSIBLE_RESULTS = set(['PASS', 'FAIL', ...])
> 

I don't really care about it, I can change it if everybody agrees.
Attached patch Update according to c63 & c64 (obsolete) — Splinter Review
Switched to serialize and post(based on scheme, like in http://hg.mozilla.org/build/talos/file/ee67e01c9d58/talos/output.py#l52)
Switched to strings for possible results
Attachment #645444 - Attachment is obsolete: true
Attachment #646690 - Flags: feedback?(jhammel)
Attachment #646690 - Flags: feedback?(jgriffin)
I think if you agree with the patch's contents, we should focus a bit on the Context thing so we have some v0.1 usable thing, land that and then anybody will be welcome to add new features (like output to Datazilla). Sounds like a plan?
Comment on attachment 646690 [details] [diff] [review]
Update according to c63 & c64

+    @abstractmethod
+    def serialize(self, results_collection):
+        pass

I generally use a docstring for ABC methods.  This avoids putting 'pass' and tells what the function should do.

+    def serialize(self, results_collection):
+        return self.to_xunit(results_collection)

Why not just have the body of to_xunit be the serialize method?

To be honest I probably wouldn't do a post method yet.  It will be hard to generalize

Looking good!
Attachment #646690 - Flags: feedback?(jhammel) → feedback+
Comment on attachment 646690 [details] [diff] [review]
Update according to c63 & c64

Looks good to me too.

I like Jeff's earlier idea about having serialize take an optional parameter that is a file-like object, and if it exists, write to that, otherwise return a string.  But that's a minor detail.

I think having 'post' write to a file is kind of weird.  Probably, we should just raise NotImplementedError for 'post' for something like XUnitOutput; it would only be implemented by something, like Datazilla or Autolog, which actually sends the data somewhere.
Attachment #646690 - Flags: feedback?(jgriffin) → feedback+
Attached patch v0.1 (obsolete) — Splinter Review
Attachment #646690 - Attachment is obsolete: true
Attachment #647280 - Flags: review?(jhammel)
Attachment #647280 - Flags: review?(jgriffin)
Comment on attachment 647280 [details] [diff] [review]
v0.1

Review of attachment 647280 [details] [diff] [review]:
-----------------------------------------------------------------

::: moztest/moztest/output.py
@@ +6,5 @@
> +from abc import abstractmethod
> +import xml.dom.minidom as dom
> +
> +
> +class Output(object):

How will you handle deserializing back to a results collection? Are you going to create an input.py module and fragment code related to the same serialization format? How about renaming this Serializer and keeping the door open to perform both input and output?

@@ +11,5 @@
> +    """ Abstract base class for outputting test results """
> +
> +    @abstractmethod
> +    def serialize(self, results_collection):
> +        """ Returns a string representation of the results """

I would have this write to a file object instead of returning a string. If someone really wants a string, they can pass a StringIO.
Comment on attachment 647280 [details] [diff] [review]
v0.1

+    def tests_with_result(self, result):
+        """ Returns a generator of TestResults with the PASS result """
+        return self.filter(lambda t: t.result_actual == result)

The function name is a little misleading, as is the docstring.  This is passing tests, which could be tests that fail and are expected to fail.

+            if type(text) is list:
+                text = '\n'.join(text)

Should probably be "if not isinstance(text, basestring):" unless there is a reason not to do this

+class TestContext(object):

This should probably at least take the info from mozinfo vs passing in arch.

This looks really good. I'm going to r- for mozinfo in TestContext, but other than that I am fine with landing.

This should also have a setup.py and, ideally a README (and even more ideally tests).
Attachment #647280 - Flags: review?(jhammel) → review-
Attached patch v0.1 (obsolete) — Splinter Review
Attachment #647280 - Attachment is obsolete: true
Attachment #647280 - Flags: review?(jgriffin)
Attachment #647347 - Flags: review?(jhammel)
Attachment #647347 - Flags: review?(jgriffin)
Setup.py & README coming.
Attached patch v0.1 setup.py and README.md (obsolete) — Splinter Review
Attachment #647351 - Flags: review?(jhammel)
Attachment #647351 - Flags: review?(jgriffin)
Comment on attachment 647351 [details] [diff] [review]
v0.1 setup.py and README.md

Looks good to me.
Attachment #647351 - Flags: review?(jhammel) → review+
(In reply to Mihnea Dobrescu-Balaur from comment #73)
> Created attachment 647347 [details] [diff] [review]
> v0.1

I had actually thought that that e.g. dumps was going to call serialize with a StringIO buffer and return its getvalue().  Maybe :gps or :jgriffin can comment there?

Also, only one of dumps or serialize needs to be an abstract method.  In your incarnation, serialize can go in the base class.  In my incarnation, dumps can go in the base class.

I don't really like the name dumps.  Sure, its what json uses, but it doesn't read well in my opinion.  I'm fine with dump_string(), string(), or even to_string(). But I won't block on this.
Comment on attachment 647351 [details] [diff] [review]
v0.1 setup.py and README.md

Actually, I hate to reverse what I said but one very important thing:  mozinfo should be added to the deps
Attachment #647351 - Flags: review+ → review-
(In reply to Jeff Hammel [:jhammel] from comment #77)
> (In reply to Mihnea Dobrescu-Balaur from comment #73)
> > Created attachment 647347 [details] [diff] [review]
> > v0.1
> 
> I had actually thought that that e.g. dumps was going to call serialize with
> a StringIO buffer and return its getvalue().  Maybe :gps or :jgriffin can
> comment there?
> 

Yes, that's what I'd thought as well.
Attached patch v0.1 (obsolete) — Splinter Review
Attachment #647347 - Attachment is obsolete: true
Attachment #647351 - Attachment is obsolete: true
Attachment #647347 - Flags: review?(jhammel)
Attachment #647347 - Flags: review?(jgriffin)
Attachment #647351 - Flags: review?(jgriffin)
Attachment #647686 - Flags: review?(jhammel)
Attachment #647686 - Flags: review?(jgriffin)
Comment on attachment 647686 [details] [diff] [review]
v0.1

+Package for handling Mozilla test results.
+
+
+
+## Usage example

I could live with one space less

+This shows how you can create an xUnit representation of Marionette
+(or python unittest) results.

How about just "...xUnit representation of python unittest results"?

+        self.env = os.environ

Might be better to do os.environ.copy() since this could change in testing (albeit, neither is fool-proof :/)

+    def __init__(self, name, test_class='', time_start=None, context=None,
+                 result_expected='PASS'):

It'd be nice to have a docstring here tell what is expected here

+        assert isinstance(name, basestring)
+        assert result_expected in TestResult.POSSIBLE_RESULTS

It would also be nice to have assertion messages, e.g. 

assert result_expected in self.POSSIBLE_RESULTS, "Result '%s' not in possible results: %s" % (result_expected, ', '.join(self.POSSIBLE_RESULTS))

in any case, this should probably be 'self.POSSIBLE_RESULTS' for subclassing

+    deps += ['simplejson']

I'd probably do deps.append('simplejson') but it doesn't really matter

So this looks good and I think its about ready to check in.

The only real problem I have is that it is not 2.4 compatible.  I'm not 100% sure it is 2.5 compatible.  From a mozbase point of view, I'm not a huge fan of maintaining this backwards compatability.  However, from a "help! i'm a talos developer trapped in 2.4!" point of view, I would like to move talos towards using this architecture.  Anyway, I don't want to block on this, as long as we're aware of the issue.  In any case, https://wiki.mozilla.org/Auto-tools/Projects/MozBase#Overview should be updated to reflect reality *AND* we need to not have anything that Talos uses require 2.5 or greater :(
Attachment #647686 - Flags: review?(jhammel) → review+
Attached patch v0.1Splinter Review
Attachment #647686 - Attachment is obsolete: true
Attachment #647686 - Flags: review?(jgriffin)
Attachment #647738 - Flags: review?(jhammel)
Attachment #647738 - Flags: review?(jgriffin)
Comment on attachment 647738 [details] [diff] [review]
v0.1

Cool!  lgtm if jhammel is happy with it.
Attachment #647738 - Flags: review?(jgriffin) → review+
Comment on attachment 647738 [details] [diff] [review]
v0.1

Looks great!
Attachment #647738 - Flags: review?(jhammel) → review+
pushed: https://github.com/mozilla/mozbase/commit/08d4e0d2ff68e2c1855261ba053ced61a59af894
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: