774419 - New package: MozTest (or similar) for Mozbase

Reporter

Description

•

12 years ago

See bug 773394 for the origin of this discussion and https://groups.google.com/forum/#!topic/mozilla.tools/FJYN4wg4erk for the tools group discussion.

It would be nice to have a Mozbase package that would consolidate test results.  This includes the xunit writing code from bug 773394, but could include autolog reporting, datazilla reporting, etc.

We should spec out an API and figure out what goes there and file bugs for existing software to be brought under this banner.

Jeff Hammel

Reporter

Comment 1

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #0)

> We should spec out an API and figure out what goes there and file bugs for
> existing software to be brought under this banner.

Datazilla_client is one: https://github.com/mozilla/datazilla_client ; or the package could use this as a dependency either in the interim or in the long run

Jeff Hammel

Reporter

Comment 2

•

12 years ago

As long as we're in general agreement as per the code, I'm more than happy handing off the specing to whoever.

Andrew Halberstadt [:ahal]

Comment 3

•

12 years ago

I agree with this package. I just want a clear definition of "test" vs "result" and how "result" ties into "log" (continuing discussion from the other bug).

Is there a case where:
A) We would use the test class but not the result class (can't think of one)
B) We would use the result class but not the test class (possibly, but still seems edge case-y)

Maybe a separate results.py is good enough (as opposed to a separate module)?

How would a results class affect existing harnesses that use mozbase? Is the results class now responsible for logging pass/fails or does it simply return an object to the harness? I'd imagine the latter, but then we need to keep in mind that several harnesses depend on receiving a continuous stream of pass/fail data for buildbot.

I'm just a little confused at how it all will fit together at the moment.

Jeff Hammel

Reporter

Comment 4

•

12 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #3)
> I agree with this package. I just want a clear definition of "test" vs
> "result" and how "result" ties into "log" (continuing discussion from the
> other bug).
> 
> Is there a case where:
> A) We would use the test class but not the result class (can't think of one)
> B) We would use the result class but not the test class (possibly, but still
> seems edge case-y)

For A. I can't think of any off the top of my head, though I would guess there are cases.

For B. this seems likely if you are post-processing results and testing output and post-processing.  For instance, let's say you have one set of results you want to combine with another set and crunch some numbers on (say, "all failed tests" just to keep it easy).  I'm not trying to make things hard, but in general I much more fear trying to un-mix soup vs having a soup that's not mixed enough.

> Maybe a separate results.py is good enough (as opposed to a separate module)?

Package, you mean?  I don't know.  Even if we go with a single file, I'd prefer to have it be a separate package.  It isn't part of logging, exactly, unless we're defining logging == "all output"

> How would a results class affect existing harnesses that use mozbase? Is the
> results class now responsible for logging pass/fails or does it simply
> return an object to the harness? 

My opinion (which I don't claim is the One True Opinion) is that results will be a class where the harness accumulates results.  See, for example, https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/__init__.py#L43 or http://hg.mozilla.org/build/talos/file/1f8f47452048/talos/results.py (which is overly complex, but ignoring that).  In both cases, these basically do the following:

* provide a place to accumulate results
* provide some (very) basic methods to inspect
* provide places where output to various formats may plug in so that one can do: results => output
* some other framework-specific bells and/or whistles

The last should be doable via subclassing (again, IMHO, there are many architecture possibilities).

> I'd imagine the latter, but then we need to
> keep in mind that several harnesses depend on receiving a continuous stream
> of pass/fail data for buildbot.

Yes, I'd still say the harness's job is to gather results (again, my opinion).  I'd say MozResultsTestWhatever is a place to put them and output from them.  Other opinions welcome

> I'm just a little confused at how it all will fit together at the moment.

Does my sketch above make things clearer?  That's my vision, anyway

Gregory Szorc [:gps]

Comment 5

•

12 years ago

Good questions, Andrew!

I think the test result classes should be more or less dumb objects that hold data. Sure, they have some methods for adding content and loading/saving from/to other formats. The important bit is they are agnostic as to how things arrived there.

To support the case of "streaming" results, I think generic callbacks on the TestSuiteResults container class would do just fine.

def onTestResult(suite, result):
    if result.success:
        print 'TEST-PASS...'
    else:
        print 'TEST-FAIL...'

This could be installed by the test runner.

FWIW, I think our existing reliance on parsing output of tests during the whole chain should be marginalized as much as possible. We should aim to have machine-consumable test results which don't require that every downstream system invent a parser. At most, there should be one parser: the test suite runner. This should normalize everything as far as it can so test results are fully machine readable and don't require additional parsing.

I'm not advocating for getting rid of human-readable output (you can still convert to that). But, we shouldn't rely on the human readable input for any kind of post-run crunching.

Gregory Szorc [:gps]

Comment 6

•

12 years ago

See also some of my ideas in bug 774132 and comments in the README.rst on bug 751795. This might be easier to explain in person or IRC if you have any questions.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 7

•

12 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #3)
> I agree with this package. I just want a clear definition of "test" vs
> "result" and how "result" ties into "log" (continuing discussion from the
> other bug).
> 
> Is there a case where:
> A) We would use the test class but not the result class (can't think of one)
> B) We would use the result class but not the test class (possibly, but still
> seems edge case-y)
> 

I think B can occur if we try to assure backwards-compatibility. For example, the test results in the xpcshell harness are stored as manually created dictionaries. We could just convert those to a test-result-class instance without needing to rewrite the whole run_tests part.

Jeff Hammel

Reporter

Comment 8

•

12 years ago

Attached file example — Details

So we probably should find an owner for this.  In basic form I think we'll want....

* a TestResults object for holding all test results
* an object to hold an individual test result
* tests can have names, a status (pass/fail are the most basic, although this could be any number), and probably a description of what happened (or maybe None if pass)
** and probably an arbitrary key-value pair

There should be output classes.  These are classes that take results and *do stuff with them* like post to datazilla: https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.py

See also https://developer.mozilla.org/en/Test_log_format

I'm happy to take or co-take this if no one else has time or if this is blocking.  While the first implementation does not have to be the Be All End All, I would like to see a little thought put into this such that we don't regret it a few months from now.

Gregory Szorc [:gps]

Comment 9

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #8)
> * a TestResults object for holding all test results
> * an object to hold an individual test result

+1

> * tests can have names, a status (pass/fail are the most basic, although
> this could be any number), and probably a description of what happened (or
> maybe None if pass)
> ** and probably an arbitrary key-value pair

I'd advocate for a stronger API and not allowing arbitrary key-value pairs. The reason is you are dealing with a format that will be used by a lot of different systems, some of which may not be under your direct control. If you leave them to their own ways, they will invent their own methods of doing things and you are back at the original problem of having N solutions to the same problem.

Force people to produce a known format so there are no one-offs and downstream tools can consume this one format. Adding new fields should be simple: just add a property/method to the base class. So, people shouldn't complain about that being a barrier.

> There should be output classes.  These are classes that take results and *do
> stuff with them* like post to datazilla:
> https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.
> py

I concur. Test runners can then instantiate whatever output class is appropriate (maybe several) and do the right thing.

> I'm happy to take or co-take this if no one else has time or if this is
> blocking.

As stated on IRC, this is not blocking and is probably low priority. Bug 773394 was originally filed as a DRY violation, nothing more.

Jonathan Griffin (:jgriffin)

Comment 10

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #8)
> Created attachment 642774 [details]
> example
> 
> So we probably should find an owner for this.  In basic form I think we'll
> want....
> 
> * a TestResults object for holding all test results
> * an object to hold an individual test result
> * tests can have names, a status (pass/fail are the most basic, although
> this could be any number), and probably a description of what happened (or
> maybe None if pass)
> ** and probably an arbitrary key-value pair
> 
I'd add duration to the standard list of attributes a test can have (which can be None if the harness doesn't care about test duration).  

Should we allow tests to distinguish between filename and name?  For Marionette tests (and other things using unittest), there's both a filename, and a testname, e.g., test_execute_script.py (filename), and test_execute_script.TestExecuteChrome.test_check_window (combination of filename, class and method names).  In this case, it's nice to have filename as a separate property, in case we will be using it to locate files (to generate a manifest, for example).

For tests that don't need to distinguish these, name could be == filename.

For status, maybe:  pass, fail, known-fail, unexpected-success (or should those just be lumped in with fail?), skipped, error (test harness error as opposed to test failure).

Gregory Szorc [:gps]

Comment 11

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> I'd add duration to the standard list of attributes a test can have
> (which can be None if the harness doesn't care about test duration).  

I think we should *always* care about duration, even if the test harness today doesn't. Tracking it is one more piece of data. It is cheap to keep and you never know when it may come in handy.

Also, I would generally say to not record duration directly. Instead, log the start and end wall times explicitly. By logging absolute times instead of a range, you can do things like visualize how tests are running in parallel. You lose this if you just have a duration.

> Should we allow tests to distinguish between filename and name?

Yes. Filenames can change over time and if test names are constant, this can be used to track identical tests across renames.

> For status, maybe:  pass, fail, known-fail, unexpected-success (or should
> those just be lumped in with fail?), skipped, error (test harness error as
> opposed to test failure).

Store expected result and actual result separately.

Jonathan Griffin (:jgriffin)

Comment 12

•

12 years ago

The TestResults object will also need a variety of attributes to describe the environment that the tests were run in, e.g., https://github.com/mozilla/datazilla_client/blob/master/datazilla/datazilla.py#L55.

This is trickier than it sounds, because we already use different terms for the same os/platform/harness in different pieces of Mozilla infrastructure, and although we can't fix this problem with this package, we don't want to make it worse, either.

Jeff Hammel

Reporter

Comment 13

•

12 years ago

So as this is low-priority (I think?) I'm much relieved :)

I'm in favor of looser descriptions of test metadata rather than imposing more stringent ones for a few reasons:

1. Getting everyone to agree on what defines what all test results should have may be folly
2. If we change what that is, and build on it, and then a test comes along that does not have this characteristic, it may be hard to change the test harness

I'm not against e.g.

def add(self, status, name, description, duration=None, filename=None, **metadata)

OTOH, I would hate to block new frameworks etc on defining too tight of an API.

Duration would be great to have for harnesses that all access to such information, but I'd hate to require it.

filename vs name would be great if you have file based tests.  Is this always the case? (There are also at least URLs.  You could have more than one test per file.  Etc)

We probably should enumerate all of our actual statuses more formally.

In any case, this is all bikeshedding without alternate implementations.  I'd love to see an implementation that was both as tight as possible but also easily extensible.  Along with the argument that "if you give people a dict, they'll abuse  it" goes the argument "If you make something hard to do with the API, people will hack around it". Both extremes are bad

Jeff Hammel

Reporter

Comment 14

•

12 years ago

We should also consider BrowserTestResults, which will be a large but important subclass.  We probably in general should record extensions, Firefox version (buildid, etc), preferences, and anything else about the browser environment.  We should probably also transmit (if not add directly to results) the platform information from mozinfo

Gregory Szorc [:gps]

Comment 15

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #13)
> I'm in favor of looser descriptions of test metadata rather than imposing
> more stringent ones for a few reasons:
> 
> 1. Getting everyone to agree on what defines what all test results should
> have may be folly
> 2. If we change what that is, and build on it, and then a test comes along
> that does not have this characteristic, it may be hard to change the test
> harness

I never said all these things should be required ;)

Make the core set of required attributes small. Test name. Expected result. Actual result. Everything else is optional. Maybe an outputter requires an additional set of attributes to work. If so, the burden is on the test runner to populate them to get that feature.

This still maintains flexibility while imposing some law and order.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 16

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #13)
> def add(self, status, name, description, duration=None, filename=None,
> **metadata)

Please see our discussion on bug 771030 where the usage of a ** argument blocks the extensibility of the API. I would vote that we do not make use of it but also use a real keyword argument here which defaults to None.

Gregory Szorc [:gps]

Comment 17

•

12 years ago

https://etherpad.mozilla.org/test-result-data-model now contains a proposal for the data model.

I think this supersedes what we being discussed in the other bug, so replacing the URL.

URL: https://bugzilla.mozilla.org/show_bug... → https://etherpad.mozilla.org/test-res...

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 18

•

12 years ago

I ha

Assignee: nobody → mbalaur

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 19

•

12 years ago

I have two questions:

1. What results do we accept for the tests? (PASS, FAIL, SKIP etc)

2. What kind of results lists do we want to get from the TestResultCollection? (successful, failed, todo etc)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 20

•

12 years ago

Attached patch just a start for the results module (obsolete) — Details — Splinter Review

As discussed on IRC, the way of creating the test results will be:

1. instantiate the TestResult object
2. run the test
3. call finish(status) on the TestResult object

Attachment #643208 - Flags: feedback?(jhammel)

Attachment #643208 - Flags: feedback?(jgriffin)

Attachment #643208 - Flags: feedback?(gps)

Attachment #643208 - Flags: feedback?(ahalberstadt)

Jeff Hammel

Reporter

Comment 21

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #19)
> I have two questions:
> 
> 1. What results do we accept for the tests? (PASS, FAIL, SKIP etc)
> 
> 2. What kind of results lists do we want to get from the
> TestResultCollection? (successful, failed, todo etc)

Some of these can be gleaned from https://developer.mozilla.org/en/Test_log_format . For 1. I think PASS, FAIL, SKIP are a good start.  We can always add more.

For 2., I think the short answer is we'd like to get as many as possible.  The ones you mention are interesting.  "All tests" of course.  If we care about more then it could be cool to look towards having some sort of 'query()' method for it, but this is probably YAGNI at this point

Andrew Halberstadt [:ahal]

Comment 22

•

12 years ago

For 1) this is what mozlog currently defines: https://github.com/mozilla/mozbase/blob/master/mozlog/mozlog/logger.py#L21 .. Though nothing really uses mozlog yet.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 23

•

12 years ago

A better list is probably:
https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/logger.py#L36

I have taken all those results from mochitests.

Gregory Szorc [:gps]

Comment 24

•

12 years ago

Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

I think it is looking good!

::: moztest/moztest/results.py
@@ +14,5 @@
> +    """ Stores test result data """
> +
> +    POSSIBLE_RESULTS = ('PASS', 'FAIL', 'SKIP', 'ERROR')
> +
> +    def __init__(self, name, filename=None, context=TestContext(''), result_expected='PASS'):

We *may* want to move to a model where we have multiple "context" types. This could be changed later, as the scope of a context increases.

@@ +27,5 @@
> +        self.time_end = None
> +        self.result_expected = result_expected
> +        self.result_actual = None
> +        self.finished = False
> +        self._output = ""

I think a list of strings would be better. Many test runners will read line output as it is available. I think it makes more sense to buffer in the test result and to retain the original structure of this data. We can bikeshed over whether the strings inside the list should have newline characters. I would vote for no.

Attachment #643208 - Flags: feedback?(gps) → feedback+

Jeff Hammel

Reporter

Comment 25

•

12 years ago

Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

+    POSSIBLE_RESULTS = ('PASS', 'FAIL', 'SKIP', 'ERROR')

I would probably not make this a tuple so that subclasses can (more easily) extend this.

+        self.filename = filename or name
I thought we decided that filename was optional and so could be added later?


+    def __str__(self):
+        return '%s - %s' % (self.name, self.result_actual)

I would ideally love to see the default format follow https://developer.mozilla.org/en/Test_log_format (assuming the test is done

Are result_actual and finished redundant?

So for some test harnesses (e.g. Talos), we may record the start and end time separately.  Maybe its better to have these as optional arguments to __init__ and finish and only calculate them if they're None?

+    @property
+    def context(self):
+        """ The TestContext in which the test was run """
+        return self._context
+
+    @context.setter
+    def context(self, c):
+        assert isinstance(c, TestContext)
+
+        self._context = c

So I find the context stuff a bit confusing:

- you can set this in __init__ and if not we create an instance for you (IMHO, we should not create an instance by default, although its hard to figure out from this code.  Maybe we shouldn't have TestContext at all until that is more than just a stub?)

- you can override this at any time with the setter

In general, I find setters and getters that don't do anything but type-checking to be more of a hindrance than a blessing.

The same goes for output.  Maybe output should be an argument to .finish()? Also, are we going to enforce a string?  

+    @property
+    def contexts(self):
+        """ List of unique contexts for the test results contained """
+        cs = [tr.context for tr in self]
+        return list(set(cs))

I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test

+    @property
+    def passed(self):
+        """ Returns a list of TestResults with the PASS result """
+        return self.filter(lambda t: t.result_actual == 'PASS')
+
+    @property
+    def failed(self):
+        """ Returns a list of TestResults with the FAIL result """
+        return self.filter(lambda t: t.result_actual == 'FAIL')
+
+    @property
+    def skipped(self):
+        """ Returns a list of TestResults with the SKIP result """
+        return self.filter(lambda t: t.result_actual == 'SKIP')

To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.  In the three-item case, what is here works.  However, if we have more than three states, then you might have more than one PASS, FAIL state.  If we are only going to have 'PASS', 'FAIL', 'SKIP' (not what I would advocate!), then we should probably make them not strings and break finish() into pass(), fail(), skip(), etc

Overall a good start.  I do think there's a bit to consider here, though.

For one, I'd love to see one or more existing harnesses be used as input here.  While I empathize with doing it right "from scratch", there are a lot of lessons in the existing harnesses.  Since the goal is to support them, we will want to ensure that whatever we're doing here can do so.

Attachment #643208 - Flags: feedback?(jhammel)

Attachment #643208 - Flags: feedback?(gps)

Attachment #643208 - Flags: feedback+

Jonathan Griffin (:jgriffin)

Comment 26

•

12 years ago

Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

I also think this is a good start.

I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').

We also need to support several different modes of operation.

1 - the type that this model is most closely adapted to right now: populating results realtime via some Python testrunner
2 - populating results by post-processing of a log file.  In this case, we'd need the ability to specify things like start and end times in the same call (presumably in __init__); I don't think it makes sense to call finish() in this context.
3 - populating results by near realtime streaming of log data.  We may want a separate subclass for this, and in fact we'd need to figure out how this relates to our existing logparser code, so we probably want to punt on this at this stage of design.

In case 2 above, for some test harnesses, we will have a test duration, but not start or end times; we need to support that as well in our model.

Attachment #643208 - Flags: feedback?(jgriffin) → feedback+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 27

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #21)
> (In reply to Mihnea Dobrescu-Balaur from comment #19)
> 
> For 2., I think the short answer is we'd like to get as many as possible. 
> The ones you mention are interesting.  "All tests" of course.  If we care
> about more then it could be cool to look towards having some sort of
> 'query()' method for it, but this is probably YAGNI at this point

There is the filter method that just takes in a predicate. I guess that's what you meant?

Jeff Hammel

Reporter

Comment 28

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #27)
> (In reply to Jeff Hammel [:jhammel] from comment #21)
> > (In reply to Mihnea Dobrescu-Balaur from comment #19)
> > 
> > For 2., I think the short answer is we'd like to get as many as possible. 
> > The ones you mention are interesting.  "All tests" of course.  If we care
> > about more then it could be cool to look towards having some sort of
> > 'query()' method for it, but this is probably YAGNI at this point
> 
> There is the filter method that just takes in a predicate. I guess that's
> what you meant?

Yep, that should be good enough for the time being

Gregory Szorc [:gps]

Comment 29

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #26)
> We also need to support several different modes of operation.
> 
> 1 - the type that this model is most closely adapted to right now:
> populating results realtime via some Python testrunner
> 2 - populating results by post-processing of a log file.  In this case, we'd
> need the ability to specify things like start and end times in the same call
> (presumably in __init__); I don't think it makes sense to call finish() in
> this context.
> 3 - populating results by near realtime streaming of log data.  We may want
> a separate subclass for this, and in fact we'd need to figure out how this
> relates to our existing logparser code, so we probably want to punt on this
> at this stage of design.

I think #2 and #3 are nearly identical.

All use cases can be accommodated by making __init__ take the fewest number of required arguments possible. Whatever creates TestResult instances can populate fields as it needs to (including the end time, bypassing finish() if it needs to). finish() is really just a convenience method, IMO. I originally proposed it to encapsulate the verification logic. Perhaps verification could be factored out and called automagically at important times?

I would also argue that we should minimize the amount of "log parsing" in our overall design. I would generally favor the end state to be tests which output machine-readable data (e.g. newline-delimited JSON arrays or objects). We'll need some form of parsing in the test runners because there is IPC no matter what. But, I think the long term goal should be to have everything normalized to TestResult instances as quickly as possible and to maintain that representation throughout all involved systems. In other words, the raw output from the test harnesses should be irrelevant and hidden from everyone except those who maintain them. If we need a human-friendly representation, we can call a method on TestResult.

This necessitates serialization/deserialization code for TestResult. Would it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?

I recognize that my vision may be a bit grandiose. The large practical concern I see is there are systems relying on the existing log output. Yes. And that's why TestResult instances holds the raw output from the test runner. We can use that as a shim to preserve the existing harness output until downstream systems (including humans) convert to whatever the future holds.

Jeff Hammel

Reporter

Comment 30

•

12 years ago

> This necessitates serialization/deserialization code for TestResult. Would it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?

IMHO, very no.  I envision output as a completely different set of modules that take a test result container (and/or a test result) and serialize it.  I would not want to add this to the testresult class.  Test results and test results containers should hold information.  Output should be handled separately and take these as arguments.

Andrew Halberstadt [:ahal]

Comment 31

•

12 years ago

Comment on attachment 643208 [details] [diff] [review]
just a start for the results module

Review of attachment 643208 [details] [diff] [review]:
-----------------------------------------------------------------

+1, good start!

::: moztest/moztest/results.py
@@ +51,5 @@
> +        if self.finished:
> +            return self.time_end - self.time_start
> +        else:
> +            # returns the elapsed time
> +            return time.time() - self.time_start

I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.

@@ +111,5 @@
> +
> +    def append(self, test_result):
> +        assert isinstance(test_result, TestResult)
> +
> +        list.append(self, test_result)

Should catch list.insert() too

Attachment #643208 - Flags: feedback?(jgriffin)

Attachment #643208 - Flags: feedback?(ahalberstadt)

Attachment #643208 - Flags: feedback+

Andrew Halberstadt [:ahal]

Updated

•

12 years ago

Attachment #643208 - Flags: feedback?(jgriffin) → feedback+

Jeff Hammel

Reporter

Comment 32

•

12 years ago

> @@ +111,5 @@
> > +
> > +    def append(self, test_result):
> > +        assert isinstance(test_result, TestResult)
> > +
> > +        list.append(self, test_result)
> 
> Should catch list.insert() too

I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 33

•

12 years ago

> We *may* want to move to a model where we have multiple "context" types.
> This could be changed later, as the scope of a context increases.

As we talked on IRC, the Context thing is kind of a stub at the moment, until we actually define how we will represent it.

> I think a list of strings would be better. Many test runners will read line
> output as it is available. I think it makes more sense to buffer in the test
> result and to retain the original structure of this data. We can bikeshed
> over whether the strings inside the list should have newline characters. I
> would vote for no.

I thought we would use just a string to which we would append strings with newlines. Will change.

> I would probably not make this a tuple so that subclasses can (more easily) extend this.

Sure, I'll initiate a discussion about the representation of the context.

> I thought we decided that filename was optional and so could be added later?

It is optional, and :jgriffin said it should default to the name. Should it just be None?

> I would ideally love to see the default format follow https://developer.mozilla.
> org/en/Test_log_format (assuming the test is done

Ok, will do.

> Are result_actual and finished redundant?

Right now, yes, they are. I'll just remove finished and make a property.

> So for some test harnesses (e.g. Talos), we may record the start and end time separately.
> Maybe its better to have these as optional arguments to __init__ and finish and only
> calculate them if they're None?

There should be only one way of using this so that it doesn't get confusing. We can adapt whatever test runner after that, I think.

So let's agree on a way to do this and just use that everywhere.

> The same goes for output.  Maybe output should be an argument to .finish()? Also, are we
> going to enforce a string?

Output should be allowed to be added incrementally during the run process.

Regarding type checking, I did it because that's what it seemed is wanted from the discussions. I personally don't like it and find it very "non-pythonic".

> I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test

So the container would keep track of contexts and there should be one per run (per manifest) ?

> To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.

I don't understand, can you please elaborate?

> For one, I'd love to see one or more existing harnesses be used as input here.

Ok, I'll write a test script that runs a harness and stores its results in these classes.

> I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').

So should we just add methods like known_fails = tests with expected = actual = fail ?

> We also need to support several different modes of operation. [...]

That would mean having a jack-of-all-trades class with lots of functionality. Is this what we want?

> I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.

Talked about this on IRC and the elasped time seemed the preferred option. We should debate this, I guess.

> I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing

As I said above, I'm pro ducktyping and not doing type checking. It just seemed to me that this is desired. Can I just drop all the type checking?

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 34

•

12 years ago

Whoa, sorry about hte long lines, I thought bugzilla would take care of those.

Jeff Hammel

Reporter

Comment 35

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #33)
> > We *may* want to move to a model where we have multiple "context" types.
> > This could be changed later, as the scope of a context increases.
> 
> As we talked on IRC, the Context thing is kind of a stub at the moment,
> until we actually define how we will represent it.
> 
> > I think a list of strings would be better. Many test runners will read line
> > output as it is available. I think it makes more sense to buffer in the test
> > result and to retain the original structure of this data. We can bikeshed
> > over whether the strings inside the list should have newline characters. I
> > would vote for no.
> 
> I thought we would use just a string to which we would append strings with
> newlines. Will change.

One bad thing about appending strings is that, in general, you may deallocate the string, memcpy it and replace it.

> > I would probably not make this a tuple so that subclasses can (more easily) extend this.
> 
> Sure, I'll initiate a discussion about the representation of the context.

Sounds good.

> > I thought we decided that filename was optional and so could be added later?
> 
> It is optional, and :jgriffin said it should default to the name. Should it
> just be None?

I thought in the ctor we were only taking required args? We should either do one or the other.

> > I would ideally love to see the default format follow https://developer.mozilla.
> > org/en/Test_log_format (assuming the test is done
> 
> Ok, will do.
> 
> > Are result_actual and finished redundant?
> 
> Right now, yes, they are. I'll just remove finished and make a property.
> 
> > So for some test harnesses (e.g. Talos), we may record the start and end time separately.
> > Maybe its better to have these as optional arguments to __init__ and finish and only
> > calculate them if they're None?
> 
> There should be only one way of using this so that it doesn't get confusing.
> We can adapt whatever test runner after that, I think.

So imagine this scenario:  You have a harness which launches a browser and then an extension-side JS runner starts and stops the test.  So the JS knows the start time and end time, not the python side. This will not work here.

> So let's agree on a way to do this and just use that everywhere.

Easy to say.  I think agreeing on a way should be informed by the existing harnesses, not just some ideal.

> > The same goes for output.  Maybe output should be an argument to .finish()? Also, are we
> > going to enforce a string?
> 
> Output should be allowed to be added incrementally during the run process.
> 
> Regarding type checking, I did it because that's what it seemed is wanted
> from the discussions. I personally don't like it and find it very
> "non-pythonic".
> 
> > I would be more inclined to record the contexts on the result container itself, have a current context, and have "something" add that to the latest added test
> 
> So the container would keep track of contexts and there should be one per
> run (per manifest) ?

Not necessarily.  The context may switch during a "run"/"suite".  Imagine if you do some test, change a preference, and do some more tests

OTOH, the machine information will be the same

> > To me, this is a bit magic.  If we're using strings (as we do here), I would prefer something that mapped a string to 'PASS', 'FAIL', etc. that dictates the success of a test.
> 
> I don't understand, can you please elaborate?

So we have states: PASS, FAIL, REBOOT, SKIP, ABORT, CRASH, etc.  Right now there are only three in your current patch. PASS and assumedly SKIP are "good" (I assume); FAIL is bad.  But ABORT, CRASH, etc, are also bad.  I guess if you want to see the tests that are bad, you need something bucketing states into at least good and bad.

Similarly, if your expected != what you got, this is bad.  

> > For one, I'd love to see one or more existing harnesses be used as input here.
> 
> Ok, I'll write a test script that runs a harness and stores its results in
> these classes.
> 
> > I think we need to distinguish between failed tests (implicitly, unexpected failures) and known fail tests, since many of our test harnesses do.  That is, failed() should only return tests where actual_result == FAIL and expected_result == PASS; we should have another method to return expected failures (which several of our harnesses call 'todo').
> 
> So should we just add methods like known_fails = tests with expected =
> actual = fail ?
> 
> > We also need to support several different modes of operation. [...]
> 
> That would mean having a jack-of-all-trades class with lots of
> functionality. Is this what we want?

I'll defer to :jgriffin here, but I think so.

> > I can imagine this causing some confusion if a script is expecting a test to be finished when it isn't. I don't think elapsed time will be all that useful so maybe it should just return None.
> 
> Talked about this on IRC and the elasped time seemed the preferred option.
> We should debate this, I guess.

Maybe we can call this 'elapsed' or something different?  Does that help?

> > I'm not really sure we should have checking in any of these methods.  What about slice insertion what about setitem ? In any case, if we're inheriting from list, I think it is overkill doing type-checking.  It also kills the possibility of duck-typing
> 
> As I said above, I'm pro ducktyping and not doing type checking. It just
> seemed to me that this is desired. Can I just drop all the type checking?

I'm +1 on dropping type-checking.  It will be easy to add in if we actually care about it.  I think the errors in incorporating these classes into a harness with incorrect types will be fairly obvious to figure out (and, for the most part, will not change once the incorporation is done).

Jonathan Griffin (:jgriffin)

Comment 36

•

12 years ago

> > I think we need to distinguish between failed tests (implicitly, unexpected 
> > failures) and known fail tests, since many of our test harnesses do.  That is, 
> > failed() should only return tests where actual_result == FAIL and expected_result 
> > == PASS; we should have another method to return expected failures (which several 
> > of our harnesses call 'todo').
> 
> So should we just add methods like known_fails = tests with expected = actual = 
> fail ?

Yes, I think so, maybe known_failed() to be consistent with the existing failed() method.  And failed() should not return these.

> > We also need to support several different modes of operation. [...]
> 
> That would mean having a jack-of-all-trades class with lots of functionality. Is 
> this what we want?

Our current test harnesses operate in fairly different ways.  I think all we need is to provide methods that work realtime (i.e., providing start and end times automatically as TestResult objects are created/completed) and ex post facto (i.e., letting the consumer provide these explicitly).

Gregory Szorc [:gps]

Comment 37

•

12 years ago

Not to bike shed, but perhaps REBOOT could be rolled into a general "system exception" enumeration.

Jeff Hammel

Reporter

Comment 38

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #37)
> Not to bike shed, but perhaps REBOOT could be rolled into a general "system
> exception" enumeration.

Sure.  All the more reason, IMHO, to match the exact status to a class of status (could even be a (suggested) exit code, if desired)

Jonathan Griffin (:jgriffin)

Comment 39

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #29)

> 
> I would also argue that we should minimize the amount of "log parsing" in
> our overall design. I would generally favor the end state to be tests which
> output machine-readable data (e.g. newline-delimited JSON arrays or
> objects). We'll need some form of parsing in the test runners because there
> is IPC no matter what. But, I think the long term goal should be to have
> everything normalized to TestResult instances as quickly as possible and to
> maintain that representation throughout all involved systems. In other
> words, the raw output from the test harnesses should be irrelevant and
> hidden from everyone except those who maintain them. If we need a
> human-friendly representation, we can call a method on TestResult.
> 
> This necessitates serialization/deserialization code for TestResult. Would
> it be appropriate to add toJSON() and @staticmethod fromJSON() on TestResult?
> 
> I recognize that my vision may be a bit grandiose. The large practical
> concern I see is there are systems relying on the existing log output. Yes.
> And that's why TestResult instances holds the raw output from the test
> runner. We can use that as a shim to preserve the existing harness output
> until downstream systems (including humans) convert to whatever the future
> holds.

I agree this is an idealized state that we should move towards.

That said, in practical terms, the test harnesses are not always the best authority about everything that happened during a test.  Mostly, they know about test failures (although not always, see e.g. bug 677964) and maybe JS exceptions.  They generally don't know about other things which might be of interest to test authors/runners, depending on the context, including assertions, purify/valgrind errors, shutdown crashes/leaks, gonk errors (in the case of B2G), etc.  I think log parsing will always be desirable in at least some contexts.

I suppose it's an open question about how much of these non-test-failures we might want to include in this model; I'd like to retain the ability to include all of the above, again depending on context.

However, there's nothing about that which directly affects this model, except for requiring support for creating TestResult objects during post-processing.  Populating TestResults from a log should probably belong to a separate class or package.

One thing we haven't considered in our design are errors that do not belong to specific tests.  For example, when mochitest is run by TBPL, it can fail for a number of reasons that can't be pinned to a specific test.  These can be buildbot errors, or shutdown leaks/crashes, for example.  Do we want to include these in our model?  I'd argue yes.  We could reflect these kinds of errors using TestResult objects that have no name or have a special attribute, or we could add some additional attributes to TestResultCollection, and I'm sure there are other options.

Jeff Hammel

Reporter

Comment 40

•

12 years ago

> One thing we haven't considered in our design are errors that do not belong to specific tests.  For example, when mochitest is run by TBPL, it can fail for a number of reasons that can't be pinned to a specific test.  These can be buildbot errors, or shutdown leaks/crashes, for example.  Do we want to include these in our model?  I'd argue yes.

I'd argue yes as well

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 41

•

12 years ago

Attached patch incremental update; modified some stuff according to feedback (obsolete) — Details — Splinter Review

* dropped high level type checking
* added the ability to specify time_start and time_end in ctor & finish() method
* correct __str__ for TestResult
* real-world POSSIBLE_RESULTS
* output is a list (of strings)
* added Description to TestResult
* removed redundant finished attribute
* new tests_with_result(result) method for Collection class
* successful and unsuccessful tests lists for Collection class
* explicit [all] tests attribute for Collection class

Will work on the Context thing after I get some opinions on the mail thread (tools@lists.mozilla.org).

After I get to a base skeleton that everyone accepts I'll try to fit it to be used by an existing test suite.

Attachment #643208 - Attachment is obsolete: true

Attachment #643208 - Flags: feedback?(gps)

Gregory Szorc [:gps]

Comment 42

•

12 years ago

Comment on attachment 643678 [details] [diff] [review]
incremental update; modified some stuff according to feedback

Review of attachment 643678 [details] [diff] [review]:
-----------------------------------------------------------------

::: moztest/moztest/results.py
@@ +20,5 @@
> +        'TEST-UNEXPECTED-PASS',
> +        'TEST-UNEXPECTED-FAIL',
> +        'TEST-KNOWN-FAIL',
> +        'TEST-SKIPPED',
> +        'TEST-ERROR'

The 'TEST-' prefix isn't necessary.

What do Python conventions saying about enumerations like this? Should there be class level "constants" (which can be strings) or should people use strings everywhere?

@@ +97,5 @@
> +        return list(set(cs))
> +
> +    def filter(self, predicate):
> +        """ Returns a list of TestResults that satisfy a given predicate """
> +        return [tr for tr in self if predicate(tr)]

Use a generator?

@@ +116,5 @@
> +
> +    @property
> +    def tests(self):
> +        """ List of all tests in the collection """
> +        return self

Why?

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 43

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #42)
> Comment on attachment 643678 [details] [diff] [review]
> incremental update; modified some stuff according to feedback
> 
> Review of attachment 643678 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: moztest/moztest/results.py
> @@ +20,5 @@
> > +        'TEST-UNEXPECTED-PASS',
> > +        'TEST-UNEXPECTED-FAIL',
> > +        'TEST-KNOWN-FAIL',
> > +        'TEST-SKIPPED',
> > +        'TEST-ERROR'
> 
> The 'TEST-' prefix isn't necessary.

I found it included in [1] and [2]. *confused*
[1] https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/logger.py#L36
[2] https://github.com/mozilla/mozbase/blob/master/mozlog/mozlog/logger.py#L21

> 
> What do Python conventions saying about enumerations like this? Should there
> be class level "constants" (which can be strings) or should people use
> strings everywhere?

Don't know, PEP8 only talks about defining constants at module-level.

> 
> @@ +97,5 @@
> > +        return list(set(cs))
> > +
> > +    def filter(self, predicate):
> > +        """ Returns a list of TestResults that satisfy a given predicate """
> > +        return [tr for tr in self if predicate(tr)]
> 
> Use a generator?

The etherpad document said "[..] list of [..]". Should I do a generator? It's no problem, just replacing [] with ().

> 
> @@ +116,5 @@
> > +
> > +    @property
> > +    def tests(self):
> > +        """ List of all tests in the collection """
> > +        return self
> 
> Why?

In case someone expects it to behave like the usual python test result collections (i.e. unittest).

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 44

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #43)
> (In reply to Gregory Szorc [:gps] from comment #42)
> > 
> > What do Python conventions saying about enumerations like this? Should there
> > be class level "constants" (which can be strings) or should people use
> > strings everywhere?
> 
> Don't know, PEP8 only talks about defining constants at module-level.

Would you rather prefer a class only with constants? Like:

class Results:
    PASS = 0
    FAIL = 1
[...]

Jeff Hammel

Reporter

Comment 45

•

12 years ago

Weighing in on the enumeration issue, it kinda depends on what the enumerates are for.  I'm still more in favor of a mapping.

If we don't want a mapping, then there are a number of solutions.

- we can use strings, as is.  This is common, albeit there is nothing telling what the strings are.  If the *only* place we care about this is `if expected_result == actual_result`: then this doesn't matter, although it does make it easier to misspell and have a harder-to-detect error

- we can use singletons

class State(object):
    def __init__(self, name): self.name = name

state_names = ['PASS', 'FAIL', 'SKIP']
for name in state_names:
    globals()[name] = State(name) # or whatever, e.g. setattr(self.states, name, name)

This way you can still check, e.g. results.PASS != results.FAIL, etc
This might also be extensible.
(I'm also not necessarily suggesting these live in global scope, I just wrote it this way as a cheap example.  Could be in class scope, etc.

I'd prefer not to do something like

class Results:
    PASS = 0
    FAIL = 1

This assigns an arbitrary value to a state and you can't loop over them effectively.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 46

•

12 years ago

:jhammel, could you please show a snippet of code of the mapping you envision? Because I'm not sure I understand what you mean.

Thanks!

Jeff Hammel

Reporter

Comment 47

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #46)
> :jhammel, could you please show a snippet of code of the mapping you
> envision? Because I'm not sure I understand what you mean.
> 
> Thanks!

So I was thinking of a mapping of the (free-form) state to some sort of category.  There is probably only one success states, but there could be e.g. multiple reasons the test was not ran (SKIP) or failed (the test failed, many sorts of infrastructure failure, browser crash, etc).

Thinking it over, though, this might be YAGNI.  If we force passing in the expected state (or maybe defaulting to PASS), then checking equality with the actual state is enough to get this down to a binary success/non-success, and that's probably all we need.  Other tools can classify e.g. the various kinds of infrastructure failures.

Vaguely relatedly, would be it a good idea to have an optional message for test results?  PASS may not need a message, but there could be a reason for skipping, or a failure message.  I'm thinking a la TinderboxPrint here or https://developer.mozilla.org/en/Test_log_format . 

For example, if we ported Talos to this, a pass or fail message could indicate some summary of the p-test for SfN. E.g.

- talos finishes a test
- results are uploaded to datazilla
- datazilla tells the result of the p-test
- we set this as the pass/fail message

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 48

•

12 years ago

Right, so there will be a string attribute called "reason" or something like that which states why a test is skipped or why it failed (with the stack trace or log being added to the output attribute).

How do we pass this in to the TestResult? Only via the finish() method? Or can it be also passed in via the constructor?

I'm still not sure of the "correct" way test harnesses are supposed/expected to operate with this.

Jeff Hammel

Reporter

Comment 49

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #48)
> Right, so there will be a string attribute called "reason" or something like
> that which states why a test is skipped or why it failed (with the stack
> trace or log being added to the output attribute).

So I was intending (albeit, I won't claim it to be the only way) that reason/message be something short.  The log will be available separately.  OTOH, maybe keeping a stack trace in results is a good idea? (or not?)

> How do we pass this in to the TestResult? Only via the finish() method? Or
> can it be also passed in via the constructor?

I would pass it in the finish() method. 

> I'm still not sure of the "correct" way test harnesses are supposed/expected
> to operate with this.

To be honest, neither am I ;) I would need to try to hook this up to a harness (or several) before I could really give this much insight

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 50

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #49)
> 
> So I was intending (albeit, I won't claim it to be the only way) that
> reason/message be something short. 

I agree.

> The log will be available separately. 
> OTOH, maybe keeping a stack trace in results is a good idea? (or not?)

From what I've seen when a test fails it says something like "TEST FAILED: <short reason> View following log:" [...] and then's the log.

We could keep to that, I guess?

> 
> To be honest, neither am I ;) I would need to try to hook this up to a
> harness (or several) before I could really give this much insight

I'll look at some code and see.

Jonathan Griffin (:jgriffin)

Comment 51

•

12 years ago

> > Right, so there will be a string attribute called "reason" or something like
> > that which states why a test is skipped or why it failed (with the stack
> > trace or log being added to the output attribute).
> 
> So I was intending (albeit, I won't claim it to be the only way) that
> reason/message be something short.  The log will be available separately. 
> OTOH, maybe keeping a stack trace in results is a good idea? (or not?)
> 
We already have a description attr in the model; is this the same as 'reason'?  I think we wouldn't need both.

I do think we should keep stack traces in the model, in a separate attribute.  It can be useful for generating xUnit output, posting to Autolog, etc.
 
> > To be honest, neither am I ;) I would need to try to hook this up to a
> > harness (or several) before I could really give this much insight
> 
> I'll look at some code and see.

I think a productive approach would be to find failing logs for all the different harnesses at https://tbpl.mozilla.org/, and see how the failure data they report would be represented in this model.

Jeff Hammel

Reporter

Comment 52

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #51)
> > > Right, so there will be a string attribute called "reason" or something like
> > > that which states why a test is skipped or why it failed (with the stack
> > > trace or log being added to the output attribute).
> > 
> > So I was intending (albeit, I won't claim it to be the only way) that
> > reason/message be something short.  The log will be available separately. 
> > OTOH, maybe keeping a stack trace in results is a good idea? (or not?)
> > 
> We already have a description attr in the model; is this the same as
> 'reason'?  I think we wouldn't need both.

My thoughts are description == the description of the test, reason/message == why you got a certain state (e.g. finish(state='FAIL', reason='Node [id="foobar"] not found'))

> I do think we should keep stack traces in the model, in a separate
> attribute.  It can be useful for generating xUnit output, posting to
> Autolog, etc.
>  
> > > To be honest, neither am I ;) I would need to try to hook this up to a
> > > harness (or several) before I could really give this much insight
> > 
> > I'll look at some code and see.
> 
> I think a productive approach would be to find failing logs for all the
> different harnesses at https://tbpl.mozilla.org/, and see how the failure
> data they report would be represented in this model.

+1

Jeff Hammel

Reporter

Updated

•

12 years ago

Blocks: 775756

cmtalbert

Comment 53

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #51)
> I think a productive approach would be to find failing logs for all the
> different harnesses at https://tbpl.mozilla.org/, and see how the failure
> data they report would be represented in this model.
Leak reporting, crash reporting, and logcat reporting are not supported by this model. For leak and crash reporting, we output a summary block of information and then a giant blob of details. (you could think of it as a BLOB anyway, but it is parsable information). I think that moving toward JSON representations of this information that we can then de-serialize into human readable (aka current log format) would be better. Until the outside tools (buildbot, tbpl, orangefactor etc) are updated to deal with the JSON we can have the harnesses de-serialize their output.

Logcat is a special case, it's a total blob that is relatively unstructured and is not easy to deal with (for machines or people).

All our test systems are python/JS systems. We have python runners that then call JS that runs inside the platform and then information is returned to python (sometimes just the status of the binary under test's output code, sometimes streams of results data that is generated on the JS side and spewed to stdout which the python side picks up, sometimes it is an intense communication between python and JS in real time as the test runs.)  So, I think we can create a set of methods for the class around that. In particular, there will always have to be some kind of profile setup (to inject that javascript into the system), some kind of cleanup, some kind of options handling, and some kind of "run" functionality.

All of these methods should be overridable/extendable by the specific test classes that implement a test type (a mochitest runner for instance).

And I'd recommend dropping the "TEST-" prefixes everywhere. Those are silly hold-overs of our current logging system, which isn't worth codifying into a solution like this. I like "UNEXPECTED-FAIL, EXPECTED-FAIL" etc. much better.

Those are some thoughts I had while reading over the bug and the etherpad.  This is a great idea, and great work. I'm excited to see what you come up with.

Andrew Halberstadt [:ahal]

Comment 54

•

12 years ago

> Those are some thoughts I had while reading over the bug and the etherpad.  This is a great idea, and great work. I'm excited to see what you come up with.

Agreed, lots of good work so far. My one comment is that it seems like there is still some confusion as to what exactly should go into v1. At first we just wanted a Results class, but now it seems like we are doing an entire generic test harness. This is really exciting, but there are many different components each with many different possible implementations.

I think it would be wise to start with something basic (i.e the Results class since that seems to be where all the work has gone so far), get that reviewed and checked in (we don't need everyone to be in 100% agreement at this stage), and then file follow up bugs for additional features/components or for more in depth discussion on items that need a closer look (i.e what exact attributes the result class should/shouldn't have).

Jeff Hammel

Reporter

Comment 55

•

12 years ago

+1 on starting on the results class.  We'll need the context class for this too.

I'd probably go from there to a few output formats (xunit,datazilla, whatever) and then (or concurrently) try to get some existing harnesses to use the results class. marionette, mozmill, and mochitest seem easy targets.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 56

•

12 years ago

Attached patch Support for importing unittest (example: Marionette) results; xUnit output (obsolete) — Details — Splinter Review

A sample of the representation of tests results: [1]
A sample xunit xml file: [2]


[1] http://pastebin.mozilla.org/1712582
[2] http://swarm.cs.pub.ro/~mihneadb/out.xml

Attachment #643678 - Attachment is obsolete: true

Attachment #645101 - Flags: feedback?(jhammel)

Attachment #645101 - Flags: feedback?(jgriffin)

Attachment #645101 - Flags: feedback?(ahalberstadt)

Gregory Szorc [:gps]

Comment 57

•

12 years ago

And the XML formats being used are slightly different!

For xpcshell tests, I used http://stackoverflow.com/questions/3787886/producing-a-correct-and-complete-xml-unit-testing-report-for-hudson (bug 725478). This is using the Ant (Java) XML "schema".

For Marionette (bug 774419), it appears the XML schema for xUnit.net was used (https://xunit.codeplex.com/wikipage?title=XmlFormat).

They are mostly the same, but there are some differences.

Similar but different XML formats make me sad. I guess we need to decide if we should support 1 or both. Ugh.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 58

•

12 years ago

:gps : yes, I used the codeplex.com one because that's the one :jgriffin linked me. :)

I don't mind switching to another format, just tell me which one we should support. I'd say it's better to only have one, this being a generic/shared library and all.

Jeff Hammel

Reporter

Comment 59

•

12 years ago

+1 on having a single format, preferably one that gets us what we want

Jeff Hammel

Reporter

Comment 60

•

12 years ago

Comment on attachment 645101 [details] [diff] [review]
Support for importing unittest (example: Marionette) results; xUnit output

+        self.env = env or {}

Is this supposed to be the system environment variables?  If so, why not os.environ ?

(Some comments would help)

TestContext should probably use mozinfo to gather machine information.  

arch should be None by default, not the empty string.

I'm not sure I see the point of the Results class as it is.  You essentially list the same thing three times 'PASS' vs PASS vs PASS in POSSIBLE_RESULTS.  Also, since the latter is a tuple, it cannot be altered.  Is that desired?  If so why (and it should be documented).  If the only thing that is desired is POSSIBLE_RESULTS, we should just make a global variable containing the strings since we aren't getting any information over this in this implementation

+        # assert isinstance(context, TestContext)
Please remove

+    def finish(self, result, time_end=None, output=[], reason='N/A'):

Don't use mutable default function arguments.

reason should probably be None by default

I'm not sure if we need to keep time_taken as an attribute

+            t = TestResult(name=str(test).split()[0], test_class=get_class(test),
+                           time_start=0, result_expected=result_expected)
+            t.finish(result_actual, time_end=0, reason=relevant_line(output),
+                     output=output)

Why 0 here?  


+    def add_unittest_result(self, result):
+        """ Adds the unittest result provided to the collection"""

Should definitely note a *python* unittest result

+def from_unittest_results(results_list):

If this is the way that (python) unittest results should be populated, I would prefer

- a subclass just for python unittest results
or
- a classmethod that makes an instance based on unittest results

+    def to_xunit(self):

This should go in a whole separate file, e.g. output.py, inheriting from an abstract base class that takes a test collection and yields a result.  Output should not live on the results class.  the results class is there to store results.  output classes will be there to format them

Attachment #645101 - Flags: feedback?(jhammel)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 61

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #60)
> Comment on attachment 645101 [details] [diff] [review]
> Support for importing unittest (example: Marionette) results; xUnit output
> 
> +        self.env = env or {}
> 
> Is this supposed to be the system environment variables?  If so, why not
> os.environ ?

I was thinking we would pass in os.environ if we wanted to in the constructor.

> 
> (Some comments would help)
> 
> TestContext should probably use mozinfo to gather machine information.  
> 
> arch should be None by default, not the empty string.

Ok.

> 
> I'm not sure I see the point of the Results class as it is.  You essentially
> list the same thing three times 'PASS' vs PASS vs PASS in POSSIBLE_RESULTS. 
> Also, since the latter is a tuple, it cannot be altered.  Is that desired? 
> If so why (and it should be documented).  If the only thing that is desired
> is POSSIBLE_RESULTS, we should just make a global variable containing the
> strings since we aren't getting any information over this in this
> implementation

We were talking about having like an enum of them and not passing in strings so we won't have so many typos by bugs. It was as you said in the previous patch.

> 
> +        # assert isinstance(context, TestContext)
> Please remove
> 
> +    def finish(self, result, time_end=None, output=[], reason='N/A'):
> 
> Don't use mutable default function arguments.
> 
> reason should probably be None by default

Ok.

> 
> I'm not sure if we need to keep time_taken as an attribute

If we don't there's kind of no way of knowing how much time the unittests took.

> 
> +            t = TestResult(name=str(test).split()[0],
> test_class=get_class(test),
> +                           time_start=0, result_expected=result_expected)
> +            t.finish(result_actual, time_end=0,
> reason=relevant_line(output),
> +                     output=output)
> 
> Why 0 here?

Because there's no way of knowing how much time each test from the unittest suite took and so we need to pass in some dummy (int) values.

> 
> 
> +    def add_unittest_result(self, result):
> +        """ Adds the unittest result provided to the collection"""
> 
> Should definitely note a *python* unittest result

Ok, I figured since it's written in a single word (and not "unit test") it would be obvious. Will change

> 
> +def from_unittest_results(results_list):
> 
> If this is the way that (python) unittest results should be populated, I
> would prefer
> 
> - a subclass just for python unittest results
> or
> - a classmethod that makes an instance based on unittest results

Ok, I'll make it a classmethod.

> 
> +    def to_xunit(self):
> 
> This should go in a whole separate file, e.g. output.py, inheriting from an
> abstract base class that takes a test collection and yields a result. 
> Output should not live on the results class.  the results class is there to
> store results.  output classes will be there to format them

It seemed to "java-like", but ok. :)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 62

•

12 years ago

Attached patch separated output; update according to feedback (obsolete) — Details — Splinter Review

Attachment #645101 - Attachment is obsolete: true

Attachment #645101 - Flags: feedback?(jgriffin)

Attachment #645101 - Flags: feedback?(ahalberstadt)

Attachment #645444 - Flags: feedback?(jhammel)

Jeff Hammel

Reporter

Comment 63

•

12 years ago

Comment on attachment 645444 [details] [diff] [review]
separated output; update according to feedback

So I'm still not sure about enumerating results, e.g.

PASS = 'PASS'

As I'm not sure what that buys us vs

POSSIBLE_RESULTS = set(['PASS', 'FAIL', ...])

Overall this is looking good.

I wonder how output classes should in general deal with output.  There are three usual cases:

1. You want to output results to a file
2. You want to output results to the screen
3. You want to upload results (e.g. HTTP POST) to a server

A few ways of doing this:

- Output.__call__ can always return a string.  Then the caller can decide what to do with this
- Output.__call__ can optionally accept a "file like object" which is written to.  If None is specified, the function can make a StringIO instance and call getvalue on at the end

I imagine at some point we're going to want something to route output meaningfully, but probably not until the pattern becomes apparent.

:jgriffin, I'd love any feedback you could give on this

Attachment #645444 - Flags: feedback?(jhammel) → feedback+

Jonathan Griffin (:jgriffin)

Comment 64

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #63)

> I wonder how output classes should in general deal with output.  There are
> three usual cases:
> 
> 1. You want to output results to a file
> 2. You want to output results to the screen
> 3. You want to upload results (e.g. HTTP POST) to a server
> 
> A few ways of doing this:
> 
> - Output.__call__ can always return a string.  Then the caller can decide
> what to do with this
> - Output.__call__ can optionally accept a "file like object" which is
> written to.  If None is specified, the function can make a StringIO instance
> and call getvalue on at the end
> 
> I imagine at some point we're going to want something to route output
> meaningfully, but probably not until the pattern becomes apparent.
> 
> :jgriffin, I'd love any feedback you could give on this

So in general we want to either:

 1 provide some serialized output, either to a file or stdout
 2 post some data somewhere, presumably using an existing library

For cases like Datazilla and Autolog, I think the output class should handle 2, but we'd also want to provide access to serialized output for logging/debugging, assuming that's available from the underlying package.

For this reason, maybe we should use explicit methods rather than __call__?  If an xUnit class uses __call__ to generate its xUnit output to a file or stdout, what should __call__ do for the Datazilla case?  Give you the serialized output, or actually post to Datazilla?  To avoid this confusion, maybe we should have serialize() and post() methods, or something similar.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 65

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #63)
> Comment on attachment 645444 [details] [diff] [review]
> separated output; update according to feedback
> 
> So I'm still not sure about enumerating results, e.g.
> 
> PASS = 'PASS'
> 
> As I'm not sure what that buys us vs
> 
> POSSIBLE_RESULTS = set(['PASS', 'FAIL', ...])
> 

I don't really care about it, I can change it if everybody agrees.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 66

•

12 years ago

Attached patch Update according to c63 & c64 (obsolete) — Details — Splinter Review

Switched to serialize and post(based on scheme, like in http://hg.mozilla.org/build/talos/file/ee67e01c9d58/talos/output.py#l52)
Switched to strings for possible results

Attachment #645444 - Attachment is obsolete: true

Attachment #646690 - Flags: feedback?(jhammel)

Attachment #646690 - Flags: feedback?(jgriffin)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 67

•

12 years ago

I think if you agree with the patch's contents, we should focus a bit on the Context thing so we have some v0.1 usable thing, land that and then anybody will be welcome to add new features (like output to Datazilla). Sounds like a plan?

Jeff Hammel

Reporter

Comment 68

•

12 years ago

Comment on attachment 646690 [details] [diff] [review]
Update according to c63 & c64

+    @abstractmethod
+    def serialize(self, results_collection):
+        pass

I generally use a docstring for ABC methods.  This avoids putting 'pass' and tells what the function should do.

+    def serialize(self, results_collection):
+        return self.to_xunit(results_collection)

Why not just have the body of to_xunit be the serialize method?

To be honest I probably wouldn't do a post method yet.  It will be hard to generalize

Looking good!

Attachment #646690 - Flags: feedback?(jhammel) → feedback+

Jonathan Griffin (:jgriffin)

Comment 69

•

12 years ago

Comment on attachment 646690 [details] [diff] [review]
Update according to c63 & c64

Looks good to me too.

I like Jeff's earlier idea about having serialize take an optional parameter that is a file-like object, and if it exists, write to that, otherwise return a string.  But that's a minor detail.

I think having 'post' write to a file is kind of weird.  Probably, we should just raise NotImplementedError for 'post' for something like XUnitOutput; it would only be implemented by something, like Datazilla or Autolog, which actually sends the data somewhere.

Attachment #646690 - Flags: feedback?(jgriffin) → feedback+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 70

•

12 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

Attachment #646690 - Attachment is obsolete: true

Attachment #647280 - Flags: review?(jhammel)

Attachment #647280 - Flags: review?(jgriffin)

Gregory Szorc [:gps]

Comment 71

•

12 years ago

Comment on attachment 647280 [details] [diff] [review]
v0.1

Review of attachment 647280 [details] [diff] [review]:
-----------------------------------------------------------------

::: moztest/moztest/output.py
@@ +6,5 @@
> +from abc import abstractmethod
> +import xml.dom.minidom as dom
> +
> +
> +class Output(object):

How will you handle deserializing back to a results collection? Are you going to create an input.py module and fragment code related to the same serialization format? How about renaming this Serializer and keeping the door open to perform both input and output?

@@ +11,5 @@
> +    """ Abstract base class for outputting test results """
> +
> +    @abstractmethod
> +    def serialize(self, results_collection):
> +        """ Returns a string representation of the results """

I would have this write to a file object instead of returning a string. If someone really wants a string, they can pass a StringIO.

Jeff Hammel

Reporter

Comment 72

•

12 years ago

Comment on attachment 647280 [details] [diff] [review]
v0.1

+    def tests_with_result(self, result):
+        """ Returns a generator of TestResults with the PASS result """
+        return self.filter(lambda t: t.result_actual == result)

The function name is a little misleading, as is the docstring.  This is passing tests, which could be tests that fail and are expected to fail.

+            if type(text) is list:
+                text = '\n'.join(text)

Should probably be "if not isinstance(text, basestring):" unless there is a reason not to do this

+class TestContext(object):

This should probably at least take the info from mozinfo vs passing in arch.

This looks really good. I'm going to r- for mozinfo in TestContext, but other than that I am fine with landing.

This should also have a setup.py and, ideally a README (and even more ideally tests).

Attachment #647280 - Flags: review?(jhammel) → review-

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 73

•

12 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

Attachment #647280 - Attachment is obsolete: true

Attachment #647280 - Flags: review?(jgriffin)

Attachment #647347 - Flags: review?(jhammel)

Attachment #647347 - Flags: review?(jgriffin)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 74

•

12 years ago

Setup.py & README coming.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 75

•

12 years ago

Attached patch v0.1 setup.py and README.md (obsolete) — Details — Splinter Review

Attachment #647351 - Flags: review?(jhammel)

Attachment #647351 - Flags: review?(jgriffin)

Jeff Hammel

Reporter

Comment 76

•

12 years ago

Comment on attachment 647351 [details] [diff] [review]
v0.1 setup.py and README.md

Looks good to me.

Attachment #647351 - Flags: review?(jhammel) → review+

Jeff Hammel

Reporter

Comment 77

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur from comment #73)
> Created attachment 647347 [details] [diff] [review]
> v0.1

I had actually thought that that e.g. dumps was going to call serialize with a StringIO buffer and return its getvalue().  Maybe :gps or :jgriffin can comment there?

Also, only one of dumps or serialize needs to be an abstract method.  In your incarnation, serialize can go in the base class.  In my incarnation, dumps can go in the base class.

I don't really like the name dumps.  Sure, its what json uses, but it doesn't read well in my opinion.  I'm fine with dump_string(), string(), or even to_string(). But I won't block on this.

Jeff Hammel

Reporter

Comment 78

•

12 years ago

Comment on attachment 647351 [details] [diff] [review]
v0.1 setup.py and README.md

Actually, I hate to reverse what I said but one very important thing:  mozinfo should be added to the deps

Attachment #647351 - Flags: review+ → review-

Jonathan Griffin (:jgriffin)

Comment 79

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #77)
> (In reply to Mihnea Dobrescu-Balaur from comment #73)
> > Created attachment 647347 [details] [diff] [review]
> > v0.1
> 
> I had actually thought that that e.g. dumps was going to call serialize with
> a StringIO buffer and return its getvalue().  Maybe :gps or :jgriffin can
> comment there?
> 

Yes, that's what I'd thought as well.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 80

•

12 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

Attachment #647347 - Attachment is obsolete: true

Attachment #647351 - Attachment is obsolete: true

Attachment #647347 - Flags: review?(jhammel)

Attachment #647347 - Flags: review?(jgriffin)

Attachment #647351 - Flags: review?(jgriffin)

Attachment #647686 - Flags: review?(jhammel)

Attachment #647686 - Flags: review?(jgriffin)

Jeff Hammel

Reporter

Comment 81

•

12 years ago

Comment on attachment 647686 [details] [diff] [review]
v0.1

+Package for handling Mozilla test results.
+
+
+
+## Usage example

I could live with one space less

+This shows how you can create an xUnit representation of Marionette
+(or python unittest) results.

How about just "...xUnit representation of python unittest results"?

+        self.env = os.environ

Might be better to do os.environ.copy() since this could change in testing (albeit, neither is fool-proof :/)

+    def __init__(self, name, test_class='', time_start=None, context=None,
+                 result_expected='PASS'):

It'd be nice to have a docstring here tell what is expected here

+        assert isinstance(name, basestring)
+        assert result_expected in TestResult.POSSIBLE_RESULTS

It would also be nice to have assertion messages, e.g. 

assert result_expected in self.POSSIBLE_RESULTS, "Result '%s' not in possible results: %s" % (result_expected, ', '.join(self.POSSIBLE_RESULTS))

in any case, this should probably be 'self.POSSIBLE_RESULTS' for subclassing

+    deps += ['simplejson']

I'd probably do deps.append('simplejson') but it doesn't really matter

So this looks good and I think its about ready to check in.

The only real problem I have is that it is not 2.4 compatible.  I'm not 100% sure it is 2.5 compatible.  From a mozbase point of view, I'm not a huge fan of maintaining this backwards compatability.  However, from a "help! i'm a talos developer trapped in 2.4!" point of view, I would like to move talos towards using this architecture.  Anyway, I don't want to block on this, as long as we're aware of the issue.  In any case, https://wiki.mozilla.org/Auto-tools/Projects/MozBase#Overview should be updated to reflect reality *AND* we need to not have anything that Talos uses require 2.5 or greater :(

Attachment #647686 - Flags: review?(jhammel) → review+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 82

•

12 years ago

Attached patch v0.1 — Details — Splinter Review

Attachment #647686 - Attachment is obsolete: true

Attachment #647686 - Flags: review?(jgriffin)

Attachment #647738 - Flags: review?(jhammel)

Attachment #647738 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 83

•

12 years ago

Comment on attachment 647738 [details] [diff] [review]
v0.1

Cool!  lgtm if jhammel is happy with it.

Attachment #647738 - Flags: review?(jgriffin) → review+

Jeff Hammel

Reporter

Comment 84

•

12 years ago

Comment on attachment 647738 [details] [diff] [review]
v0.1

Looks great!

Attachment #647738 - Flags: review?(jhammel) → review+

Jeff Hammel

Reporter

Comment 85

•

12 years ago

pushed: https://github.com/mozilla/mozbase/commit/08d4e0d2ff68e2c1855261ba053ced61a59af894

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

example 12 years ago Jeff Hammel 738 bytes, text/plain		Details
just a start for the results module 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 4.56 KB, patch	k0scist : feedback+ ahal : feedback+ ahal : feedback+	Details \| Diff \| Splinter Review
incremental update; modified some stuff according to feedback 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 3.98 KB, patch		Details \| Diff \| Splinter Review
Support for importing unittest (example: Marionette) results; xUnit output 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 10.26 KB, patch		Details \| Diff \| Splinter Review
separated output; update according to feedback 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 11.28 KB, patch	k0scist : feedback+	Details \| Diff \| Splinter Review
Update according to c63 & c64 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 11.21 KB, patch	jgriffin : feedback+ k0scist : feedback+	Details \| Diff \| Splinter Review
v0.1 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 10.93 KB, patch	k0scist : review-	Details \| Diff \| Splinter Review
v0.1 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 11.58 KB, patch		Details \| Diff \| Splinter Review
v0.1 setup.py and README.md 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 1.89 KB, patch	k0scist : review-	Details \| Diff \| Splinter Review
v0.1 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 13.45 KB, patch	k0scist : review+	Details \| Diff \| Splinter Review
v0.1 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 14.04 KB, patch	k0scist : review+ jgriffin : review+	Details \| Diff \| Splinter Review