Closed Bug 896087 Opened 11 years ago Closed 11 years ago

XPCShell should use structured log messages rather than parse output with regular expressions

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla26

People

(Reporter: chmanchester, Assigned: chmanchester)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 13 obsolete files)

30.38 KB, patch
chmanchester
: review+
Details | Diff | Splinter Review
8.98 KB, patch
chmanchester
: review+
Details | Diff | Splinter Review
The output from an xpcshell test is currently parsed for evidence of a test failure with a series of regular expressions. This is a good opportunity to make use of structured log messages for test output.
This is a version of the patch from bug 890555 that applies on top of the patch from bug 887054 - other patches in this bug will be based on this.
These are tests of the output of tests that running in child processes, those
that contain syntax errors, and the way the existing parsing in
runxpcshelltests.py interprets their output.
Assignee: nobody → cmanchester
OS: Mac OS X → All
Hardware: x86 → All
Marking the parxpc bug as a dependency for now.
Depends on: parxpc
This applies on top of patch from bug 887054 as well as the two other patches
in this bug.

There appears to be a performance penalty with this implementation - a
consistent 8-9% of the original runtime on my machine.
Attachment #779576 - Flags: feedback?(ted)
Attachment #779576 - Flags: feedback?(gps)
Interesting that this slows down things by 8-10%. What's the difference if you measure with |time|? I'm curious if the extra 8-10% is coming from a single core or multiple cores. If single, I'd look towards Python as the bottleneck. (Another easy way to test is to look at runxpcshelltests.py's CPU usage during a run - if it's hitting 100% with this patch, that's not good.)
The most recent m-c produces hangs with the parallel harness patches I'm using - I'll work with mihneadb to see if I can get a combination of patches with mostly passing tests to investigate this further.

The runs noted above had a significant number of failures due to pending parallelization fixes, and therefore dumped a lot of output (and perhaps had other flakiness) they wouldn't have on a typical test run - I'd like to see how much this contributed to the slowdown.
Attachment #779576 - Flags: feedback?(ted)
Attachment #779576 - Flags: feedback?(gps)
I was able to run subset of tests in toolkit/mozapps/extensions, and found
that most of the slowdown was due to the repeated import of log4moz.
This version doesn't use log4moz, but rather implements a subset of structured
output directly in head.js.
Attachment #779576 - Attachment is obsolete: true
Attachment #780767 - Attachment description: Output structured messages in head.js, interpret them in runxpcshelltests.py; → Part 3 - Output structured messages in head.js, interpret them in runxpcshelltests.py;
Attachment #780767 - Flags: feedback?(ted)
Attachment #780767 - Flags: feedback?(gps)
Comment on attachment 780767 [details] [diff] [review]
Part 3 - Output structured messages in head.js, interpret them in runxpcshelltests.py;

Review of attachment 780767 [details] [diff] [review]:
-----------------------------------------------------------------

You do a lot of formatting in head.js still. e.g. all the TEST-* messages. I think it's tempting to move some of this into runxpcshelltests.py. That would keep the messages more structured further down the pipeline, no? A downside to this is interactive mode may need special consideration then. Hmmm...

::: testing/xpcshell/head.js
@@ +411,5 @@
> +         {_message: "TEST-PASS | (xpcshell/head.js) | " + truePassedChecks + " (+ " +
> +                    _falsePassedChecks + ") check(s) passed\n"});
> +    _log("test_info",
> +         {_message: "TEST-INFO | (xpcshell/head.js) | " + _todoChecks +
> +                    " check(s) todo\n"});

It would be nice if these messages included the test function, line number, etc.

@@ +435,5 @@
> +           {_message: "TEST-UNEXPECTED-FAIL | (xpcshell/head.js) | Source file " + element + " contains SyntaxError",
> +            reason: e,
> +            source_file: element,
> +            stack: _format_exception_stack(e.stack)});
> +    }

I must ask what inspired this.
Attachment #780767 - Flags: feedback?(gps) → feedback+
(In reply to Gregory Szorc [:gps] from comment #8)
> Comment on attachment 780767 [details] [diff] [review]
> Part 3 - Output structured messages in head.js, interpret them in
> runxpcshelltests.py;
> 
> Review of attachment 780767 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> You do a lot of formatting in head.js still. e.g. all the TEST-* messages. I
> think it's tempting to move some of this into runxpcshelltests.py. That
> would keep the messages more structured further down the pipeline, no? A
> downside to this is interactive mode may need special consideration then.
> Hmmm...
> 

I think this is the way to go, as it will result in less duplicate information overall. We can have the fallback _log for interactive mode do a little formatting if need be. For the most part, the message will just be '{action string} | {file name} | {diagnostic}'.

> ::: testing/xpcshell/head.js
> @@ +411,5 @@
> > +         {_message: "TEST-PASS | (xpcshell/head.js) | " + truePassedChecks + " (+ " +
> > +                    _falsePassedChecks + ") check(s) passed\n"});
> > +    _log("test_info",
> > +         {_message: "TEST-INFO | (xpcshell/head.js) | " + _todoChecks +
> > +                    " check(s) todo\n"});
> 
> It would be nice if these messages included the test function, line number,
> etc.
> 
> @@ +435,5 @@
> > +           {_message: "TEST-UNEXPECTED-FAIL | (xpcshell/head.js) | Source file " + element + " contains SyntaxError",
> > +            reason: e,
> > +            source_file: element,
> > +            stack: _format_exception_stack(e.stack)});
> > +    }
> 
> I must ask what inspired this.

I noticed that the harness fails with no output when it tries to load a file containing a syntax error. It can tell when this happens, why not let the user know? Unfortunately I can't see how to expose a super useful stack trace (from the file it tries to load) when this occurs.
Attachment #780767 - Flags: feedback?(ted)
Now with less formatting in head.js, and un-bitrotted with respect to the most
recent patch for bug 887954.
Attachment #780767 - Attachment is obsolete: true
Attachment #783444 - Flags: feedback?(ted)
Attachment #783444 - Flags: feedback?(gps)
Comment on attachment 783444 [details] [diff] [review]
Output structured messages in head.js, interpret them in runxpcshelltests.py;

Review of attachment 783444 [details] [diff] [review]:
-----------------------------------------------------------------

I like this version because it is more structured.

If you have infinite time, I'd love to see followup patches that expose data from the xpcshell process as it is available (i.e. don't buffer full process output). Add an API to allow native Python invokers to receive a callback when a new structured message available. If you still have time after that and want to have fun to close out your internship, you could look into alternate UIs for |mach xpcshell-test|. e.g. start an HTTP server, open Firefox, and stream test results into a nifty HTML dashboard in real time. That would be epic. Just don't get too caught up in this and forget about other objectives :)

::: testing/xpcshell/head.js
@@ +24,5 @@
> +  "test_known_fail": "TEST-KNOWN-FAIL",
> +  "test_unexpected_pass": "TEST-UNEXPECTED-PASS",
> +  "javascript_error": "TEST-UNEXPECTED-FAIL",
> +  "test_pass": "TEST-PASS",
> +  "test_info": "TEST-INFO"

Nit: This isn't Python, you don't need to quote the keys.

@@ +213,5 @@
> +function _format_exception_stack(stack) {
> +  return stack.split("\n").reduce(function(stack_msg, frame) {
> +    if (frame) {
> +      // frame is of the form "fname(args)@file:line"
> +      let frame_regexp = new RegExp("(.*)\\(.*\\)@(.*):(\\d*)", "g");

I can't recall if SpiderMonkey is intelligent enough to cache compiled regular expressions. You should consider moving this regexp object into a longer-lived scope. This shouldn't be tight loop code, so maybe not worth your time.
Attachment #783444 - Flags: feedback?(gps) → feedback+
Comment on attachment 783444 [details] [diff] [review]
Output structured messages in head.js, interpret them in runxpcshelltests.py;

Review of attachment 783444 [details] [diff] [review]:
-----------------------------------------------------------------

Overall I like the approach, I have some suggestions and some nitpicky stuff.

::: testing/xpcshell/head.js
@@ +24,5 @@
> +  "test_known_fail": "TEST-KNOWN-FAIL",
> +  "test_unexpected_pass": "TEST-UNEXPECTED-PASS",
> +  "javascript_error": "TEST-UNEXPECTED-FAIL",
> +  "test_pass": "TEST-PASS",
> +  "test_info": "TEST-INFO"

Is there some real benefit here to making up the new lowercase_names instead of just using the existing strings as actions?

@@ +190,5 @@
>    if (_quit)
>      return;
>  
> +  _log("test_info",
> +       {_message: "TEST-INFO | (xpcshell/head.js) | running event loop\n"});

It seems unfortunate to have to duplicate the test-info bit in both the action and the message. Can't you infer the latter from the former?

::: testing/xpcshell/runxpcshelltests.py
@@ +4,5 @@
>  # License, v. 2.0. If a copy of the MPL was not distributed with this
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
>  import copy
> +import sys, os, os.path, shutil, math, traceback

Maybe split these up into separate import lines while you're here?

@@ +34,5 @@
>  NUM_THREADS = max(4, int(cpu_count() * 4))
>  
> +FAILURE_ACTIONS = ['test_unexpected_fail',
> +                   'test_unexpected_pass',
> +                   'javascript_error']

Should be a set if you're going to do lots of "x in FAILURE_ACTIONS".

@@ +42,5 @@
> +    "test_unexpected_pass": "TEST-UNEXPECTED-PASS",
> +    "javascript_error": "TEST-UNEXPECTED-FAIL",
> +    "test_pass": "TEST-PASS",
> +    "test_info": "TEST-INFO",
> +}

...if you didn't make up new names for these you wouldn't have to have this mapping table in two places.

@@ +315,5 @@
>              '-n',
>              '-s',
>              '-e', 'const _HTTPD_JS_PATH = "%s";' % self.httpdJSPath,
> +            '-e', 'const _HEAD_JS_PATH = "%s";' % self.headJSPath,
> +            '-e', 'const _INTERACTIVE = %s;' % ('true' if self.interactive else 'false')]

str(self.interactive).lower() ?

@@ +376,5 @@
> +            if line:
> +                self.output_lines.append(line)
> +            return
> +
> +        msg = ''

This seems unnecessary.

@@ +377,5 @@
> +                self.output_lines.append(line)
> +            return
> +
> +        msg = ''
> +        msg += '%s: ' % line['process'] if ('process' in line) else ''

You don't need the parens here (and below).

@@ +387,5 @@
> +                                    line['diagnostic'])
> +        else:
> +            msg += '%s' % line
> +
> +        msg += '\n%s' % line['stack'] if ('stack' in line) else ''

Generally for building up strings it's more Pythonic to build a list and then "".join() or " ".join() the list into a string.

@@ +392,5 @@
> +        self.output_lines.append(msg)
> +
> +    def has_failure_output(self, output):
> +        """Inspects process output for presence of an error. Saves output
> +        lines as they are parsed."""

I find it a little non-obvious that has_failure_output sets self.output_lines, FWIW. Not sure if just renaming the function or putting that in the docstring or something would help.

@@ +402,5 @@
> +            try:
> +                # The structured logger outputs JSON objects. For those tests
> +                # that produce a lot of additional output, we don't want to
> +                # waste time letting loads fail.
> +                if not line_string or not line_string.startswith('{'):

The "not line_string" is redundant with the second test.

@@ +426,5 @@
> +                    seen_proc_start = True
> +                elif action == 'child_test_end':
> +                    seen_proc_end = True
> +
> +            except ValueError:

Why is this try block wrapping so much code? I'd prefer if it was just wrapping the json.loads line, since that's the only thing we don't have full control over.

@@ +431,5 @@
> +                self.append_message_from_line(line_string)
> +                if 'TEST-UNEXPECTED-' in line_string:
> +                    has_failure = True
> +
> +        if seen_proc_start and (not seen_proc_end):

You don't need the parens here.
Attachment #783444 - Flags: feedback?(ted) → feedback+
Thanks for taking a look at this. I'll upload a new patch soon (with gps'
feedback as well).

I'm unsure of what makes the most sense in this case for the string/action
mapping question.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> Comment on attachment 783444 [details] [diff] [review]
> Output structured messages in head.js, interpret them in runxpcshelltests.py;
> 
> Review of attachment 783444 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall I like the approach, I have some suggestions and some nitpicky stuff.
> 
> ::: testing/xpcshell/head.js
> @@ +24,5 @@
> > +  "test_known_fail": "TEST-KNOWN-FAIL",
> > +  "test_unexpected_pass": "TEST-UNEXPECTED-PASS",
> > +  "javascript_error": "TEST-UNEXPECTED-FAIL",
> > +  "test_pass": "TEST-PASS",
> > +  "test_info": "TEST-INFO"
> 
> Is there some real benefit here to making up the new lowercase_names instead
> of just using the existing strings as actions?

The intention here is to allow for a richer set of test actions for structured messages 
that aren't tied to a particular message string. These are only used in head.js for
backwards compatibility in the case tests are run interactively, so a user doesn't have
to try to comb through json messages while debugging.

That being said, the set of actions here probably doesn't have significant benefit at this stage,
so I would be ok with leaving out the 'javascript_error' actions and converting with
.replace('_', '-').upper() and the js equivalent.

> @@ +190,5 @@
> >    if (_quit)
> >      return;
> >  
> > +  _log("test_info",
> > +       {_message: "TEST-INFO | (xpcshell/head.js) | running event loop\n"});
> 
> It seems unfortunate to have to duplicate the test-info bit in both the
> action and the message. Can't you infer the latter from the former?

Right. I left this so it would end up outputting the original string. This can be
build from 'action' | 'path' | 'diagnostic' like the others, but would lose the
parens around the path for this particular string (although I'm guessing that
would be ok).

> @@ +392,5 @@
> > +        self.output_lines.append(msg)
> > +
> > +    def has_failure_output(self, output):
> > +        """Inspects process output for presence of an error. Saves output
> > +        lines as they are parsed."""
> 
> I find it a little non-obvious that has_failure_output sets
> self.output_lines, FWIW. Not sure if just renaming the function or putting
> that in the docstring or something would help.

I will change this to a parse_output method that sets a has_failure field.
Attachment #783444 - Attachment is obsolete: true
I noticed some problems in how output is handled in the case tests output their
own unstructured data.
Attachment #785236 - Attachment is obsolete: true
Attachment #785299 - Attachment description: Output structured messages in head.js, interpret them in runxpcshelltests.py; → Part 3 - Output structured messages in head.js, interpret them in runxpcshelltests.py;
Attachment #785299 - Flags: review?
Attachment #785299 - Flags: feedback?(gps)
This might be a good time to note that I've been working on top of the parallel xpcshell patches under the assumption that that bug would land first - these changes are only dependencies in as much as they both make significant changes to runxpcshelltests.py.
Comment on attachment 785299 [details] [diff] [review]
Part 3 - Output structured messages in head.js, interpret them in runxpcshelltests.py;

Review of attachment 785299 [details] [diff] [review]:
-----------------------------------------------------------------

I like it! Per IRL discussion, I think we should just kill human-readable formatting from head.js. It's only used in interactive mode. I suspect this mode isn't used too often and that human parsing the JSON output won't be a significant issue for developers using interactive mode. If people complain, we can always restore it.

Ideally, Python would still be in the loop during interactive mode. However, since Python is using line buffering, this would change behavior. Characters entered from stdin wouldn't be displayed until a newline is encountered, for example. That's not desirable. Unfortunately, switching away from line buffering in Python is pretty difficult since mozprocess only supports line buffering AFAIK.

::: testing/xpcshell/runxpcshelltests.py
@@ +409,5 @@
> +        self.output_lines = []
> +        for line_string in output.splitlines():
> +            # The structured logger outputs JSON objects. For those tests
> +            # that produce a lot of additional output, we don't want to
> +            # waste time letting loads fail.

Did you actually measure this? If so, please explicitly state this, otherwise it sounds like an ungrounded assertion or requires a lot of bug spelunking to rediscover.
Attachment #785299 - Flags: feedback?(gps) → feedback+
Attachment #778706 - Attachment is obsolete: true
> ::: testing/xpcshell/runxpcshelltests.py
> @@ +409,5 @@
> > +        self.output_lines = []
> > +        for line_string in output.splitlines():
> > +            # The structured logger outputs JSON objects. For those tests
> > +            # that produce a lot of additional output, we don't want to
> > +            # waste time letting loads fail.
> 
> Did you actually measure this? If so, please explicitly state this,
> otherwise it sounds like an ungrounded assertion or requires a lot of bug
> spelunking to rediscover.

I really ought to stop making assumptions about performance. I just did some quick measurements, and this doesn't have a consistent impact - I won't bother with it in the next iteration of the patch.
This removes the "format for interactive mode" bits to get rid of all the
duplicate formatting in javascript and python. To be clear, interactive
mode still works, but results in JSON messages being dumped to the console
instead of the usual strings.
Attachment #789273 - Flags: review?(ted)
Attachment #785299 - Attachment is obsolete: true
Comment on attachment 778714 [details] [diff] [review]
Part 1 - self tests exercising behavior of current log parsing;

Review of attachment 778714 [details] [diff] [review]:
-----------------------------------------------------------------

<3 tests.
Attachment #778714 - Flags: review?(ted) → review+
Comment on attachment 789273 [details] [diff] [review]
Output structured messages in head.js, interpret them in runxpcshelltests.py;

Review of attachment 789273 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think I have anything else to add at this point, so r=me.
Attachment #789273 - Flags: review?(ted) → review+
Attachment #778714 - Attachment description: Part 2 - self tests exercising behavior of current log parsing; → Part 1 - self tests exercising behavior of current log parsing;
Unbitrot changes to runxpcshelltests.py. This applies on top of part one in
this bug.
Attachment #789273 - Attachment is obsolete: true
Attachment #790829 - Attachment is obsolete: true
Attachment #790830 - Attachment description: Output structured messages in head.js, interpret them in runxpcshelltests.py; → Part 2 - Output structured messages in head.js, interpret them in runxpcshelltests.py;
Attachment #790830 - Flags: review+
Attachment #778714 - Attachment is obsolete: true
Attachment #790830 - Attachment is obsolete: true
Attachment #790870 - Attachment description: Output structured messages in head.js, interpret them in runxpcshelltests.py; → Part 2 - Output structured messages in head.js, interpret them in runxpcshelltests.py;
Attachment #790870 - Flags: review+
Here's a try run:

https://tbpl.mozilla.org/?tree=Try&rev=68ca5cafba4b

The selftests fail on B2G desktop - we don't run xpcshell tests on this platform, so maybe we shouldn't be running the selftests?

I built b2g desktop and ran the selftests but wasn't able to reproduce. The tests that fail are all invoking the child harness, and as far as I can tell, no B2G xpcshell tests do ipc tests.

I will find out what we're running on B2G and what the known limitations are.
This works around by skipping these particular tests on B2G. Effectively
skipped the tests on a local desktop B2G build, here's a try run:

https://tbpl.mozilla.org/?tree=Try&rev=e6c10c9ae8a0
Attachment #791529 - Flags: review?(ted)
Attachment #790868 - Attachment is obsolete: true
Attachment #791529 - Attachment is obsolete: true
Attachment #791529 - Flags: review?(ted)
After what I saw when first working on this patch, I did a full try run last week to see what sort of performance impact this would have on automation. I did a comparison between these and contemporary m-c pushes. My findings were that for some platforms, maybe the very fastest platforms (Ubuntu opt and OS X 10.8 opt), the test job took 2-5 minutes more with this patch applied. Other platforms were almost exactly on par with and without this patch applied.

Assuming this is a representative run (I compared a single try run against I think this is a small price to pay, particularly once we turn on parallelism for these runs, but I wanted to be transparent about my findings.
I did additional runs for each platform. Just so It's documented somewhere, this is what I found. For each platform this is the average of 5 runs (in seconds). A positive difference is slow down with these patches applied, a negative a speedup. There was a lot of noise in these numbers, and these results contradict some of my initial findings, so I'm not sure we can rely on these as our expected performance impact when this lands.

Platform	        With	Without Difference
Ubuntu opt	        2378.6	2180.8	197.8
Ubuntu debug	        4065.6	3859.6	206
Ubuntu 64 opt	        2075.2	1904.6	170.6
Ubuntu 64 debug	        3419.6	3324.2	95.4
Win XP opt	        1634.4	1616.8	17.6
Win XP debug	        2316.2	2310.8	5.4
Win7 opt	        2668.6	2585.4	83.2
Win7 debug	        3362.8	3487.6	-124.8
Win 8 debug	        3991.4	3951.8	39.6
Win 8 opt	        3254.6	3277	-22.4
OS X 10.6 opt	        1188	1174.8	13.2
OS X 10.7 opt	        1165.8	1181.6	-15.8
OS X 10.8 opt	        1063.4	1142.6	-79.2
OS X 10.8 debug	        1651.8	1688.6	-36.8
OS X 10.6 debug	        2054.6	2025.4	29.2
Android 2.2 Armv6 opt	1000	890.8	109.2
Android 2.2 opt	        970	949.2	20.8
Unlike on try, I get a consistent performance impact running locally. There is a consistent 10-20 second performance penalty in user time (using the time command) with this patch applied out of about 7.5 minutes of user time for the run.

After testing with a few versions of this, I find that if I dump a dummy string in head.js rather than using JSON.stringify, and I always just append the message string instead of using json.loads in runxpcshell.py, the times are once again equivalent to what they are without this patch. This was consistent for 5 runs with each version (no structured logging, structured logging as is, and this patch except dummy strings instead of JSON). 

Further optimizations may be possible, but from this I think any cost here is the relatively small cost required for JSON, and not another incident of the implementation.
I'd be fine with a 5% user time regression (20s) to land this, especially since parallel xpcshell is landing and we're tracking down long tests.

I say land as-is and file a follow-up for the optimization you described.
Comment on attachment 791557 [details] [diff] [review]
Part 1 - self tests exercising behavior of current log parsing;

Review of attachment 791557 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!
Attachment #791557 - Flags: review?(ted) → review+
\o/
I isolated the issue: changing from getting the objdir from an environment variable to getting it from a MozbuildObject makes the PATH in the env that gets passed into the xpcshell process unicode, not str.
This fixes it. Part 2 still applies cleanly on this.
Attachment #791557 - Attachment is obsolete: true
Comment on attachment 794293 [details] [diff] [review]
Part 1 - self tests exercising behavior of current log parsing;

This is an issue with python 2.7.2 according to bug 810435 . irc r+ from gps
Attachment #794293 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ec97a5129a9d
https://hg.mozilla.org/mozilla-central/rev/89e249527629
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: