Closed Bug 1024055 Opened 10 years ago Closed 8 years ago

Make mozharness use structured logging for marionette tests

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(firefox52 fixed)

RESOLVED FIXED
Tracking Status
firefox52 --- fixed

People

(Reporter: chmanchester, Assigned: pyang)

References

Details

Attachments

(4 files, 5 obsolete files)

8.93 KB, patch
ahal
: review+
mozilla
: review+
chmanchester
: checked-in+
Details | Diff | Splinter Review
14.64 KB, patch
ahal
: feedback+
jgraham
: feedback+
Details | Diff | Splinter Review
4.67 KB, patch
jgraham
: feedback+
ahal
: feedback+
Details | Diff | Splinter Review
5.89 KB, patch
ahal
: review+
Details | Diff | Splinter Review
mozharness should use structured log data provided that will be provided by bug 956739 with the parser landed in bug 1021817.
This puts alternate parsing behind a config option for marionette tests. Still need to figure out the best strategy for testing this.

James, are these changes to the structured parser compatible with the web platform tests?
Attachment #8438705 - Flags: feedback?(james)
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
Comment on attachment 8438705 [details] [diff] [review]
Alter marionette's mozharness script to use structured log data to determine job status

Review of attachment 8438705 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/structuredlog.py
@@ +99,5 @@
>  
>          try:
>              data = json.loads(line)
>          except ValueError:
> +            self.warning("Failed to parse line '%s' as json" % line)

What was the reason for this change?

::: scripts/marionette.py
@@ +50,5 @@
> +    A class that extends StructuredOutputParserHelper such that it can parse
> +    if gecko did not install properly
> +    """
> +
> +    bad_gecko_install = 'Error installing gecko!'

So I don't understand why this needs to be special cased here? Can't we just change marionette to do something more sane?
Attachment #8438705 - Flags: feedback?(james) → feedback+
Comment on attachment 8438705 [details] [diff] [review]
Alter marionette's mozharness script to use structured log data to determine job status

Review of attachment 8438705 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/structuredlog.py
@@ +99,5 @@
>  
>          try:
>              data = json.loads(line)
>          except ValueError:
> +            self.warning("Failed to parse line '%s' as json" % line)

Seemed like a reflection of the other changes to the behavior in this patch, but I don't think it matters.

::: scripts/marionette.py
@@ +50,5 @@
> +    A class that extends StructuredOutputParserHelper such that it can parse
> +    if gecko did not install properly
> +    """
> +
> +    bad_gecko_install = 'Error installing gecko!'

From looking at the code and surrounding comments, it seems this is a specific case the test harness needs to cause a special behavior in mozharness.
> ::: scripts/marionette.py
> @@ +50,5 @@
> > +    A class that extends StructuredOutputParserHelper such that it can parse
> > +    if gecko did not install properly
> > +    """
> > +
> > +    bad_gecko_install = 'Error installing gecko!'
> 
> From looking at the code and surrounding comments, it seems this is a
> specific case the test harness needs to cause a special behavior in
> mozharness.

This still seems like a bad approach. Having marionette produce random bits of unstructured data requires every consumer to deal with it in some way.

It seems like the problem is really that marionette pushes the logic for retrying when something fails into mozharness, which generally assumes no retries. But I wonder if the right fix is to log at some level e.g. "error" but to reset the status and tbpl status every time the marionette process is restarted so although the logs would show errors on tbpl, the overall job status would be green (assuming it worked after N retries).
The gecko install hack isn't used anymore.. feel free to nuke it! (my marionette refactor patch which will land shortly actually removes the ability to even do this in the first place)
For context, that hack was used back when we didn't have releng-provided emulator builds. Basically we had a snapshot of the emulator up on a server, and for each check-in we would download it and then install the newer gecko into it. To quote chmanchester, this worked about as well as you would expect ;).
Attachment #8438705 - Attachment is obsolete: true
We're going to need to do something like this for each harness as they adopt structured logging, so I think we should put this config option in testing_config_options. Then we can land changes to corresponding mozharness scripts as in-tree harness parts are ready. To test, sounds like we can pass the commandline option to turn this on via in-tree configs for those suites that have them, and via buildbot configs on ash for those that don't. Does this sound reasonable?
Sounds pretty reasonable to me.
This puts the config option in testing_config_options.
Attachment #8442106 - Flags: review?(aki)
Attachment #8442106 - Flags: review?(ahalberstadt)
Attachment #8440033 - Attachment is obsolete: true
Comment on attachment 8442106 [details] [diff] [review]
Alter marionette's mozharness script to use structured log data to determine job status

Review of attachment 8442106 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but giving r- because this will cause bustage as is.

::: scripts/marionette.py
@@ -41,5 @@
> -
> -    def parse_single_line(self, line):
> -        if self.bad_gecko_install.search(line):
> -            self.install_gecko_failed = True
> -        super(MarionetteOutputParser, self).parse_single_line(line)

Thanks for getting rid of this!

@@ +157,5 @@
>          self.binary_path = c.get('binary_path')
>          self.test_url = c.get('test_url')
> +        self.parser_class = (StructuredOutputParser
> +                             if   c.get('structured_output')
> +                             else TestSummaryOutputParserHelper)

nit: if a single line if statement is too big for a single line, then use a normal if statement ;)

@@ +426,5 @@
> +        if self.config.get("structured_output"):
> +            # Make sure that the logging directory exists
> +            if self.mkdir_p(dirs["abs_blob_upload_dir"]) == -1:
> +                self.fatal("Could not create blobber upload directory")
> +            cmd.append("--log-raw=-")

Shouldn't the formatted log be output to stdout?

@@ +454,5 @@
>              code = self.run_command(cmd, env=env,
>                                      output_timeout=1000,
>                                      output_parser=marionette_parser)
>              if not marionette_parser.install_gecko_failed:
>                  break

Since you removed the MarionetteOutputParser above, please also remove this for loop. This will fail because marionette_parser no longer has an install_gecko_failed attribute.
Attachment #8442106 - Flags: review?(ahalberstadt) → review-
Addressed ahal's comments. The in-tree harness writing the raw log to standout is correct as long as we use the structured parser in mozharness.
Attachment #8442158 - Flags: review?(aki)
Attachment #8442158 - Flags: review?(ahalberstadt)
Attachment #8442106 - Attachment is obsolete: true
Comment on attachment 8442158 [details] [diff] [review]
Alter marionette's mozharness script to use structured log data to determine job status

Review of attachment 8442158 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm, thanks! Aki, I think it's mostly the --structured-output argument in testbase.py that you need to review.
Attachment #8442158 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8442158 [details] [diff] [review]
Alter marionette's mozharness script to use structured log data to determine job status

(In reply to Andrew Halberstadt [:ahal] from comment #13)
> Lgtm, thanks! Aki, I think it's mostly the --structured-output argument in
> testbase.py that you need to review.

--structured-output argument lgtm.
We may want to have a --no-structured-output argument if the config file says structured_output = True and we want to override that, but I'm also fine with leaving that until we actually have a need for it.
Attachment #8442158 - Flags: review?(aki) → review+
Before we can use this for parsing, we need to ensure that all the errors specified here: http://hg.mozilla.org/build/mozharness/annotate/404dfba5c39d/scripts/marionette.py#l141 are logged at the error level by the harness. These all appear to correspond to exceptions thrown in python or javascript, so this should be doable.
Merged to production, and deployed.
Component: Tools → Mozharness
This warrants further testing, but it passes my local sanity check that a failing test results in # TBPL WARNING # for marionette tests.

The goal is to make DesktopUnittestOutputParser obsolete for marionette based tests as well as anything outputting structured logs, meaning the regex in mozharness/testing/errors.py will never be used to determine job status.

This passes in an argument to the parser to determine whether output from the harness other than line delimeted JSON is always a failure. jgraham pointed out we can wrap stdout for the harness process, but I think the result is about the same.
Attachment #8478519 - Flags: feedback?(james)
Attachment #8478519 - Flags: feedback?(ahalberstadt)
Comment on attachment 8478519 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8478519 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/structuredlog.py
@@ +22,5 @@
> +        # will be used to re-interpret formatted logs.
> +        if 'strict' in kwargs:
> +            self.strict = kwargs.pop('strict')
> +        else:
> +            self.strict = False

This should default to true, that's a mistake.
Comment on attachment 8478519 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8478519 [details] [diff] [review]:
-----------------------------------------------------------------

I think this approach is probably the best we can do at the moment. It does make things more confusing, but it accomplishes what we want: getting mozharness to use structured data for determining status. To me that is a win.

::: mozharness/mozilla/structuredlog.py
@@ +38,5 @@
>          self.worst_log_level = INFO
>          self.tbpl_status = TBPL_SUCCESS
>  
> +
> +    def _log(self, line, level, test_unexpected=False):

This function isn't doing any actual logging, I'd call it a handler or something similar.

::: scripts/marionette.py
@@ +480,5 @@
>          if self.mkdir_p(dirs["abs_blob_upload_dir"]) == -1:
>              # Make sure that the logging directory exists
>              self.fatal("Could not create blobber upload directory")
>  
> +        if self.tree_config.get("structured_output"):

Note: just be aware that this will propagate down the release train, so it can't depend on anything outside of the tree (i.e buildbot-configs or regular mozharness configs).
Attachment #8478519 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 8478519 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8478519 [details] [diff] [review]:
-----------------------------------------------------------------

I approve of the general idea of using the structured logs, but I think we need to be careful not to fall back to regexp in places where it isn't actually necessary.

::: mozharness/mozilla/structuredlog.py
@@ +118,5 @@
> +
> +        if "status" in data and data["status"] == "SKIP":
> +            self.skipped += 1
> +
> +        self._log(self.formatter(data), level, test_unexpected)

I am still not terribly happy about the fact that this causes lines to be interpreted both as structured data and using regexp (with strict=false). I'm not sure structured logging is buying us anything as long as we still think that's necessary.

@@ +128,5 @@
>              self.update_levels(TBPL_WARNING, WARNING)
> +        if self.count == 0:
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if not self.has_suite_end:
> +            self.log("No suite end message was logged by this test harness", ERROR)

Need to update .log to set the tbpl status if the level is ERROR or higher (or maybe WARNING or higher).

@@ +130,5 @@
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if not self.has_suite_end:
> +            self.log("No suite end message was logged by this test harness", ERROR)
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if self.num_errors != 0:

This shouldn't be needed if we directly set the status when the action=log message is handled.

@@ +141,5 @@
> +        # currently in the format <pass count>/<fail count>/<todo count
> +        # but <expected count>/<unexpected count>/<skip count> is more reflective of
> +        # the meaning of structured logs, and will yield the relevant information.
> +        if self.unexpected_count > 0:
> +            fail_text = '<em class="testfail">%s</em>' % self.unexpected_count

Is anything actually formatting this as HTML?

::: scripts/marionette.py
@@ +346,5 @@
> +
> +        if self.tree_config.get("structured_output"):
> +            # Also when isinstance(parser, StructuredOutputParser)
> +            tbpl_status, log_level = parser.evaluate_parser(return_code)
> +            if return_code == 10 and tbpl_status == TBPL_WARNING:

10 seems like a magic number. Where does it come from?

@@ +348,5 @@
> +            # Also when isinstance(parser, StructuredOutputParser)
> +            tbpl_status, log_level = parser.evaluate_parser(return_code)
> +            if return_code == 10 and tbpl_status == TBPL_WARNING:
> +                text_status = "test failures"
> +            elif return_code != 0:

Can't this go in the StructuredOutputParser? If we need to we can add an option to turn it off, but for now it seems like a good idea.
Attachment #8478519 - Flags: feedback?(james) → feedback+
Comment on attachment 8478519 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8478519 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/structuredlog.py
@@ +118,5 @@
> +
> +        if "status" in data and data["status"] == "SKIP":
> +            self.skipped += 1
> +
> +        self._log(self.formatter(data), level, test_unexpected)

We can probably do better with marionette, but there are some producers we truly don't control.

It's possible mozharness isn't the place to focus our efforts right now. Maybe that place is treeherder. I do see this change as relatively high risk given complexities in mozharness.

@@ +128,5 @@
>              self.update_levels(TBPL_WARNING, WARNING)
> +        if self.count == 0:
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if not self.has_suite_end:
> +            self.log("No suite end message was logged by this test harness", ERROR)

I think this could cause false positives if "log" is called from a superclass.

@@ +130,5 @@
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if not self.has_suite_end:
> +            self.log("No suite end message was logged by this test harness", ERROR)
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if self.num_errors != 0:

This is how the errors logged by a super class would get propagated to the tbpl status. It's not great.

@@ +141,5 @@
> +        # currently in the format <pass count>/<fail count>/<todo count
> +        # but <expected count>/<unexpected count>/<skip count> is more reflective of
> +        # the meaning of structured logs, and will yield the relevant information.
> +        if self.unexpected_count > 0:
> +            fail_text = '<em class="testfail">%s</em>' % self.unexpected_count

It appears that tbpl and treeherder do so.

::: scripts/marionette.py
@@ +346,5 @@
> +
> +        if self.tree_config.get("structured_output"):
> +            # Also when isinstance(parser, StructuredOutputParser)
> +            tbpl_status, log_level = parser.evaluate_parser(return_code)
> +            if return_code == 10 and tbpl_status == TBPL_WARNING:

Seems unique to marionette among the harnesses.

@@ +348,5 @@
> +            # Also when isinstance(parser, StructuredOutputParser)
> +            tbpl_status, log_level = parser.evaluate_parser(return_code)
> +            if return_code == 10 and tbpl_status == TBPL_WARNING:
> +                text_status = "test failures"
> +            elif return_code != 0:

I think that's right.
Reviewing a review didn't do what I expected, but comment 22 consists of replies to the review in comment 21.
Comment on attachment 8478519 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8478519 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/structuredlog.py
@@ +46,5 @@
> +        # By interpreting log levels directly and counting them
> +        # against num_errors, a test harness can turn a job red
> +        # by logging at the error level without logging "TEST-UNEXPECTED".
> +        if level in (ERROR, CRITICAL, FATAL):
> +            self.num_errors += 1

Just making a note of an irc discussion to do this in parse_single_line to make this more explicit.
Reading bug 1017559 made me understand this a bit better. I think we can be more obvious about the distinction. Now I'm thinking:

Mozlog can look at a set of log messages and give us a status for a test run. This is pretty much: any test_end or test_status where 'expected' is in the json message means test failure (orange), any log action at a level 'ERROR' (maybe 'WARNING') or above is error (red). Doing it in mozlog means our definitions don't have an opportunity to get subtly out of step when this logic is implemented in treeherder or some other tool. This can just be a handler.
Submitted too early, here's the rest:

Mozharness can pass messages to this handler (the thing that tells us how a run went based on a set of JSON messages) and set status based on that. For those that request it, mozharness will pass unstructured messages ('log' actions, or any non-json lines left on stdout if we allow that), to a regex based parser to detect anything unexpected that should fail the run. We might still need to detect arbitrary python exceptions (like those raised by mozlog unexpectedly) or output from things we don't control with a fallback, but a test result will never be passed through a regex based parser.
This is what the handler in mozlog to interpret statuses would look like.
This is a version of the patch I was working on last week that uses mozlog to set job status (the changes to marionette.py are almost identical). It's not that much simpler than the other version, but I think it's a step in the right direction.

If we agree this is a reasonable way forward, I'd like to land these on ash so we can test the various failure scenarios.
Attachment #8482883 - Flags: feedback?(james)
Attachment #8482883 - Flags: feedback?(ahalberstadt)
Attachment #8482921 - Flags: feedback?(james)
Attachment #8482921 - Flags: feedback?(ahalberstadt)
(In reply to Chris Manchester (PTO 9/6 - 9/10) [:chmanchester] from comment #28)
> Created attachment 8482921 [details] [diff] [review]
> Modify marionette's mozharness script and the structured output parser to
> use structured logs to determine job status.
> 
> This is a version of the patch I was working on last week that uses mozlog
> to set job status (the changes to marionette.py are almost identical). It's
> not that much simpler than the other version, but I think it's a step in the
> right direction.
> 
> If we agree this is a reasonable way forward, I'd like to land these on ash
> so we can test the various failure scenarios.

I just noticed this doesn't set the log level when the fallback parse_single_line is used. That needs to be fixed.
Comment on attachment 8482883 [details] [diff] [review]
Provide a mozlog handler to report overall status of a test run for the benefit of mozharness and others

Review of attachment 8482883 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense. I have some optional suggestions that are more of a brainstorm than hard requirements.

::: testing/mozbase/mozlog/mozlog/structured/handlers/statushandler.py
@@ +8,5 @@
> +TBPL_SUCCESS = 'SUCCESS'
> +TBPL_WARNING = 'WARNING'
> +TBPL_FAILURE = 'FAILURE'
> +TBPL_EXCEPTION = 'EXCEPTION'
> +TBPL_RETRY = 'RETRY'

I'd like for these to not have the 'TBPL' prefix.

@@ +16,5 @@
> +MOZLOG_OK = 'OK'
> +MOZLOG_TEST_FAIL = 'FAIL'
> +MOZLOG_ERROR = 'ERROR'
> +
> +_tbpl_statuses = {

Same.

@@ +39,5 @@
> +
> +    def __call__(self, data):
> +        status = MOZLOG_OK
> +        action = data['action']
> +        if action == 'log' and log_levels[data['level']] <= log_levels['ERROR']:

Why stop at ERROR? The status should just be the maximum log level, whether that's WARNING or ERROR or CRITICAL. E.g:

if log_levels[data['level']] > log_levels[self.worst_level]:
    status = self.worst_level = data['level']

It might also be nice to keep track of the number of log messages in each level (I know mozharness wouldn't make use of that, but other things may want to).

@@ +45,5 @@
> +        elif action == 'test_end':
> +            self.count += 1
> +            if 'expected' in data:
> +                self.unexpected += 1
> +                status = MOZLOG_TEST_FAIL

Also, maybe test failures and log levels should be stored separately.

@@ +64,5 @@
> +        if status == MOZLOG_OK or self.status != MOZLOG_OK:
> +            return
> +        self.status = status
> +
> +    def evaluate_status(self, style):

I'd just call this 'evaluate' so that the api is `status.evaluate()` instead of `status.evaluate_status()`.
Attachment #8482883 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 8482921 [details] [diff] [review]
Modify marionette's mozharness script and the structured output parser to use structured logs to determine job status.

Review of attachment 8482921 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to hold off feedback on this patch until the first one is reviewed/landed as this is heavily dependent on that. But from a quick glance, it looks ok.
Attachment #8482921 - Flags: feedback?(ahalberstadt)
Comment on attachment 8482883 [details] [diff] [review]
Provide a mozlog handler to report overall status of a test run for the benefit of mozharness and others

Review of attachment 8482883 [details] [diff] [review]:
-----------------------------------------------------------------

So I think I like the general idea, but it seems much nicer to do all the evaluation at the end when evaluate is called rather than doing it continually. I also agree with everything ahal said.

::: testing/mozbase/mozlog/mozlog/structured/handlers/statushandler.py
@@ +39,5 @@
> +
> +    def __call__(self, data):
> +        status = MOZLOG_OK
> +        action = data['action']
> +        if action == 'log' and log_levels[data['level']] <= log_levels['ERROR']:

I tend to agree that going for a design where we track everything and, at the end, use the data we've gathered to set the status, rather than doing the thing where we continually update the status as we go along.

e.g. 

in __init__
self.log_count = defaultdict(int)

for each message
if action == "log"
   self.log_count[data["level"]] += 1

When we evaluate the status
worst_level = None
for level in log_levels:
    if level in self.log_count:
        worst_level = level
        break

@@ +45,5 @@
> +        elif action == 'test_end':
> +            self.count += 1
> +            if 'expected' in data:
> +                self.unexpected += 1
> +                status = MOZLOG_TEST_FAIL

We can just set status at the end based on there being > 1 unexpected, right?

@@ +70,5 @@
> +        translation from mozlog statuses is provided."""
> +        diagnostic = None
> +        if not self.suite_ended:
> +            self._update_status(MOZLOG_ERROR)
> +            diagnostic = 'No suite_end message was found for this run'

Is this the right thing to do? It depends if you think that the most probable cause of this error is that the logging is broken (so the suite_end won't ever be produced) or evaluate was called too early. It's not quite clear to me which we should worry about more.
Attachment #8482883 - Flags: feedback?(james) → feedback+
(In reply to Andrew Halberstadt [:ahal] from comment #30)
> Comment on attachment 8482883 [details] [diff] [review]
> Provide a mozlog handler to report overall status of a test run for the
> benefit of mozharness and others
> 
> Review of attachment 8482883 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Makes sense. I have some optional suggestions that are more of a brainstorm
> than hard requirements.
> 
> ::: testing/mozbase/mozlog/mozlog/structured/handlers/statushandler.py
> @@ +8,5 @@
> > +TBPL_SUCCESS = 'SUCCESS'
> > +TBPL_WARNING = 'WARNING'
> > +TBPL_FAILURE = 'FAILURE'
> > +TBPL_EXCEPTION = 'EXCEPTION'
> > +TBPL_RETRY = 'RETRY'
> 
> I'd like for these to not have the 'TBPL' prefix.

I want to make it explicit that these specific strings mean something to TBPL, and users interacting with this can therefore use them directly.

> 
> @@ +16,5 @@
> > +MOZLOG_OK = 'OK'
> > +MOZLOG_TEST_FAIL = 'FAIL'
> > +MOZLOG_ERROR = 'ERROR'
> > +
> > +_tbpl_statuses = {
> 
> Same.

These can lose the prefix, but again want to be explicit that this is an adapter between the meaning of a test run as the mozlog format knows it and other tools.

> 
> @@ +39,5 @@
> > +
> > +    def __call__(self, data):
> > +        status = MOZLOG_OK
> > +        action = data['action']
> > +        if action == 'log' and log_levels[data['level']] <= log_levels['ERROR']:
> 
> Why stop at ERROR? The status should just be the maximum log level, whether
> that's WARNING or ERROR or CRITICAL. E.g:
> 
> if log_levels[data['level']] > log_levels[self.worst_level]:
>     status = self.worst_level = data['level']
> 

I guess these would all translate to TBPL_FAILURE (or maybe WARNING if we need the harness to be able to turn things orange).

> It might also be nice to keep track of the number of log messages in each
> level (I know mozharness wouldn't make use of that, but other things may
> want to).
> 
> @@ +45,5 @@
> > +        elif action == 'test_end':
> > +            self.count += 1
> > +            if 'expected' in data:
> > +                self.unexpected += 1
> > +                status = MOZLOG_TEST_FAIL
> 
> Also, maybe test failures and log levels should be stored separately.
> 
> @@ +64,5 @@
> > +        if status == MOZLOG_OK or self.status != MOZLOG_OK:
> > +            return
> > +        self.status = status
> > +
> > +    def evaluate_status(self, style):
> 
> I'd just call this 'evaluate' so that the api is `status.evaluate()` instead
> of `status.evaluate_status()`.
(In reply to James Graham [:jgraham] from comment #32)
> Comment on attachment 8482883 [details] [diff] [review]
> Provide a mozlog handler to report overall status of a test run for the
> benefit of mozharness and others
> 
> Review of attachment 8482883 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So I think I like the general idea, but it seems much nicer to do all the
> evaluation at the end when evaluate is called rather than doing it
> continually. I also agree with everything ahal said.
> 
> ::: testing/mozbase/mozlog/mozlog/structured/handlers/statushandler.py
> @@ +39,5 @@
> > +
> > +    def __call__(self, data):
> > +        status = MOZLOG_OK
> > +        action = data['action']
> > +        if action == 'log' and log_levels[data['level']] <= log_levels['ERROR']:
> 
> I tend to agree that going for a design where we track everything and, at
> the end, use the data we've gathered to set the status, rather than doing
> the thing where we continually update the status as we go along.

I think that's right, but the advantage to continually updating is we have a notion of the first non-OK status that gets set. Assuming that's the most relevant status to return (I could still be wrong about that), we can just keep a specific field for that.

> 
> e.g. 
> 
> in __init__
> self.log_count = defaultdict(int)
> 
> for each message
> if action == "log"
>    self.log_count[data["level"]] += 1
> 
> When we evaluate the status
> worst_level = None
> for level in log_levels:
>     if level in self.log_count:
>         worst_level = level
>         break
> 
> @@ +45,5 @@
> > +        elif action == 'test_end':
> > +            self.count += 1
> > +            if 'expected' in data:
> > +                self.unexpected += 1
> > +                status = MOZLOG_TEST_FAIL
> 
> We can just set status at the end based on there being > 1 unexpected, right?

Same comment about keeping the first non-OK status.

> 
> @@ +70,5 @@
> > +        translation from mozlog statuses is provided."""
> > +        diagnostic = None
> > +        if not self.suite_ended:
> > +            self._update_status(MOZLOG_ERROR)
> > +            diagnostic = 'No suite_end message was found for this run'
> 
> Is this the right thing to do? It depends if you think that the most
> probable cause of this error is that the logging is broken (so the suite_end
> won't ever be produced) or evaluate was called too early. It's not quite
> clear to me which we should worry about more.

I guess this is right if we're thinking about mozharness because it takes the place of evaluate_parser, which is done after the test process finishes. But now that I'm thinking about other users, and a world where the result of a run might be available incrementally, this isn't right, and I think this check should be moved back to mozharness.
Attachment #8482921 - Flags: feedback?(james)
Split off bug 1063535 to land the status handler.
Chris, what is the status of this bug? Looks like you missed to finish it off. I wonder if this bug only exists for the marionette script and not for mozharness in general, e.g. as an enhancement to base/log.py. It would be interesting for our firefox-ui-tests which get run through mozharness, and need to report to Treeherder including the creation of the error_summary_text artifact.
Flags: needinfo?(cmanchester)
I think the suite-agnostic parts of this ended up landing in other bugs. I don't think anything would block landing the marionette part, but I'm sure the patches wouldn't apply cleanly by now.

Note that this bug as it is is about setting job status based on structured logs, not using structured logs for mozharness' own logging. The error summary artifact can and should be added to marionette and can be done independent of this, the other scripts like destktop_unittest.py have examples of how this is done.
Assignee: cmanchester → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(cmanchester)
I just turned on structured log in mozharness' marionette script and most of suites look fine.  however config 'strict' need to be turned off because microformats suite pushed non-structured message to stdout.
If that wasn't important I guess we can simply turn it off.
try result with strict on https://treeherder.mozilla.org/#/jobs?repo=try&revision=f25e78b231d8539c8b48e5601a8abc7a43b3e6da
strict off https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb9d8f10533eb5a129c2f0e3cd623a1212683ea5

ni? Chris for suggestion
Flags: needinfo?(cmanchester)
Ideally tests would log everything through structured logging and we would be able to rely on that. This might not be easy for all tests, but I'm not familiar with the microformats suite. As long as we're aware of what's being printed without structured logging and none of it is critical information (such as test results, or an error from the harness intended to be fatal) this might be ok
Flags: needinfo?(cmanchester)
Attached patch enable_config (obsolete) — Splinter Review
This should be minimized patch to enable structured log for marionette.
feedback? to see if anything else we need to add.
Attachment #8797054 - Flags: feedback?(ahalberstadt)
Comment on attachment 8797054 [details] [diff] [review]
enable_config

Review of attachment 8797054 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks good to me!

::: testing/mozharness/scripts/marionette.py
@@ +454,5 @@
>          level = INFO
> +        success_codes = None
> +        tbpl_status, log_level = marionette_parser.evaluate_parser(
> +            code,
> +            success_codes=success_codes)

No point in passing in "success_codes" if it's just None anyway, might as well remove that.
Attachment #8797054 - Flags: feedback?(ahalberstadt) → feedback+
Attached patch enable_configSplinter Review
- Remove passing success_code argument
- Rename 'code' to 'return_code'
Assignee: nobody → pyang
Attachment #8797054 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Comment on attachment 8797355 [details] [diff] [review]
enable_config

Here is the try result of the patch.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=681a374efcf34de135af62d848a9437973e7aa7d

Looks ok and I'll confirm failed tests are  intermittent.
Attachment #8797355 - Flags: review?(ahalberstadt)
Attachment #8482921 - Attachment is obsolete: true
For the future, you should try to avoid using "all" when selecting unittests. For example, in this case you can use a try syntax like:

try: -b o -p <platforms> -u marionette -t none

This will only run the Mn tests (which is what we care about in this patch) and will help keep test machines available for other pushes.
Comment on attachment 8797355 [details] [diff] [review]
enable_config

Review of attachment 8797355 [details] [diff] [review]:
-----------------------------------------------------------------

Here are a couple more try runs to test the failure cases:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ahalberstadt@mozilla.com&fromchange=848a6d1091b854fc765d9ca84300a7c4b560bacb&tochange=e74b0ca092f4a27712b35f2588309059066da8ff

First one purposefully adds a test failure, second one purposefully causes a crash (though not on windows). Both have the expected results! I'll take the liberty of landing this.
Attachment #8797355 - Flags: review?(ahalberstadt) → review+
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0789005be29
Make mozharness use StructuredOutputParser for marionette tests, r=ahal
https://hg.mozilla.org/mozilla-central/rev/a0789005be29
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: