Mozharness' structuredlog.py should get run status directly from mozlog's status handler

RESOLVED FIXED

Status

Release Engineering
Mozharness
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: chmanchester, Assigned: chmanchester)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {leave-open})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

Originally from bug 1024055. Seems like it should have its own bug.
Blocks: 1068138
Created attachment 8492239 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

This is the effect on mozharness of taking the run summary approach in bug 1068732.
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
Created attachment 8494578 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

When we move our harnesses to mozharness structured logging, this evaluate_parser function is going to become the uniform source for how a test turns tbpl/treeherder a color.

Here it is being generally green on ash:
https://treeherder.mozilla.org/ui/#/jobs?repo=ash&revision=80e3afe71657

And with crashes everywhere (wpt is the only thing using mozharness' structuredlog.py right now):
https://treeherder.mozilla.org/ui/#/jobs?repo=ash&revision=5224561bbf4f
Attachment #8494578 - Flags: review?(james)
Attachment #8494578 - Flags: review?(ahalberstadt)
Attachment #8492239 - Attachment is obsolete: true
Comment on attachment 8494578 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

Review of attachment 8494578 [details] [diff] [review]:
-----------------------------------------------------------------

I can't say I understand everything that is going on here, but it all seems reasonable.
Attachment #8494578 - Flags: review?(ahalberstadt) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #3)
> Comment on attachment 8494578 [details] [diff] [review]
> Modify mozharness' structurelog.py to use the mozlog status handler
> 
> Review of attachment 8494578 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I can't say I understand everything that is going on here, but it all seems
> reasonable.

The gist of it, for any test job:

Unexpected test status or crash action -> orange
Log at the error level or worse -> red
No test_end log (no tests run) or no suite_end log (the harness failed to complete -> red
Failure output from a tool whose output we don't control (from errors_list, regex based) -> red
Non zero return from harness process -> red (if it isn't already orange)
Everything else -> green
Blocks: 1068153
Comment on attachment 8494578 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

Review of attachment 8494578 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately I think that at least fixing the crash handling is a blocker here.

::: mozharness/mozilla/structuredlog.py
@@ +90,1 @@
>              self.update_levels(TBPL_WARNING, WARNING)

This is a problem for wpt, where a test might have an expected crash.

I'm not sure what a great solution is. In wpt you only get a crash message on combination with a test_end status=crash message, so it's fine to set the overall status just based on the test results. I guess this isn't true for all harnesses? We can of course parameterise this behaviour, but it seems worth thinking about whether we can fix the model to be consistent in all cases first.

@@ +98,5 @@
> +            self.log("No checks run.", ERROR)
> +            self.update_levels(TBPL_FAILURE, ERROR)
> +        if ('ERROR' in summary.log_level_counts or
> +            'CRITICAL' in summary.log_level_counts):
> +            self.update_levels(TBPL_FAILURE, ERROR)

All the code in this function seems rather repetitive. It sort of obscures the fact that there's only really two stages: 1) check for things that set the status to warning, 2) check for things that set the status to error.

I guess there would be a few ways to change this. Long if statements seem bad. Something more declarative would be e.g.

statuses = [((TBPL_WARNING, WARNING), [(summary.unexpected, None),
                                       ('crash' in summary.action_counts, None)]),
            ((TBPL_FAILURE, ERROR), ['suite_end' not in summary.action_counts, "No suite end message emitted"), 
                                      [...]])]

for levels, conditions in statuses:
    for condition, msg in conditions:
        if condition:
            self.update_levels(*levels)
            if msg is not None:
                self.log(msg, levels[1])

I'm not exactly sure if that's better but it's somewhat less procedural.

@@ +125,5 @@
> +            fail_text = '<em class="testfail">%s</em>' % unexpected
> +        else:
> +            fail_text = '0'
> +
> +        allchecks = summary.action_counts['test_end'] + summary.action_counts['test_status']

all_checks

@@ +127,5 @@
> +            fail_text = '0'
> +
> +        allchecks = summary.action_counts['test_end'] + summary.action_counts['test_status']
> +        expected_ran = allchecks - unexpected - skipped
> +        text_summary = "%s/%s/%s" % (expected_ran, fail_text, skipped)

It's not really clear that todo is the same as skipped. I thought that in mozilla terms todo meant "FAIL but expected".

Also this whole function is pretty sad. What's it needed for compat with? If it's actually needed can I opt out somehow?

(arguably this function should also live in another class or something because parsing the results isn't the same responsibility as printing a summary to the logs).
Attachment #8494578 - Flags: review?(james) → review-
Created attachment 8500657 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

This accounts for both cases of an expected crash as a test status and a crash that isn't associated with a particular test. At the moment I think wpt will be the only harness logging expected crashes through mozcrash but both seem useful.

todo should be known failures, that's my mistake and updated, although this brings up another issue: with mochitest's buffering our results aren't going to be what we expect as long as we call the handler in mozharness. We can unbuffer output to stdout and re-buffer in mozharness, and it would be a repeat of what the harness does locally, but maybe that's ok. Any consumer summarizing the raw log needs to see all the messages, and given the buffering caters to a mozharness limitation in the first place, maybe it's the right place to do it.

Alternatively, and this could be a decent idea, we could use the status handler in the harness, and pass in a command line argument to make it dump the summary as json at the very end of the run (to a small file we could then upload to blobber as well, perhaps written at suite_end) for mozharness to parse. I like this because it minimizes mozharness' role in interpreting test results based on stdout, a role it maybe shouldn't have in the first place.

print_summary and append_tinderboxprint_line are parts of the DesktopUnittestOutput parser used by scripts. It's a questionable original design, but I see no reason not to provide these summaries for the ui, and as long as we are we don't have to update all the users to detect whether structured logging is in use when deciding how to call these methods if we do it here. It's up to a script to call them or not.
Attachment #8500657 - Flags: feedback?(james)
Attachment #8494578 - Attachment is obsolete: true
Created attachment 8500658 [details] [diff] [review]
Log a summary of expected and unexpected statuses in the mozlog StatusHandler summary

This is the corresponding mozlog change. It's seeming rather arbitrary what ends up here and what doesn't -- I'm open to ideas.
Attachment #8500658 - Flags: feedback?(james)
ahal, if you have any input on what I mentioned in comment 6 it would be very welcome. Thanks!
Flags: needinfo?(ahalberstadt)
ahal and I talked about this on irc.
Flags: needinfo?(ahalberstadt)
(In reply to Chris Manchester [:chmanchester] from comment #9)
> ahal and I talked about this on irc.

From the end of that conversation:

21:47	ahal	we already save the full raw log in a file for blobber purposes
21:48	ahal	let's just parse that for the summary
21:48	ahal	(or attach a handler to it or whatever)
21:49	chmanchester	and we can just have test jobs use some other OutputParser for whatever else mozharness cares about
21:49	chmanchester	(i.e., still eliminate DesktopUnittestOutputParser from the get go)
21:49	ahal	yeah
Attachment #8500658 - Flags: feedback?(james) → feedback+
Comment on attachment 8500657 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

Review of attachment 8500657 [details] [diff] [review]:
-----------------------------------------------------------------

I think this works better.

::: mozharness/mozilla/structuredlog.py
@@ +91,5 @@
> +
> +        # These are warning/orange statuses.
> +        failure_conditions = [
> +            sum(summary.unexpected_statuses.values()) > 0,
> +            summary.action_counts.get('crash', 0) > summary.expected_statuses.get('CRASH', 0),

The mismatch between "crash" and "CRASH" here is a little upsetting, but perhaps not worth fixing here.

@@ +116,5 @@
> +
> +        # If a superclass was used to detect errors with a regex based output parser,
> +        # this will be reflected in the status here.
> +        if self.num_errors:
> +            self.update_levels(TBPL_FAILURE, ERROR)

Presumably this should use error_pair

@@ +121,5 @@
> +
> +        # Harnesses typically return non-zero on test failure, so don't promote
> +        # to error if we already have a failing status.
> +        if return_code != 0 and self.tbpl_status == TBPL_SUCCESS:
> +            self.update_levels(TBPL_FAILURE, ERROR)

error_pair

@@ +133,5 @@
> +
> +    def print_summary(self, suite_name):
> +        # Summary text provided for compatibility. Counts are currently
> +        # in the format <pass count>/<fail count>/<todo count>,
> +        # <expected count>/<unexpected count>/<expected fail count> will yield the

These aren't strictly equivalent, are they? I don't really know what this is used for so it might not matter, but perhaps pass -> expected && status in (pass, ok); fail -> unexpected && status not in (pass, ok); todo -> status == skip || expected && status not in (pass, ok). That does mean that unexpected && status in (pass, ok) doesn't get counted anywhere though.

@@ +135,5 @@
> +        # Summary text provided for compatibility. Counts are currently
> +        # in the format <pass count>/<fail count>/<todo count>,
> +        # <expected count>/<unexpected count>/<expected fail count> will yield the
> +        # expected info from a structured log.
> +        summary = self.handler.summarize()

Should this be cached?
Attachment #8500657 - Flags: feedback?(james) → feedback+
However this ends up, I don't think we can aim to provide a summary from the logger until we figure out how to stop buffering the raw log.
Depends on: 1055765
No longer depends on: 1063535
Feedback on the proposal in comment 6 about getting test results from a file and leaving stdout alone would be appreciated. Would you be amenable to making this change for wpt for the sake of uniformity in mozharness test result parsing?
Flags: needinfo?(james)
In general I prefer the idea of pushing the formatting and interpretation of test output to the edges of the system, rather than down into specific harnesses. The structured logging is intended to provide an API on which we can build reusable components to do things like compute the result of a testrun. If we give individual harnesses the freedom to do whatever here they'll start to have divergent behaviour and you'll need to examine the code of each harness to explain why a certain run ended up with the status it had. Therefore I'd prefer that there are compelling reasons to leave things up to the harnesses to determine. In this case the problem with buffering in mochitest might be a reason to prefer letting the harness provide the formatted logs until such a time as e.g. treeherder can provide a better UI by interpreting the raw logs directly.
Flags: needinfo?(james)
(In reply to James Graham [:jgraham] from comment #14)
> In general I prefer the idea of pushing the formatting and interpretation of
> test output to the edges of the system, rather than down into specific
> harnesses. The structured logging is intended to provide an API on which we
> can build reusable components to do things like compute the result of a
> testrun. If we give individual harnesses the freedom to do whatever here
> they'll start to have divergent behaviour and you'll need to examine the
> code of each harness to explain why a certain run ended up with the status
> it had. Therefore I'd prefer that there are compelling reasons to leave
> things up to the harnesses to determine. In this case the problem with
> buffering in mochitest might be a reason to prefer letting the harness
> provide the formatted logs until such a time as e.g. treeherder can provide
> a better UI by interpreting the raw logs directly.

If we design things here is such a way that the results of tests are never interpreted based on the contents of stdout we're still serving these goals. This will mean unstructured output will never cause an orange job (a change that will need to be publicized), so there wouldn't be confusion about the level of support for divergent behavior in individual harnesses. If we use stdout for the raw logs it seems in this case we're doomed to re-implement existing limitations (the verbosity and size of the log produced my mozharness) when we already have access to something better.

This is all a little hand wavy until we have a plan for consuming the logs in treeherder and replacing whatever it uses the formatted log for for something driven by the structured log.
The log truncation appears to be in buildbot, not mozharness, here: http://hg.mozilla.org/build/buildbot/annotate/b8b419061e7c/master/buildbot/status/builder.py#l470
Created attachment 8505742 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

I think we should land this or something very close and revisit the issue with buffering/summarizing once we have mochitest unbuffered and xpcshell landed (there will be a similar consideration there).

I sort of believe the right way to do this is to upload a summary file and have mozharness do minimal processing of that to determine tbpl status, but this is already sufficient for wpt and marionette based tests, and for gaia build/linter tests this will be our only real option (these tests have no python harness, so mozharness is the first place we could use a mozlog formatter/handler).
Attachment #8505742 - Flags: review?(james)
Attachment #8500657 - Attachment is obsolete: true
Created attachment 8505745 [details] [diff] [review]
Log a summary of expected and unexpected statuses in the mozlog StatusHandler summary.

This is the mozlog part again. I'll run both of these through ash again before landing.
Attachment #8505745 - Flags: review?(james)
Attachment #8500658 - Attachment is obsolete: true
Comment on attachment 8505742 [details] [diff] [review]
Modify mozharness' structurelog.py to use the mozlog status handler

Review of attachment 8505742 [details] [diff] [review]:
-----------------------------------------------------------------

So I'm not really sure this is perfect, but I guess it makes sense to land it and address any issues in followup work.

::: mozharness/mozilla/structuredlog.py
@@ +91,5 @@
> +
> +        # These are warning/orange statuses.
> +        failure_conditions = [
> +            sum(summary.unexpected_statuses.values()) > 0,
> +            summary.action_counts.get('crash', 0) > summary.expected_statuses.get('CRASH', 0),

I guess this is OK, although it seems a little dubious to assume that these two numbers are always perfectly in sync when it isn't enforced by the API.
Attachment #8505742 - Flags: review?(james) → review+
Attachment #8505745 - Flags: review?(james) → review+
(In reply to James Graham [:jgraham] from comment #19)
> Comment on attachment 8505742 [details] [diff] [review]
> Modify mozharness' structurelog.py to use the mozlog status handler
> 
> Review of attachment 8505742 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So I'm not really sure this is perfect, but I guess it makes sense to land
> it and address any issues in followup work.
> 
> ::: mozharness/mozilla/structuredlog.py
> @@ +91,5 @@
> > +
> > +        # These are warning/orange statuses.
> > +        failure_conditions = [
> > +            sum(summary.unexpected_statuses.values()) > 0,
> > +            summary.action_counts.get('crash', 0) > summary.expected_statuses.get('CRASH', 0),
> 
> I guess this is OK, although it seems a little dubious to assume that these
> two numbers are always perfectly in sync when it isn't enforced by the API.

Right, logging expected crashes without crash actions would hide crash actions logged independent of test statuses. This is sort of the thing where mozlog doesn't model failures outside of test files well.
On Ash:
https://treeherder.mozilla.org/ui/#/jobs?repo=ash&revision=57d7e38549d4
The mozlog part:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d93d799ad09
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/7d93d799ad09
We need the mozlog patch to land anywhere mozharness runs wpt before the mozharness one does. This looks like the project/development trees and mozilla-aurora. Any others?
Flags: needinfo?(james)
Well the authoritative answer to that is in http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py, but I think the short answer is "no", it runs on m-c, m-i and project branches including cedar where it also has some extra configurations. It also runs on aurora (we don't branch mozharness for gecko branches? That seems rather fragile).
Flags: needinfo?(james)
Please uplift the mozlog patch uploaded in comment 18 (https://bug1066785.bugzilla.mozilla.org/attachment.cgi?id=8505745) to aurora. This impacts only w-p-t. Per the configs linked in comment 25 this runs on firefox 35 and above.
Whiteboard: [checkin-needed-aurora]
https://hg.mozilla.org/releases/mozilla-aurora/rev/e713f36c81ea
Whiteboard: [checkin-needed-aurora]
https://hg.mozilla.org/build/mozharness/rev/8ff82e31655b
In production: https://hg.mozilla.org/build/mozharness/rev/8ff82e31655b
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.