Closed Bug 988603 Opened 10 years ago Closed 10 years ago

mozharness reports success when build fails in some cases

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mshal, Assigned: mozilla)

Details

Attachments

(1 file, 2 obsolete files)

With the JS minifier enabled (bug 903149), sometimes we'll display a warning message that contains a string like "SyntaxError: blah". The SyntaxError message comes from a subprocess, and the minifier is still returning success (ie: this isn't a failure case, it just is a warning that has the magic word "SyntaxError" in it). Mozharness picks this up and ends up halting the build:

00:19:53  WARNING -  Warning: JS minification verification failed for ../../dist/bin/modules/KeyValueParser.jsm:
00:19:53    ERROR -  10:26 SyntaxError: missing ) after argument list:
00:19:53     INFO -  10:26   is.init(fstream, "UTF-8"
00:19:53     INFO -  10:26 ..........................^
...
00:20:44     INFO - Return code: 0
00:20:44    FATAL - Halting on failure while running make package AB_CD=multi
00:20:44    FATAL - Running post_fatal callback...
00:20:44    FATAL - Exiting 0

However, since the return code is still 0, the build shows up as green in tbpl. I think in this case we probably want it to be red.
Attached patch fatal_exit_code (obsolete) — Splinter Review
I was going to put in an |if self.run_command(...):| in b2g_desktop_multilocale.py, but then I looked at run_command()'s logic, and this seems wrong:
http://hg.mozilla.org/build/mozharness/file/8521152eb1f7/mozharness/base/script.py#l704

(namely, if returncode is 0, we will exit 0 on fatal(), which seems wrong).

This is one way to fix it.  I know you were thinking about failure_config; I'm open to that, but this patch should fix this specific issue.

I think I've caught the various 'exception' bits (fatal_exit_code=3, in tests where we want to turn an 'unzip' failure purple rather than red).  Otherwise we'll default to 2 (failure).  We probably want a clear run on Cypress after this lands.
Assignee: nobody → aki
Attachment #8397515 - Flags: review?(jlund)
Comment on attachment 8397515 [details] [diff] [review]
fatal_exit_code

Review of attachment 8397515 [details] [diff] [review]:
-----------------------------------------------------------------

So overall looks like an improvement. I think the best thing about this is that we set the return code to a sane number and we add 'exception' purple levels to places that would otherwise be red. I have some questions/comments:

1) This bug is due to parser.num_errors being > 0 while returncode is in success_codes. Maybe it be useful to explicitly say why we fail? Something like my code snippet below.
2) I'm not 100% sure about using parser.num_errors inside the halt_on_failure condition and then ignoring parser. What if we parse_single_line() that results in a level worse than fatal_exit_code? Won't levels from parser get ignored? Maybe, if there are parser.num_errors we should use parser.worst_level somehow? I don't know and it's probably scope creep for this bug.
3) This still doesn't touch self.return_code. Which is only an issue if (a) you implement things like a PostScript decorator/listener or (b) we error/fail in run_command but we don't set halt_on_failure. I think both of those go outside the scope of solving this bug and, like you mentioned, there is still room for something like failure_config.
4) I trust you over me to know which run_commands should have fatal_exit_code=3 so your choices there lgtm. Either way, I think this will be an improvement.

r+ once (1) and (5) are addressed by either commenting on or actually implementing it if you agree with them.

::: mozharness/base/log.py
@@ +227,2 @@
>          append_to_log=False,
>      ):

I guess halt_on_failure was legacy code here?

::: mozharness/base/script.py
@@ +703,5 @@
>          self.log("Return code: %d" % returncode, level=return_level)
>          if halt_on_failure:
>              if parser.num_errors or returncode not in success_codes:
> +                if fatal_exit_code is None:
> +                    fatal_exit_code = returncode

5) I am not sure about this. We set a default for fatal_exit_code=2 and the only time we explicitly set fatal_exit_code outside of that is when we set it to 3. But then here we 'if fatal_exit_code is None:'?

Are we trying to allow for someone to explicitly pass fatal_exit_code=None? Which effectively is a way of getting run_command to use returncode? If that is the case, maybe we should assert returncode is not also 0 or we will be right back to this weird edge case in this bug IIUC.

But I would argue it is messy to allow fatal_exit_code=None. Maybe doing something like what I mentioned in comment (1) at the top of this review and then also:

if halt_on_failure:
    if parser.num_errors or returncode not in success_codes:
        if returncode not in success_codes:
            self.error("%s not in success codes: %s" % (returncode,
                                                        success_codes))
        if parser.num_errors:
            self.error("failures found while parsing output")
        if not fatal_exit_code:
            self.warning("'fatal_exit_code' was not > 0. Defaulting to 2")
            fatal_exit_code = 2
Attached patch fatal_exit_code2 (obsolete) — Splinter Review
(In reply to Jordan Lund (:jlund) from comment #2)
> So overall looks like an improvement. I think the best thing about this is
> that we set the return code to a sane number and we add 'exception' purple
> levels to places that would otherwise be red.

Yeah.  I don't think this is ideal still, but I wanted to improve it, at least.

> 1) This bug is due to parser.num_errors being > 0 while returncode is in
> success_codes. Maybe it be useful to explicitly say why we fail? Something
> like my code snippet below.

Sure, added.

> 2) I'm not 100% sure about using parser.num_errors inside the
> halt_on_failure condition and then ignoring parser. What if we
> parse_single_line() that results in a level worse than fatal_exit_code?
> Won't levels from parser get ignored? Maybe, if there are parser.num_errors
> we should use parser.worst_level somehow? I don't know and it's probably
> scope creep for this bug.
> 3) This still doesn't touch self.return_code. Which is only an issue if (a)
> you implement things like a PostScript decorator/listener or (b) we
> error/fail in run_command but we don't set halt_on_failure. I think both of
> those go outside the scope of solving this bug and, like you mentioned,
> there is still room for something like failure_config.

Touched return_code, and removed the ability to set fatal_exit_code to None.


> ::: mozharness/base/log.py
> @@ +227,2 @@
> >          append_to_log=False,
> >      ):
> 
> I guess halt_on_failure was legacy code here?

Yup.  It was from when fatal() was optionally-fatal.

> 5) I am not sure about this. We set a default for fatal_exit_code=2 and the
> only time we explicitly set fatal_exit_code outside of that is when we set
> it to 3. But then here we 'if fatal_exit_code is None:'?
> 
> Are we trying to allow for someone to explicitly pass fatal_exit_code=None?

Yup.  But I can't think of a real life use case, so I might as well strip it out.

> if halt_on_failure:
>     if parser.num_errors or returncode not in success_codes:
>         if returncode not in success_codes:
>             self.error("%s not in success codes: %s" % (returncode,
>                                                         success_codes))
>         if parser.num_errors:
>             self.error("failures found while parsing output")
>         if not fatal_exit_code:
>             self.warning("'fatal_exit_code' was not > 0. Defaulting to 2")
>             fatal_exit_code = 2

Done, but slightly differently.  This is the only difference between the two patches.
Attachment #8397515 - Attachment is obsolete: true
Attachment #8397515 - Flags: review?(jlund)
Attachment #8398079 - Flags: review?(jlund)
Attached patch fatal_exit_code3Splinter Review
Also a get_output_from_command() fix.
Attachment #8398079 - Attachment is obsolete: true
Attachment #8398079 - Flags: review?(jlund)
Attachment #8398081 - Flags: review?(jlund)
Comment on attachment 8398081 [details] [diff] [review]
fatal_exit_code3

Review of attachment 8398081 [details] [diff] [review]:
-----------------------------------------------------------------

r+ lgtm

I guess we could still assign 0 or None to fatal_exit_code but that seems like it would be the fault of the dev using run_command().
Attachment #8398081 - Flags: review?(jlund) → review+
Comment on attachment 8398081 [details] [diff] [review]
fatal_exit_code3

https://hg.mozilla.org/build/mozharness/rev/c1505e7e1fa0
Going to kick off cypress builds+tests.
Attachment #8398081 - Flags: checked-in+
in production.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: