Closed Bug 1220941 Opened 9 years ago Closed 9 years ago

Mozharness checkout-sources step shouldn't output CalledProcessError exceptions to the log for allowed failures

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1211527

People

(Reporter: erahm, Unassigned)

References

(Blocks 1 open bug)

Details

For build failures often the only error I see is attributed to 'hg log -r' [1]. This is in fact not the reason for the build failure (I don't think it is an issue at all AFAICT).

Example log output that seems to be causing the issue:
> 17:32:08     INFO -  checking to see if 80d5260f7be669cf00ed83b36b65e3be5abe6e90 exists in /builds/hg-shared/try
> 17:32:08     INFO -  command: START
> 17:32:08     INFO -  command: hg log -r 80d5260f7be669cf00ed83b36b65e3be5abe6e90 --template {node}
> 17:32:08     INFO -  command: cwd: /builds/hg-shared/try
> 17:32:08     INFO -  command: env: {'HGPLAIN': '1'}
> 17:32:09     INFO -  Hit exception running hg:
> 17:32:09     INFO -  Traceback (most recent call last):
> 17:32:09     INFO -    File "/builds/slave/try-m64-d-00000000000000000000/build/tools/buildfarm/utils/../../lib/python/util/hg.py", line 99, in get_hg_output
> 17:32:09     INFO -      return get_output(['hg'] + cmd, timeout=timeout, env=env, **kwargs)
> 17:32:09     INFO -    File "/builds/slave/try-m64-d-00000000000000000000/build/tools/buildfarm/utils/../../lib/python/util/commands.py", line 207, in get_output
> 17:32:09     INFO -      raise error
> 17:32:09     INFO -  CalledProcessError: Command '['hg', 'log', '-r', '80d5260f7be669cf00ed83b36b65e3be5abe6e90', '--template', '{node}']' returned non-zero exit status 255

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=13329548&repo=try
(In reply to Wes Kocher (:KWierso) from comment #1)
> This might involve adding something to
> https://github.com/mozilla/treeherder/blob/
> 2c4e935e5f1ecce7ff8b4620eebf74dddf757550/treeherder/log_parser/parsers.
> py#L404 to just not highlight these infra-related failures.

Yes, that's the case here.
In this case, I don't believe we should add this to the blacklist.

In an ideal world we wouldn't have to use regex like `r"|^[A-Za-z.]+Error: "` at all, but could instead rely solely on the mozharness log level of ERROR and above. However this isn't practical, since there are lots of potential real error lines that don't have any mozharness prefix at all.

We could instead say "if there is a mozharness prefix, always trust that 'INFO' isn't an error, otherwise fallback to the other regex". However I don't know if there are cases where mozharness INFO is given for a real error (if there are they should be fixed, but that may be a long task).

Either way I don't think we should output exceptions to the log for expected (or permitted) failure cases. eg cases where we're happy to retry we should output "Pulling from hg.m.o failed; retrying (1 of 5)" etc rather than an exception. Fixing this would be a releng/automation bug not a Treeherder one.

I'm happy to make the "trust mozharness INFO" change if people want; but the harder part is going to be checking that it doesn't cause false negatives. If someone can do that work (ie: keep an eye out for real failure lines that have INFO that shouldn't) that would help greatly.
This issue appears to be a result of the intentional change in bug 1188968.

(In reply to Ed Morley [:emorley] from comment #3)
> We could instead say "if there is a mozharness prefix, always trust that
> 'INFO' isn't an error, otherwise fallback to the other regex". However I
> don't know if there are cases where mozharness INFO is given for a real
> error (if there are they should be fixed, but that may be a long task).

In fact there is one "INFO but shouldn't have been" example in that very log:
https://treeherder.mozilla.org/logviewer.html#?job_id=13329548&repo=try#L18384

 18:00:34 INFO - make[5]: *** [libdmd.dylib] Error 1

So there are two related, but separate issues that need to be fixed:

1) Find and fix all cases where INFO was incorrectly used (releng/mozharness/test harness). Once that's done, make Treeherder trust "INFO"

2) Catch this uncaught exception and output a nicer error message, since it's allowed to fail.

#2 is the quicker fix for this issue (and even with the fix for #1, something that should be done regardless).

Making this bug about #2.

Chris, since Ben is away until December 8th, would you mind taking a look for the sheriffs? Thanks :-)
Blocks: log-SnR
Component: Treeherder: Log Viewer → General Automation
Flags: needinfo?(catlee)
Product: Tree Management → Release Engineering
QA Contact: catlee
Summary: 'hg log -r' incorrectly flagged as reason for build failure → Mozharness checkout-sources step shouldn't output CalledProcessError exceptions to the log for allowed failures
bug 1211527 has the precise bits we're hitting here. We should not be printing errors for expected situations.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(catlee)
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.