mozharness DesktopUnittestOutputParser.evaluate_parser() should record a failure if the return code was non-zero

NEW
Unassigned

Status

Release Engineering
Mozharness
4 years ago
3 years ago

People

(Reporter: emorley, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozharness])

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
As part of testing bug 923513 it became apparent that we could have an exception in remoteautomation.py::checkForCrashes() but yet the panda test run not be reported as a failure.

> > 3) (But not an issue with this patch in particular) The panda mochitest-1
> > run had an uncaught exception whilst trying to process the java-exception
> 
> 16:57:15     INFO -  INFO | zombiecheck | Reading PID log:
> /tmp/tmp1PoLeMpidlog
> 16:57:16     INFO -  Traceback (most recent call last):
> 16:57:16     INFO -    File
> "/builds/panda-0854/test/build/tests/mochitest/runtests.py", line 1005, in
> runTests
> 16:57:16     INFO -      onLaunch=onLaunch
> 16:57:16     INFO -    File
> "/builds/panda-0854/test/build/tests/mochitest/runtestsremote.py", line 542,
> in runApp
> 16:57:16     INFO -      return self._automation.runApp(*args, **kwargs)
> 16:57:16     INFO -    File
> "/builds/panda-0854/test/build/tests/mochitest/automation.py", line 899, in
> runApp
> 16:57:16     INFO -      crashed =
> self.checkForCrashes(os.path.join(profileDir, "minidumps"), symbolsPath)
> 16:57:16     INFO -    File
> "/builds/panda-0854/test/build/tests/mochitest/remoteautomation.py", line
> 118, in checkForCrashes
> 16:57:16     INFO -      javaException =
> mozcrash.check_for_java_exception(logcat)
> 16:57:16     INFO -  AttributeError: 'module' object has no attribute
> 'check_for_java_exception'
> 16:57:16     INFO -  /data/anr/traces.txt not found
> 16:57:16     INFO -  Automation Error: Received unexpected exception while
> running application
> 16:57:16     INFO -  WARNING | leakcheck | refcount logging is off, so leaks
> can't be detected!
> 16:57:16     INFO -  runtests.py | Running tests: end.
(Reporter)

Comment 1

4 years ago
Full log for that example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=29108241&tree=Try

Ok, so the exception is bubbling up until here:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1012
{
1010     except:
1011       traceback.print_exc()
1012       log.error("Automation Error: Received unexpected exception while running application\n")
1013       status = 1
...
1026     return status
}

Which should mean we get '1' returned here:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtestsremote.py#727

{
725         try:
726             dm.recordLogcat()
727             retVal = mochitest.runTests(options)
728         except:
729             log.error("Automation Error: Exception caught while running tests")
730             traceback.print_exc()
731             mochitest.stopWebServer(options)
732             mochitest.stopWebSocketServer(options)
733             try:
734                 mochitest.cleanup(None, options)
735             except devicemanager.DMError:
736                 # device error cleaning up... oh well!
737                 pass
738             retVal = 1
739 
740     mochitest.printDeviceInfo(printLogcat=True)
741 
742     sys.exit(retVal)
}

So I'm guessing we do sys.exit(1) correctly (I can't see anything else that goes wrong), and that we're just not handling this in mozharness?
(Reporter)

Comment 2

4 years ago
Ha, https://hg.mozilla.org/build/mozharness/file/28399b92470f/scripts/android_panda.py does:
   234                 test_summary_parser = DesktopUnittestOutputParser(suite_category,
   235                                                                   config=self.config,
   236                                                                   error_list=error_list,
   237                                                                   log_obj=self.log_obj)
   238 
   239                 return_code = self.run_command(cmd, cwd=dirs['abs_test_install_dir'], env=env, output_parser=test_summary_parser)
   240 
   241                 tbpl_status, log_level = test_summary_parser.evaluate_parser(return_code)

DesktopUnittestOutputParser.evaluate_parser():
https://hg.mozilla.org/build/mozharness/file/28399b92470f/mozharness/mozilla/testing/unittest.py#l125
{
   125     def evaluate_parser(self, return_code):
   126         if self.num_errors:  # mozharness ran into a script error
   127             self.tbpl_status = self.worst_level(TBPL_FAILURE, self.tbpl_status,
   128                                                 levels=TBPL_WORST_LEVEL_TUPLE)
   129 
   130         # I have to put this outside of parse_single_line because this checks not
   131         # only if fail_count was more then 0 but also if fail_count is still -1
   132         # (no fail summary line was found)
   133         if self.fail_count != 0:
   134             self.worst_log_level = self.worst_level(WARNING, self.worst_log_level)
   135             self.tbpl_status = self.worst_level(TBPL_WARNING, self.tbpl_status,
   136                                                 levels=TBPL_WORST_LEVEL_TUPLE)
   137         # we can trust in parser.worst_log_level in either case
   138         return (self.tbpl_status, self.worst_log_level)
}

...so basically we just don't use return_code at all :-/
(Reporter)

Updated

4 years ago
Summary: AttributeError exception in remoteautomation.py::checkForCrashes() did not turn the run orange/red on panda jobs → mozharness DesktopUnittestOutputParser.evaluate_parser() doesn't use the return code it is passed to determine success/fail
(Reporter)

Updated

4 years ago
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Component: General → General Automation
Product: Testing → Release Engineering
QA Contact: catlee
Whiteboard: [mozharness]
Version: Trunk → other
(Reporter)

Comment 3

4 years ago
Created attachment 819696 [details] [diff] [review]
WIP

I believe things like crashes result in a non-zero return code, so we have to make this a WARNING rather than a FAILURE, since otherwise crashes will appear red on TBPL along with automation exceptions etc.
Attachment #819696 - Flags: review?(aki)
(Reporter)

Comment 4

4 years ago
Also note that whilst this is 'DesktopUnittestOutputParser', we actually use this for Android and B2G as well.q

Comment 5

4 years ago
Comment on attachment 819696 [details] [diff] [review]
WIP

This may have a larger-than-expected change in behavior.
Are you signing sheriffs up to detect that in production, or are you planning on testing this on cedar or ash first?
Attachment #819696 - Flags: review?(aki) → review+
(Reporter)

Comment 6

4 years ago
(In reply to Aki Sasaki [:aki] from comment #5)
> This may have a larger-than-expected change in behavior.
> Are you signing sheriffs up to detect that in production, or are you
> planning on testing this on cedar or ash first?

I'll test on cedar/ash first. Remind me how I do that again? (I looked on https://wiki.mozilla.org/ReleaseEngineering/Mozharness but didn't see any steps). Does cedar pull mozharness default and ash use a user repo? (My memory is fuzzy).
(Reporter)

Updated

4 years ago
Flags: needinfo?(aki)

Comment 7

4 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> (In reply to Aki Sasaki [:aki] from comment #5)
> > This may have a larger-than-expected change in behavior.
> > Are you signing sheriffs up to detect that in production, or are you
> > planning on testing this on cedar or ash first?
> 
> I'll test on cedar/ash first. Remind me how I do that again? (I looked on
> https://wiki.mozilla.org/ReleaseEngineering/Mozharness but didn't see any
> steps). Does cedar pull mozharness default and ash use a user repo? (My
> memory is fuzzy).

Correct.  The user repo for ash is http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness .
Flags: needinfo?(aki)
(Reporter)

Comment 8

4 years ago
I've pushed to ash-mozharness and then to ash to trigger builds, results should be at:
https://tbpl.mozilla.org/?tree=Ash&rev=17c88762ec86
(Reporter)

Comment 9

4 years ago
Ok so Windows tests always return 1 on all trunk trees (I'd noticed this before but forgotten to file), which means this patch (correctly) turns them orange - so we'll need to fix that first before landing this.

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31174977&tree=Ash
but also on m-c:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31217020&tree=Mozilla-Central

{
04:21:16     INFO -  202502 INFO SimpleTest FINISHED
04:21:17     INFO -  INFO | runtests.py | Application ran for: 0:18:16.105000
04:21:17     INFO -  INFO | zombiecheck | Reading PID log: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpgs9owppidlog
04:21:17     INFO -  ==> process 3648 launched child process 2356 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bd9fbd0.2087353558 "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpftmjrw\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" plugin)
04:21:17     INFO -  ==> process 3648 launched child process 2516 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bebe100.1936084576 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" tab)
04:21:17     INFO -  ==> process 3648 launched child process 4080 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bd9d9d0.394117182 "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpftmjrw\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" plugin)
04:21:17     INFO -  ==> process 3648 launched child process 2840 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.975b6b0.1021023147 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" tab)
04:21:17     INFO -  WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
04:21:17     INFO -  runtests.py | Running tests: end.
04:21:17     INFO -  0
04:21:17    ERROR - Return code: 1
}

However I'm not sure what is going on with B2G - the exception seems unrelated to my patch (perhaps we need to reset ash-mozharness, since it's a bit of a mess of merges at the moment). eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31174708&tree=Ash#error0
(Reporter)

Updated

4 years ago
Depends on: 944364

Updated

3 years ago
Component: General Automation → Mozharness
(Reporter)

Updated

3 years ago
Blocks: 1048775
Summary: mozharness DesktopUnittestOutputParser.evaluate_parser() doesn't use the return code it is passed to determine success/fail → mozharness DesktopUnittestOutputParser.evaluate_parser() should record a failure if the return code was non-zero
(Reporter)

Comment 10

3 years ago
Comment on attachment 819696 [details] [diff] [review]
WIP

Clearing r+ since this needs the dependant bug first, and this will get it off my to-check-in bugzilla todos list.
Attachment #819696 - Attachment description: Patch v1 → WIP
Attachment #819696 - Flags: review+
(Reporter)

Comment 11

3 years ago
This needs bug 944364 fixed before this can land (presuming it's not bitrotted anyway).
Unassigning to clear this from bugzilla todos.
Assignee: emorley → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.