928887 - mozharness DesktopUnittestOutputParser.evaluate_parser() should record a failure if the return code was non-zero

Reporter

Description

•

12 years ago

As part of testing bug 923513 it became apparent that we could have an exception in remoteautomation.py::checkForCrashes() but yet the panda test run not be reported as a failure. > > 3) (But not an issue with this patch in particular) The panda mochitest-1 > > run had an uncaught exception whilst trying to process the java-exception > > 16:57:15 INFO - INFO | zombiecheck | Reading PID log: > /tmp/tmp1PoLeMpidlog > 16:57:16 INFO - Traceback (most recent call last): > 16:57:16 INFO - File > "/builds/panda-0854/test/build/tests/mochitest/runtests.py", line 1005, in > runTests > 16:57:16 INFO - onLaunch=onLaunch > 16:57:16 INFO - File > "/builds/panda-0854/test/build/tests/mochitest/runtestsremote.py", line 542, > in runApp > 16:57:16 INFO - return self._automation.runApp(*args, **kwargs) > 16:57:16 INFO - File > "/builds/panda-0854/test/build/tests/mochitest/automation.py", line 899, in > runApp > 16:57:16 INFO - crashed = > self.checkForCrashes(os.path.join(profileDir, "minidumps"), symbolsPath) > 16:57:16 INFO - File > "/builds/panda-0854/test/build/tests/mochitest/remoteautomation.py", line > 118, in checkForCrashes > 16:57:16 INFO - javaException = > mozcrash.check_for_java_exception(logcat) > 16:57:16 INFO - AttributeError: 'module' object has no attribute > 'check_for_java_exception' > 16:57:16 INFO - /data/anr/traces.txt not found > 16:57:16 INFO - Automation Error: Received unexpected exception while > running application > 16:57:16 INFO - WARNING | leakcheck | refcount logging is off, so leaks > can't be detected! > 16:57:16 INFO - runtests.py | Running tests: end.

Ed Morley [:emorley]

Reporter

Comment 1

•

12 years ago

Full log for that example: https://tbpl.mozilla.org/php/getParsedLog.php?id=29108241&tree=Try Ok, so the exception is bubbling up until here: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1012 { 1010 except: 1011 traceback.print_exc() 1012 log.error("Automation Error: Received unexpected exception while running application\n") 1013 status = 1 ... 1026 return status } Which should mean we get '1' returned here: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtestsremote.py#727 { 725 try: 726 dm.recordLogcat() 727 retVal = mochitest.runTests(options) 728 except: 729 log.error("Automation Error: Exception caught while running tests") 730 traceback.print_exc() 731 mochitest.stopWebServer(options) 732 mochitest.stopWebSocketServer(options) 733 try: 734 mochitest.cleanup(None, options) 735 except devicemanager.DMError: 736 # device error cleaning up... oh well! 737 pass 738 retVal = 1 739 740 mochitest.printDeviceInfo(printLogcat=True) 741 742 sys.exit(retVal) } So I'm guessing we do sys.exit(1) correctly (I can't see anything else that goes wrong), and that we're just not handling this in mozharness?

Ed Morley [:emorley]

Reporter

Comment 2

•

12 years ago

Ha, https://hg.mozilla.org/build/mozharness/file/28399b92470f/scripts/android_panda.py does: 234 test_summary_parser = DesktopUnittestOutputParser(suite_category, 235 config=self.config, 236 error_list=error_list, 237 log_obj=self.log_obj) 238 239 return_code = self.run_command(cmd, cwd=dirs['abs_test_install_dir'], env=env, output_parser=test_summary_parser) 240 241 tbpl_status, log_level = test_summary_parser.evaluate_parser(return_code) DesktopUnittestOutputParser.evaluate_parser(): https://hg.mozilla.org/build/mozharness/file/28399b92470f/mozharness/mozilla/testing/unittest.py#l125 { 125 def evaluate_parser(self, return_code): 126 if self.num_errors: # mozharness ran into a script error 127 self.tbpl_status = self.worst_level(TBPL_FAILURE, self.tbpl_status, 128 levels=TBPL_WORST_LEVEL_TUPLE) 129 130 # I have to put this outside of parse_single_line because this checks not 131 # only if fail_count was more then 0 but also if fail_count is still -1 132 # (no fail summary line was found) 133 if self.fail_count != 0: 134 self.worst_log_level = self.worst_level(WARNING, self.worst_log_level) 135 self.tbpl_status = self.worst_level(TBPL_WARNING, self.tbpl_status, 136 levels=TBPL_WORST_LEVEL_TUPLE) 137 # we can trust in parser.worst_log_level in either case 138 return (self.tbpl_status, self.worst_log_level) } ...so basically we just don't use return_code at all :-/

Ed Morley [:emorley]

Reporter

Updated

•

12 years ago

Summary: AttributeError exception in remoteautomation.py::checkForCrashes() did not turn the run orange/red on panda jobs → mozharness DesktopUnittestOutputParser.evaluate_parser() doesn't use the return code it is passed to determine success/fail

Ed Morley [:emorley]

Reporter

Updated

•

12 years ago

Assignee: nobody → emorley

Status: NEW → ASSIGNED

Component: General → General Automation

Product: Testing → Release Engineering

QA Contact: catlee

Whiteboard: [mozharness]

Version: Trunk → other

Ed Morley [:emorley]

Reporter

Comment 3

•

12 years ago

Attached patch WIP — Details — Splinter Review

I believe things like crashes result in a non-zero return code, so we have to make this a WARNING rather than a FAILURE, since otherwise crashes will appear red on TBPL along with automation exceptions etc.

Attachment #819696 - Flags: review?(aki)

Ed Morley [:emorley]

Reporter

Comment 4

•

12 years ago

Also note that whilst this is 'DesktopUnittestOutputParser', we actually use this for Android and B2G as well.q

Aki Sasaki (not active)

Comment 5

•

12 years ago

Comment on attachment 819696 [details] [diff] [review] WIP This may have a larger-than-expected change in behavior. Are you signing sheriffs up to detect that in production, or are you planning on testing this on cedar or ash first?

Attachment #819696 - Flags: review?(aki) → review+

Ed Morley [:emorley]

Reporter

Comment 6

•

12 years ago

(In reply to Aki Sasaki [:aki] from comment #5) > This may have a larger-than-expected change in behavior. > Are you signing sheriffs up to detect that in production, or are you > planning on testing this on cedar or ash first? I'll test on cedar/ash first. Remind me how I do that again? (I looked on https://wiki.mozilla.org/ReleaseEngineering/Mozharness but didn't see any steps). Does cedar pull mozharness default and ash use a user repo? (My memory is fuzzy).

Ed Morley [:emorley]

Reporter

Updated

•

12 years ago

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 7

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #6) > (In reply to Aki Sasaki [:aki] from comment #5) > > This may have a larger-than-expected change in behavior. > > Are you signing sheriffs up to detect that in production, or are you > > planning on testing this on cedar or ash first? > > I'll test on cedar/ash first. Remind me how I do that again? (I looked on > https://wiki.mozilla.org/ReleaseEngineering/Mozharness but didn't see any > steps). Does cedar pull mozharness default and ash use a user repo? (My > memory is fuzzy). Correct. The user repo for ash is http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness .

Flags: needinfo?(aki)

Ed Morley [:emorley]

Reporter

Comment 8

•

12 years ago

I've pushed to ash-mozharness and then to ash to trigger builds, results should be at: https://tbpl.mozilla.org/?tree=Ash&rev=17c88762ec86

Ed Morley [:emorley]

Reporter

Comment 9

•

12 years ago

Ok so Windows tests always return 1 on all trunk trees (I'd noticed this before but forgotten to file), which means this patch (correctly) turns them orange - so we'll need to fix that first before landing this. eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=31174977&tree=Ash but also on m-c: https://tbpl.mozilla.org/php/getParsedLog.php?id=31217020&tree=Mozilla-Central { 04:21:16 INFO - 202502 INFO SimpleTest FINISHED 04:21:17 INFO - INFO | runtests.py | Application ran for: 0:18:16.105000 04:21:17 INFO - INFO | zombiecheck | Reading PID log: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpgs9owppidlog 04:21:17 INFO - ==> process 3648 launched child process 2356 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bd9fbd0.2087353558 "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpftmjrw\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" plugin) 04:21:17 INFO - ==> process 3648 launched child process 2516 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bebe100.1936084576 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" tab) 04:21:17 INFO - ==> process 3648 launched child process 4080 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.bd9d9d0.394117182 "c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpftmjrw\plugins\nptest.dll" -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" plugin) 04:21:17 INFO - ==> process 3648 launched child process 2840 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=3648.975b6b0.1021023147 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 3648 "\\.\pipe\gecko-crash-server-pipe.3648" tab) 04:21:17 INFO - WARNING | leakcheck | refcount logging is off, so leaks can't be detected! 04:21:17 INFO - runtests.py | Running tests: end. 04:21:17 INFO - 0 04:21:17 ERROR - Return code: 1 } However I'm not sure what is going on with B2G - the exception seems unrelated to my patch (perhaps we need to reset ash-mozharness, since it's a bit of a mess of merges at the moment). eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=31174708&tree=Ash#error0

Ed Morley [:emorley]

Reporter

Updated

•

12 years ago

Depends on: 944364

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Component: General Automation → Mozharness

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Blocks: 1048775

Summary: mozharness DesktopUnittestOutputParser.evaluate_parser() doesn't use the return code it is passed to determine success/fail → mozharness DesktopUnittestOutputParser.evaluate_parser() should record a failure if the return code was non-zero

Ed Morley [:emorley]

Reporter

Comment 10

•

11 years ago

Comment on attachment 819696 [details] [diff] [review] WIP Clearing r+ since this needs the dependant bug first, and this will get it off my to-check-in bugzilla todos list.

Attachment #819696 - Attachment description: Patch v1 → WIP

Attachment #819696 - Flags: review+

Ed Morley [:emorley]

Reporter

Comment 11

•

11 years ago

This needs bug 944364 fixed before this can land (presuming it's not bitrotted anyway). Unassigning to clear this from bugzilla todos.

Assignee: emorley → nobody

Status: ASSIGNED → NEW

Chris AtLee [:catlee]

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → INCOMPLETE