steeplechase hangs if one of the clients crashes

RESOLVED FIXED

Status

Testing
General
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: sydpolk, Assigned: sydpolk)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
Steeplechase/Negatus hangs if one of the clients fails to launch:

python /Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py --binary /home/mozilla/firefoxes/nightly/win64/firefox/firefox.exe --binary2 /home/mozilla/firefoxes/nightly/macosx/FirefoxNightly.app/Contents/MacOS/firefox --save-logs-to /Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/logs --specialpowers-path /home/mozilla/firefoxes/nightly/linux64/tests/steeplechase/specialpowers --prefs-file /home/mozilla/firefoxes/nightly/linux64/tests/steeplechase/prefs_general.js --signalling-server http://172.16.141.52:8080/ --html-manifest /home/mozilla/firefoxes/nightly/linux64/tests/steeplechase/tests/steeplechase.ini --host1 172.16.141.58:20701 --host2 172.16.141.57:20701
steeplechase INFO | Pushing app to Client 1...
steeplechase INFO | Pushing app to Client 2...
steeplechase INFO | Waiting for results...
Writing profile for Client 1...
Pushing profile to Client 1...
cmd: ['C:/Users/Mozilla/AppData/Local/Temp/tests/steeplechase/app/firefox.exe', '-no-remote', '-profile', 'C:/Users/Mozilla/AppData/Local/Temp/tests/steeplechase/profile', 'http://172.16.141.57:50283/index.html']
Writing profile for Client 2...
Pushing profile to Client 2...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.57:50283/index.html']
Traceback (most recent call last):
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 311, in <module>
    sys.exit(0 if main(sys.argv[1:]) else 1)
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 301, in main
    html_pass_count, html_fail_count = test.run()
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 187, in run
    passes, failures = result
TypeError: 'NoneType' object is not iterable
Exception in thread Client 2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanager.py", line 396, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.57:50283/index.html'] (output: 'r', retval: '65280')

Exception in thread Client 1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanager.py", line 392, in shellCheckOutput
    retval = self.shell(cmd, buf, env=env, cwd=cwd, timeout=timeout, root=root)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanagerSUT.py", line 334, in shell
    self._sendCmds([{ 'cmd': '%s %s' % (cmd, cmdline) }], outputfile, timeout)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanagerSUT.py", line 133, in _sendCmds
    raise err
DMError: Automation Error: Timeout in command exec "MOZ_CRASHREPORTER_NO_REPORT=1,XPCOM_DEBUG_BREAK=warn,DISPLAY=:0" C:/Users/Mozilla/AppData/Local/Temp/tests/steeplechase/app/firefox.exe -no-remote -profile C:/Users/Mozilla/AppData/Local/Temp/tests/steeplechase/profile http://172.16.141.57:50283/index.html

The real problem is that Client 2's firefox is busted:

macosx-negatus-01:MacOS mozilla$ cd /tmp/tests/steeplechase/app/
macosx-negatus-01:app mozilla$ ./firefox
Couldn't load XPCOM.
macosx-negatus-01:app mozilla$ 

However, steeplechase should just exit at that point.
Bleh. I guess we just need to catch DMError there and return a failure.
(Assignee)

Comment 2

4 years ago
I think that this may be invalid. Jenkins invokes a Shell script action with bash -xe, so the first error will kill the process. If I just do:

python ./steelechase/runsteeplechase.py .... --host2 172.16.141.58

The output gets captured but Jenkins hangs.

If I do the same thing like:

python ./steelechase/runsteeplechase.py .... --host2 172.16.141.58 > steeplechase.out 2>&1

Jenkins will fail the job. Of course, I don't see output in the log. Still have to figure out enough bash foo.
(Assignee)

Comment 3

4 years ago
The output file generate by the above, looks like this, however:

Traceback (most recent call last):
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 311, in <module>
    sys.exit(0 if main(sys.argv[1:]) else 1)
  File "/Users/mozilla/jenkins/workspace/nightly-win81_64-mac10_9/steeplechase/steeplechase/runsteeplechase.py", line 238, in main
    dm1 = DeviceManagerSUT(host, port)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanagerSUT.py", line 49, in __init__
    verstring = self._runCmds([{ 'cmd': 'ver' }])
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanagerSUT.py", line 151, in _runCmds
    self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mozdevice-0.40-py2.7.egg/mozdevice/devicemanagerSUT.py", line 133, in _sendCmds
    raise err
mozdevice.devicemanager.DMError: Remote Device Error: Did not get prompt after connecting: timed out

This is different from the output you get if you don't redirect.
(Assignee)

Comment 4

4 years ago
Jenkins was getting confused. I figured out how to capture this now. runsteeplechase.py is returning "1" in its status, so we are good.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INVALID
(Assignee)

Comment 5

4 years ago
Even with my changes, this is still hanging on Jenkins. This needs to be looked at more.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
What we need to do is stick an except block on this try block:
https://github.com/mozilla/steeplechase/blame/master/steeplechase/runsteeplechase.py#L99

like:
except mozdevice.DMError as e:
    output = "Error running build: " + e.msg
    result = 0, 1

That will give you some error output and a failure result to handle in the finally block.
(Assignee)

Comment 7

4 years ago
Well, that actually loses the return code back from the client. Also, DMError comes from DeviceManager. I can do that for now, but maybe we should tease out the error code from the string embedded in the DMError.
(Assignee)

Comment 8

4 years ago
Created attachment 8500564 [details] [review]
Pull request to add exception handling to RunThread.run so that steeplechase won't hang.

Add exception handling to RunThread.run.
Attachment #8500564 - Flags: review?(ted)
(Assignee)

Comment 9

4 years ago
Ted, could you review the pull request for this and integrate as appropriate? Thanks.
Flags: needinfo?(ted)
Comment on attachment 8500564 [details] [review]
Pull request to add exception handling to RunThread.run so that steeplechase won't hang.

Thanks for the patch!
Attachment #8500564 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
Assignee: nobody → spolk
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.