Closed Bug 1084614 Opened 6 years ago Closed 6 years ago

ssltunnel intermittently fails to bind on Android 4.0

Categories

(Testing :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(firefox39 fixed)

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

(Whiteboard: [ateam_harness_work])

Attachments

(2 files, 1 obsolete file)

Most (all?) mochitest and robocop test logs for Android 4.0 show something like:

12:46:52     INFO -  runtests.py | Server pid: 20201
12:46:52     INFO -  runtests.py | Websocket server pid: 20203
12:46:52     INFO -  runtests.py | SSL tunnel pid: 20205
12:46:52     INFO -  failed to bind socket
12:46:52     INFO -  Shutting down...
12:46:53     INFO -  runtests.py | Running tests: start.

"failed to bind socket / Shutting down..." does not appear in Android 2.3 logs.

I don't know that this is causing any test failures, but it seems wrong.
On Android 4.0, ssltunnel is run on a "foopy" which typically serves multiple pandaboards. Process listings from the foopy typically show 8 or more instances of ssltunnel running. I wonder if we are running into port conflicts.


http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1413468522/mozilla-central_panda_android_test-mochitest-1-bm101-tests1-panda-build632.txt.gz

08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: calling [ps xU cltbld]
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:   PID TTY      STAT   TIME COMMAND
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4040 ?        S      4:39 /builds/panda-0173/test/build/venv/bin/python /builds/panda-0173/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0173/test/build/tests/mochitest -l /builds/panda-0173/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4042 ?        Sl     0:08 /builds/panda-0173/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelm9NB5D.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4106 ?        S      4:37 /builds/panda-0175/test/build/venv/bin/python /builds/panda-0175/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0175/test/build/tests/mochitest -l /builds/panda-0175/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4108 ?        Sl     0:07 /builds/panda-0175/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel0km9nn.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4253 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0179 --pidfile=/builds/panda-0179/twistd.pid --python=/builds/panda-0179/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  4548 ?        S      0:54 /builds/panda-0173/test/build/venv/bin/python /builds/panda-0173/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0173/test/build/tests/mochitest -l /builds/panda-0173/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  5945 ?        S     13:18 /builds/panda-0171/test/build/venv/bin/python /builds/panda-0171/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0171/test/build/tests/mochitest -l /builds/panda-0171/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG:  5947 ?        Sl     0:09 /builds/panda-0171/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel0nP5lt.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 10279 ?        S      4:49 /builds/panda-0177/test/build/venv/bin/python /builds/panda-0177/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0177/test/build/tests/mochitest -l /builds/panda-0177/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 10281 ?        Sl     0:05 /builds/panda-0177/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelKrCucv.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 13382 ?        S      5:19 /builds/panda-0171/test/build/venv/bin/python /builds/panda-0171/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0171/test/build/tests/mochitest -l /builds/panda-0171/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 13956 ?        S      1:54 /builds/panda-0171/test/build/venv/bin/python /builds/panda-0171/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0171/test/build/tests/mochitest -l /builds/panda-0171/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 14670 ?        S      3:54 /builds/panda-0179/test/build/venv/bin/python /builds/panda-0179/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0179/test/build/tests/mochitest -l /builds/panda-0179/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 14672 ?        Sl     0:05 /builds/panda-0179/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel8v5sF8.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 15016 ?        S      5:51 /builds/panda-0180/test/build/venv/bin/python /builds/panda-0180/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0180/test/build/tests/mochitest -l /builds/panda-0180/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 15018 ?        Sl     0:09 /builds/panda-0180/test/build/hostutils/bin/ssltunnel /tmp/ssltunneltprySu.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 17989 ?        S      0:32 /builds/panda-0176/test/build/venv/bin/python /builds/panda-0176/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0176/test/build/tests/mochitest -l /builds/panda-0176/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 17991 ?        Sl     0:00 /builds/panda-0176/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelYI55ST.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 21711 ?        S      0:34 /builds/panda-0171/test/build/venv/bin/python /builds/panda-0171/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0171/test/build/tests/mochitest -l /builds/panda-0171/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 23684 ?        Sl     0:04 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0178 --pidfile=/builds/panda-0178/twistd.pid --python=/builds/panda-0178/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 23810 ?        S      0:17 /tools/buildbot/bin/python scripts/scripts/android_panda.py --cfg android/android_panda_releng.py --jsreftest-suite jsreftest-1 --blob-upload-branch mozilla-inbound --download-symbols true
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 24799 ?        S      0:01 python -u /builds/sut_tools/logcat.py 10.26.129.73 /builds/panda-0178/test/build/blobber_upload_dir/logcat.log -v time
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 24987 ?        S      0:00 python -u /builds/panda-0178/test/build/tests/reftest/remotereftest.py --deviceIP=10.26.129.73 --xre-path=../hostutils/xre --utility-path=../hostutils/bin --app=org.mozilla.fennec --ignore-window-size --bootstrap --extra-profile-file=jsreftest/tests/user.js jsreftest/tests/jstests.list --http-port=30178 --ssl-port=31178 --symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-debug/1413466183/fennec-36.0a1.en-US.android-arm.crashreporter-symbols.zip --total-chunks=3 --this-chunk=1
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 24997 ?        Sl     1:57 ../hostutils/bin/xpcshell -g /builds/panda-0178/test/build/hostutils/xre -v 170 -f /builds/panda-0178/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpop4Qn5';const _SERVER_PORT = '30178'; const _SERVER_ADDR ='10.26.129.17'; -f /builds/panda-0178/test/build/tests/reftest/server.js
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 25631 ?        S      0:32 /builds/panda-0180/test/build/venv/bin/python /builds/panda-0180/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0180/test/build/tests/mochitest -l /builds/panda-0180/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 25731 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0174 --pidfile=/builds/panda-0174/twistd.pid --python=/builds/panda-0174/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 25736 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0177 --pidfile=/builds/panda-0177/twistd.pid --python=/builds/panda-0177/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 26087 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0171 --pidfile=/builds/panda-0171/twistd.pid --python=/builds/panda-0171/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 26090 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0170 --pidfile=/builds/panda-0170/twistd.pid --python=/builds/panda-0170/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 26137 ?        S      0:11 /tools/buildbot/bin/python scripts/scripts/android_panda.py --cfg android/android_panda_releng.py --crashtest-suite crashtest --blob-upload-branch mozilla-central --download-symbols ondemand
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27685 ?        S      0:11 /tools/buildbot/bin/python scripts/scripts/android_panda.py --cfg android/android_panda_releng.py --reftest-suite reftest-6 --blob-upload-branch try --download-symbols ondemand
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27854 ?        Ss     0:00 /bin/sh -c /builds/watch_devices.sh > /dev/null 2>&1
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27856 ?        S      0:00 /bin/bash -eu /builds/watch_devices.sh
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27859 ?        S      0:00 tee -a /builds/watcher.log
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27869 ?        Sl     0:03 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0173 --pidfile=/builds/panda-0173/twistd.pid --python=/builds/panda-0173/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 27938 ?        S      0:00 /bin/bash -eu /builds/watch_devices.sh panda-0181
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28081 ?        S      0:00 /bin/bash /builds/tools/buildfarm/mobile/manage_buildslave.sh gettac panda-0181
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28093 ?        S      0:00 python /builds/tools/buildfarm/mobile/../utils/retry.py --stderr-regexp ERROR 404: Not Found --fail-if-match wget -q -O/builds/panda-0181/buildbot.tac.new http://slavealloc.pvt.build.mozilla.org/gettac/panda-0181
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28129 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0180 --pidfile=/builds/panda-0180/twistd.pid --python=/builds/panda-0180/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28132 ?        Sl     0:01 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0176 --pidfile=/builds/panda-0176/twistd.pid --python=/builds/panda-0176/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28141 ?        S      0:00 python -u /builds/sut_tools/logcat.py 10.26.129.65 /builds/panda-0170/test/build/blobber_upload_dir/logcat.log -v time
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28160 ?        S      0:00 python -u /builds/panda-0170/test/build/tests/reftest/remotereftest.py --deviceIP=10.26.129.65 --xre-path=../hostutils/xre --utility-path=../hostutils/bin --app=org.mozilla.fennec --ignore-window-size --bootstrap --http-port=30170 --ssl-port=31170 --symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1413468522/fennec-36.0a1.en-US.android-arm.crashreporter-symbols.zip reftest/tests/testing/crashtest/crashtests.list
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28170 ?        Sl     0:13 ../hostutils/bin/xpcshell -g /builds/panda-0170/test/build/hostutils/xre -v 170 -f /builds/panda-0170/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpyPA0RS';const _SERVER_PORT = '30170'; const _SERVER_ADDR ='10.26.129.17'; -f /builds/panda-0170/test/build/tests/reftest/server.js
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28423 ?        S      0:00 python -u /builds/sut_tools/logcat.py 10.26.129.74 /builds/panda-0179/test/build/blobber_upload_dir/logcat.log -v time
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28441 ?        R      0:06 /tools/buildbot/bin/python scripts/scripts/android_panda.py --cfg android/android_panda_releng.py --mochitest-suite mochitest-1 --blob-upload-branch mozilla-central --download-symbols ondemand
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28449 ?        S      0:00 python -u /builds/panda-0179/test/build/tests/reftest/remotereftest.py --deviceIP=10.26.129.74 --xre-path=../hostutils/xre --utility-path=../hostutils/bin --app=org.mozilla.fennec --ignore-window-size --bootstrap --http-port=30179 --ssl-port=31179 --symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/nsilva@mozilla.com-efa0352ac458/try-android/fennec-36.0a1.en-US.android-arm.crashreporter-symbols.zip reftest/tests/layout/reftests/reftest.list --total-chunks=8 --this-chunk=6
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28460 ?        Sl     0:03 ../hostutils/bin/xpcshell -g /builds/panda-0179/test/build/hostutils/xre -v 170 -f /builds/panda-0179/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpFl0F1G';const _SERVER_PORT = '30179'; const _SERVER_ADDR ='10.26.129.17'; -f /builds/panda-0179/test/build/tests/reftest/server.js
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28839 ?        R      0:00 /tools/buildbot/bin/python /builds/sut_tools/verify.py
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28844 ?        R      0:00 ps xU cltbld
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28909 ?        S      2:13 /builds/panda-0178/test/build/venv/bin/python /builds/panda-0178/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0178/test/build/tests/mochitest -l /builds/panda-0178/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 28911 ?        Sl     0:04 /builds/panda-0178/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel24ePhU.cfg
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 29375 ?        Sl     0:03 /tools/buildbot/bin/python2.7 /tools/buildbot/bin/twistd --no_save --rundir=/builds/panda-0175 --pidfile=/builds/panda-0175/twistd.pid --python=/builds/panda-0175/buildbot.tac
08:21:15     INFO -  10/16/2014 08:21:15: DEBUG: 30534 ?        S      0:48 /builds/panda-0180/test/build/venv/bin/python /builds/panda-0180/test/build/tests/mochitest/pywebsocket_wrapper.py -p 9988 -w /builds/panda-0180/test/build/tests/mochitest -l /builds/panda-0180/test/build/tests/mochitest/websock.log --log-level=debug --allow-handlers-outside-root-dir
Blocks: 1080566
Probably the main reason that this isn't causing test failures is that https support in mochitests is still relatively new.

The port conflict thing seems less likely when you consider that the exact same process is used to select the HTTP server port.  Maybe landing a patch that runs netstat -p alongside the ps command might help inform that a little better.
It looks like ssltunnel is failing at http://hg.mozilla.org/mozilla-central/annotate/2209a8de5d23/testing/mochitest/ssltunnel/ssltunnel.cpp#l964. I *assume* it is trying to bind to the port passed to runtestsremote.py. For the failed case:

12:15:12     INFO - Running command: ['python', '-u', '/builds/panda-0329/test/build/tests/mochitest/runtestsremote.py', ... '--http-port=30329', '--ssl-port=31329', ...

which should be set by http://hg.mozilla.org/build/mozharness/annotate/08cd66cbfacd/scripts/android_panda.py#l436, which looks OK to me -- on device panda-XXX, it uses --ssl-port=31XXX.

In the failed log, I do not see any reason to believe 31329 is already in use.
As an aside, we should probably stick the port number in the error message here:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/ssltunnel/ssltunnel.cpp#963
Assignee: nobody → gbrown
Attachment #8568611 - Flags: review?(ted)
:kmoir -- Can you see if there is something running on the foopy for panda-0329 that is using port 31329? I'm hoping "netstat -p" from the root account will show something.
Flags: needinfo?(kmoir)
[root@foopy58.p3.releng.scl3.mozilla.com ~]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 0.0.0.0:30322               0.0.0.0:*                   LISTEN      30436/xpcshell      
tcp        0      0 0.0.0.0:30324               0.0.0.0:*                   LISTEN      1418/xpcshell       
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      4982/sshd           
tcp        0      0 0.0.0.0:31320               0.0.0.0:*                   LISTEN      1958/ssltunnel      
tcp        0      0 0.0.0.0:25                  0.0.0.0:*                   LISTEN      5088/master         
tcp        0      0 0.0.0.0:31322               0.0.0.0:*                   LISTEN      30440/ssltunnel     
tcp        0      0 0.0.0.0:31324               0.0.0.0:*                   LISTEN      1430/ssltunnel      
tcp        0      0 0.0.0.0:30332               0.0.0.0:*                   LISTEN      30507/../hostutils/ 
tcp        0      0 0.0.0.0:31328               0.0.0.0:*                   LISTEN      26060/ssltunnel     
tcp        0      0 0.0.0.0:31329               0.0.0.0:*                   LISTEN      27032/ssltunnel     

[root@foopy58.p3.releng.scl3.mozilla.com ~]# ps -ef | grep 27032
root      2159 32678  0 09:28 pts/0    00:00:00 grep 27032
cltbld   27032     1  0 Feb22 ?        00:00:01 /builds/panda-0329/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelEs06qr.cfg

perhaps the ssltunnel session process isn't getting killed between test suites?
Flags: needinfo?(kmoir)
Yes, I think that's it exactly. 

Now that I know what I'm looking for, I see similar, long-running ssltunnel's in the process list for the other failure I found, http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-f96e618a84af/try-android-api-11/try_panda_android_test-robocop-4-bm102-tests1-panda-build124.txt.gz:

18:46:45     INFO -  cltbld   18068     1  0 Feb15 ?        00:00:05 /builds/panda-0499/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel5_RTRH.cfg
18:46:45     INFO -  cltbld   18139     1  0 Feb15 ?        00:00:05 /builds/panda-0503/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelAQ2Pnt.cfg

We need to make sure ssltunnel is always killed when a test is done with it.
Summary: ssltunnel seems to fail on Android 4.0 → ssltunnel intermittently fails to bind on Android 4.0
Attachment #8568611 - Flags: review?(ted) → review+
See Also: → 1124695
One possibility that I am concerned about is when android_panda.py times out, like this:

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-11/1425579369/mozilla-inbound_panda_android_test-mochitest-1-bm89-tests1-panda-build162.txt.gz

command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_panda.py', '--cfg', 'android/android_panda_releng.py', '--mochitest-suite', 'mochitest-1', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2460.474057
========= Finished '/tools/buildbot/bin/python scripts/scripts/android_panda.py ...' failed (results: 2, elapsed: 41 mins, 0 secs) (at 2015-03-05 11:50:34.291248) =========

Did android_panda.py start runtestsremote.py? Did runtestsremote.py start ssltunnel? Were all processes killed, or were some orphaned?


:kmoir -- Can you check the foopy for panda-0381 and see if the foopy has any running processes that were started between 2015-03-05 11:10 and 2015-03-05 11:50?
Flags: needinfo?(kmoir)
No, not from today but there are some orphaned processes from earlier before the patch was applied.  Perhaps they should be killed on all the foopies since they seem to be there from before.


[root@foopy62.p4.releng.scl3.mozilla.com ~]# ps -ef | grep ssltunnel
cltbld    1059   789  0 18:01 ?        00:00:00 /builds/panda-0627/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelbJa31a.cfg
cltbld    3844  3802  0 18:11 ?        00:00:00 [ssltunnel] <defunct>
root      4610  4554  0 18:18 pts/0    00:00:00 grep ssltunnel
cltbld   12500     1  0 Feb23 ?        00:00:05 /builds/panda-0375/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelV8PU5b.cfg
cltbld   24954     1  0 Feb15 ?        00:00:10 /builds/panda-0380/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelBK0B1x.cfg
cltbld   32749     1  0 Feb19 ?        00:00:08 /builds/panda-0374/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelQcmx_x.cfg
Flags: needinfo?(kmoir)
Thanks Kim. Yes, the ssltunnel orphans should be cleaned up, but I still don't understand how they are being orphaned and assume we'll just get more.
Whiteboard: [ateam_harness_work]
It looks to me like the harness normally kills ssltunnel just fine. I'm not sure of exception cases, like when buildbot kills a mozharness script that exceeds its timeout -- but Comment 14 suggests that ssltunnel is cleaned up even in those cases.

Still, ssltunnel orphans are happening somehow. I suppose it is always a risk.

I am tempted to fix this by having the harness kill orphaned ssltunnels: before runtests.py starts ssltunnel, check for and kill running ssltunnel instances with ppid=1, or something like that. (It would be nice to only kill the ssltunnel using the needed port, but that would require something like parsing netstat output or parsing the temporary config files...seems error-prone.)

Any better suggestions?
We do something similar when running them on the emulators.

http://hg.mozilla.org/build/mozharness/file/bc2b3ab7c9c6/scripts/android_emulator_unittest.py#l572

I agree that killing the process seems like the right approach
This patch updates the android_panda.py mozharness script to kill orphan ssltunnel processes before starting tests.

At first I was going to make this change in run_tests.py. That seemed right because that's where we start ssltunnel. On the other hand, that sort of process checking and cleanup seemed out of scope for run_tests.py and this problem seems to be specific to the panda jobs (because of the lifetime of the foopy, I suppose) -- why clutter run_tests.py and run this check on all platforms if it only happens on pandas? So I updated the android_panda.py script instead; does that seem reasonable?

Here it is in action:

http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-2cc28ee1bb03/try-android-api-11/try_panda_android_test-crashtest-bm100-tests1-panda-build282.txt.gz

06:56:42     INFO - Running main action method: run_test
06:56:42     INFO - Running command: ['ps', '-f', '-C', 'ssltunnel', '--no-headers']
06:56:42     INFO - Copy/paste: ps -f -C ssltunnel --no-headers
06:56:42     INFO -  cltbld    3544     1  0 Mar05 ?        00:00:05 /builds/panda-0611/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel6w5cry.cfg
06:56:42     INFO -  cltbld    8319     1  0 Feb15 ?        00:00:20 /builds/panda-0102/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelVZLKf7.cfg
06:56:42     INFO -  cltbld   13811     1  0 Feb22 ?        00:00:13 /builds/panda-0096/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelbMTNbB.cfg
06:56:42     INFO -  cltbld   15367     1  0 Feb28 ?        00:00:11 /builds/panda-0104/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelVSzmEr.cfg
06:56:42     INFO -  cltbld   31981     1  0 Mar09 ?        00:00:04 /builds/panda-0106/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel2cfbEH.cfg
06:56:42     INFO -  cltbld   32105     1  0 Mar09 ?        00:00:04 /builds/panda-0103/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelVSAw5K.cfg
06:56:42     INFO - Return code: 0
06:56:42     INFO - Running command: ['ps', '-o', 'pid,ppid', '-C', 'ssltunnel', '--no-headers']
06:56:42     INFO - Copy/paste: ps -o pid,ppid -C ssltunnel --no-headers
06:56:42     INFO - killing orphan with pid 3544
06:56:42     INFO - killing orphan with pid 8319
06:56:42     INFO - killing orphan with pid 13811
06:56:42     INFO - killing orphan with pid 15367
06:56:42     INFO - killing orphan with pid 31981
06:56:42     INFO - killing orphan with pid 32105
06:56:42     INFO - Return code: 0
06:56:42     INFO - mkdir: /builds/panda-0104/test/build/blobber_upload_dir
Attachment #8578051 - Flags: review?(kmoir)
I meant to mention: I noticed some un-used code just where I wanted to update; I deleted it.
Comment on attachment 8578051 [details] [diff] [review]
kill orphan ssltunnel before starting test on panda

It looks like this reports an error when there are no ssltunnel processes running. That's annoying.

http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-2cc28ee1bb03/try-android-api-11/try_panda_android_test-crashtest-bm100-tests1-panda-build283.txt.gz

08:35:06     INFO - Running command: ['ps', '-f', '-C', 'ssltunnel', '--no-headers']
08:35:06     INFO - Copy/paste: ps -f -C ssltunnel --no-headers
08:35:06    ERROR - Return code: 1
08:35:06     INFO - Running command: ['ps', '-o', 'pid,ppid', '-C', 'ssltunnel', '--no-headers']
08:35:06     INFO - Copy/paste: ps -o pid,ppid -C ssltunnel --no-headers
08:35:06    ERROR - Return code: 1
Attachment #8578051 - Flags: review?(kmoir)
See Also: → 915212
See Also: → 919920
Bug 915212 and bug 919920 convinced me that there is at least a chance that a similar problem sometimes affects mochitest and reftest jobs on other platforms -- so I changed back to implementing this in the mochitest and reftest harnesses.

This patch makes two passes through ps output: The first to log any processes containing 'ssltunnel' or 'xpcshell' on the command line; the second to kill processes named 'ssltunnel' or 'xpcshell' with ppid=1.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f5cd88ed407 shows this in action. For most platforms, logs just show:

18:02:45     INFO -  Checking for orphan ssltunnel processes...
18:02:45     INFO -  Checking for orphan xpcshell processes...
18:02:46     INFO -  SUITE-START | Running 1225 tests

On Android 4.0, logs often find matching ssltunnel orphans:

18:08:03     INFO -  Checking for orphan ssltunnel processes...
18:08:03     INFO -  cltbld    5948     1  0 Mar05 ?        00:00:06 /builds/panda-0485/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelKhCPKH.cfg
18:08:03     INFO -  cltbld   14154     1  0 Feb23 ?        00:00:15 /builds/panda-0494/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel59p8io.cfg
18:08:03     INFO -  cltbld   14510     1  0 Feb12 ?        00:00:18 /builds/panda-0488/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelQbAO3V.cfg
18:08:03     INFO -  cltbld   15468     1  0 Feb16 ?        00:00:15 /builds/panda-0487/test/build/hostutils/bin/ssltunnel /tmp/ssltunnelwgVbzn.cfg
18:08:03     INFO -  cltbld   16292 16250  0 18:05 ?        00:00:00 [ssltunnel] <defunct>
18:08:03     INFO -  cltbld   26349     1  0 Mar11 ?        00:00:04 /builds/panda-0486/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel1QCSd4.cfg
18:08:03     INFO -  cltbld   29292     1  0 Mar05 ?        00:00:06 /builds/panda-0491/test/build/hostutils/bin/ssltunnel /tmp/ssltunnel9NXWNn.cfg
18:08:03     INFO -  killing ssltunnel orphan with pid 5948
18:08:03     INFO -  killing ssltunnel orphan with pid 14154
18:08:03     INFO -  killing ssltunnel orphan with pid 14510
18:08:03     INFO -  killing ssltunnel orphan with pid 15468
18:08:03     INFO -  killing ssltunnel orphan with pid 26349
18:08:03     INFO -  killing ssltunnel orphan with pid 29292
18:08:03     INFO -  Checking for orphan xpcshell processes...
18:08:03     INFO -  cltbld   12358 12348  1 17:39 ?        00:00:19 ../hostutils/bin/xpcshell -g /builds/panda-0488/test/build/hostutils/xre -v 170 -f /builds/panda-0488/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpfR6Dl3';const _SERVER_PORT = '30488'; const _SERVER_ADDR ='10.26.132.21'; -f /builds/panda-0488/test/build/tests/reftest/server.js
18:08:03     INFO -  cltbld   15375 15365  3 17:56 ?        00:00:21 ../hostutils/bin/xpcshell -g /builds/panda-0483/test/build/hostutils/xre -v 170 -f /builds/panda-0483/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpdNS9JU';const _SERVER_PORT = '30483'; const _SERVER_ADDR ='10.26.132.21'; -f /builds/panda-0483/test/build/tests/reftest/server.js
18:08:03     INFO -  cltbld   16286 16250  3 18:05 ?        00:00:04 /builds/panda-0491/test/build/hostutils/bin/xpcshell -g /builds/panda-0491/test/build/hostutils/xre -v 170 -f /builds/panda-0491/test/build/hostutils/bin/components/httpd.js -e const _PROFILE_PATH = '/tmp/tmpSL7q9X.mozrunner'; const _SERVER_PORT = '30491'; const _SERVER_ADDR = '10.26.132.21'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false; -f /builds/panda-0491/test/build/tests/mochitest/server.js
18:08:03     INFO -  SUITE-START | Running 467 tests
Attachment #8578051 - Attachment is obsolete: true
Attachment #8579086 - Flags: review?(kmoir)
Attachment #8579086 - Flags: review?(kmoir) → review+
Depends on: 1145364
https://hg.mozilla.org/mozilla-central/rev/abddc93fd8c4
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Blocks: 1152994
You need to log in before you can comment on or make changes to this bug.