Closed Bug 1407203 Opened 6 years ago Closed 6 years ago

Intermittent twisted.internet.error.CannotListenError: Couldn't listen on any:8191: [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.

Categories

(Testing :: General, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Assigned: dminor)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=135893152&repo=autoland

https://queue.taskcluster.net/v1/task/O7LSCa6FRNe4Yr4bKC4Kfw/runs/0/artifacts/public/logs/live_backing.log

08:43:39     INFO -  TEST-START | dom/media/webaudio/test/test_audioBufferSourceNodeOffset.html
08:43:39     INFO -  TEST-SKIP | dom/media/webaudio/test/test_audioBufferSourceNodeOffset.html | took 0ms
08:43:39     INFO -  Running manifest: dom\media\mediasource\test\mochitest.ini
08:43:40     INFO -  Z:\task_1507623589\build\tests\bin\pk12util.exe: PKCS12 IMPORT SUCCESSFUL
08:43:40     INFO -  MochitestServer : launching [u'Z:\\task_1507623589\\build\\tests\\bin\\xpcshell.exe', '-g', 'Z:\\task_1507623589\\build\\application\\firefox', '-v', '170', '-f', 'Z:\\task_1507623589\\build\\tests\\bin\\components\\httpd.js', '-e', "const _PROFILE_PATH = 'c:\\\\users\\\\genericworker\\\\appdata\\\\local\\\\temp\\\\tmpy6rbnc.mozrunner'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', 'Z:\\task_1507623589\\build\\tests\\mochitest\\server.js']
08:43:40     INFO -  runtests.py | Server pid: 1228
08:43:40     INFO -  runtests.py | Websocket server pid: 4744
08:43:40     INFO -  runtests.py | websocket/process bridge pid: 2596
08:43:40     INFO -  runtests.py | SSL tunnel pid: 5824
08:43:40     INFO -  Traceback (most recent call last):
08:43:40     INFO -    File "websocketprocessbridge\websocketprocessbridge.py", line 102, in <module>
08:43:40     INFO -      reactor.listenTCP(int(args.port), txws.WebSocketFactory(bridgeFactory))
08:43:40     INFO -    File "Z:\task_1507623589\build\venv\lib\site-packages\twisted\internet\posixbase.py", line 419, in listenTCP
08:43:40     INFO -      p.startListening()
08:43:40     INFO -    File "Z:\task_1507623589\build\venv\lib\site-packages\twisted\internet\tcp.py", line 857, in startListening
08:43:40     INFO -      raise CannotListenError, (self.interface, self.port, le)
08:43:40     INFO -  twisted.internet.error.CannotListenError: Couldn't listen on any:8191: [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.
:catlee, in the last twisted errors have always been a Release Engineering area of responsibility- is this still the case and can you help get someone to look at this?
Flags: needinfo?(catlee)
Product: Testing → Release Engineering
Whiteboard: [stockwell infra]
Version: Version 3 → unspecified
Historically twisted was used in buildbot, but in this case this is twisted being used as part of websocketprocessbridge in mochitest:
https://dxr.mozilla.org/mozilla-central/rev/20d57b9c4183973af4af5e078dff2aec0b74f928/testing/mochitest/runtests.py#1125
https://dxr.mozilla.org/mozilla-central/rev/20d57b9c4183973af4af5e078dff2aec0b74f928/testing/tools/websocketprocessbridge/websocketprocessbridge.py#102

(I'm not really sure why we have two separate websocket servers running as part of Mochitest, TBH.)

twisted doesn't set SO_REUSEADDR on listen sockets on Windows:
https://github.com/twisted/twisted/blob/44a9a75bfa869031e108c5155c8c34bf36dd426e/src/twisted/internet/tcp.py#L960

but there's a whole discussion in a bug about why not:
https://twistedmatrix.com/trac/ticket/1151

That bug does indicate that Windows' TCP/IP stack doesn't behave the same as other platforms, so that SO_REUSEADDR isn't really necessary, which would mean that when this happens this port is actually in use by something else.
Also: it's a little worrying that that failure doesn't cause the harness to stop. It seems to just soldier on. It looks like we check that the websocketprocessbridge server is running (albeit with a hardcoded port number);
https://dxr.mozilla.org/mozilla-central/rev/20d57b9c4183973af4af5e078dff2aec0b74f928/testing/mochitest/runtests.py#1144

...but we don't check that the server process we spawned is alive or anything.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #3)
> :catlee, in the last twisted errors have always been a Release Engineering
> area of responsibility- is this still the case and can you help get someone
> to look at this?

As Ted mentioned, these errors aren't in buildbot code.
Flags: needinfo?(catlee)
Product: Release Engineering → Testing
:dminor these are failures in mochitest-media, probably some issues with tooling and maybe other issues with the product- can you help get this to resolution?
Flags: needinfo?(dminor)
Whiteboard: [stockwell infra] → [stockwell needswork]
I'll have a look. The natural thing would be to just try a different port if the first was in used, but this is a bit awkward because the port number is also hard coded in the tests: http://searchfox.org/mozilla-central/rev/40b456626e2d0409b7034768b4d9526fc7235ea4/dom/media/tests/mochitest/pc.js#1985.
Assignee: nobody → dminor
Flags: needinfo?(dminor)
We do start and stop the websocketprocessbridge process more than once in some test runs. Seems likely that the first process is not fully cleaned up by the time we try to start the second one, in which case we don't need to worry about a dynamic port scheme, just finding a nicer way to shut things down.
Supporting a dynamic port would certainly be nice as well. I have banged that drum a lot in the past, but fixing everything in Mochitest felt like an uphill battle.
Comment on attachment 8919802 [details]
Bug 1407203 - Wait for websocketProcessBridge to exit;

https://reviewboard.mozilla.org/r/190728/#review195944

thanks :dminor, this seems fairly safe- have you tested on android, linux, osx as well?  I would expect it to be fine there.
Attachment #8919802 - Flags: review?(jmaher) → review+
Try job for other platforms, seems fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f89500731bd50c67309fbc06e2b6fec91e9418d1

Marking this leave-open, just in case this fix is not sufficient.
Keywords: leave-open
Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5fe4aa159e60
Wait for websocketProcessBridge to exit; r=jmaher
Those failures can still be seen. See:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=140051043&lineNumber=1768

And as noted on bug 1410883 the added call to `wait()` here should be a no-op because it gets done anyway by `ProcessHandler.wait()`, which gets called by `ProcessHandler.kill()` automatically.
Closing this as it has not occurred in 2 months.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.