webrtc wpt tend to run out of fds on android
Categories
(Core :: WebRTC: Signaling, defect, P3)
Tracking
()
People
(Reporter: bwc, Unassigned)
References
Details
This is basically permaorange on android, but is masked by the ini file. Pretty often, I see this kind of thing in the logcat (on debug):
05-26 22:23:54.295 2824 2839 I Gecko : [Child 2824: Main Thread]: I/jsep [1590528234287575 (id=2147483750 url=https://web-platform.test:8443/webrtc/RTCPeerConnection-transceivers.https.html)]: stable -> have-local-offer
05-26 22:23:54.296 2824 2839 I Gecko : [Child 2824: Main Thread]: I/signaling [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x73dd996fd580
05-26 22:23:54.297 2824 2839 I Gecko : [Child 2824: Main Thread]: I/jsep [1590528234288895 (id=2147483750 url=https://web-platform.test:8443/webrtc/RTCPeerConnection-transceivers.https.html)]: stable -> have-remote-offer
05-26 22:23:54.298 2824 2839 E rtc :
05-26 22:23:54.298 2824 2839 E rtc :
05-26 22:23:54.298 2824 2839 E rtc : #
05-26 22:23:54.298 2824 2839 E rtc : # Fatal error in /builds/worker/checkouts/gecko/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287
05-26 22:23:54.298 2824 2839 E rtc : # last system error: 24
05-26 22:23:54.298 2824 2839 E rtc : # Check failed: pipe(fds) == 0
05-26 22:23:54.298 2824 2839 E rtc : #
05-26 22:23:54.298 2824 2839 E rtc : #
Sometimes, it is instead something like this:
05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: Unable to create pipe named "2814.175.970996532" in server mode error(Too many open files).: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 213
05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: Failed to create top level actor!: file /builds/worker/checkouts/gecko/ipc/glue/BackgroundImpl.cpp, line 789
05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: '!mBackgroundManager', file /builds/worker/checkouts/gecko/dom/network/UDPSocketChild.cpp, line 59
05-27 15:02:36.862 2814 7783 F MOZ_Assert: Assertion failure: false (Failed to create UDP socket), at /builds/worker/checkouts/gecko/media/mtransport/nr_socket_prsock.cpp:1514
It seems that we're running out of fds during this test. I predict that this problem goes away if we run this test in isolation, but let's see...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a4ebefdf7b77069b8b883768a22fc43232d09c8
Reporter | ||
Comment 1•5 years ago
|
||
I see lots of other webrtc wpt disabled on android too, but that do not fail, at least on this try push where RTCPeerConnection-transceivers.https.html is enabled:
I suspect that this ends up being a game of whack-a-mole, where if we disable one test, the problem just occurs in a subsequent one.
Reporter | ||
Comment 2•5 years ago
•
|
||
Yes, running RTCPeerConnection-transceivers.https.html by itself works fine. Also, running the whole suite with RTCPeerConnection-transceivers.https.html disabled just moves the failure somewhere else (in this case, RTCRtpTransceiver.https.html).
Reporter | ||
Comment 3•5 years ago
•
|
||
Reporter | ||
Comment 4•5 years ago
|
||
Hmm. Accidentally selected the backlog variant, which doesn't run the webrtc wpt. This should be what we actually want...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=300f525f340ddc1d384f3c06596800dc61fd3d1f
Reporter | ||
Comment 5•5 years ago
•
|
||
It does look like waiting a second at the beginning of RTCPeerConnection-transceivers.https.html helps with the failure rate on that test, although we see fd-related failures later on, in the same places we see failures when we disable RTCPeerConnection-transceivers.https.html entirely. Is there some way to set up an inter-test pause for a given wpt suite?
Comment 6•5 years ago
|
||
I wonder if there is a shared library that could wrap the test start process.
:jgraham, do you have advice here?
Comment 7•5 years ago
|
||
There isn't a mechanism for an inter-test pause at the moment. We have the ability to forcibly restart the browser after (or before) a specific test; maybe that would help in this case? It seems kind of fragile though.
Comment 8•5 years ago
|
||
See e.g. https://searchfox.org/mozilla-central/source/testing/web-platform/meta/fetch/corb/img-png-mislabeled-as-html-nosniff.tentative.sub.html.ini#3 for restart-after
(side note for jmaher: we should try removing all of those; they don't help CI perf and maybe the bugs are fixed).
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to James Graham [:jgraham] from comment #7)
There isn't a mechanism for an inter-test pause at the moment. We have the ability to forcibly restart the browser after (or before) a specific test; maybe that would help in this case? It seems kind of fragile though.
I will look into how many of these restarts are needed to take care of the problem.
Reporter | ||
Comment 10•5 years ago
|
||
Reporter | ||
Comment 11•5 years ago
|
||
Adding some restart-after for android seems to prevent the fd issue from becoming a problem.
Reporter | ||
Comment 12•5 years ago
|
||
Reporter | ||
Comment 13•5 years ago
|
||
Reporter | ||
Comment 14•5 years ago
|
||
Description
•