Open Bug 1641237 Opened 8 months ago Updated 8 months ago

webrtc wpt tend to run out of fds on android

Categories

(Core :: WebRTC: Signaling, defect, P3)

defect

Tracking

()

People

(Reporter: bwc, Unassigned)

References

Details

This is basically permaorange on android, but is masked by the ini file. Pretty often, I see this kind of thing in the logcat (on debug):

05-26 22:23:54.295 2824 2839 I Gecko : [Child 2824: Main Thread]: I/jsep [1590528234287575 (id=2147483750 url=https://web-platform.test:8443/webrtc/RTCPeerConnection-transceivers.https.html)]: stable -> have-local-offer
05-26 22:23:54.296 2824 2839 I Gecko : [Child 2824: Main Thread]: I/signaling [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x73dd996fd580
05-26 22:23:54.297 2824 2839 I Gecko : [Child 2824: Main Thread]: I/jsep [1590528234288895 (id=2147483750 url=https://web-platform.test:8443/webrtc/RTCPeerConnection-transceivers.https.html)]: stable -> have-remote-offer
05-26 22:23:54.298 2824 2839 E rtc :
05-26 22:23:54.298 2824 2839 E rtc :
05-26 22:23:54.298 2824 2839 E rtc : #
05-26 22:23:54.298 2824 2839 E rtc : # Fatal error in /builds/worker/checkouts/gecko/media/webrtc/trunk/webrtc/rtc_base/task_queue_libevent.cc, line 287
05-26 22:23:54.298 2824 2839 E rtc : # last system error: 24
05-26 22:23:54.298 2824 2839 E rtc : # Check failed: pipe(fds) == 0
05-26 22:23:54.298 2824 2839 E rtc : #
05-26 22:23:54.298 2824 2839 E rtc : #

Sometimes, it is instead something like this:

05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: Unable to create pipe named "2814.175.970996532" in server mode error(Too many open files).: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 213
05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: Failed to create top level actor!: file /builds/worker/checkouts/gecko/ipc/glue/BackgroundImpl.cpp, line 789
05-27 15:02:36.862 2814 7783 I Gecko : [Child 2814, Unnamed thread 7bcb432ae9a0] WARNING: '!mBackgroundManager', file /builds/worker/checkouts/gecko/dom/network/UDPSocketChild.cpp, line 59
05-27 15:02:36.862 2814 7783 F MOZ_Assert: Assertion failure: false (Failed to create UDP socket), at /builds/worker/checkouts/gecko/media/mtransport/nr_socket_prsock.cpp:1514

It seems that we're running out of fds during this test. I predict that this problem goes away if we run this test in isolation, but let's see...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a4ebefdf7b77069b8b883768a22fc43232d09c8

I see lots of other webrtc wpt disabled on android too, but that do not fail, at least on this try push where RTCPeerConnection-transceivers.https.html is enabled:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=75a7fe4c0bfe024aecbe25c2d55154ece77c7596&selectedTaskRun=VJ7QeWwwRbeAoqR-7YjTmw-0

I suspect that this ends up being a game of whack-a-mole, where if we disable one test, the problem just occurs in a subsequent one.

Yes, running RTCPeerConnection-transceivers.https.html by itself works fine. Also, running the whole suite with RTCPeerConnection-transceivers.https.html disabled just moves the failure somewhere else (in this case, RTCRtpTransceiver.https.html).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cce7b121f7a0301593726092b80a7e25ceb1f15&selectedTaskRun=YYiTNolbTeeBGPoqSlh7CQ-0

Summary: RTCPeerConnection-transceivers.https.html usually runs out of fds on android → webrtc wpt tend to run out of fds on android
Let's see if inserting a 1 second wait at the beginning of the RTCPeerConnection-transceivers.https.html test helps...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aff987752c0647becad1aceeffea0349422ae354

Hmm. Accidentally selected the backlog variant, which doesn't run the webrtc wpt. This should be what we actually want...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=300f525f340ddc1d384f3c06596800dc61fd3d1f

It does look like waiting a second at the beginning of RTCPeerConnection-transceivers.https.html helps with the failure rate on that test, although we see fd-related failures later on, in the same places we see failures when we disable RTCPeerConnection-transceivers.https.html entirely. Is there some way to set up an inter-test pause for a given wpt suite?

Flags: needinfo?(jmaher)

I wonder if there is a shared library that could wrap the test start process.

:jgraham, do you have advice here?

Flags: needinfo?(jmaher) → needinfo?(james)

There isn't a mechanism for an inter-test pause at the moment. We have the ability to forcibly restart the browser after (or before) a specific test; maybe that would help in this case? It seems kind of fragile though.

Flags: needinfo?(james)

See e.g. https://searchfox.org/mozilla-central/source/testing/web-platform/meta/fetch/corb/img-png-mislabeled-as-html-nosniff.tentative.sub.html.ini#3 for restart-after (side note for jmaher: we should try removing all of those; they don't help CI perf and maybe the bugs are fixed).

(In reply to James Graham [:jgraham] from comment #7)

There isn't a mechanism for an inter-test pause at the moment. We have the ability to forcibly restart the browser after (or before) a specific test; maybe that would help in this case? It seems kind of fragile though.

I will look into how many of these restarts are needed to take care of the problem.

Adding some restart-after for android seems to prevent the fd issue from becoming a problem.

You need to log in before you can comment on or make changes to this bug.