Closed Bug 963524 Opened 11 years ago Closed 10 years ago

Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected

Categories

(Core :: WebRTC, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox32 --- fixed
firefox33 --- fixed
firefox34 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- fixed
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: cbook, Assigned: bwc)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 7 obsolete files)

209.75 KB, application/x-gzip
Details
3.57 KB, patch
bwc
: review+
Details | Diff | Splinter Review
b2g_emulator_vm mozilla-inbound debug test mochitest-debug-6 on 2014-01-24 03:17:58 PST for push 691a1d12372a slave: tst-linux64-spot-439 https://tbpl.mozilla.org/php/getParsedLog.php?id=33520855&tree=Mozilla-Inbound Summary 7949 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected
Summary: Intermittent TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected
Assignee: nobody → rjesup
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html | pc_local: ICE failed to switch to connected
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html https://tbpl.mozilla.org/php/getParsed| pc_local: ICE failed to switch to connected
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html https://tbpl.mozilla.org/php/getParsed| pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected
Not sure why I took it...
Assignee: rjesup → docfaraday
Appears to be some sort of E10S/ICE interaction. Byron: ping me on IRC for details on how to track this down
So, looking at these logs, I am seeing lots of multi-second gaps where I would not expect: 21:09:12 INFO - 05-02 03:37:03.972 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 6 : fsm_get_fcb_by_call_id_and_type : fcb= 0x451da060 21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] fsmdef.c:3967: SIPCC-FSM: fsmdef_ev_setpeerconnection: Setting peerconnection handle for (1/6) to fb822595c9afe888 21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 1/6, fsm_change_state: DEF: IDLE -> STABLE 21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 6 : DCSM_READY --> DCSM_READY 21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/6, sm_process_event: DEF :(IDLE:SETPEERCONNECTION ) 21:09:12 INFO - 05-02 03:37:09.652 760 760 I PRLog : 1074345096[40222080]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:497: PeerConnectionImpl: PeerConnectionImpl constructor for 21:09:12 INFO - 05-02 03:37:09.672 760 760 I PRLog : 1074345096[40222080]: [main|CC_SIPCCCall] CC_SIPCCCall.cpp:36: Creating CC_SIPCCCall 65543 21:09:12 INFO - 05-02 03:37:09.702 760 831 I PRLog : 55440[4430d900]: [CCAPP Task|def] ccapi.c:1164: SIPCC-CC_API: 1/7, cc_int_feature2: UI -> GSM: SETPEERCONNECTION 21:09:12 INFO - 05-02 03:37:09.712 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 22 :(DCSM_READY:SETPEERCONNECTION ) 2 21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 1/7, fsm_change_state: DEF: IDLE -> STABLE 21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 7 : DCSM_READY --> DCSM_READY 21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/7, sm_process_event: DEF :(IDLE:SETPEERCONNECTION ) 21:09:12 INFO - 05-02 03:37:14.313 760 831 I PRLog : 55440[4430d900]: [CCAPP Task|def] ccapi.c:1164: SIPCC-CC_API: 1/6, cc_int_feature2: UI -> GSM: ADDSTREAM 21:09:12 INFO - 05-02 03:37:14.323 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 23 :(DCSM_READY:ADDSTREAM ) 21:09:12 INFO - 05-02 03:37:14.323 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] sm.c:46: SIPCC-FSM: sm_process_event: DEF 6 : 0x40834f39x: sm entry: (STABLE:ADDSTREAM) 2 21:09:12 INFO - 05-02 03:37:28.073 760 871 I PRLog : 56032[43b22200]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed 21:09:12 INFO - 05-02 03:37:28.812 760 916 I PRLog : 56576[45126b00]: [ProcessThread|WebrtcAudioSessionConduit] AudioConduit.cpp:861: SendRTCPPacket RTCP Packet Send Failed 21:09:12 INFO - 05-02 03:37:29.652 760 871 I PRLog : 56032[43b22200]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed 21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: ../../../../gecko/media/mtransport/transportlayerice.cpp:134: Flow[fb822595c9afe888:1,rtp(none)]; Layer[ice]: state 1->5 21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: Flow[fb822595c9afe888:1,rtp(none)]; Layer[dtls]: Lower lower experienced an error 21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: ../../../../gecko/media/mtransport/transportlayerdtls.cpp:625: Flow[fb822595c9afe888:1,rtp(none)]; Layer[dtls]: state 1->5 2
I'll also note that the log levels are frustratingly inconsistent from run to run; some stuff seems to be logging at INFO, others at WARNING.
Finally found a log with something potentially useful in it: 05:20:19 INFO - -1143539264[7feeae715140]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:274: PeerConnectionObserverDispatch processing mCallState = 26 (SETLOCALDESCSUCCESS), mFsmState = 16 (STABLE) 05:20:19 INFO - (ice/INFO) ICE-PEER(PC:1398946818716414 (id=409 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html):default)/CAND-PAIR(Udjs): triggered check on Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host) 05:20:20 INFO - -1847404800[7fee9a9eb2c0]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed 05:20:20 INFO - -1847404800[7fee9a9eb2c0]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed 05:20:20 INFO - (ice/INFO) ICE-PEER(PC:1398946818716414 (id=409 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html):default)/CAND-PAIR(Udjs): triggered check on Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host) 05:20:20 INFO - (stun/INFO) STUN-CLIENT(Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host)): Too many retransmit attempts 05:20:20 INFO - (stun/INFO) Responding with error 400: ICE Failure It seems that a sufficient network delay can result in a flurry of check requests being delivered to us all at once, causing us to emit triggered checks that cause our retransmit counter to reach its limit, prompting ICE to fail. This is probably wrong.
Depends on: 1006809
I should point out that I'm pretty sure that 1006809 is only one of the reasons for this intermittent failure. I'm seeing evidence for extremely bad latency in these logs, and until this gets better, the webrtc tests are going to be unreliable.
OS's: 35 (77.8%): b2g_emulator_vm 7 (15.6%): Ubuntu VM 12.04 x64 3 (6.7%): Ubuntu VM 12.04 For the b2g emulator I'm totally with you regarding the execution speed. I'm just wondering if the Linux builds are running on overloaded VM's or why else it sometimes happens on Linux.
I have a feeling that there are many root causes for failures in this test. We just need to knock them down one by one.
(In reply to Byron Campen [:bwc] from comment #55) > I have a feeling that there are many root causes for failures in this test. > We just need to knock them down one by one. So, the first thing that sticks out at me is: > STUN-CLIENT(srflx(IP4:10.132.56.142:58803/UDP|stun.services.mozilla.com:3478)): Timed out Yikes! We should not be attempting to reach out to the cloud during mochi tests! Lacking a locally-deployed STUN server (which probably isn't happening soon), I think we might be able to head off these errors by adding "iceServers:[]" to the config_pc1 and config_pc2 objects in pc.js (or take other steps to deactivate the default stun server -- technically, [] should cause the constructor to throw, per spec). It might not clear up all of the problems that are being filed under this bug, but it should make it much easier to see what's going on.
Priority: -- → P2
Target Milestone: --- → mozilla33
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected → Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected
The failure in comment 77 is due to port reuse. I also reproduced the port-reuse with better logging on try.
^
Flags: needinfo?(jduell.mcbugs)
(The reused port in comment 77 is 47841, and in the attachment is 38622)
This patch adds a temporary verification step to our peer connection tests to search for ports which appear twice in SDP offer, answer or in both. Try run: https://tbpl.mozilla.org/?tree=Try&rev=5d19db42b9aa
Attachment #8443857 - Flags: review?(docfaraday)
I don't know enough about SDP to know why/how you could get dupe ports in the offer.
Flags: needinfo?(jduell.mcbugs)
SDP just exposes the underlying problem. The same SDP code works fine on all other platforms. Only with IPC and e10s tests we see port numbers are getting used twice in the SDP. So our current assumption is that the client side (or whatever you call the side which initiates the IPC calls) makes multiple calls for request UDP sockets and sometimes the server side seems to return one port number/socket multiple times. @Jason: we were told you are the expert on IPC code. Does the above assumption feasible to you? Can you please help us how to debug and solve this problem?
Flags: needinfo?(jduell.mcbugs)
Comment on attachment 8443857 [details] [diff] [review] bug_963524_detect_double_ports.patch Review of attachment 8443857 [details] [diff] [review]: ----------------------------------------------------------------- Primarily concerned with the ICE TCP question, but have some other minor concerns. ::: dom/media/tests/mochitest/templates.js @@ +190,5 @@ > [ > + 'PC_REMOTE_CHECK_FOR_DUPLICATED_PORTS', > + // TODO: remove this once bug 963524 is fixed > + function (test) { > + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g; I wonder if it would be sensible to keep this around long term, actually. Also, this might end up having false positives with ICE TCP tuples clashing with UDP tuples (this will land soon with any luck). We may need to check candidates with "tcptype" separately, which probably also means a little refactoring to have a common duplicate-checking function. @@ +194,5 @@ > + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g; > + var offerIps = []; > + var answerIps = []; > + var resArray; > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) { What does this do if _local_offer or sdp are undefined? Similar question for answer stuff. @@ +196,5 @@ > + var answerIps = []; > + var resArray; > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) { > + var tuple = resArray[1] + ":" + resArray[2]; > + if (offerIps.indexOf(tuple) != -1) { Maybe we want !== here? Also in the other != -1 comparisons.
Attachment #8443857 - Flags: review?(docfaraday) → review-
(In reply to Jason Duell (:jduell) from comment #83) > I don't know enough about SDP to know why/how you could get dupe ports in > the offer. This call (when passed a port of 0) is occasionally picking an already-bound port, leading to the duplicates we see later on in the SDP. http://dxr.mozilla.org/mozilla-central/source/media/mtransport/nr_socket_prsock.cpp#1037 This is either a flaw in udp-socket-child, or a flaw in the way we're using it.
Addressed Byron's concerns regarding TCP candidates.
Attachment #8443857 - Attachment is obsolete: true
Attachment #8444872 - Flags: review?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #87) > I wonder if it would be sensible to keep this around long term, actually. That depends on what we think how bad our implementations are (going to be) :-) But sure, does not hurt to have more safety belts. > Also, this might end up having false positives with ICE TCP tuples clashing > with UDP tuples (this will land soon with any luck). We may need to check > candidates with "tcptype" separately, which probably also means a little > refactoring to have a common duplicate-checking function. I added support for TCP candidates in the new patch. > @@ +194,5 @@ > > + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g; > > + var offerIps = []; > > + var answerIps = []; > > + var resArray; > > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) { > > What does this do if _local_offer or sdp are undefined? Similar question for > answer stuff. Well, our tests only proceed from the onSuccess() callbacks from createOffer() and createAnswer(). So the assumption here is that both exist. > @@ +196,5 @@ > > + var answerIps = []; > > + var resArray; > > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) { > > + var tuple = resArray[1] + ":" + resArray[2]; > > + if (offerIps.indexOf(tuple) != -1) { > > Maybe we want !== here? Also in the other != -1 comparisons. indexOf() should always return a number, so == should be safe if I'm not wrong. But I guess it never hurts to use === instead (included in the new patch) :-)
Comment on attachment 8444872 [details] [diff] [review] bug_963524_detect_double_ports.patch Review of attachment 8444872 [details] [diff] [review]: ----------------------------------------------------------------- An r- that is half my fault, sorry about that. ::: dom/media/tests/mochitest/templates.js @@ +197,5 @@ > + var tempArray = []; > + var resultArray = []; > + while ((resArray = re.exec(sdp)) !== null) { > + info("resArray: " + resArray); > + const tuple = resArray[2] + ":" + resArray[3]; Oops, I totally forgot that ICE TCP adds another wrinkle here; active candidates are likely to all use port 9, since it is not important to convey the actual port number (http://tools.ietf.org/html/rfc6544#appendix-C). So maybe we check if resArray[3] === "9" here, and skip the triple, we'll work around this? Sorry I did not remember this earlier. @@ +221,5 @@ > + for (var i=0; i< offerTuples.length; i++) { > + if (answerTuples.indexOf(offerTuples[i]) !== -1) { > + dump("SDP offer: " + test._local_offer.sdp.replace(/[\r]/g, '') + "\n"); > + dump("SDP answer: " + test.pcRemote._last_answer.sdp.replace(/[\r]/g, '') + "\n"); > + ok(false, "This IP:Port " + offerTuples[i] + " appears in SDP offer and answer!"); Won't this fire a false positive if the offer has a UDP tuple that shows up in the answer as a TCP tuple? Why aren't we just using triples everywhere?
Attachment #8444872 - Flags: review?(docfaraday) → review-
This ignores now TCP on port 9 and uses Triples only. Try run: https://tbpl.mozilla.org/?tree=Try&rev=dde5f54257a2
Attachment #8444872 - Attachment is obsolete: true
Attachment #8446189 - Flags: review?(docfaraday)
Comment on attachment 8446189 [details] [diff] [review] bug_963524_detect_double_ports.patch Review of attachment 8446189 [details] [diff] [review]: ----------------------------------------------------------------- Some nits, but it looks reasonable otherwise. Probably would be good to do a mochitest try push using either ipc or e10s, and run over and over until it detects the failure we're looking for. ::: dom/media/tests/mochitest/templates.js @@ +193,5 @@ > + var re = /a=candidate.* (UDP|TCP) [\d]+ ([\d\.]+) ([\d]+) typ host/g; > + > + function _sdpCandidatesIntoArray(sdp) { > + var resArray = []; > + var resultArray = []; The naming here makes it a little difficult to read, maybe rename resArray to something like patternMatchArray? @@ +216,5 @@ > + const offerTriples = _sdpCandidatesIntoArray(test._local_offer.sdp); > + info("Offer ICE host candidates: " + JSON.stringify(offerTriples)); > + > + const answerTriples = _sdpCandidatesIntoArray(test.pcRemote._last_answer.sdp); > + info("Anwer ICE host candidates: " + JSON.stringify(answerTriples)); Nit: typo
Attachment #8446189 - Flags: review?(docfaraday) → review+
Shih-Chiang wrote the UDPSocketChild code, so perhaps he has an idea.
Flags: needinfo?(jduell.mcbugs) → needinfo?(schien)
Honza might also know. This is in reference to comment 88--we're for some reason seeing Child UDP sockets with the same port created when we ought to get a different port.
Flags: needinfo?(honzab.moz)
Addressed Byron's nit's. Carrying forward r+=bwc
Attachment #8446189 - Attachment is obsolete: true
Passing 0 is representing any port while creating UDP socket. For the usage in NrSocketIpc should all go through the case of "network interface specified with arbitrary port", and it'll use UDPSocketParent under this code path: http://dxr.mozilla.org/mozilla-central/source/dom/network/src/UDPSocketParent.cpp#84 . I'm adding some more debug information to get more clue.
Flags: needinfo?(schien)
No new problems created by the patch.
(In reply to Jason Duell (:jduell) from comment #95) > Honza might also know. This is in reference to comment 88--we're for some > reason seeing Child UDP sockets with the same port created when we ought to > get a different port. We by default reuse. Maybe don't reuse for the test? Even the autobind could be confused here, not sure. I'm no TCP/UDP stack expert for linux. Maybe ask Michal?
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #105) > (In reply to Jason Duell (:jduell) from comment #95) > > Honza might also know. This is in reference to comment 88--we're for some > > reason seeing Child UDP sockets with the same port created when we ought to > > get a different port. > > We by default reuse. Maybe don't reuse for the test? Even the autobind > could be confused here, not sure. I'm no TCP/UDP stack expert for linux. > Maybe ask Michal? I think maybe there is still some misunderstanding; the duplicate port is handed out when the first socket bound to it is still in use.
Needinfoing Honza to make sure he sees this: I think maybe there is still some misunderstanding; the duplicate port is handed out when the first socket bound to it is still in use.
Flags: needinfo?(honzab.moz)
(In reply to Randell Jesup [:jesup] from comment #107) > Needinfoing Honza to make sure he sees this: > > I think maybe there is still some misunderstanding; the duplicate port is > handed out when the first socket bound to it is still in use. Don't worry, I see it. That is what reuse does - use the same port. Can you try pass |false| to reuse and we are done here?
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #108) > That is what reuse does - use the same port. Can you try pass |false| to > reuse and we are done here? But the client side has not given up the port yet. 're-use' should only give out ports which have been given up. The client side calls multiple times for random port and allows with re-use to give back ports which were in use before. But instead of giving distinct ports, including ports which were in use before but have been closed by the client, the current implementation gives out the same port number multiple times to the same client.
Sorry for my previous comment. After talking to Byron we both realized that we are confused by your last comment. When you used the word 'reuse' I assume you would refer to the socket option SO_PORTREUSE. But we are actually not aware that is used anywhere. Could you please explain what do you mean by 'reuse'? Maybe with pointers to code in mozilla-central. And what do you mean with "We by default reuse"? We believe that our implementation provides port 0 for all new port requests. And we want and need distinguished port numbers for each of these requests. This works on all platforms as expected, but not when IPC is being used. And we use IPC so far only on B2G. Ideally we would like to avoid having to have special code for IPC and/or B2G.
Maybe I'm not the best to answer this. I suppose the ICE code is based on our UDP socket, NSPR or the xpcom classes. NSPR by default sets the SO_REUSEADDR (not PORT), IIRC. The xpcom class, that is using the NSPR socket internally, doesn't change that. I don't know the code you are asking about at all, so I maybe just need some more background to give answers. It's not clear what process/subprocess is allocating the ports, what protocol is involved etc... Maybe send me a private mail (use the bugzilla mail) and we can talk about it off the bug. I don't want to pollute it.
Target Milestone: mozilla33 → mozilla35
Ok, it looks like we're dealing with a linux kernel bug: http://comments.gmane.org/gmane.linux.network/238870 Seems that this particular usage pattern is weird enough that there is no interest in fixing the bug. I propose we work around by only setting reuseaddr if the port is non-zero.
Attachment #8465022 - Attachment is obsolete: true
Comment on attachment 8465023 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. Review of attachment 8465023 [details] [diff] [review]: ----------------------------------------------------------------- This is the most minimal way of fixing this, but it occurs to me that it might be useful to add a function for getting the port from a NetAddr to DNS.h (alongside other convenience functions like IsIpAddrAny). Let me know if you'd prefer to take that approach.
Attachment #8465023 - Flags: feedback?(mcmanus)
Comment on attachment 8465023 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. Review of attachment 8465023 [details] [diff] [review]: ----------------------------------------------------------------- Thanks byron - The DNS.h change would be nice, but I would also accept the spot fix if you don't want to bother. It hasn't come up before, so it probably isn't that important. please fix the bracing to if () { } else if () { } style.. even when its just code you moved.
Attachment #8465023 - Flags: feedback?(mcmanus) → feedback+
Move some convenience code into a more useful place, and tweak some formatting.
Attachment #8465023 - Attachment is obsolete: true
Comment on attachment 8465512 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. Review of attachment 8465512 [details] [diff] [review]: ----------------------------------------------------------------- Moved the port-getter function to DNS.h/cpp (and converted it to use uint16_t), and fixed up the brace formatting.
Attachment #8465512 - Flags: review?(mcmanus)
Comment on attachment 8465512 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. Review of attachment 8465512 [details] [diff] [review]: ----------------------------------------------------------------- thanks! ::: netwerk/base/src/nsUDPSocket.cpp @@ +664,5 @@ > // no need to enter the lock here > + uint16_t result; > + nsresult ret = net::GetPort(&mAddr, &result); > + *aResult = static_cast<int32_t>(result); > + return ret; s/ret/rv
Attachment #8465512 - Flags: review?(mcmanus) → review+
Attachment #8465512 - Attachment is obsolete: true
Comment on attachment 8466237 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. Review of attachment 8466237 [details] [diff] [review]: ----------------------------------------------------------------- Carry forward r=mcmanus
Attachment #8466237 - Flags: review+
Comment on attachment 8446254 [details] [diff] [review] bug_963524_detect_double_ports.patch We've already checked this in elsewhere.
Attachment #8446254 - Attachment is obsolete: true
Let's see if this fixes all of the failures we're seeing here.
Keywords: checkin-needed
Keywords: leave-open
Target Milestone: mozilla35 → mozilla34
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Please request aurora/beta/esr31/b2g30 approval on this patch to fix this very annoying orange :)
Flags: needinfo?(docfaraday)
Comment on attachment 8466237 [details] [diff] [review] Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port. That's way more aggressive than any uplift I've seen outside of sec bugs, but here goes. Ball's in your court RyanVM. NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): 839757 User impact if declined: Very annoying intermittent failures in all webrtc mochitests, and for webrtc users on B2G. Testing completed: The usual CI stuff, and some hand testing. Risk to taking this patch (and alternatives if risky): None. String or UUID changes made by this patch: None. [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is an intermittent failure that manifests in all webrtc mochitests, and is a drain on the sheriffs' time. User impact if declined: Intermittent webrtc failures when using e10s. Fix Landed on Version: 34 Risk to taking this patch (and alternatives if risky): None. String or UUID changes made by this patch: None. See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info. Approval Request Comment [Feature/regressing bug #]: 839757 [User impact if declined]: Intermittent webrtc failures when using e10s, and continued intermittent failures across the entire webrtc mochitest suite. [Describe test coverage new/current, TBPL]: No new test coverage, existing webrtc mochitests hit this problem pretty frequently. [Risks and why]: None; all this does is disable a address-reuse sockopt when it makes sense to do so. [String/UUID change made/needed]: None.
Attachment #8466237 - Flags: approval-mozilla-esr31?
Attachment #8466237 - Flags: approval-mozilla-beta?
Attachment #8466237 - Flags: approval-mozilla-b2g30?
Attachment #8466237 - Flags: approval-mozilla-aurora?
Flags: needinfo?(docfaraday)
Attachment #8466237 - Flags: approval-mozilla-esr31?
Attachment #8466237 - Flags: approval-mozilla-esr31+
Attachment #8466237 - Flags: approval-mozilla-beta?
Attachment #8466237 - Flags: approval-mozilla-beta+
Attachment #8466237 - Flags: approval-mozilla-aurora?
Attachment #8466237 - Flags: approval-mozilla-aurora+
Attachment #8466237 - Flags: approval-mozilla-b2g30? → approval-mozilla-b2g30+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: