Closed
Bug 963524
Opened 11 years ago
Closed 10 years ago
Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla34
Tracking | Status | |
---|---|---|
firefox32 | --- | fixed |
firefox33 | --- | fixed |
firefox34 | --- | fixed |
firefox-esr24 | --- | unaffected |
firefox-esr31 | --- | fixed |
b2g-v1.3 | --- | unaffected |
b2g-v1.3T | --- | unaffected |
b2g-v1.4 | --- | fixed |
b2g-v2.0 | --- | fixed |
b2g-v2.1 | --- | fixed |
People
(Reporter: cbook, Assigned: bwc)
References
()
Details
(Keywords: intermittent-failure)
Attachments
(2 files, 7 obsolete files)
209.75 KB,
application/x-gzip
|
Details | |
3.57 KB,
patch
|
bwc
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-esr31+
bajaj
:
approval-mozilla-b2g30+
|
Details | Diff | Splinter Review |
b2g_emulator_vm mozilla-inbound debug test mochitest-debug-6 on 2014-01-24 03:17:58 PST for push 691a1d12372a
slave: tst-linux64-spot-439
https://tbpl.mozilla.org/php/getParsedLog.php?id=33520855&tree=Mozilla-Inbound
Summary
7949 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Summary: Intermittent TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected
Updated•11 years ago
|
Assignee: nobody → rjesup
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 30•11 years ago
|
||
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html | pc_local: ICE failed to switch to connected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 36•11 years ago
|
||
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html | pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html https://tbpl.mozilla.org/php/getParsed| pc_local: ICE failed to switch to connected
Updated•11 years ago
|
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html https://tbpl.mozilla.org/php/getParsed| pc_local: ICE failed to switch to connected → Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 49•11 years ago
|
||
Appears to be some sort of E10S/ICE interaction.
Byron: ping me on IRC for details on how to track this down
Assignee | ||
Comment 50•11 years ago
|
||
So, looking at these logs, I am seeing lots of multi-second gaps where I would not expect:
21:09:12 INFO - 05-02 03:37:03.972 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 6 : fsm_get_fcb_by_call_id_and_type : fcb= 0x451da060
21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] fsmdef.c:3967: SIPCC-FSM: fsmdef_ev_setpeerconnection: Setting peerconnection handle for (1/6) to fb822595c9afe888
21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 1/6, fsm_change_state: DEF: IDLE -> STABLE
21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 6 : DCSM_READY --> DCSM_READY
21:09:12 INFO - 05-02 03:37:03.992 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/6, sm_process_event: DEF :(IDLE:SETPEERCONNECTION )
21:09:12 INFO - 05-02 03:37:09.652 760 760 I PRLog : 1074345096[40222080]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:497: PeerConnectionImpl: PeerConnectionImpl constructor for
21:09:12 INFO - 05-02 03:37:09.672 760 760 I PRLog : 1074345096[40222080]: [main|CC_SIPCCCall] CC_SIPCCCall.cpp:36: Creating CC_SIPCCCall 65543
21:09:12 INFO - 05-02 03:37:09.702 760 831 I PRLog : 55440[4430d900]: [CCAPP Task|def] ccapi.c:1164: SIPCC-CC_API: 1/7, cc_int_feature2: UI -> GSM: SETPEERCONNECTION
21:09:12 INFO - 05-02 03:37:09.712 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 22 :(DCSM_READY:SETPEERCONNECTION )
2
21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 1/7, fsm_change_state: DEF: IDLE -> STABLE
21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 7 : DCSM_READY --> DCSM_READY
21:09:12 INFO - 05-02 03:37:09.762 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/7, sm_process_event: DEF :(IDLE:SETPEERCONNECTION )
21:09:12 INFO - 05-02 03:37:14.313 760 831 I PRLog : 55440[4430d900]: [CCAPP Task|def] ccapi.c:1164: SIPCC-CC_API: 1/6, cc_int_feature2: UI -> GSM: ADDSTREAM
21:09:12 INFO - 05-02 03:37:14.323 760 834 I PRLog : 55632[4430dc00]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 23 :(DCSM_READY:ADDSTREAM )
21:09:12 INFO - 05-02 03:37:14.323 760 834 I PRLog : 55632[4430dc00]: [GSM Task|fsm_sm] sm.c:46: SIPCC-FSM: sm_process_event: DEF 6 : 0x40834f39x: sm entry: (STABLE:ADDSTREAM)
2
21:09:12 INFO - 05-02 03:37:28.073 760 871 I PRLog : 56032[43b22200]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed
21:09:12 INFO - 05-02 03:37:28.812 760 916 I PRLog : 56576[45126b00]: [ProcessThread|WebrtcAudioSessionConduit] AudioConduit.cpp:861: SendRTCPPacket RTCP Packet Send Failed
21:09:12 INFO - 05-02 03:37:29.652 760 871 I PRLog : 56032[43b22200]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed
21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: ../../../../gecko/media/mtransport/transportlayerice.cpp:134: Flow[fb822595c9afe888:1,rtp(none)]; Layer[ice]: state 1->5
21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: Flow[fb822595c9afe888:1,rtp(none)]; Layer[dtls]: Lower lower experienced an error
21:09:12 INFO - 05-02 03:37:32.333 760 765 I PRLog : 41360[42f56080]: ../../../../gecko/media/mtransport/transportlayerdtls.cpp:625: Flow[fb822595c9afe888:1,rtp(none)]; Layer[dtls]: state 1->5
2
Assignee | ||
Comment 51•11 years ago
|
||
I'll also note that the log levels are frustratingly inconsistent from run to run; some stuff seems to be logging at INFO, others at WARNING.
Assignee | ||
Comment 52•11 years ago
|
||
Finally found a log with something potentially useful in it:
05:20:19 INFO - -1143539264[7feeae715140]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:274: PeerConnectionObserverDispatch processing mCallState = 26 (SETLOCALDESCSUCCESS), mFsmState = 16 (STABLE)
05:20:19 INFO - (ice/INFO) ICE-PEER(PC:1398946818716414 (id=409 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html):default)/CAND-PAIR(Udjs): triggered check on Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host)
05:20:20 INFO - -1847404800[7fee9a9eb2c0]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed
05:20:20 INFO - -1847404800[7fee9a9eb2c0]: [MediaStreamGrph|WebrtcAudioSessionConduit] AudioConduit.cpp:709: A/V sync: GetAVStats failed
05:20:20 INFO - (ice/INFO) ICE-PEER(PC:1398946818716414 (id=409 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html):default)/CAND-PAIR(Udjs): triggered check on Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host)
05:20:20 INFO - (stun/INFO) STUN-CLIENT(Udjs|IP4:10.134.59.228:53690/UDP|IP4:10.134.59.228:54025/UDP(host(IP4:10.134.59.228:53690/UDP)|candidate:0 1 UDP 2122252543 10.134.59.228 54025 typ host)): Too many retransmit attempts
05:20:20 INFO - (stun/INFO) Responding with error 400: ICE Failure
It seems that a sufficient network delay can result in a flurry of check requests being delivered to us all at once, causing us to emit triggered checks that cause our retransmit counter to reach its limit, prompting ICE to fail. This is probably wrong.
Assignee | ||
Comment 53•11 years ago
|
||
I should point out that I'm pretty sure that 1006809 is only one of the reasons for this intermittent failure. I'm seeing evidence for extremely bad latency in these logs, and until this gets better, the webrtc tests are going to be unreliable.
Comment 54•11 years ago
|
||
OS's:
35 (77.8%): b2g_emulator_vm
7 (15.6%): Ubuntu VM 12.04 x64
3 (6.7%): Ubuntu VM 12.04
For the b2g emulator I'm totally with you regarding the execution speed.
I'm just wondering if the Linux builds are running on overloaded VM's or why else it sometimes happens on Linux.
Assignee | ||
Comment 55•11 years ago
|
||
I have a feeling that there are many root causes for failures in this test. We just need to knock them down one by one.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 60•11 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #55)
> I have a feeling that there are many root causes for failures in this test.
> We just need to knock them down one by one.
So, the first thing that sticks out at me is:
> STUN-CLIENT(srflx(IP4:10.132.56.142:58803/UDP|stun.services.mozilla.com:3478)): Timed out
Yikes! We should not be attempting to reach out to the cloud during mochi tests! Lacking a locally-deployed STUN server (which probably isn't happening soon), I think we might be able to head off these errors by adding "iceServers:[]" to the config_pc1 and config_pc2 objects in pc.js (or take other steps to deactivate the default stun server -- technically, [] should cause the constructor to throw, per spec).
It might not clear up all of the problems that are being filed under this bug, but it should make it much easier to see what's going on.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Priority: -- → P2
Target Milestone: --- → mozilla33
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicAudio.html,test_peerConnection_basicAudioVideo.html,test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected → Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_offerRequiresReceiveVideoAudio.html | pc_local: ICE failed to switch to connected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 78•10 years ago
|
||
The failure in comment 77 is due to port reuse. I also reproduced the port-reuse with better logging on try.
Assignee | ||
Comment 80•10 years ago
|
||
(The reused port in comment 77 is 47841, and in the attachment is 38622)
Comment 81•10 years ago
|
||
This patch adds a temporary verification step to our peer connection tests to search for ports which appear twice in SDP offer, answer or in both.
Try run: https://tbpl.mozilla.org/?tree=Try&rev=5d19db42b9aa
Attachment #8443857 -
Flags: review?(docfaraday)
Assignee | ||
Comment 82•10 years ago
|
||
Another case, this time with port 54719
https://tbpl.mozilla.org/php/getParsedLog.php?id=42182422&tree=Try&full=1#error0
Comment 83•10 years ago
|
||
I don't know enough about SDP to know why/how you could get dupe ports in the offer.
Flags: needinfo?(jduell.mcbugs)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 85•10 years ago
|
||
SDP just exposes the underlying problem. The same SDP code works fine on all other platforms. Only with IPC and e10s tests we see port numbers are getting used twice in the SDP.
So our current assumption is that the client side (or whatever you call the side which initiates the IPC calls) makes multiple calls for request UDP sockets and sometimes the server side seems to return one port number/socket multiple times.
@Jason: we were told you are the expert on IPC code. Does the above assumption feasible to you? Can you please help us how to debug and solve this problem?
Flags: needinfo?(jduell.mcbugs)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 87•10 years ago
|
||
Comment on attachment 8443857 [details] [diff] [review]
bug_963524_detect_double_ports.patch
Review of attachment 8443857 [details] [diff] [review]:
-----------------------------------------------------------------
Primarily concerned with the ICE TCP question, but have some other minor concerns.
::: dom/media/tests/mochitest/templates.js
@@ +190,5 @@
> [
> + 'PC_REMOTE_CHECK_FOR_DUPLICATED_PORTS',
> + // TODO: remove this once bug 963524 is fixed
> + function (test) {
> + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g;
I wonder if it would be sensible to keep this around long term, actually. Also, this might end up having false positives with ICE TCP tuples clashing with UDP tuples (this will land soon with any luck). We may need to check candidates with "tcptype" separately, which probably also means a little refactoring to have a common duplicate-checking function.
@@ +194,5 @@
> + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g;
> + var offerIps = [];
> + var answerIps = [];
> + var resArray;
> + while ((resArray = re.exec(test._local_offer.sdp)) !== null) {
What does this do if _local_offer or sdp are undefined? Similar question for answer stuff.
@@ +196,5 @@
> + var answerIps = [];
> + var resArray;
> + while ((resArray = re.exec(test._local_offer.sdp)) !== null) {
> + var tuple = resArray[1] + ":" + resArray[2];
> + if (offerIps.indexOf(tuple) != -1) {
Maybe we want !== here? Also in the other != -1 comparisons.
Attachment #8443857 -
Flags: review?(docfaraday) → review-
Assignee | ||
Comment 88•10 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #83)
> I don't know enough about SDP to know why/how you could get dupe ports in
> the offer.
This call (when passed a port of 0) is occasionally picking an already-bound port, leading to the duplicates we see later on in the SDP.
http://dxr.mozilla.org/mozilla-central/source/media/mtransport/nr_socket_prsock.cpp#1037
This is either a flaw in udp-socket-child, or a flaw in the way we're using it.
Comment 89•10 years ago
|
||
Addressed Byron's concerns regarding TCP candidates.
Attachment #8443857 -
Attachment is obsolete: true
Attachment #8444872 -
Flags: review?(docfaraday)
Comment 90•10 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #87)
> I wonder if it would be sensible to keep this around long term, actually.
That depends on what we think how bad our implementations are (going to be) :-)
But sure, does not hurt to have more safety belts.
> Also, this might end up having false positives with ICE TCP tuples clashing
> with UDP tuples (this will land soon with any luck). We may need to check
> candidates with "tcptype" separately, which probably also means a little
> refactoring to have a common duplicate-checking function.
I added support for TCP candidates in the new patch.
> @@ +194,5 @@
> > + var re = /a=candidate.* [\d]+ ([\d\.]+) ([\d]+) typ host/g;
> > + var offerIps = [];
> > + var answerIps = [];
> > + var resArray;
> > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) {
>
> What does this do if _local_offer or sdp are undefined? Similar question for
> answer stuff.
Well, our tests only proceed from the onSuccess() callbacks from createOffer() and createAnswer(). So the assumption here is that both exist.
> @@ +196,5 @@
> > + var answerIps = [];
> > + var resArray;
> > + while ((resArray = re.exec(test._local_offer.sdp)) !== null) {
> > + var tuple = resArray[1] + ":" + resArray[2];
> > + if (offerIps.indexOf(tuple) != -1) {
>
> Maybe we want !== here? Also in the other != -1 comparisons.
indexOf() should always return a number, so == should be safe if I'm not wrong. But I guess it never hurts to use === instead (included in the new patch) :-)
Assignee | ||
Comment 91•10 years ago
|
||
Comment on attachment 8444872 [details] [diff] [review]
bug_963524_detect_double_ports.patch
Review of attachment 8444872 [details] [diff] [review]:
-----------------------------------------------------------------
An r- that is half my fault, sorry about that.
::: dom/media/tests/mochitest/templates.js
@@ +197,5 @@
> + var tempArray = [];
> + var resultArray = [];
> + while ((resArray = re.exec(sdp)) !== null) {
> + info("resArray: " + resArray);
> + const tuple = resArray[2] + ":" + resArray[3];
Oops, I totally forgot that ICE TCP adds another wrinkle here; active candidates are likely to all use port 9, since it is not important to convey the actual port number (http://tools.ietf.org/html/rfc6544#appendix-C). So maybe we check if resArray[3] === "9" here, and skip the triple, we'll work around this?
Sorry I did not remember this earlier.
@@ +221,5 @@
> + for (var i=0; i< offerTuples.length; i++) {
> + if (answerTuples.indexOf(offerTuples[i]) !== -1) {
> + dump("SDP offer: " + test._local_offer.sdp.replace(/[\r]/g, '') + "\n");
> + dump("SDP answer: " + test.pcRemote._last_answer.sdp.replace(/[\r]/g, '') + "\n");
> + ok(false, "This IP:Port " + offerTuples[i] + " appears in SDP offer and answer!");
Won't this fire a false positive if the offer has a UDP tuple that shows up in the answer as a TCP tuple? Why aren't we just using triples everywhere?
Attachment #8444872 -
Flags: review?(docfaraday) → review-
Comment 92•10 years ago
|
||
This ignores now TCP on port 9 and uses Triples only.
Try run: https://tbpl.mozilla.org/?tree=Try&rev=dde5f54257a2
Attachment #8444872 -
Attachment is obsolete: true
Attachment #8446189 -
Flags: review?(docfaraday)
Assignee | ||
Comment 93•10 years ago
|
||
Comment on attachment 8446189 [details] [diff] [review]
bug_963524_detect_double_ports.patch
Review of attachment 8446189 [details] [diff] [review]:
-----------------------------------------------------------------
Some nits, but it looks reasonable otherwise. Probably would be good to do a mochitest try push using either ipc or e10s, and run over and over until it detects the failure we're looking for.
::: dom/media/tests/mochitest/templates.js
@@ +193,5 @@
> + var re = /a=candidate.* (UDP|TCP) [\d]+ ([\d\.]+) ([\d]+) typ host/g;
> +
> + function _sdpCandidatesIntoArray(sdp) {
> + var resArray = [];
> + var resultArray = [];
The naming here makes it a little difficult to read, maybe rename resArray to something like patternMatchArray?
@@ +216,5 @@
> + const offerTriples = _sdpCandidatesIntoArray(test._local_offer.sdp);
> + info("Offer ICE host candidates: " + JSON.stringify(offerTriples));
> +
> + const answerTriples = _sdpCandidatesIntoArray(test.pcRemote._last_answer.sdp);
> + info("Anwer ICE host candidates: " + JSON.stringify(answerTriples));
Nit: typo
Attachment #8446189 -
Flags: review?(docfaraday) → review+
Comment 94•10 years ago
|
||
Shih-Chiang wrote the UDPSocketChild code, so perhaps he has an idea.
Flags: needinfo?(jduell.mcbugs) → needinfo?(schien)
Comment 95•10 years ago
|
||
Honza might also know. This is in reference to comment 88--we're for some reason seeing Child UDP sockets with the same port created when we ought to get a different port.
Flags: needinfo?(honzab.moz)
Comment 96•10 years ago
|
||
Addressed Byron's nit's.
Carrying forward r+=bwc
Attachment #8446189 -
Attachment is obsolete: true
Comment 97•10 years ago
|
||
Passing 0 is representing any port while creating UDP socket. For the usage in NrSocketIpc should all go through the case of "network interface specified with arbitrary port", and it'll use UDPSocketParent under this code path: http://dxr.mozilla.org/mozilla-central/source/dom/network/src/UDPSocketParent.cpp#84 .
I'm adding some more debug information to get more clue.
Flags: needinfo?(schien)
Comment 99•10 years ago
|
||
Keywords: checkin-needed
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 105•10 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #95)
> Honza might also know. This is in reference to comment 88--we're for some
> reason seeing Child UDP sockets with the same port created when we ought to
> get a different port.
We by default reuse. Maybe don't reuse for the test? Even the autobind could be confused here, not sure. I'm no TCP/UDP stack expert for linux. Maybe ask Michal?
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 106•10 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #105)
> (In reply to Jason Duell (:jduell) from comment #95)
> > Honza might also know. This is in reference to comment 88--we're for some
> > reason seeing Child UDP sockets with the same port created when we ought to
> > get a different port.
>
> We by default reuse. Maybe don't reuse for the test? Even the autobind
> could be confused here, not sure. I'm no TCP/UDP stack expert for linux.
> Maybe ask Michal?
I think maybe there is still some misunderstanding; the duplicate port is handed out when the first socket bound to it is still in use.
Comment 107•10 years ago
|
||
Needinfoing Honza to make sure he sees this:
I think maybe there is still some misunderstanding; the duplicate port is handed out when the first socket bound to it is still in use.
Flags: needinfo?(honzab.moz)
Comment 108•10 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #107)
> Needinfoing Honza to make sure he sees this:
>
> I think maybe there is still some misunderstanding; the duplicate port is
> handed out when the first socket bound to it is still in use.
Don't worry, I see it.
That is what reuse does - use the same port. Can you try pass |false| to reuse and we are done here?
Flags: needinfo?(honzab.moz)
Comment 109•10 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #108)
> That is what reuse does - use the same port. Can you try pass |false| to
> reuse and we are done here?
But the client side has not given up the port yet. 're-use' should only give out ports which have been given up.
The client side calls multiple times for random port and allows with re-use to give back ports which were in use before. But instead of giving distinct ports, including ports which were in use before but have been closed by the client, the current implementation gives out the same port number multiple times to the same client.
Comment 110•10 years ago
|
||
Sorry for my previous comment.
After talking to Byron we both realized that we are confused by your last comment. When you used the word 'reuse' I assume you would refer to the socket option SO_PORTREUSE. But we are actually not aware that is used anywhere.
Could you please explain what do you mean by 'reuse'? Maybe with pointers to code in mozilla-central. And what do you mean with "We by default reuse"?
We believe that our implementation provides port 0 for all new port requests. And we want and need distinguished port numbers for each of these requests. This works on all platforms as expected, but not when IPC is being used. And we use IPC so far only on B2G. Ideally we would like to avoid having to have special code for IPC and/or B2G.
Comment 111•10 years ago
|
||
Maybe I'm not the best to answer this.
I suppose the ICE code is based on our UDP socket, NSPR or the xpcom classes. NSPR by default sets the SO_REUSEADDR (not PORT), IIRC. The xpcom class, that is using the NSPR socket internally, doesn't change that.
I don't know the code you are asking about at all, so I maybe just need some more background to give answers. It's not clear what process/subprocess is allocating the ports, what protocol is involved etc... Maybe send me a private mail (use the bugzilla mail) and we can talk about it off the bug. I don't want to pollute it.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Target Milestone: mozilla33 → mozilla35
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 118•10 years ago
|
||
Ok, it looks like we're dealing with a linux kernel bug:
http://comments.gmane.org/gmane.linux.network/238870
Seems that this particular usage pattern is weird enough that there is no interest in fixing the bug. I propose we work around by only setting reuseaddr if the port is non-zero.
Assignee | ||
Comment 119•10 years ago
|
||
A fix.
Assignee | ||
Comment 120•10 years ago
|
||
Remove a misplaced comment.
Assignee | ||
Updated•10 years ago
|
Attachment #8465022 -
Attachment is obsolete: true
Assignee | ||
Comment 121•10 years ago
|
||
Comment on attachment 8465023 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
Review of attachment 8465023 [details] [diff] [review]:
-----------------------------------------------------------------
This is the most minimal way of fixing this, but it occurs to me that it might be useful to add a function for getting the port from a NetAddr to DNS.h (alongside other convenience functions like IsIpAddrAny). Let me know if you'd prefer to take that approach.
Attachment #8465023 -
Flags: feedback?(mcmanus)
Comment 122•10 years ago
|
||
Comment on attachment 8465023 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
Review of attachment 8465023 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks byron - The DNS.h change would be nice, but I would also accept the spot fix if you don't want to bother. It hasn't come up before, so it probably isn't that important.
please fix the bracing to
if () {
} else if () {
}
style.. even when its just code you moved.
Attachment #8465023 -
Flags: feedback?(mcmanus) → feedback+
Assignee | ||
Comment 123•10 years ago
|
||
Move some convenience code into a more useful place, and tweak some formatting.
Assignee | ||
Updated•10 years ago
|
Attachment #8465023 -
Attachment is obsolete: true
Assignee | ||
Comment 124•10 years ago
|
||
Comment on attachment 8465512 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
Review of attachment 8465512 [details] [diff] [review]:
-----------------------------------------------------------------
Moved the port-getter function to DNS.h/cpp (and converted it to use uint16_t), and fixed up the brace formatting.
Attachment #8465512 -
Flags: review?(mcmanus)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 126•10 years ago
|
||
Comment on attachment 8465512 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
Review of attachment 8465512 [details] [diff] [review]:
-----------------------------------------------------------------
thanks!
::: netwerk/base/src/nsUDPSocket.cpp
@@ +664,5 @@
> // no need to enter the lock here
> + uint16_t result;
> + nsresult ret = net::GetPort(&mAddr, &result);
> + *aResult = static_cast<int32_t>(result);
> + return ret;
s/ret/rv
Attachment #8465512 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 127•10 years ago
|
||
Fix nit.
Assignee | ||
Updated•10 years ago
|
Attachment #8465512 -
Attachment is obsolete: true
Assignee | ||
Comment 128•10 years ago
|
||
Comment on attachment 8466237 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
Review of attachment 8466237 [details] [diff] [review]:
-----------------------------------------------------------------
Carry forward r=mcmanus
Attachment #8466237 -
Flags: review+
Assignee | ||
Comment 129•10 years ago
|
||
Comment on attachment 8446254 [details] [diff] [review]
bug_963524_detect_double_ports.patch
We've already checked this in elsewhere.
Attachment #8446254 -
Attachment is obsolete: true
Assignee | ||
Comment 130•10 years ago
|
||
Let's see if this fixes all of the failures we're seeing here.
Keywords: checkin-needed
Updated•10 years ago
|
Keywords: leave-open
Target Milestone: mozilla35 → mozilla34
Comment 131•10 years ago
|
||
Flags: in-testsuite+
Keywords: checkin-needed
Comment 132•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
status-b2g-v1.3:
--- → unaffected
status-b2g-v1.3T:
--- → unaffected
status-b2g-v1.4:
--- → affected
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → fixed
status-firefox32:
--- → affected
status-firefox33:
--- → affected
status-firefox34:
--- → fixed
status-firefox-esr24:
--- → unaffected
status-firefox-esr31:
--- → affected
Comment 133•10 years ago
|
||
Please request aurora/beta/esr31/b2g30 approval on this patch to fix this very annoying orange :)
Flags: needinfo?(docfaraday)
Assignee | ||
Comment 134•10 years ago
|
||
Comment on attachment 8466237 [details] [diff] [review]
Avoid setting SO_REUSEADDR when binding a UDP socket to port 0, since the linux kernel might select an already open port.
That's way more aggressive than any uplift I've seen outside of sec bugs, but here goes. Ball's in your court RyanVM.
NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #):
839757
User impact if declined:
Very annoying intermittent failures in all webrtc mochitests, and for webrtc users on B2G.
Testing completed:
The usual CI stuff, and some hand testing.
Risk to taking this patch (and alternatives if risky):
None.
String or UUID changes made by this patch:
None.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is an intermittent failure that manifests in all webrtc mochitests, and is a drain on the sheriffs' time.
User impact if declined:
Intermittent webrtc failures when using e10s.
Fix Landed on Version:
34
Risk to taking this patch (and alternatives if risky):
None.
String or UUID changes made by this patch:
None.
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Approval Request Comment
[Feature/regressing bug #]:
839757
[User impact if declined]:
Intermittent webrtc failures when using e10s, and continued intermittent failures across the entire webrtc mochitest suite.
[Describe test coverage new/current, TBPL]:
No new test coverage, existing webrtc mochitests hit this problem pretty frequently.
[Risks and why]:
None; all this does is disable a address-reuse sockopt when it makes sense to do so.
[String/UUID change made/needed]:
None.
Attachment #8466237 -
Flags: approval-mozilla-esr31?
Attachment #8466237 -
Flags: approval-mozilla-beta?
Attachment #8466237 -
Flags: approval-mozilla-b2g30?
Attachment #8466237 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(docfaraday)
Updated•10 years ago
|
Attachment #8466237 -
Flags: approval-mozilla-esr31?
Attachment #8466237 -
Flags: approval-mozilla-esr31+
Attachment #8466237 -
Flags: approval-mozilla-beta?
Attachment #8466237 -
Flags: approval-mozilla-beta+
Attachment #8466237 -
Flags: approval-mozilla-aurora?
Attachment #8466237 -
Flags: approval-mozilla-aurora+
Comment 135•10 years ago
|
||
Comment 136•10 years ago
|
||
Updated•10 years ago
|
Attachment #8466237 -
Flags: approval-mozilla-b2g30? → approval-mozilla-b2g30+
Comment 137•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•