Closed Bug 1184319 Opened 9 years ago Closed 8 years ago

Intermittent B2G test_peerConnection_closeDuringIce.html,test_peerConnection_offerRequiresReceiveAudio.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test timed out

Categories

(Core :: WebRTC, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox42 --- affected

People

(Reporter: RyanVM, Assigned: drno)

Details

(Keywords: intermittent-failure)

No description provided.
New intermittent timeout, with fairly high frequency - any ideas what might have been checked in around when this started that might affect this use-case?
backlog: --- → webRTC+
Rank: 19
Flags: needinfo?(jib)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
Priority: -- → P1
Have there been any changes to the CI setup for B2G emulator between the 11th and the 15th?
For the most part, nothing's changed with the B2G emulator setup in years most likely.
jib: the other things that landed were deviceid changes. If there's a deadlock there (or significant expansion in time used), that could cause this. Note: b2g emulator.
So if I read the raw logs correctly then it is actually not a timing issue this time as only 29 seconds pass by between the first log message from the test until the last message. And then it sit and times out several minutes later. But I found this beauty in the log file: -667:38.97 LOG: ProcessReader INFO TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams. -667:45.94 LOG: ProcessReader INFO -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":55288,"cid":1,"active":2,"pdpType":-1, "ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} -667:48.27 LOG: ProcessReader INFO -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":55328,"cid":1,"active":2,"pdpType":-1, "ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} -667:48.36 LOG: ProcessReader INFO JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null -667:49.36 LOG: ProcessReader INFO JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null -667:49.95 LOG: ProcessReader INFO (ice/WARNING) ICE(PC:1437385246595316 (id=4294967972 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnectio n_offerRequiresRec): No STUN servers specified Could it be that activating the network connection on the emulator sometimes fails?
Flags: needinfo?(drno)
(In reply to Randell Jesup [:jesup] from comment #19) > jib: the other things that landed were deviceid changes. If there's a > deadlock there (or significant expansion in time used), that could cause > this. Note: b2g emulator. Nothing I can think of. If you mean Bug 1181428, then the only significant difference is this line: http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaManager.cpp?rev=7a16019b26bf&mark=864-866#842 which does a second often-redundant GetOriginKey round-trip call to the chrome process at the tail of every gUM call in case deviceIds have become persistent, but it is lightweight and fire-and-forget, so it shouldn't hold up anything.
Flags: needinfo?(jib)
(In reply to Nils Ohlmeier [:drno] from comment #20) > Could it be that activating the network connection on the emulator sometimes > fails? Possibly. Hit up the people in #b2g. We had horrible problems with stuff like this early on (very timing-related, and someone else broke it at least once since then)
(In reply to Randell Jesup [:jesup] from comment #22) > (In reply to Nils Ohlmeier [:drno] from comment #20) > > > Could it be that activating the network connection on the emulator sometimes > > fails? > > Possibly. Hit up the people in #b2g. We had horrible problems with stuff > like this early on (very timing-related, and someone else broke it at least > once since then) Nils -- I think your theory is the best lead we have on what might be going on here. I'm assigning this to you to pursue this theory and talk with the b2g automation folks. Don't spend a ton of time on this one. I'd love to get this resolved, but I know intermittents like this can be a horrible time suck with nothing to show for it if the working theory proves wrong. If your theory proves wrong, let me know, and we can reassess the bug (ownership, priority, etc). Thanks.
Assignee: nobody → drno
In the raw log from comment #23: -1800:40.23 LOG: ProcessReader INFO -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":55232,"cid":1,"active":2,"pdpType":-1,"ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} -1800:42.20 LOG: ProcessReader INFO -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":55232,"cid":1,"active":2,"pdpType":-1,"ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} -1800:42.99 LOG: ProcessReader INFO JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null -1800:43.52 LOG: ProcessReader INFO (ice/WARNING) ICE(PC:1437488691940122 (id=4294967972 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresRec): No STUN servers specified -1800:43.53 LOG: ProcessReader INFO (ice/NOTICE) ICE(PC:1437488691940122 (id=4294967972 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresRec): No TURN servers specified -1800:43.99 LOG: ProcessReader INFO (ice/WARNING) ICE(PC:1437488692425202 (id=4294967972 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresRec): No STUN servers specified -1800:44.01 LOG: ProcessReader INFO (ice/NOTICE) ICE(PC:1437488692425202 (id=4294967972 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresRec): No TURN servers specified -1800:44.23 LOG: ProcessReader INFO JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null Two things are different in the failing test case compared to the working tests before: 1) For some unknown reason two onSetupDataCallResult's show up. The content of the onSetupDataCallResult return is identical to the working scenarios. So I find it more concerning that it happens twice. 2) The JS error here https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/DataCallManager.js#1513 appears to be unique to this failure. Looks like the something has called shutdown() on the RILNetworkInterface object (that seems to be only way how dataCall could be null).
Hmm that previous automated comment is actually annoying to a certain degree... I'm running a try build with request for full logs to check if I can repro this through re-triggers: https://treeherder.mozilla.org/#/jobs?repo=try&revision=653432bc8c10
Full log excerpt: 13:59:14 INFO - 5597 INFO TEST-START | dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveAudio.html [...] 13:59:19 INFO - TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams. 13:59:25 INFO - -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":1073846909,"cid":1,"active":2,"pdpType":-1,"ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} 13:59:27 INFO - JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null 13:59:27 INFO - -*- DataCall[0:epc.tmobile.com]: onSetupDataCallResult: {"failCause":0,"suggestedRetryTime":55232,"cid":1,"active":2,"pdpType":-1,"ifname":"rmnet0","addresses":"10.0.2.100/24","dnses":"10.26.75.40 10.26.75.41","gateways":"10.0.2.2"} 13:59:28 INFO - JavaScript error: jar:file:///system/b2g/omni.ja!/components/DataCallManager.js, line 1513: TypeError: this.dataCall is null [...] 13:59:47 INFO - 5747 INFO Got media stream: audio (remote) 14:04:32 INFO - 5748 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveAudio.html | A valid string reason is expected 14:04:32 INFO - 5749 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveAudio.html | Reason cannot be empty 14:04:32 INFO - 5750 INFO Network interface is not ready, required additional network setup 14:04:32 INFO - 5751 INFO Setting up network interface 14:04:32 INFO - 5752 INFO Network interface is ready 14:04:32 INFO - 5753 INFO TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveAudio.html | Test timed out. This confirms that our test is busy only for a couple of seconds and then sits and waits... probably because the network is in fact not up. What is really strange is the logging about the network interface at the end of the test. This is suppose to happen at the beginning of the test. My previous attempt of requesting full logs was after the network initialization code. So I'll try again, this time as early as possible.
Try run with full logging turned on (as early as possible): https://treeherder.mozilla.org/#/jobs?repo=try&revision=3af81364c9c2
So full logging shows that the network setup steps are executed as expected in the case when everything works. What is a little surprising is that we apparently tear down the network at the end of each test case... which makes the beloved test_zmedia_cleanup useless... so I think this is different then it used to be. But I doubt this is a recent change. I retriggered a couple of times so that I hopefully get the error with full logging...
(In reply to Nils Ohlmeier [:drno] from comment #35) > What is a little surprising is that we apparently tear down the network at > the end of each test case... which makes the beloved test_zmedia_cleanup > useless... so I think this is different then it used to be. But I doubt this > is a recent change. Apparently we always have done it this way: start-up and tear-down before and after each test case. Which make the infamous test_zmedia_cleanup.html really questionable -> bug 1188120 > I retriggered a couple of times so that I hopefully get the error with full > logging... Obviously the problem seems to dis-appear if you turn on full logging...
Summary: Intermittent B2G test_peerConnection_closeDuringIce.html,test_peerConnection_offerRequiresReceiveAudio.html | Test timed out → Intermittent B2G test_peerConnection_closeDuringIce.html,test_peerConnection_offerRequiresReceiveAudio.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test timed out
Still happening at low frequency
Rank: 19 → 25
Priority: P1 → P2
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.