Closed
Bug 1351531
Opened 7 years ago
Closed 7 years ago
Intermittent dom/media/tests/mochitest/test_peerConnection_simulcastAnswer.html | Error in test execution: Error: Timeout for element pcLocal_remote_{5c32567b-e920-4183-9554-7fe01c3b9670} waitForMediaElementFlow@http://mochi.test:8888/tests/dom/media/test
Categories
(Core :: WebRTC, defect, P2)
Core
WebRTC
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: mjf)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(2 files)
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=87131531&repo=autoland https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-win64-pgo/1490740507/autoland_win8_64_test_pgo-mochitest-media-e10s-bm112-tests1-windows-build52.txt.gz
Comment 1•7 years ago
|
||
Seems like we recently started having issues with getting video frames on inbound tracks.
Comment 2•7 years ago
|
||
Byron, could you take a look? I think this is caused by bug 1344556. The first occurrence of this (together with bug 1351590) intermittent was on autoland roughly 2h after bug 1344556 was pushed to autoland.
Blocks: 1344556
Flags: needinfo?(docfaraday)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
:mreavy, can you help find someone to look into this? It has been a week with no action on this bug, also bug 1351590 seems to be related.
Flags: needinfo?(mreavy)
Updated•7 years ago
|
Whiteboard: [stockwell needswork]
Comment 6•7 years ago
|
||
Thanks, Joel. Nils -- As we discussed on irc, we need an owner for this.
Flags: needinfo?(mreavy)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
Comment 7•7 years ago
|
||
Michael agreed to take a look at this.
Assignee: nobody → mfroman
Flags: needinfo?(drno)
Comment hidden (Intermittent Failures Robot) |
Comment 9•7 years ago
|
||
Michael, do you have an update here, this is a very frequent issue and combined with the failures in bug 1351590, this is one of the most frequent issues in our tests in the last week.
Flags: needinfo?(mfroman)
Assignee | ||
Comment 10•7 years ago
|
||
I've been digging into it for a week. I can reproduce the intermittent locally, but I have not yet figured out the root of the issue. I'm sorry I don't have a fix yet, but I'm still working on it. It is my top priority.
Flags: needinfo?(mfroman)
Assignee | ||
Comment 11•7 years ago
|
||
When this intermittent case fails, it is when the first ssrc received is unknown. You can see this more easily using the command: egrep 'new ssrc|unknown SSRC' log.txt | less
Flags: needinfo?(rjesup)
Comment 12•7 years ago
|
||
I'll look on Tuesday (pto til then). Note we added code to Videoconduit recently to better adapt (n FF53 and later) to unknown/un-signaled SSRCs. This fixed Spark for example.
Comment hidden (Intermittent Failures Robot) |
Comment 14•7 years ago
|
||
So from looking at the log file in attachment 8858356 [details] it looks to me like there is some confusion going on with the SSRCs being used on the two PeerConnection in the test. The test creates a second fake offer on PcLocal here: http://searchfox.org/mozilla-central/source/dom/media/tests/mochitest/test_peerConnection_simulcastAnswer.html#60 which generates the SSRC's 3724330823 an 2816312869. But when it comes to generating the answer it shows exactly the same SSRC's. Not sure why. I guess the offerer theoretically should never use the two SSRC's from the fake offer. I would try to remove that call which generates the second offer and replace it with some code which simply inserts the needed attributes into the SDP offer from hard coded values instead.
Comment 15•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #14) > I would try to remove that call which generates the second offer and replace > it with some code which simply inserts the needed attributes into the SDP > offer from hard coded values instead. Never mind. That second offer call is made on the PcRemote side. So it makes sense that the SSRC's are matching with the one from the answer later on.
Comment 16•7 years ago
|
||
My suspicion is that these lines starting in line 1050: GECKO(15334) | (vie_receiver.cc:321): Packet received on SSRC: 2816312869 with payload type: 120, timestamp: 2460978607, sequence number: 1954, arrival time: 67340983, toffset: 450, abs send time: 3400532 GECKO(15334) | (congestion_controller.cc:104): WrappingBitrateEstimator: Switching to absolute send time RBE. GECKO(15334) | (remote_bitrate_estimator_abs_send_time.cc:119): RemoteBitrateEstimatorAbsSendTime: Instantiating. GECKO(15334) | (vie_channel.cc:1302): OnInitializeDecoder 120 VP8 GECKO(15334) | (call.cc:727): DeliverRtp: found unknown SSRC: 3724330823 might be pointing to the root cause. This looks to me like our filter let pass the initial packets for both SSRC's (2816312869 and 3724330823) through to the decoder. Which makes sense as the answerer in this test case can probably start sending the two RTP streams, before the offerer (and receiver in this case) has received the SSRC's via the signaling. I'm starting to wonder if filtering by RID would be better and safer thing to do here. But that would be a lot of effort just to fix an intermittent.
Comment 17•7 years ago
|
||
Thanks for the comments and analysis :drno! It sounds like there might be something to do here to fix this! :drno, is this something you could work on or could help get someone to look at this? We have pretty high failure rates on this bug and bug 1351590 for the last 3 weeks and I would like to have something in the works this week or temporarily disable (windows && !debug - we would have coverage on windows debug still, osx*, linux*) in case we need another week or two to schedule the work in.
Flags: needinfo?(drno)
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #16) > My suspicion is that these lines starting in line 1050: > > GECKO(15334) | (vie_receiver.cc:321): Packet received on SSRC: 2816312869 > with payload type: 120, timestamp: 2460978607, sequence number: 1954, > arrival time: 67340983, toffset: 450, abs send time: 3400532 > GECKO(15334) | (congestion_controller.cc:104): WrappingBitrateEstimator: > Switching to absolute send time RBE. > GECKO(15334) | (remote_bitrate_estimator_abs_send_time.cc:119): > RemoteBitrateEstimatorAbsSendTime: Instantiating. > GECKO(15334) | (vie_channel.cc:1302): OnInitializeDecoder 120 VP8 > GECKO(15334) | (call.cc:727): DeliverRtp: found unknown SSRC: 3724330823 > > might be pointing to the root cause. This looks to me like our filter let > pass the initial packets for both SSRC's (2816312869 and 3724330823) through > to the decoder. Which makes sense as the answerer in this test case can > probably start sending the two RTP streams, before the offerer (and receiver > in this case) has received the SSRC's via the signaling. > > I'm starting to wonder if filtering by RID would be better and safer thing > to do here. But that would be a lot of effort just to fix an intermittent. If "our filter" refers to the filter that we're setting up just for this simulcast mochitest (PeerConnectionImpl::SelectSsrc), we're not getting that far in the mochitest. If you're referring to some other filtering, you may be on to something.
Comment 19•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #17) > Thanks for the comments and analysis :drno! It sounds like there might be > something to do here to fix this! > > :drno, is this something you could work on or could help get someone to look > at this? We have pretty high failure rates on this bug and bug 1351590 for > the last 3 weeks and I would like to have something in the works this week > or temporarily disable (windows && !debug - we would have coverage on > windows debug still, osx*, linux*) in case we need another week or two to > schedule the work in. I think Michael is on a good track of hunting down the root cause right now (we are chatting about this particular bug for hours now). Unfortunately the logic in this area is pretty complex and there is still a chance that it turns our we are chasing the wrong thing again. As Michael is able to repro the problem locally (unfortunately only with low frequency) I don't have a problem with disabling the tests temporarily until we have fix, if we need to. I suggest if we don't have a fix by Friday morning we will land a temporary test disable on Friday.
Flags: needinfo?(drno)
Comment hidden (mozreview-request) |
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8860132 [details] Bug 1351531 - disable test_peerConnection_simulcastAnswer.html. https://reviewboard.mozilla.org/r/132164/#review134990
Attachment #8860132 -
Flags: review?(drno) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 22•7 years ago
|
||
Pushed by drno@ohlmeier.org: https://hg.mozilla.org/integration/autoland/rev/9bfeb0378415 disable test_peerConnection_simulcastAnswer.html. r=drno
Keywords: checkin-needed
Comment 23•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9bfeb0378415
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 24•7 years ago
|
||
reopening until we get a real fix here
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Keywords: leave-open
Resolution: FIXED → ---
Whiteboard: [stockwell needswork] → [stockwell disabled]
Comment 25•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #24) > reopening until we get a real fix here We decided to develop the real fix over in bug 1358224 (because the fix is not going to trivial and it affects multiple intermittent bugs). So I think we can close this then, alright?
Blocks: 1358224
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Comment 26•7 years ago
|
||
sounds great
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•