Closed Bug 1472321 Opened 2 years ago Closed 2 years ago

Transceivers that are created recvonly/inactive don't have a peer identity when they have a send track

Categories

(Core :: WebRTC: Signaling, defect, P1)

61 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- wontfix
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: damencho, Assigned: bwc)

References

Details

(Keywords: regression, Whiteboard: [need info bwc 2018-07-06])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36

Steps to reproduce:

We got reports that when using meet.jit.si and Firefox if you start muted you cannot unmute, we are currently working on fixing the video part, but we found a problem with the audio.

The happy path:
- addStream(audio) to the peerconnection
- setLocalDescription (audio stream is sendrecv) (happy-SLD.txt attachment)
In this case, audio flows without a problem.

The problem scenario:
- setLocalDescription (audio stream is recvonly) (unhappy-SLD-1.txt attachment)
- after some time addStream(audio) to the peerconnection 
- setLocalDescription (audio stream is sendrecv) (unhappy-SLD-2.txt attachment)

We tested that this does not occur with Firefox 59 and 60, but we reproduce it with 61 and 62Nightly.
If you need we can set up an environment where you can test this against jitsi-meet deployment. There are few signaling fixes that still are not deployed on meet.jit.si.



Actual results:

The audio start flowing, but the rtp extension for audio levels has a value indicating audio is muted(0xff). Nothing is heard on the remote side, where audio is received and decoded.


Expected results:

Should be sending the audio from the device.
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core
Byron, can you help me triage this one?
Component: WebRTC: Audio/Video → WebRTC: Signaling
Flags: needinfo?(docfaraday)
Whiteboard: [need info bwc 2018-07-06]
(In reply to Damian Minkov from comment #0)
> Created attachment 8988921 [details]
> Archive with the values passed to setLocalDescription in the described
> scenarios
> 
> User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36
> 
> Steps to reproduce:
> 
> We got reports that when using meet.jit.si and Firefox if you start muted
> you cannot unmute, we are currently working on fixing the video part, but we
> found a problem with the audio.
> 
> The happy path:
> - addStream(audio) to the peerconnection
> - setLocalDescription (audio stream is sendrecv) (happy-SLD.txt attachment)
> In this case, audio flows without a problem.
> 
> The problem scenario:
> - setLocalDescription (audio stream is recvonly) (unhappy-SLD-1.txt
> attachment)
> - after some time addStream(audio) to the peerconnection 
> - setLocalDescription (audio stream is sendrecv) (unhappy-SLD-2.txt
> attachment)
> 
> We tested that this does not occur with Firefox 59 and 60, but we reproduce
> it with 61 and 62Nightly.
> If you need we can set up an environment where you can test this against
> jitsi-meet deployment. There are few signaling fixes that still are not
> deployed on meet.jit.si.
> 
> 
> 
> Actual results:
> 
> The audio start flowing, but the rtp extension for audio levels has a value
> indicating audio is muted(0xff). Nothing is heard on the remote side, where
> audio is received and decoded.
> 
> 
> Expected results:
> 
> Should be sending the audio from the device.

I met the same issue. And a quick fix is to add a fake audio stream to the peerconnection, and then call replaceTrack to replace the fake one with a real one.

Hope it could help.

(In reply to Damian Minkov from comment #0)
> Created attachment 8988921 [details]
> Archive with the values passed to setLocalDescription in the described
> scenarios
> 
> User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36
> 
> Steps to reproduce:
> 
> We got reports that when using meet.jit.si and Firefox if you start muted
> you cannot unmute, we are currently working on fixing the video part, but we
> found a problem with the audio.
> 
> The happy path:
> - addStream(audio) to the peerconnection
> - setLocalDescription (audio stream is sendrecv) (happy-SLD.txt attachment)
> In this case, audio flows without a problem.
> 
> The problem scenario:
> - setLocalDescription (audio stream is recvonly) (unhappy-SLD-1.txt
> attachment)
> - after some time addStream(audio) to the peerconnection 
> - setLocalDescription (audio stream is sendrecv) (unhappy-SLD-2.txt
> attachment)
> 
> We tested that this does not occur with Firefox 59 and 60, but we reproduce
> it with 61 and 62Nightly.
> If you need we can set up an environment where you can test this against
> jitsi-meet deployment. There are few signaling fixes that still are not
> deployed on meet.jit.si.
> 
> 
> 
> Actual results:
> 
> The audio start flowing, but the rtp extension for audio levels has a value
> indicating audio is muted(0xff). Nothing is heard on the remote side, where
> audio is received and decoded.
> 
> 
> Expected results:
> 
> Should be sending the audio from the device.

I met the same issue. And a quick fix is to add a fake audio stream to the peerconnection, and then call replaceTrack to replace the fake one with a real one.

Hope it could help.
Perhaps webrtc.org is not doing the right thing when we configure the audio level RTP extension on the second negotiation?
(In reply to Byron Campen [:bwc] (PTO 7/3-7/11) from comment #3)
> Perhaps webrtc.org is not doing the right thing when we configure the audio
> level RTP extension on the second negotiation?

I guess we keep using the same RTP transport even if we restart sender/receivers. Maybe we mute anything as part of a renegotiation and the RTP transport remembers the muted state?

Mozregression might also help to identify what has changed on our end.
Status: UNCONFIRMED → NEW
Rank: 15
Ever confirmed: true
Priority: -- → P2
Can I get an environment where I can reproduce this? Alternately, could you use mozregression to pinpoint where things broke?
Flags: needinfo?(damencho)
To reproduce it:
You can open https://meet.jit.si/someroom in another tab/browser and then in the tab where you will reproduce the issue open: 
https://meet.jit.si/someroom#config.startWithAudioMuted=true this will create the receive-only audio stream. Then audio unmuting from the toolbar on the bottom will add the stream and setLocalDescription with sendrecv.

If you want to disable and video, to speed up testing you can use the following two links:
https://meet.jit.si/someroom#config.startWithVideoMuted=true
https://meet.jit.si/someroom#config.startWithAudioMuted=true&config.startWithVideoMuted=true
Flags: needinfo?(damencho)
I also used and mozregression, and this was the result:

13:17.63 INFO: Narrowed inbound regression window from [a91bff97, 0665d23d] (6 builds) to [ccc6b901, 0665d23d] (2 builds) (~1 steps left)
13:17.63 INFO: No more inbound revisions, bisection finished.
13:17.63 INFO: Last good revision: ccc6b9010664de1531bfa50d7080399d4a162fbb
13:17.63 INFO: First bad revision: 0665d23d7e8788a07048f1454ea4457c0197b9d0
13:17.63 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ccc6b9010664de1531bfa50d7080399d4a162fbb&tochange=0665d23d7e8788a07048f1454ea4457c0197b9d0
Looks like bug 1445860 is preventing the peer identity from being set on a transceiver that has a track, but is not transmitting at the moment.
Summary: no outgoing audio(muted) after renegotiation → Transceivers that are created recvonly/inactive don't have a peer identity when they have a send track
Assignee: nobody → docfaraday
Blocks: 1445860
Flags: needinfo?(docfaraday)
I'm guessing the culprit is that this line https://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp#1509 no longer gets called after landing of bug 1445860.

I guess we should invert the logic there and enable the track by default and have MediaPipelineTransmit::UpdateSinkIdentity_m() only disabled it if needed by PeerIdentity.
Rank: 15 → 9
Priority: P2 → P1
What we probably need to do is call UpdateSinkIdentity_m on all non-stopped transceivers, but have UpdateSinkIdentity_m check if mDomTrack is set.
Comment on attachment 8992652 [details]
Bug 1472321: Make sure sink identity is set on recvonly/inactive transceivers.

https://reviewboard.mozilla.org/r/257518/#review264428
Attachment #8992652 - Flags: review+
Comment on attachment 8992652 [details]
Bug 1472321: Make sure sink identity is set on recvonly/inactive transceivers.

https://reviewboard.mozilla.org/r/257518/#review264474

Looks good to me.
Attachment #8992652 - Flags: review?(mfroman) → review+
I tried to reproduce with the patch, and audio seemed to work. Can you verify?
Flags: needinfo?(damencho)
Is there a build I can test? Or is there an easy way to run with that patch?
I had tested several times with the Mac binary and it works. 
But on few of the tests, I got crash when unmuting, one of the reports https://crash-stats.mozilla.com/report/index/1baf8be6-c813-4358-9aa5-be4010180718#tab-details, not sure whether the crash is related to the current bug.
Flags: needinfo?(damencho)
I am trying to reproduce this crash, but I cannot. However, I don't get audio when I unmute anymore. Did something change on your end?
Flags: needinfo?(damencho)
I just re-tested it with the build that Nils sent above and it works(the audio is heard), then tested again with latest Firefox and I still reproduce the issue. This time I didn't have any crash, around 10 attempts.
The scenario I'm testing:
1. Open in chrome https://meet.jit.si/damencho, audio mute myself in order to be sure the audio I will hear is coming from Firefox
2. Open in Firefox https://meet.jit.si/damencho#config.startWithAudioMuted=true and then umute and I hear myself.
Flags: needinfo?(damencho)
I'm really wary of fixing this if it just leads (sometimes) to a crash elsewhere. I'll keep trying to reproduce.
I just got the following crash after rebasing the changeset:

Assertion failure: aStream->GraphImpl()->IterationEnd() > mAllocations[i].mLastCallbackAppendTime, at /Users/bcampen/checkouts/mozilla-central/dom/media/webrtc/MediaEngineWebRTCAudio.cpp:840
#01: mozilla::MediaDevice::Pull(RefPtr<mozilla::SourceMediaStream> const&, int, long long, nsMainThreadPtrHandle<nsIPrincipal> const&)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a0a4dd]
#02: mozilla::SourceListener::NotifyPull(mozilla::MediaStreamGraph*, long long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a19db4]
#03: mozilla::SourceMediaStream::PullNewData(long long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a7fae0]
#04: mozilla::MediaStreamGraphImpl::UpdateGraph(long long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a7f144]
#05: mozilla::MediaStreamGraphImpl::OneIteration(long long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a80ab3]
#06: mozilla::AudioCallbackDriver::DataCallback(float const*, float*, long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2976478]
#07: passthrough_resampler<float>::fill(void*, long*, void*, long)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3d520d8]
#08: audiounit_output_callback(void*, unsigned int*, AudioTimeStamp const*, unsigned int, unsigned int, AudioBufferList*)[/Users/bcampen/checkouts/mozilla-central/obj-x86_64-apple-darwin17.6.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3d4abb7]
#09: AUConverterBase::RenderBus(unsigned int&, AudioTimeStamp const&, unsigned int, unsigned int)[/System/Library/Components/CoreAudio.component/Contents/MacOS/CoreAudio +0x3560]
#10: AUBase::DoRenderBus(unsigned int&, AudioTimeStamp const&, unsigned int, AUOutputElement*, unsigned int, AudioBufferList&)[/System/Library/Components/CoreAudio.component/Contents/MacOS/CoreAudio +0x12dca4]
#11: AUBase::DoRender(unsigned int&, AudioTimeStamp const&, unsigned int, unsigned int, AudioBufferList&)[/System/Library/Components/CoreAudio.component/Contents/MacOS/CoreAudio +0x12d37a]
#12: AUHAL::AUIOProc(unsigned int, AudioTimeStamp const*, AudioBufferList const*, AudioTimeStamp const*, AudioBufferList*, AudioTimeStamp const*, void*)[/System/Library/Components/CoreAudio.component/Contents/MacOS/CoreAudio +0x6d44]
#13: HALC_ProxyIOContext::IOWorkLoop()[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio +0x3b422]
#14: HALC_ProxyIOContext::IOThreadEntry(void*)[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio +0x3a2dc]
#15: HALB_IOThread::Entry(void*)[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio +0x3a01e]
#16: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3661]
#17: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x350d]
Seems to happen reliably, after granting gUM audio permission, when loading https://meet.jit.si/someroom#config.startWithAudioMuted=true in a single tab (and nobody else in the room).
Happens without the changeset, too. Tracking down the problem with mozregression...
Hmm. I got nonsense out of mozregression, and the last several tests did not crash.
Ok, this took a while because mozregression was inappropriately expanding its set of revisions, but things seem to have broken on this push:

https://hg.mozilla.org/mozilla-central/log?rev=d6a5e8aea651

Lots of media-related bugs landed there, continuing to search.
Ok, after rebasing to before the offending changeset, I'm getting the following crash:

#
# Fatal error in /Users/bcampen/checkouts/mozilla-central/media/webrtc/trunk/webrtc/call/call.cc, line 654
# last system error: 0
# Check failed: video_receive_ssrcs_.find(config.rtp.remote_ssrc) == video_receive_ssrcs_.end()
# 
#
Redirecting call to abort() to mozalloc_abort

Here's that line:

https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/call/call.cc#654


It looks like something in jitsi is causing us to create a receive stream for a remote ssrc that we already have a receive stream for. Looking closer.
FWIW, this probably has nothing to do with the changeset on this bug, since it is video, but maybe a similar bug exists for audio?
Ok, so what appears to be happening is we receive two RTP packets with the same SSRC that end up being routed to different MediaPipelines. This seems to be happening because the first RTP packet with that SSRC arrives when the negotiated session has only one video m-section; our bundle filter code lets it through to the only video MediaPipeline, which then learns the ssrc and configures it on the VideoConduit. Then, we get a SRD (offer) with a new video m-section, which is where that SSRC _actually_ belongs. We do not crash yet though, because we do not set that SSRC on a VideoConduit until negotiation completes. We also do not update the bundle filter yet. But we do create a new MediaPipeline, which also gets the RTP packets, which then sets the SSRC on the new VideoConduit, which makes webrtc.org assert.

Ultimately, even if we were updating things like filters and SSRCs more aggressively (which we should probably be doing), we're still screwed if an endpoint negotiates a session with a single video stream, then starts transmitting a new stream (with a new SSRC) that has nothing to do with the pre-existing session, and then finally negotiates that new stream on a new m-section. I'll have to think about how to mitigate this problem.
Ok, so the best solution for handing these unsignaled (or not-yet signaled) SSRCs is still the RTP MID extension, and it looks like webrtc.org might just have some support for it. We should probably try it out.

Absent RTP MID (which the middlebox would need support for, of course), we would need to ensure that every time we set a receive ssrc on a VideoConduit, either the VideoConduit ensures that it is unset on the entire call object first, or MediaPipeline ensures that it is unset on all the VideoConduits. I know the latter is not going to be easy, not sure about the former. Since it is webrtc.org being picky, I am inclined to put this handling in VideoConduit if possible.
Really though, endpoints should not be sending an RTP stream until they have seen a remote description...
Yeah, even if I use a build with asserts disabled, I still sometimes get a crash here:

https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/call/call.cc#692

I'm going to make sure this patch is not the cause, but I bet it is caused by the same thing I described in comment 28.
Yeah, this happens regardless of whether the patch here is applied.
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a36535761268
Make sure sink identity is set on recvonly/inactive transceivers. r=drno,mjf
https://hg.mozilla.org/mozilla-central/rev/a36535761268
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Duplicate of this bug: 1475884
Duplicate of this bug: 1480277
Hello Byron!!
I miss a unit test in https://reviewboard.mozilla.org/r/257518/diff in order to avoid another regression in the future, which i worry a lot.

Do you think a unit test could be added?

Thanks!!
Ditto on the test question. Also, please nominate this for Beta approval if you feel it's suitable for uplift.
Flags: needinfo?(docfaraday)
Flags: in-testsuite?
Comment on attachment 8992652 [details]
Bug 1472321: Make sure sink identity is set on recvonly/inactive transceivers.

Approval Request Comment
[Feature/Bug causing the regression]:

   Bug 1445860

[User impact if declined]:

   Some webrtc unmute use-cases will not work properly.

[Is this code covered by automated tests?]:

   No, this fix is not tested presently.

[Has the fix been verified in Nightly?]:

   Yes.

[Needs manual test from QE? If yes, steps to reproduce]: 

   No.

[List of other uplifts needed for the feature/fix]:

   None.

[Is the change risky?]:

   Not very.

[Why is the change risky/not risky?]:

   The primary risk here is that making this corner case work might cause some other part of the code to be exercised in a new way, but it seems to work fine.

[String changes made/needed]:

   None.
Flags: needinfo?(docfaraday)
Attachment #8992652 - Flags: approval-mozilla-beta?
Comment on attachment 8992652 [details]
Bug 1472321: Make sure sink identity is set on recvonly/inactive transceivers.

Fix for a WebRTC issue and a crash, sounds a little risky let's uplift. 
Byron can you file a followup bug for adding tests if you intend to work on  that?
Flags: needinfo?(docfaraday)
Attachment #8992652 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.