Firefox crashes when using WebRTC and renegotiating with a new offer that has less RTP header extensions than the previous one.

RESOLVED FIXED in Firefox 54

Status

()

P1
normal
Rank:
19
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: saeed, Assigned: drno)

Tracking

({crash})

53 Branch
mozilla55
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 wontfix, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: regression)

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36

Steps to reproduce:

I was not able to come up with a clean isolated repro of this that I can share yet, but I will try to later. We noticed Firefox would crash when another participant joined the WebRTC call and did not support the 'toffset' RTP header extension.

In such a case we renegotiate the existing peer connection by setting a new offer SDP as the remote description.  The original remote offer SDP would have both 'abs-send-time' (with id=1) and 'toffset' (with id=2).  The new remote offer we use after a participant joins a call would only contain 'abs-send-time' (with id=1) and cause Firefox to fail at the below assertion and crash.

We tracked this down to a possible issue with WebrtcVideoConduit::AddLocalRTPExtensions(), but not entirely sure.

https://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/media-conduit/VideoConduit.cpp#260

It seems that extList will always continue to grow, and the values become corrupt since once remove_if() is called, the new extensions are added to the old pre-remove_if end of the list.


Actual results:

The following assertion failure causes Firefox to crash:

#
# Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_receive_stream.cc, line 213
# Check failed: 0 == vie_channel_->SetReceiveAbsoluteSendTimeStatus(true, id) (0 vs. -1)
# 
#
Component: Untriaged → WebRTC: Networking
Keywords: crash
Product: Firefox → Core
(Assignee)

Updated

2 years ago
Status: UNCONFIRMED → NEW
Rank: 19
Ever confirmed: true
Priority: -- → P1

Comment 1

2 years ago
FYI we believe the issue is the call to std::remove_if not being passed to std::erase, which leaves the last few elements in extList in an undefined state.  If this happens to leave multiple entries with the same ID (from the following insert call), then the lower levels abort later when the mapping of ID to extension clobbers itself.
(Assignee)

Comment 2

2 years ago
Thanks for tracking this so far down already.

Indeed the missing erase() looks like trouble. I'm working on test to verify the theory.
Assignee: nobody → drno
(Assignee)

Comment 4

2 years ago
Looks like the test was not crashing on any OS.

Can you open about:crashes on a Firefox instance which has crashed like described and put the crash link in here?
Flags: needinfo?(saeed)
Comment hidden (mozreview-request)
(Reporter)

Comment 6

2 years ago
Thanks for taking a look Nils.

https://crash-stats.mozilla.com/report/index/4ff3cf1f-2914-4b29-be15-bfeef0170511
Flags: needinfo?(saeed)
(Reporter)

Comment 7

2 years ago
This is somewhat strange, because while the crash log I just sent you seems to talk about failing on an assert on line 279 of video_receive_stream.cc, message printed to my stdout/err seems to talk about a different assert from a few lines above:

#
# Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_receive_stream.cc, line 213
# Check failed: 0 == vie_channel_->SetReceiveAbsoluteSendTimeStatus(true, id) (0 vs. -1)
# 
#
To be perfectly honest, I'm not sure why we have an _Add_LocalRTPExtensions function. Why is it not _Set_LocalRTPExtensions? If an RTP extension is negotiated off, we ought to remove it, right?

Comment 9

2 years ago
mozreview-review
Comment on attachment 8866647 [details]
Bug 1363563: remove and erase existing header extensions.

https://reviewboard.mozilla.org/r/138252/#review141630

Looks good to me.
Attachment #8866647 - Flags: review?(mfroman) → review+
(Assignee)

Comment 10

2 years ago
(In reply to Byron Campen [:bwc] from comment #8)
> To be perfectly honest, I'm not sure why we have an _Add_LocalRTPExtensions
> function. Why is it not _Set_LocalRTPExtensions? If an RTP extension is
> negotiated off, we ought to remove it, right?

Indeed. I also don't understand why the code tries to remove extension, just to add it back in immediately - as if removing it from the vector would result in some other actions, but it does not as far as I can tell.

My only concern with a _Set_ instead would be, that if we start re-using a config, we might actually leave extensions turned on. But the current code can't handle that either...
(Assignee)

Comment 11

2 years ago
(In reply to Saeed Jahed from comment #7)
> This is somewhat strange, because while the crash log I just sent you seems
> to talk about failing on an assert on line 279 of video_receive_stream.cc,
> message printed to my stdout/err seems to talk about a different assert from
> a few lines above:
> 
> #
> # Fatal error in
> /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/
> video_receive_stream.cc, line 213
> # Check failed: 0 == vie_channel_->SetReceiveAbsoluteSendTimeStatus(true,
> id) (0 vs. -1)
> # 
> #

That is indeed strange.

As I had no luck in trying to repro hitting that assert with a mochitest, can you download a build from here https://treeherder.mozilla.org/#/jobs?repo=try&revision=98f8bef987b3 - these builds include the erase() call. So if you are right, then this should fix the problem. If the problem is somewhere else we will hopefully know then as well.

You have to click on the green B for your OS, and then search in the lower left corner for the download link. If you let me know the target OS I can also extract the direct link and post it here.
Flags: needinfo?(saeed)
(Reporter)

Comment 12

2 years ago
> As I had no luck in trying to repro hitting that assert with a mochitest,
> can you download a build from here
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=98f8bef987b3 - these
> builds include the erase() call. So if you are right, then this should fix
> the problem. If the problem is somewhere else we will hopefully know then as
> well.
> 
> You have to click on the green B for your OS, and then search in the lower
> left corner for the download link. If you let me know the target OS I can
> also extract the direct link and post it here.

I don't see a Linux build there, and it seems like I can't repro the issue on my Mac (Firefox 54.0b5 (64-bit)) to begin with.
Flags: needinfo?(saeed)
(Assignee)

Comment 13

2 years ago
(In reply to Saeed Jahed from comment #12)
> I don't see a Linux build there, and it seems like I can't repro the issue
> on my Mac (Firefox 54.0b5 (64-bit)) to begin with.

That might explain why I was not able to repro then on my Mac :-)

Our treeherder UI is not really intuitive.. This should be the link to the Linux 64bit optimized build: https://queue.taskcluster.net/v1/task/ToO0fp0iRJeIsQ242-eNKw/runs/0/artifacts/public/build/target.tar.bz2

Would be great if you could report back if it fixes the issue for you.
Flags: needinfo?(saeed)
(Reporter)

Comment 14

2 years ago
Ah! Sorry, I should have mentioned that my crash was on my Linux :)

That link didn't quite work for me (libXt.so.6: cannot open shared object file), but I tried target.tar.bz2 from the green B under Linux x64 opt and that worked:
https://queue.taskcluster.net/v1/task/WJ9rTOTiSE6pirhAC4HBkA/runs/0/artifacts/public/build/target.tar.bz2

It no longer crashes!  So that seems to have fixed it.
Flags: needinfo?(saeed)
(Assignee)

Comment 15

2 years ago
(In reply to Saeed Jahed from comment #14)
> That link didn't quite work for me (libXt.so.6: cannot open shared object
> file), but I tried target.tar.bz2 from the green B under Linux x64 opt and
> that worked:
> https://queue.taskcluster.net/v1/task/WJ9rTOTiSE6pirhAC4HBkA/runs/0/
> artifacts/public/build/target.tar.bz2

Sorry I gave you accidentally the Linux 32bit link instead of the 64bit one.
 
> It no longer crashes!  So that seems to have fixed it.

Awesome. Thanks for helping with the verification!

Comment 16

2 years ago
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/ba7f87ca199a
remove and erase existing header extensions. r=mjf

Comment 17

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ba7f87ca199a
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Comment 18

2 years ago
Comment on attachment 8866647 [details]
Bug 1363563: remove and erase existing header extensions.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1250356
[User impact if declined]: Can cause crashes in WebRTC calls.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: It has been verified by the reporter in a try build.
[Needs manual test from QE? If yes, steps to reproduce]: No, as we were unable to repro it locally.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: Not really
[Why is the change risky/not risky?]: Because it uses a std operation to cleanup a vector, which was left in a dirty state before.
[String changes made/needed]: N/A
Attachment #8866647 - Flags: approval-mozilla-beta?
(Assignee)

Updated

2 years ago
Blocks: 1344557
Whiteboard: regression
status-firefox54: --- → affected
Comment on attachment 8866647 [details]
Bug 1363563: remove and erase existing header extensions.

Fix a WebRtc crash and was verified. Beta54+. Should be in 54 beta 8.
Attachment #8866647 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 20

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/121ef7d3dcae
status-firefox54: affected → fixed
Blocks: 1250356
status-firefox53: --- → wontfix
status-firefox-esr52: --- → unaffected
Version: 54 Branch → 53 Branch
You need to log in before you can comment on or make changes to this bug.