Closed Bug 1365090 Opened 8 years ago Closed 8 years ago

Crash when renegotiating SDP and simulcast is enabled

Categories

(Core :: WebRTC: Networking, defect, P1)

54 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: saeed, Assigned: drno)

References

Details

(Keywords: crash)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 Steps to reproduce: crash report at: https://crash-stats.mozilla.com/report/index/38edd174-7da8-42f0-96d0-ad7320170515 Note the actual error differs a bit from the line number in the attach crash report: # # Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_send_stream.cc, line 188 # Check failed: ssrcs.size() == config_.rtp.rids.size() (3 vs. 6) # # The root cause is we believe is in WebrtcVideoConduit::ConfigureSendMediaCodec When completing renegotiation, mEncoderConfig is cleared then recreated from codecConfig's details. But, mSendStreamConfig simply push_back()s any rids without clearing first. So later, at the call level, the ssrc count and rid count differ. RIDs contains the correct values, but duplicated. If I add a clear to mSendStreamConfig in a local build things work fine. Unfortunately, we have not been able to create an isolated repro that we can send to you. If we do, I'll make sure to add it to the bug.
Some details I left out. We've been able to cause this crash on Firefox 54 beta on a Linux box.
Severity: normal → critical
Component: Untriaged → WebRTC: Networking
Keywords: crash
Product: Firefox → Core
Thanks, Saeed, for all the details. Nils -- Can you run with this and get a fix landed this week? If so (and the fix is as straightforward as it looks), I think we can get the fix into Beta 54. Thanks.
Status: UNCONFIRMED → NEW
Rank: 12
Ever confirmed: true
Flags: needinfo?(drno)
Priority: -- → P1
Assignee: nobody → drno
Flags: needinfo?(drno)
Comment on attachment 8868001 [details] Bug 1365090: clear RID vector when reconfiguring send media codec. https://reviewboard.mozilla.org/r/139558/#review142898
Attachment #8868001 - Flags: review?(rjesup) → review+
The crash report actually looks like a report from the other bug regarding renegotiating extensions. But I agree with findings regarding the assertion you included here in the ticket. As we don't have a way to repro can I ask you to download this try build here https://queue.taskcluster.net/v1/task/FWTjxedDR2mzMc-Nqwgl3g/runs/0/artifacts/public/build/target.tar.bz2 and report back if it fixes the problem for you?
Flags: needinfo?(saeed)
Hey Nils, 'file firefox' shows a 32bit ELF an dI am hitting libgtk errors. Is there a 64bit artifact? I tried browsing around that taskcluster url but couldn't find it. Thanks, Tim
Flags: needinfo?(drno)
Sorry about that. Looks I made the same mistake again. This URL should work (I downloaded it and verified the file): https://queue.taskcluster.net/v1/task/cnLnBDvjRhKZMCH9ySK-Gw/runs/0/artifacts/public/build/target.tar.bz2
Flags: needinfo?(drno)
Ok I ran that build. Things advance but I hit a different CHECK down in the webrtc layer. I did hit this once locally on my own build but hadn't been able to repro inside gdb to see what part of applying the encoder settings specifically didn't work. There's not actually that many return false paths in that function. # # Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_send_stream.cc, line 228 # Check failed: ReconfigureVideoEncoder(encoder_config) # # ==== C stack trace =============================== /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2715df2) [0x7f9dd8e8adf2] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x26ef6d9) [0x7f9dd8e646d9] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x26e215e) [0x7f9dd8e5715e] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a41aa2) [0x7f9dd91b6aa2] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a41b95) [0x7f9dd91b6b95] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a533d8) [0x7f9dd91c83d8] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a663df) [0x7f9dd91db3df] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a608c4) [0x7f9dd91d58c4] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a61a15) [0x7f9dd91d6a15] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x14fa1a6) [0x7f9dd7c6f1a6] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x184b8c5) [0x7f9dd7fc08c5] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b172ee) [0x7f9dd928c2ee] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b0a1da) [0x7f9dd927f1da] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b16cb0) [0x7f9dd928bcb0] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b173f7) [0x7f9dd928c3f7] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b17909) [0x7f9dd928c909] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b940c9) [0x7f9dd93090c9] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b944bd) [0x7f9dd93094bd] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b18197) [0x7f9dd928d197] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b0c747) [0x7f9dd9281747] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b16cb0) [0x7f9dd928bcb0] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b173f7) [0x7f9dd928c3f7] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b17909) [0x7f9dd928c909] /usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2d38f2a) [0x7f9dd94adf2a] [0x3b2fa98f43cf] Redirecting call to abort() to mozalloc_abort https://crash-stats.mozilla.com/report/index/c3b8d7a8-6453-4431-bdec-2772c1170516
I reproe'd in gdb and am investigating now. Somethign about the encoder config is not right. sterr logging is helpful at least: (rtp_packet_history.cc:44): Purging packet history in order to re-set status. (video_send_stream.cc:307): (Re)configureVideoEncoder: {streams: [{width: 160, height: 90, max_framerate: 30, min_bitrate_bps:40000, target_bitrate_bps:80000, max_bitrate_bps:250000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 320, height: 180, max_framerate: 30, min_bitrate_bps:100000, target_bitrate_bps:150000, max_bitrate_bps:500000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 640, height: 360, max_framerate: 30, min_bitrate_bps:200000, target_bitrate_bps:300000, max_bitrate_bps:1300000, max_qp: 56, temporal_layer_thresholds_bps: [0]}], content_type: kRealtimeVideo, encoder_specific_settings: NULL, min_transmit_bitrate_bps: 0} (media_optimization.cc:216): SetTargetRates: 3548144 bps 0% loss 398ms RTT (media_optimization.cc:308): SetTargetRates/enable_qm: 3548.14 bps, 0 kbps, 0 fps, (generic_encoder.cc:131): Failed to initialize the encoder associated with payload name: VP8 (codec_database.cc:265): Failed to initialize video encoder. (video_sender.cc:104): Failed to initialize set encoder with payload name 'VP8'. (video_send_stream.cc:572): Failed to set encoder.
(In reply to Tim Haloun from comment #9) > I reproe'd in gdb and am investigating now. Somethign about the encoder > config is not right. sterr logging is helpful at least: > > (rtp_packet_history.cc:44): Purging packet history in order to re-set status. > (video_send_stream.cc:307): (Re)configureVideoEncoder: {streams: [{width: > 160, height: 90, max_framerate: 30, min_bitrate_bps:40000, > target_bitrate_bps:80000, max_bitrate_bps:250000, max_qp: 56, > temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 320, height: 180, > max_framerate: 30, min_bitrate_bps:100000, target_bitrate_bps:150000, > max_bitrate_bps:500000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0, > 0]}, {width: 640, height: 360, max_framerate: 30, min_bitrate_bps:200000, > target_bitrate_bps:300000, max_bitrate_bps:1300000, max_qp: 56, > temporal_layer_thresholds_bps: [0]}], content_type: kRealtimeVideo, The third layer appears to have only an array of size one, but the other two have three rates. Could that cause trouble?
BTW if you want to be able to debug this your self this is the URL for the treeherder run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb0932cdce02 You have to click on the tc() of the platform you are looking for. Then on the green B. And then find in the list uploaded artifacts the target.tar.bz2. This should be the 64bit debug build: https://queue.taskcluster.net/v1/task/cPG4RC04T4ygJvgrii7s-Q/runs/0/artifacts/public/build/target.tar.bz2 And this is the 64bit ASAN opt build: https://queue.taskcluster.net/v1/task/WQfvxqr6RX-cOattocpcFA/runs/0/artifacts/public/build/target.tar.bz2
With Noah's help we figured it out. This was super tricky. The bug is that VideoSendStream picks an incorrect overall max bitrate by adding the max bitrates of all the sub-streams (ln 429). The actual max bitrate of the overall stream is the sum of the target bitrates of lower streams and the max bitrate of the highest stream. The issue is, on reconfiguration, the following happens: 1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new config to the available bitrate, so potentially a large number. 2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger than the configured codec max bitrate (2050), and caps it 3) SimulcastEncoderAdapter only checks the overall cap and is happy, but when it hands out bitrates, it gives the last layer whatever is left after consuming the target bitrates of lower layers 4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is higher than max and rejects the config.
(In reply to Tim Haloun from comment #12) > With Noah's help we figured it out. This was super tricky. > > The bug is that VideoSendStream picks an incorrect overall max bitrate by > adding the max bitrates of all the sub-streams (ln 429). The actual max > bitrate of the overall stream is the sum of the target bitrates of lower > streams and the max bitrate of the highest stream. > > The issue is, on reconfiguration, the following happens: > 1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new > config to the available bitrate, so potentially a large number. > 2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger > than the configured codec max bitrate (2050), and caps it > 3) SimulcastEncoderAdapter only checks the overall cap and is happy, but > when it hands out bitrates, it gives the last layer whatever is left after > consuming the target bitrates of lower layers I guess this is the magic line which controls if a lower layer takes the max or the target bitrate depending on whether the upper layer still has enough bitrate left for its minimum bitrate: https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/simulcast_encoder_adapter.cc#425 > 4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is > higher than max and rejects the config. I assume you refer to this check here: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc#351
RE #13 (In reply to Nils Ohlmeier [:drno] from comment #13) > (In reply to Tim Haloun from comment #12) > > With Noah's help we figured it out. This was super tricky. > > > > The bug is that VideoSendStream picks an incorrect overall max bitrate by > > adding the max bitrates of all the sub-streams (ln 429). The actual max > > bitrate of the overall stream is the sum of the target bitrates of lower > > streams and the max bitrate of the highest stream. > > > > The issue is, on reconfiguration, the following happens: > > 1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new > > config to the available bitrate, so potentially a large number. > > 2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger > > than the configured codec max bitrate (2050), and caps it > > 3) SimulcastEncoderAdapter only checks the overall cap and is happy, but > > when it hands out bitrates, it gives the last layer whatever is left after > > consuming the target bitrates of lower layers > > I guess this is the magic line which controls if a lower layer takes the max > or the target bitrate depending on whether the upper layer still has enough > bitrate left for its minimum bitrate: > > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/ > modules/video_coding/codecs/vp8/simulcast_encoder_adapter.cc#425 > Yes. That line seems out of sync with what video_send_stream puts in to the overall maximum. > > 4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is > > higher than max and rejects the config. > > I assume you refer to this check here: > http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/ > modules/video_coding/codecs/vp8/vp8_impl.cc#351 Correct. This is the ultimate check that cascades back into a false result. Confirmed, After changing video_send_stream.cc line 429 to add target bitrates for the subordinate streams I can't reproduce the crash. I really have no idea how we'd construct a test suite for this though. It looks like this code is completely refactored in later webrtc revisions.
Flags: needinfo?(drno)
Tim, does this https://bugzilla.mozilla.org/attachment.cgi?id=8868329 look like the patch you wrote up for own local verification? Our testing here relies on the webrtc.org unit tests, which apparently did not catch this. Plus some end-to-end level test within the browser, where it appears almost impossible to catch this.
Flags: needinfo?(drno)
Yes that's equivalent, plus the original change to the clear the rid vector.
Comment on attachment 8868329 [details] Bug 1365090: use target bitrate instead of max for simulcast. https://reviewboard.mozilla.org/r/139910/#review143522 ::: media/webrtc/trunk/webrtc/video/video_send_stream.cc:429 (Diff revision 1) > + if (i + 1 == streams.size()) { > - video_codec.maxBitrate += streams[i].max_bitrate_bps / 1000; > + video_codec.maxBitrate += streams[i].max_bitrate_bps / 1000; > + } else { > + video_codec.maxBitrate += streams[i].target_bitrate_bps / 1000; > + } This could use a comment.
Attachment #8868329 - Flags: review?(docfaraday) → review+
Pushed by drno@ohlmeier.org: https://hg.mozilla.org/integration/autoland/rev/0f749464984f clear RID vector when reconfiguring send media codec. r=jesup https://hg.mozilla.org/integration/autoland/rev/06bdd79f4ba2 use target bitrate instead of max for simulcast. r=bwc
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8868001 [details] Bug 1365090: clear RID vector when reconfiguring send media codec. Approval Request Comment [Feature/Bug causing the regression]: bug 1337468 [User impact if declined]: Potential crashes [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: In a try run by the reporter [Needs manual test from QE? If yes, steps to reproduce]: No, since we can't repro ourself. [List of other uplifts needed for the feature/fix]: N/A [Is the change risky?]: No [Why is the change risky/not risky?]: We just ensure to empty out a vector before re-using it. [String changes made/needed]: N/A
Attachment #8868001 - Flags: approval-mozilla-beta?
Comment on attachment 8868329 [details] Bug 1365090: use target bitrate instead of max for simulcast. Approval Request Comment [Feature/Bug causing the regression]: Bug in third party library [User impact if declined]: Potential crashes in WebRTC calls [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: In a try build by the reporter [Needs manual test from QE? If yes, steps to reproduce]: No, as we can't repro out self [List of other uplifts needed for the feature/fix]: N/A [Is the change risky?]: No [Why is the change risky/not risky?]: The patch only gives less bandwidth to a special type of WebRTC call which involves multiple streams (simulcast). [String changes made/needed]: N/A
Attachment #8868329 - Flags: approval-mozilla-beta?
Depends on: 1366234
Comment on attachment 8868001 [details] Bug 1365090: clear RID vector when reconfiguring send media codec. Fix a potential WebRTC crash. Beta54+. Should be in 54 beta 10.
Attachment #8868001 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8868329 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Nils Ohlmeier [:drno] from comment #23) > [Is this code covered by automated tests?]: Yes > [Has the fix been verified in Nightly?]: In a try build by the reporter > [Needs manual test from QE? If yes, steps to reproduce]: No, as we can't > repro out self Setting qe-verify- based on Nils' assessment on manual testing.
Flags: qe-verify-
Flags: needinfo?(saeed)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: