Closed
Bug 1365090
Opened 8 years ago
Closed 8 years ago
Crash when renegotiating SDP and simulcast is enabled
Categories
(Core :: WebRTC: Networking, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | fixed |
firefox55 | --- | fixed |
People
(Reporter: saeed, Assigned: drno)
References
Details
(Keywords: crash)
Attachments
(2 files)
59 bytes,
text/x-review-board-request
|
jesup
:
review+
gchang
:
approval-mozilla-beta+
|
Details |
59 bytes,
text/x-review-board-request
|
bwc
:
review+
gchang
:
approval-mozilla-beta+
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Steps to reproduce:
crash report at: https://crash-stats.mozilla.com/report/index/38edd174-7da8-42f0-96d0-ad7320170515
Note the actual error differs a bit from the line number in the attach crash report:
#
# Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_send_stream.cc, line 188
# Check failed: ssrcs.size() == config_.rtp.rids.size() (3 vs. 6)
#
#
The root cause is we believe is in WebrtcVideoConduit::ConfigureSendMediaCodec
When completing renegotiation, mEncoderConfig is cleared then recreated from codecConfig's details.
But, mSendStreamConfig simply push_back()s any rids without clearing first. So later, at the call level, the ssrc count and rid count differ. RIDs contains the correct values, but duplicated.
If I add a clear to mSendStreamConfig in a local build things work fine.
Unfortunately, we have not been able to create an isolated repro that we can send to you. If we do, I'll make sure to add it to the bug.
Reporter | ||
Comment 1•8 years ago
|
||
Some details I left out. We've been able to cause this crash on Firefox 54 beta on a Linux box.
Severity: normal → critical
Component: Untriaged → WebRTC: Networking
Keywords: crash
Product: Firefox → Core
Comment 2•8 years ago
|
||
Thanks, Saeed, for all the details.
Nils -- Can you run with this and get a fix landed this week? If so (and the fix is as straightforward as it looks), I think we can get the fix into Beta 54. Thanks.
Status: UNCONFIRMED → NEW
Rank: 12
Ever confirmed: true
Flags: needinfo?(drno)
Priority: -- → P1
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → drno
Flags: needinfo?(drno)
Comment hidden (mozreview-request) |
Comment 4•8 years ago
|
||
mozreview-review |
Comment on attachment 8868001 [details]
Bug 1365090: clear RID vector when reconfiguring send media codec.
https://reviewboard.mozilla.org/r/139558/#review142898
Attachment #8868001 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 5•8 years ago
|
||
The crash report actually looks like a report from the other bug regarding renegotiating extensions.
But I agree with findings regarding the assertion you included here in the ticket. As we don't have a way to repro can I ask you to download this try build here https://queue.taskcluster.net/v1/task/FWTjxedDR2mzMc-Nqwgl3g/runs/0/artifacts/public/build/target.tar.bz2 and report back if it fixes the problem for you?
Flags: needinfo?(saeed)
Comment 6•8 years ago
|
||
Hey Nils,
'file firefox' shows a 32bit ELF an dI am hitting libgtk errors. Is there a 64bit artifact? I tried browsing around that taskcluster url but couldn't find it.
Thanks,
Tim
Updated•8 years ago
|
Flags: needinfo?(drno)
Assignee | ||
Comment 7•8 years ago
|
||
Sorry about that. Looks I made the same mistake again.
This URL should work (I downloaded it and verified the file):
https://queue.taskcluster.net/v1/task/cnLnBDvjRhKZMCH9ySK-Gw/runs/0/artifacts/public/build/target.tar.bz2
Flags: needinfo?(drno)
Comment 8•8 years ago
|
||
Ok I ran that build. Things advance but I hit a different CHECK down in the webrtc layer. I did hit this once locally on my own build but hadn't been able to repro inside gdb to see what part of applying the encoder settings specifically didn't work. There's not actually that many return false paths in that function.
#
# Fatal error in /home/worker/workspace/build/src/media/webrtc/trunk/webrtc/video/video_send_stream.cc, line 228
# Check failed: ReconfigureVideoEncoder(encoder_config)
#
#
==== C stack trace ===============================
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2715df2) [0x7f9dd8e8adf2]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x26ef6d9) [0x7f9dd8e646d9]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x26e215e) [0x7f9dd8e5715e]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a41aa2) [0x7f9dd91b6aa2]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a41b95) [0x7f9dd91b6b95]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a533d8) [0x7f9dd91c83d8]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a663df) [0x7f9dd91db3df]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a608c4) [0x7f9dd91d58c4]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2a61a15) [0x7f9dd91d6a15]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x14fa1a6) [0x7f9dd7c6f1a6]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x184b8c5) [0x7f9dd7fc08c5]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b172ee) [0x7f9dd928c2ee]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b0a1da) [0x7f9dd927f1da]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b16cb0) [0x7f9dd928bcb0]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b173f7) [0x7f9dd928c3f7]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b17909) [0x7f9dd928c909]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b940c9) [0x7f9dd93090c9]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b944bd) [0x7f9dd93094bd]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b18197) [0x7f9dd928d197]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b0c747) [0x7f9dd9281747]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b16cb0) [0x7f9dd928bcb0]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b173f7) [0x7f9dd928c3f7]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2b17909) [0x7f9dd928c909]
/usr/local/google/home/thaloun/src/firefoxbetafix/firefox/libxul.so(+0x2d38f2a) [0x7f9dd94adf2a]
[0x3b2fa98f43cf]
Redirecting call to abort() to mozalloc_abort
https://crash-stats.mozilla.com/report/index/c3b8d7a8-6453-4431-bdec-2772c1170516
Comment 9•8 years ago
|
||
I reproe'd in gdb and am investigating now. Somethign about the encoder config is not right. sterr logging is helpful at least:
(rtp_packet_history.cc:44): Purging packet history in order to re-set status.
(video_send_stream.cc:307): (Re)configureVideoEncoder: {streams: [{width: 160, height: 90, max_framerate: 30, min_bitrate_bps:40000, target_bitrate_bps:80000, max_bitrate_bps:250000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 320, height: 180, max_framerate: 30, min_bitrate_bps:100000, target_bitrate_bps:150000, max_bitrate_bps:500000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 640, height: 360, max_framerate: 30, min_bitrate_bps:200000, target_bitrate_bps:300000, max_bitrate_bps:1300000, max_qp: 56, temporal_layer_thresholds_bps: [0]}], content_type: kRealtimeVideo, encoder_specific_settings: NULL, min_transmit_bitrate_bps: 0}
(media_optimization.cc:216): SetTargetRates: 3548144 bps 0% loss 398ms RTT
(media_optimization.cc:308): SetTargetRates/enable_qm: 3548.14 bps, 0 kbps, 0 fps, (generic_encoder.cc:131): Failed to initialize the encoder associated with payload name: VP8
(codec_database.cc:265): Failed to initialize video encoder.
(video_sender.cc:104): Failed to initialize set encoder with payload name 'VP8'.
(video_send_stream.cc:572): Failed to set encoder.
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Tim Haloun from comment #9)
> I reproe'd in gdb and am investigating now. Somethign about the encoder
> config is not right. sterr logging is helpful at least:
>
> (rtp_packet_history.cc:44): Purging packet history in order to re-set status.
> (video_send_stream.cc:307): (Re)configureVideoEncoder: {streams: [{width:
> 160, height: 90, max_framerate: 30, min_bitrate_bps:40000,
> target_bitrate_bps:80000, max_bitrate_bps:250000, max_qp: 56,
> temporal_layer_thresholds_bps: [0, 0, 0]}, {width: 320, height: 180,
> max_framerate: 30, min_bitrate_bps:100000, target_bitrate_bps:150000,
> max_bitrate_bps:500000, max_qp: 56, temporal_layer_thresholds_bps: [0, 0,
> 0]}, {width: 640, height: 360, max_framerate: 30, min_bitrate_bps:200000,
> target_bitrate_bps:300000, max_bitrate_bps:1300000, max_qp: 56,
> temporal_layer_thresholds_bps: [0]}], content_type: kRealtimeVideo,
The third layer appears to have only an array of size one, but the other two have three rates. Could that cause trouble?
Assignee | ||
Comment 11•8 years ago
|
||
BTW if you want to be able to debug this your self this is the URL for the treeherder run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb0932cdce02
You have to click on the tc() of the platform you are looking for. Then on the green B. And then find in the list uploaded artifacts the target.tar.bz2.
This should be the 64bit debug build: https://queue.taskcluster.net/v1/task/cPG4RC04T4ygJvgrii7s-Q/runs/0/artifacts/public/build/target.tar.bz2
And this is the 64bit ASAN opt build: https://queue.taskcluster.net/v1/task/WQfvxqr6RX-cOattocpcFA/runs/0/artifacts/public/build/target.tar.bz2
Comment 12•8 years ago
|
||
With Noah's help we figured it out. This was super tricky.
The bug is that VideoSendStream picks an incorrect overall max bitrate by adding the max bitrates of all the sub-streams (ln 429). The actual max bitrate of the overall stream is the sum of the target bitrates of lower streams and the max bitrate of the highest stream.
The issue is, on reconfiguration, the following happens:
1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new config to the available bitrate, so potentially a large number.
2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger than the configured codec max bitrate (2050), and caps it
3) SimulcastEncoderAdapter only checks the overall cap and is happy, but when it hands out bitrates, it gives the last layer whatever is left after consuming the target bitrates of lower layers
4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is higher than max and rejects the config.
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Tim Haloun from comment #12)
> With Noah's help we figured it out. This was super tricky.
>
> The bug is that VideoSendStream picks an incorrect overall max bitrate by
> adding the max bitrates of all the sub-streams (ln 429). The actual max
> bitrate of the overall stream is the sum of the target bitrates of lower
> streams and the max bitrate of the highest stream.
>
> The issue is, on reconfiguration, the following happens:
> 1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new
> config to the available bitrate, so potentially a large number.
> 2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger
> than the configured codec max bitrate (2050), and caps it
> 3) SimulcastEncoderAdapter only checks the overall cap and is happy, but
> when it hands out bitrates, it gives the last layer whatever is left after
> consuming the target bitrates of lower layers
I guess this is the magic line which controls if a lower layer takes the max or the target bitrate depending on whether the upper layer still has enough bitrate left for its minimum bitrate:
https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/simulcast_encoder_adapter.cc#425
> 4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is
> higher than max and rejects the config.
I assume you refer to this check here: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc#351
Comment 14•8 years ago
|
||
RE #13
(In reply to Nils Ohlmeier [:drno] from comment #13)
> (In reply to Tim Haloun from comment #12)
> > With Noah's help we figured it out. This was super tricky.
> >
> > The bug is that VideoSendStream picks an incorrect overall max bitrate by
> > adding the max bitrates of all the sub-streams (ln 429). The actual max
> > bitrate of the overall stream is the sum of the target bitrates of lower
> > streams and the max bitrate of the highest stream.
> >
> > The issue is, on reconfiguration, the following happens:
> > 1) ViEEncoder::SetEncoder:232 sets the overall startBitrate of the new
> > config to the available bitrate, so potentially a large number.
> > 2) VCMCodecDatabase::SetSendCodec:240 notes that the startBitrate is larger
> > than the configured codec max bitrate (2050), and caps it
> > 3) SimulcastEncoderAdapter only checks the overall cap and is happy, but
> > when it hands out bitrates, it gives the last layer whatever is left after
> > consuming the target bitrates of lower layers
>
> I guess this is the magic line which controls if a lower layer takes the max
> or the target bitrate depending on whether the upper layer still has enough
> bitrate left for its minimum bitrate:
>
> https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> modules/video_coding/codecs/vp8/simulcast_encoder_adapter.cc#425
>
Yes. That line seems out of sync with what video_send_stream puts in to the overall maximum.
> > 4) The top layer's vp8_impl.cc, when configuring, notes the start bitrate is
> > higher than max and rejects the config.
>
> I assume you refer to this check here:
> http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> modules/video_coding/codecs/vp8/vp8_impl.cc#351
Correct. This is the ultimate check that cascades back into a false result.
Confirmed,
After changing video_send_stream.cc line 429 to add target bitrates for the subordinate streams I can't reproduce the crash. I really have no idea how we'd construct a test suite for this though.
It looks like this code is completely refactored in later webrtc revisions.
Flags: needinfo?(drno)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 16•8 years ago
|
||
Tim, does this https://bugzilla.mozilla.org/attachment.cgi?id=8868329 look like the patch you wrote up for own local verification?
Our testing here relies on the webrtc.org unit tests, which apparently did not catch this. Plus some end-to-end level test within the browser, where it appears almost impossible to catch this.
Flags: needinfo?(drno)
Comment 17•8 years ago
|
||
Yes that's equivalent, plus the original change to the clear the rid vector.
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8868329 [details]
Bug 1365090: use target bitrate instead of max for simulcast.
https://reviewboard.mozilla.org/r/139910/#review143522
::: media/webrtc/trunk/webrtc/video/video_send_stream.cc:429
(Diff revision 1)
> + if (i + 1 == streams.size()) {
> - video_codec.maxBitrate += streams[i].max_bitrate_bps / 1000;
> + video_codec.maxBitrate += streams[i].max_bitrate_bps / 1000;
> + } else {
> + video_codec.maxBitrate += streams[i].target_bitrate_bps / 1000;
> + }
This could use a comment.
Attachment #8868329 -
Flags: review?(docfaraday) → review+
Comment hidden (mozreview-request) |
Comment 20•8 years ago
|
||
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/0f749464984f
clear RID vector when reconfiguring send media codec. r=jesup
https://hg.mozilla.org/integration/autoland/rev/06bdd79f4ba2
use target bitrate instead of max for simulcast. r=bwc
Comment 21•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0f749464984f
https://hg.mozilla.org/mozilla-central/rev/06bdd79f4ba2
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 22•8 years ago
|
||
Comment on attachment 8868001 [details]
Bug 1365090: clear RID vector when reconfiguring send media codec.
Approval Request Comment
[Feature/Bug causing the regression]: bug 1337468
[User impact if declined]: Potential crashes
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: In a try run by the reporter
[Needs manual test from QE? If yes, steps to reproduce]: No, since we can't repro ourself.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: We just ensure to empty out a vector before re-using it.
[String changes made/needed]: N/A
Attachment #8868001 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 23•8 years ago
|
||
Comment on attachment 8868329 [details]
Bug 1365090: use target bitrate instead of max for simulcast.
Approval Request Comment
[Feature/Bug causing the regression]: Bug in third party library
[User impact if declined]: Potential crashes in WebRTC calls
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: In a try build by the reporter
[Needs manual test from QE? If yes, steps to reproduce]: No, as we can't repro out self
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: The patch only gives less bandwidth to a special type of WebRTC call which involves multiple streams (simulcast).
[String changes made/needed]: N/A
Attachment #8868329 -
Flags: approval-mozilla-beta?
Updated•8 years ago
|
Blocks: 1337468
status-firefox53:
--- → unaffected
status-firefox54:
--- → affected
status-firefox-esr52:
--- → unaffected
Comment 24•8 years ago
|
||
Comment on attachment 8868001 [details]
Bug 1365090: clear RID vector when reconfiguring send media codec.
Fix a potential WebRTC crash. Beta54+. Should be in 54 beta 10.
Attachment #8868001 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•8 years ago
|
Attachment #8868329 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 25•8 years ago
|
||
bugherder uplift |
Comment 26•8 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #23)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: In a try build by the reporter
> [Needs manual test from QE? If yes, steps to reproduce]: No, as we can't
> repro out self
Setting qe-verify- based on Nils' assessment on manual testing.
Flags: qe-verify-
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(saeed)
You need to log in
before you can comment on or make changes to this bug.
Description
•