Firefox 69: Tab crash when establishing a WebRTC call (CrashChannel::OpenContentStream)
Categories
(Core :: WebRTC: Audio/Video, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox69 | --- | wontfix |
firefox70 | --- | fixed |
firefox71 | --- | fixed |
People
(Reporter: g, Assigned: bwc, NeedInfo)
References
(Regression)
Details
(Keywords: regression)
Attachments
(6 files)
5.74 KB,
text/plain
|
Details | |
35.23 KB,
text/plain
|
Details | |
1.88 KB,
text/plain
|
Details | |
1.49 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Steps to reproduce:
We have a Web application that is using the WebRTC Framework. Up until version 68 everything works fine. In version 69 released a few days ago, the tab crashes "Gah. Your tab just crashed." when attempting to establish the audio channel.
The offending function is:
nsresult CrashChannel::OpenContentStream(bool, class nsIInputStream**, class nsIChannel**)
This behavior is deterministic. It is happening in virtually every machine we can find, all running windows (tested with Windows 8 and Windows 10).
Additional info:
- we are using webrtc-adapter (https://github.com/webrtchacks/adapter/issues) but this doesn't seem to have any effect on the problem.
- we are using kamailio and rtpengine.
We can provide SDP/SIP captures if needed.
The crash id is bp-cc88dcbc-95da-4c95-8fae-f594d0190917
Thank you in advance.
Actual results:
Tab crashed
Expected results:
The WebRTC audio channel gets established
Comment 1•5 years ago
|
||
hi, can you try to come up with an exact regression range for when this crash first started using the tool from https://mozilla.github.io/mozregression/?
Hi, thank you for the fast response.
Here's the mozregression output:
2019-09-18T10:28:10: INFO : Narrowed inbound regression window from [196e3189, 1e420eb1] (3 builds) to [0489b95c, 1e420eb1] (2 builds) (~1 steps left)
2019-09-18T10:28:10: DEBUG : Starting merge handling...
2019-09-18T10:28:10: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=1e420eb1ad7a182dc12c8e9a7de290a00e1d09c7&full=1
2019-09-18T10:28:11: DEBUG : Found commit message:
Bug 1333879: Handle multiple codecs in answer. r=mjf
Differential Revision: https://phabricator.services.mozilla.com/D30687
2019-09-18T10:28:11: DEBUG : Did not find a branch, checking all integration branches
2019-09-18T10:28:11: INFO : The bisection is done.
2019-09-18T10:28:11: INFO : Stopped
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
So it looks like the parent process forcing the content process to crash? I am not sure what situations would cause this.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Here's a link to a recent ASAN debug build: https://queue.taskcluster.net/v1/task/QcRm5uqYRzub_P5v073YTQ/runs/0/artifacts/public/build/setup.exe
If you could use that, and run it with stdout/stderr logging enabled, that might give me enough information.
Alternately, if you can give me access to the web app you're developing, I could try to reproduce the problem myself.
Hi, That setup appears to have no content (~700KB) and it doesn't download anything. I get an error when the setup tries to create a shortcut for the firefox.exe which doesn't exist. Only 3 files are created in the destination folder: /install.log /uninstall/shortcuts_log.ini /uninstall/uninstall.log Here is the install.log:
Assignee | ||
Comment 6•5 years ago
|
||
Hmm, that's inconvenient. Try this?
oops, it appears the comment gets submitted when adding an attachment, sorry about that.
Assignee | ||
Comment 10•5 years ago
•
|
||
Ok, here's the real problem.
#
# Fatal error in z:/build/build/src/media/webrtc/trunk/webrtc/voice_engine/channel.cc, line 1719
# last system error: 0
# Check failed: props
#
#
Here's the function that isn't returning anything: https://searchfox.org/mozilla-central/rev/4218cb868d8deed13e902718ba2595d85e12b86b/media/webrtc/trunk/webrtc/voice_engine/channel.cc#1184
Here is the place that is populated: https://searchfox.org/mozilla-central/rev/4218cb868d8deed13e902718ba2595d85e12b86b/media/webrtc/trunk/webrtc/voice_engine/channel.cc#1163
Here's where that should be getting called: https://searchfox.org/mozilla-central/rev/4218cb868d8deed13e902718ba2595d85e12b86b/media/webrtc/trunk/webrtc/audio/audio_send_stream.cc#478
Maybe we're failing here? https://searchfox.org/mozilla-central/rev/4218cb868d8deed13e902718ba2595d85e12b86b/media/webrtc/trunk/webrtc/audio/audio_send_stream.cc#445
Could I see the SDP negotiation?
Reporter | ||
Comment 11•5 years ago
|
||
Sure thing, here it goes. The first SDP is the original invite from the SIP Server that we set as remote description. The second is the brower's offer. Let me known if you need anything else.
Assignee | ||
Comment 12•5 years ago
|
||
Ok, the only thing in there that looks a little unusual is the fact that telephone-event's pt comes before the actual audio codec on the m-line, but that is completely valid so we shouldn't be getting upset about that.
Assignee | ||
Comment 13•5 years ago
|
||
So, I cannot reproduce this using just the SDP exchange on OS X. Maybe a windows-specific bug? Looking into it.
Assignee | ||
Comment 14•5 years ago
|
||
Fiddle that I'm trying to repro with, very minimal: https://jsfiddle.net/whe950ar/1/
Assignee | ||
Comment 15•5 years ago
|
||
Yeah, this is probably going to require actual media flow.
Assignee | ||
Comment 16•5 years ago
|
||
Ok, this reproduces the bug: https://jsfiddle.net/sa12bjc4/2/
Assignee | ||
Comment 17•5 years ago
|
||
Yeah, webrtc.org just has a fit if telephone event is first in the list of configured codecs.
Bug 1333879 broke this because the code that used to remove all but the first "real" codec also happened to perform this reordering. I am going to implement this workaround logic as close to the boundary with webrtc.org as possible, and make sure there's a comment.
We also need to have a test for this, probably a mochitest or crashtest.
Reporter | ||
Comment 18•5 years ago
|
||
Hi,
Thanks for the update Byron.
When you have a fix for this, we are glad to test it out on our app.
Reporter | ||
Comment 19•5 years ago
|
||
@Ryan VanderMeulen, I noticed that you changed the tracking flag "firefox69" to "wontfix". May I ask why? Is this fix only be available on ff70+?
Comment 20•5 years ago
|
||
there isn't a fix available yet and firefox 70 is scheduled to be released in a month.
until then the issue probably doesn't have a major general impact and doesn't meet the criteria for a patch to be uplifted to the release channel: https://wiki.mozilla.org/Release_Management/Uplift_rules#Release_Uplift
Reporter | ||
Comment 21•5 years ago
|
||
We were just wondering what was the reasoning behind the decision, and it makes perfect sense. Thank you for the clarification.
Assignee | ||
Comment 22•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=587f634e5337e27304ef43d94ef47455dd0dc5ef
Assignee | ||
Comment 23•5 years ago
|
||
Assignee | ||
Comment 24•5 years ago
|
||
Depends on D46829
Updated•5 years ago
|
Assignee | ||
Comment 25•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb3f0749a81d26d675eca5c46325fd5f38926a0d
Assignee | ||
Comment 26•5 years ago
|
||
Hmm, the crashtest is timing out on OS X. Need to figure out what is happening there.
Assignee | ||
Comment 27•5 years ago
|
||
It looks like PeerConnectionImpl::Initialize is failing maybe?
Assignee | ||
Comment 28•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c050f83a116e21c601a0e1028afde5c9a350318
Comment 29•5 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c29af4939f66 Add test-case that moves payload type 101 to the front. r=jib
Comment 30•5 years ago
|
||
Backed out changeset c29af4939f66 (Bug 1581898) for crashing mochitests
Backout link: https://hg.mozilla.org/integration/autoland/rev/daee59fa527009d0bdc0e34499310018c8da0bff
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269376010&repo=autoland&lineNumber=9526
[task 2019-10-02T06:22:00.517Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: D/mtransport Trickle candidate is redundant for stream 'PC:1569997320211340 (id=2147483749 url=https://example.com/tests/dom/media/tests/mochitest/test_peerConnection_telephoneEventFirst transport-id=transport_0' because it is completed: candidate:3 1 TCP 2105524479 fd15:4ba5:5a2b:100a:0:242:ac11:4 9 typ host tcptype active
[task 2019-10-02T06:22:00.517Z] 06:22:00 INFO - GECKO(1380) | (ice/ERR) ICE(PC:1569997320211340 (id=2147483749 url=https://example.com/tests/dom/media/tests/mochitest/test_peerConnection_telephoneEventFirst): peer (PC:1569997320211340 (id=2147483749 url=https://example.com/tests/dom/media/tests/mochitest/test_peerConnection_telephoneEventFirst:default), stream(PC:1569997320211340 (id=2147483749 url=https://example.com/tests/dom/media/tests/mochitest/test_peerConnection_telephoneEventFirst transport-id=transport_0 - dc13ee45:67e90eb3329a522e88eed4f054d5f545) tried to trickle ICE in inappropriate state 4
[task 2019-10-02T06:22:00.517Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: D/mtransport Trickle candidate is redundant for stream 'PC:1569997320211340 (id=2147483749 url=https://example.com/tests/dom/media/tests/mochitest/test_peerConnection_telephoneEventFirst transport-id=transport_0' because it is completed:
[task 2019-10-02T06:22:00.518Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: I/mtransport Flow[transport_0(none)]; Layer[dtls]: ****** SSL handshake completed ******
[task 2019-10-02T06:22:00.521Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: I/mtransport Flow[transport_0(none)]; Layer[dtls]: Selected ALPN string: webrtc
[task 2019-10-02T06:22:00.521Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: D/mtransport Created SRTP flow!
[task 2019-10-02T06:22:00.521Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: I/mtransport Flow[transport_0(none)]; Layer[dtls]: ****** SSL handshake completed ******
[task 2019-10-02T06:22:00.522Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: I/mtransport Flow[transport_0(none)]; Layer[dtls]: Selected ALPN string: webrtc
[task 2019-10-02T06:22:00.522Z] 06:22:00 INFO - GECKO(1380) | [Child 1424: Socket Thread]: D/mtransport Created SRTP flow!
[task 2019-10-02T06:22:00.523Z] 06:22:00 INFO - GECKO(1380) | #
[task 2019-10-02T06:22:00.523Z] 06:22:00 INFO - GECKO(1380) | # Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/voice_engine/channel.cc, line 1719
[task 2019-10-02T06:22:00.523Z] 06:22:00 INFO - GECKO(1380) | # last system error: 0
[task 2019-10-02T06:22:00.524Z] 06:22:00 INFO - GECKO(1380) | # Check failed: props
[task 2019-10-02T06:22:00.524Z] 06:22:00 INFO - GECKO(1380) | #
[task 2019-10-02T06:22:00.524Z] 06:22:00 INFO - GECKO(1380) | #
[task 2019-10-02T06:22:00.612Z] 06:22:00 ERROR - GECKO(1380) | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2019-10-02T06:22:00.761Z] 06:22:00 INFO - GECKO(1380) | JavaScript error: resource://services-settings/RemoteSettingsClient.jsm, line 149: Error: Unknown callback
[task 2019-10-02T06:22:00.842Z] 06:22:00 INFO - GECKO(1380) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2019-10-02T06:22:01.688Z] 06:22:01 INFO - GECKO(1380) | 1569997321685 Marionette TRACE Received observer notification xpcom-will-shutdown
[task 2019-10-02T06:22:01.690Z] 06:22:01 INFO - GECKO(1380) | 1569997321687 Marionette INFO Stopped listening on port 2828
[task 2019-10-02T06:22:01.690Z] 06:22:01 INFO - GECKO(1380) | 1569997321687 Marionette DEBUG Remote service is inactive
[task 2019-10-02T06:22:01.893Z] 06:22:01 INFO - GECKO(1380) | -----------------------------------------------------
[task 2019-10-02T06:22:01.893Z] 06:22:01 INFO - GECKO(1380) | Suppressions used:
[task 2019-10-02T06:22:01.893Z] 06:22:01 INFO - GECKO(1380) | count bytes template
[task 2019-10-02T06:22:01.894Z] 06:22:01 INFO - GECKO(1380) | 27 832 nsComponentManagerImpl
[task 2019-10-02T06:22:01.895Z] 06:22:01 INFO - GECKO(1380) | 2 288 libfontconfig.so
[task 2019-10-02T06:22:01.895Z] 06:22:01 INFO - GECKO(1380) | -----------------------------------------------------
Assignee | ||
Comment 31•5 years ago
|
||
Yeah, I butterfingered that; the second part was supposed to land too, to fix the failure the test-case causes.
Comment 32•5 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1176f96ac84a Add test-case that moves payload type 101 to the front. r=jib https://hg.mozilla.org/integration/autoland/rev/0b26b9b42bd7 Move telephone-event to the back of the codec list, because webrtc.org crashes if we don't. r=mjf
Comment 33•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1176f96ac84a
https://hg.mozilla.org/mozilla-central/rev/0b26b9b42bd7
Comment 34•5 years ago
|
||
Can you verify the fix worked in the latest nightly? Thanks!
Comment 35•5 years ago
|
||
Is this something you think might be good for beta uplift? We're heading into beta 13 (of 14) so there is not much leeway.
Assignee | ||
Comment 36•5 years ago
|
||
This would be a fairly easy uplift I think. I'll request tomorrow.
Assignee | ||
Comment 37•5 years ago
|
||
Comment on attachment 9094691 [details]
Bug 1581898: Move telephone-event to the back of the codec list, because webrtc.org crashes if we don't. r?mjf
Beta/Release Uplift Approval Request
- User impact if declined: Websites could trivially cause nullptr tab crashes, and certain legitimate webrtc scenarios would also be broken.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This is a fairly simple patch, with a test case that hits the new code.
- String changes made/needed: None
Assignee | ||
Updated•5 years ago
|
Comment 38•5 years ago
|
||
Comment on attachment 9094691 [details]
Bug 1581898: Move telephone-event to the back of the codec list, because webrtc.org crashes if we don't. r?mjf
Fix for WebRTC crash, let's uplift for beta 13.
Comment 39•5 years ago
|
||
bugherder uplift |
Comment 40•5 years ago
|
||
Comment on attachment 9094690 [details]
Bug 1581898: Add test-case that moves payload type 101 to the front. r?jib
Let's also uplift the test.
Updated•5 years ago
|
Reporter | ||
Comment 41•5 years ago
|
||
Hi
We have confirmed that the fix works well with our application (tested with nightly 71.0a1 (2019-10-04) (64-bit))
Thank you everyone for the support.
Comment 42•5 years ago
|
||
bugherder uplift |
Description
•