Closed Bug 1443032 Opened 2 years ago Closed 2 years ago

Crash on SCTP shutdown | crash in sctp_setopt

Categories

(Core :: WebRTC: Networking, defect, P2, critical)

58 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- fixed
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: jeremy.laine, Assigned: drno)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0
Build ID: 20180219150722

Steps to reproduce:

1/ Establish an RTCPeerConnection with a data channel (no media streams are needed, it's definitely the data channel causing the issue)

In the scenario I am testing, Firefox is sending the initial offer, not sure whether it matters.

2/ The remote party sends an SCTP Shutdown chunk

3/ Firefox responds with an SCTP Shutdown ACK

4/ The remote party sends an SCTP Shutdown Complete


Actual results:

Firefox crashes, see:

bp-4350fcbe-270c-4576-b7d9-1319a0180304

I am able to reproduce this 100%, using a python-based implementation of WebRTC:

https://github.com/jlaine/aiortc/tree/master/examples/server

All that is needed is to run the example server, point Firefox to http://localhost:8080 and when you want to trigger the crash, stop the server (CTRL-C).

NOTE : the crash does not occur if I send Firefox an Abort chunk instead of a Shutdown.
Severity: normal → critical
Crash Signature: [@ libxul.so@0xe2829d | libxul.so@0xe354f5 | libxul.so@0xe39984 | libxul.so@0xe3d27f | libc-2.27.so@0x104695 ]
Has Regression Range: --- → irrelevant
Has STR: --- → yes
Component: Untriaged → WebRTC
Keywords: crash
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
Summary: Crash on SCTP shutdown → Crash on SCTP shutdown | crash in libxul.so@0xe2829d | libxul.so@0xe354f5 | libxul.so@0xe39984 | libxul.so@0xe3d27f | libc-2.27.so@0x104695
(In reply to Jeremy Lainé from comment #0)
> All that is needed is to run the example server, point Firefox to
> http://localhost:8080 and when you want to trigger the crash, stop the
> server (CTRL-C).

I think this is what should be under "steps to reproduce".


I got your library to build and it appears to work, sending video and all.

But I cannot reproduce a crash on either 58, 59 or 60.

Please elaborate on the STR or provide an updated example page for reproduction.

Your crash report also doesn't seem to have resolved any symbols, so there's not much we can gather from it.
Flags: needinfo?(jeremy.laine)
Sorry about the incomplete backtrace, here's a better one:

bp-66a8904a-4e20-4837-a492-679450180305

I have created a branch of aiortc which better reproduces the problem:

https://github.com/jlaine/aiortc/tree/firefox-issue-1443032

Just point your Firefox at http://localhost:8080 and if it doesn't crash immediately, reload the page for another try.
Flags: needinfo?(jeremy.laine)
Thanks! That made it crash for me.

58: bp-ae92a160-cb00-447b-9744-2b8a00180305
59: bp-6b5ad665-40e2-4da2-8846-0f04a0180305
60: bp-cd5d8d1a-6836-4c5e-9b70-cd3460180305

I wasn't able to test 52 as the test page didn't seem to work. Didn't look into why.

Nils, could you check how you want to prioritize this? P2 for crash.
Status: UNCONFIRMED → NEW
Crash Signature: [@ libxul.so@0xe2829d | libxul.so@0xe354f5 | libxul.so@0xe39984 | libxul.so@0xe3d27f | libc-2.27.so@0x104695 ] → [@ sctp_setopt ]
Rank: 15
Component: WebRTC → WebRTC: Networking
Ever confirmed: true
Flags: needinfo?(drno)
Priority: -- → P2
Summary: Crash on SCTP shutdown | crash in libxul.so@0xe2829d | libxul.so@0xe354f5 | libxul.so@0xe39984 | libxul.so@0xe3d27f | libc-2.27.so@0x104695 → Crash on SCTP shutdown | crash in sctp_setopt
I think P2 is a reasonable priority value for now.

I guess the fact that we attempt to do a stream reset here https://hg.mozilla.org/mozilla-central/annotate/51200c0fdaddb2749549a82596da5323a4cbd499/netwerk/sctp/datachannel/DataChannel.cpp#l3075
is causing the problem as the channel has been shutdown already. Lennart what do you think as a SCTP expert?
Flags: needinfo?(drno) → needinfo?(lennart.grahl)
First of all, you're awesome Jeremy! I always wanted to write a complete WebRTC stack from scratch for Python's asyncio but never had the time to do it, so I envy you. Please contact me if you're interested in some interop testing (https://gitter.im/rawrtc/Lobby).

(In reply to Nils Ohlmeier [:drno] from comment #4)
> I guess the fact that we attempt to do a stream reset here
> https://hg.mozilla.org/mozilla-central/annotate/
> 51200c0fdaddb2749549a82596da5323a4cbd499/netwerk/sctp/datachannel/
> DataChannel.cpp#l3075
> is causing the problem as the channel has been shutdown already. Lennart
> what do you think as a SCTP expert?

I haven't touched that code section, so Randell might be of more help. Still, if `mMasterSocket` is not a dangling pointer, the call to `usrsctp_setsockopt` should not produce a crash.
Flags: needinfo?(lennart.grahl) → needinfo?(rjesup)
I was wondering if that size calculation is correct: https://hg.mozilla.org/mozilla-central/annotate/51200c0fdaddb2749549a82596da5323a4cbd499/netwerk/sctp/datachannel/DataChannel.cpp#l2068
But it seems to result in the same value as `sizeof(struct sctp_reset_streams) + sizeof(uint16_t) * mStreamsResetting.Length()`. Still, I would propose changing that because the struct could be padded. (I don't expect this being the cause here.)
It could be that this is caused by aiortc not announcing the stream reconfiguration extension but still using it. I've asked Jeremy to write a data channel only variant to be able to test it against RAWRTC. If that crashes too, it's likely a bug in usrsctp.
A minimal datachannels-only example for use with copy-and-paste signaling is available here:

https://github.com/jlaine/aiortc/blob/master/examples/datachannel-cli/cli.py
I believe I found the cause: The `SCTP_SHUTDOWN_COMP` event shuts down the association gracefully, then this (https://hg.mozilla.org/mozilla-central/annotate/51200c0fdaddb2749549a82596da5323a4cbd499/netwerk/sctp/datachannel/DataChannel.cpp#l2315) line is being executed before the streams are being reset since the `CloseAll()` call is delayed. I added a short log message after that line and this is the logging output I see:

> [12745:Socket Thread]: D/DataChannel Shutdown event.
> [12745:Socket Thread]: D/DataChannel Association change: SCTP_SHUTDOWN_COMP
> [12745:Socket Thread]: D/DataChannel Association change: streams (in/out) = (256/256)
> [12745:Socket Thread]: D/DataChannel ReceiveCallback: SCTP has finished shutting down       <--
> [12745:Main Thread]: D/DataChannel Closing all channels (connection 0x7f2e297d0400)
> [12745:Main Thread]: D/DataChannel Connection 0x7f2e297d0400/Channel 0x7f2e27efe2f0: Closing stream 3
> [12745:Main Thread]: D/DataChannel Connection 0x7f2e297d0400: Resetting outgoing stream 3
> [12745:Main Thread]: D/DataChannel Destroying Data channel 3
> [12745:Main Thread]: D/DataChannel Connection 0x7f2e297d0400: Sending outgoing stream reset for 2 streams

The marked log line will not appear in case the association is shut down by an ABORT chunk (which is what the majority of implementations uses to tear down the association).

The socket is being used after it has been free'd. I would suggest removing that `usrsctp_close` call since `DataChannelConnection::Destroy()` will take care of closing the socket eventually (and that btw. is likely going to fail as well since the socket is a dangling pointer after that close call). Randell, what do you think?
Comment on attachment 8960519 [details]
Bug 1443032: stop closing usrsctp on callback.

https://reviewboard.mozilla.org/r/229260/#review235106
Attachment #8960519 - Flags: review?(rjesup) → review+
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/0faa644eb81c
stop closing usrsctp on callback. r=jesup
Cool, will pick up a nightly build and check it out!
https://hg.mozilla.org/mozilla-central/rev/0faa644eb81c
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
I want to point out that this is a game breaking bug if we would suddenly have more implementations shutting SCTP associations down gracefully.
I cannot reproduce the crash using this nightly : 61.0a1 (2018-03-22), does it contain the fix?
(In reply to Jeremy Lainé from comment #16)
> I cannot reproduce the crash using this nightly : 61.0a1 (2018-03-22), does
> it contain the fix?

Yes, it should have the fix
Flags: needinfo?(rjesup)
Does this need an uplift request or can it ride the trains?
Flags: needinfo?(drno)
Since neither Chrome or Firefox currently terminate connections like this and the crash rate is really low I would let it ride the trains.
Flags: needinfo?(drno)
I don't mind that it doesn't land in 60 but if this doesn't land in next ESR we might end up regretting it since graceful SCTP shutdown should be the norm and not the exception. And this bug triggers a crash every single time the association is being shut down gracefully. It's just a coincidence every other implementation (other than aiortc) does a non-graceful shutdown right now.
Nils, what do you think about uplifting this to ESR60?
Flags: needinfo?(drno)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)
> Nils, what do you think about uplifting this to ESR60?

Since we need to support ESR60 for such a long time that is probably a good idea.
Flags: needinfo?(drno)
Comment on attachment 8960519 [details]
Bug 1443032: stop closing usrsctp on callback.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: It fixes a low frequency crasher when using WebRTC data channels.

User impact if declined: Potential content process crashes.
Fix Landed on Version: 61
Risk to taking this patch (and alternatives if risky): Low. We have not received any regression reports. The crashes have stopped after landing this patch. And the reporter was able to verify that it fixes the issue in his test setup.
String or UUID changes made by this patch: N/A

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8960519 - Flags: approval-mozilla-esr60?
Comment on attachment 8960519 [details]
Bug 1443032: stop closing usrsctp on callback.

Wallpaper a WebRTC crash. Approved for ESR 60.1.
Attachment #8960519 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
You need to log in before you can comment on or make changes to this bug.