Closed
Bug 1439236
Opened 7 years ago
Closed 7 years ago
Crash in m_copym
Categories
(Core :: WebRTC: Networking, defect, P2)
Core
WebRTC: Networking
Tracking
()
People
(Reporter: philipp, Assigned: drno)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
dminor
:
review+
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr60+
|
Details |
This bug was filed from the Socorro interface and is
report bp-4d2fe8c8-b660-4bfb-8894-64ede0180211.
=============================================================
Top 10 frames of crashing thread:
0 xul.dll m_copym netwerk/sctp/src/user_mbuf.c:996
1 xul.dll sctp_copy_mbufchain netwerk/sctp/src/netinet/sctp_output.c:6898
2 xul.dll sctp_med_chunk_output netwerk/sctp/src/netinet/sctp_output.c:9072
3 xul.dll sctp_chunk_output netwerk/sctp/src/netinet/sctp_output.c:10490
4 xul.dll sctp_common_input_processing netwerk/sctp/src/netinet/sctp_input.c:6130
5 xul.dll usrsctp_conninput netwerk/sctp/src/user_socket.c:3237
6 xul.dll mozilla::DataChannelConnection::SctpDtlsInput netwerk/sctp/datachannel/DataChannel.cpp:815
7 xul.dll sigslot::signal3<mozilla::TransportLayer*, unsigned char const*, unsigned int, sigslot::single_threaded>::operator media/mtransport/sigslot.h:2486
8 xul.dll mozilla::TransportFlow::PacketReceived media/mtransport/transportflow.cpp:252
9 xul.dll sigslot::signal3<mozilla::TransportLayer*, unsigned char const*, unsigned int, sigslot::single_threaded>::operator media/mtransport/sigslot.h:2486
=============================================================
this is a cross-platform crash signature, happening in the content process most of the time.
Comment 1•7 years ago
|
||
Byron, is this something you could take a look at?
It's weird. Starting on Dec 27th across seemingly all versions. Could the start of this correlate to the release of a Firefox or chrome version? I'm thinking something at the remote side could be triggering it.
Rank: 12
Flags: needinfo?(docfaraday)
Priority: -- → P2
Reporter | ||
Comment 2•7 years ago
|
||
the date in december is a red herring in this case, because in bug 1427111 we've purged all crash reports recorded before that date.
this crash signature would have been around for longer than that.
Comment 3•7 years ago
|
||
Looks like a nullptr crash to me at first glance. I can try to give it a closer look, but I need to spend some time digging us out from underneath all the oranges...
Comment 4•7 years ago
|
||
Feel free to give it to someone else that knows the area and who is not on PTO :-)
Reporter | ||
Comment 5•7 years ago
|
||
the crash signature here is spiking up after 59 has gone to the release population - it would be great if there was a fix in time for version 60 at least.
status-firefox61:
--- → affected
tracking-firefox60:
--- → ?
Comment 6•7 years ago
|
||
So this is caused when we have a sctp_association.send_queue that contains a sctp_tmit_chunk where sctp_tmit_chunk.data is null. I see at least one place where this can happen (and this place also checks whether |data| is null to begin with, so it may be expected that sometimes |data| will be null while the chunk is in the send_queue).
Iterating over |send_queue|: https://searchfox.org/mozilla-central/source/netwerk/sctp/src/netinet/sctp_output.c#6712
Checking if |data| is null: https://searchfox.org/mozilla-central/source/netwerk/sctp/src/netinet/sctp_output.c#6716
Releasing/nulling |data| without removing |chk| from |send_queue|: https://searchfox.org/mozilla-central/source/netwerk/sctp/src/netinet/sctp_output.c#6723
This hints to me that we just need to add a check on |data|, maybe here: https://searchfox.org/mozilla-central/source/netwerk/sctp/src/netinet/sctp_output.c#7040
What do you think?
Flags: needinfo?(docfaraday) → needinfo?(lennart.grahl)
Comment 7•7 years ago
|
||
I need to pass this along to Michael.
Btw. this is what has changed since our last usrsctp update: https://github.com/sctplab/usrsctp/compare/0e076261b832121cf120ddc04aaff87ac3a34d30...HEAD
Flags: needinfo?(lennart.grahl) → needinfo?(tuexen)
Updated•7 years ago
|
Comment 8•7 years ago
|
||
I'll take a look and try to figure out if I can reproduce and fix the issue.
Flags: needinfo?(tuexen)
Comment 9•7 years ago
|
||
This issue has also been reported by Chrome. However, my analysis given in https://github.com/sctplab/usrsctp/issues/160 is still valid...
Does FF provide some API how we can get some additional information when this problem occurs?
Comment 10•7 years ago
|
||
(In reply to Michael Tüxen from comment #9)
> This issue has also been reported by Chrome. However, my analysis given in
> https://github.com/sctplab/usrsctp/issues/160 is still valid...
I guess I can try looking through the code and seeing if there's anywhere else that has this bug.
> Does FF provide some API how we can get some additional information when
> this problem occurs?
I am not aware of any Firefox API that would give you more info when a third-party library crashes than what you get already with crash-stats.
Comment 11•7 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #10)
> (In reply to Michael Tüxen from comment #9)
> > This issue has also been reported by Chrome. However, my analysis given in
> > https://github.com/sctplab/usrsctp/issues/160 is still valid...
>
> I guess I can try looking through the code and seeing if there's anywhere
> else that has this bug.
I don't understand the above sentence. But it seems to be a problem showing up
in different environments... So I would say it is a bug in usrsctp. However,
the data pointer SHOULD NOT be NULL when the chunk is in the send queue. It
MAY be null, if the chunk is in the sent queue.
Please note that there is a know bug (we are working on it) when you use the
buffer priority policy, but this is NOT used within WebRTC.
So if you find a place where the pointer is set to NULL for a chunk being
in the send queue and not using the buffer priority policy, I would like to
know, since might be the root cause.
I have not seen this bug in my testing...
>
> > Does FF provide some API how we can get some additional information when
> > this problem occurs?
>
> I am not aware of any Firefox API that would give you more info when a
> third-party library crashes than what you get already with crash-stats.
OK. Thanks.
I contacted someone for the github issue, who was able to provide access to
stack variable. Unfortunately, he can't provide any information about heap
variables...
Updated•7 years ago
|
Comment 12•7 years ago
|
||
Is anyone still looking at this? This crash is still around across all channels and platforms.
Flags: needinfo?(drno)
Comment 13•7 years ago
|
||
I can provide a workaround (avoiding the crash) but have no idea how the problem can happen,
so I can't really fix it. Should I provide a work around?
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Michael Tüxen from comment #13)
> I can provide a workaround (avoiding the crash) but have no idea how the
> problem can happen,
> so I can't really fix it. Should I provide a work around?
Yes if there is a workaround it might be worth trying it. Thanks
Flags: needinfo?(drno)
Updated•7 years ago
|
Flags: needinfo?(tuexen)
Comment 15•7 years ago
|
||
Please try:
https://github.com/sctplab/usrsctp/commit/d89882d04900c80860874b5eb389b3ed3a0bca3d
which avoids dereferencing a pointer, which is NULL. However, it does not fix the
real cause of the problem, since I have not been able to reproduce this.
This problem was also reported in https://github.com/sctplab/usrsctp/issues/160.
Flags: needinfo?(tuexen)
Comment 16•7 years ago
|
||
I have a couple of crash reports like this in Nightly from my own usage.
Most (if not all) of my crashes involve me dragging a file from macOS' finder to Nightly's content area. And it's not always reproducible, but it often reproduces after a couple of hours of usage.
I don't know if it helps, but here are my reports:
https://crash-stats.mozilla.com/report/index/d1133362-3abd-4565-9b27-d39df0180507
https://crash-stats.mozilla.com/report/index/8cba118b-b7d2-4f8f-9306-8ede00180427
https://crash-stats.mozilla.com/report/index/37ae14ed-9b28-406b-af81-421a10180507
I hope this helps.
Comment 17•7 years ago
|
||
Another one with the same signature:
https://crash-stats.mozilla.com/report/index/a8c15485-f614-4476-85d0-903160180507
Comment 18•7 years ago
|
||
Nils, can we get the upstream patch landed and see if it helps?
Flags: needinfo?(drno)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → drno
Flags: needinfo?(drno)
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8976268 -
Flags: review?(dminor)
Comment 20•7 years ago
|
||
mozreview-review |
Comment on attachment 8976268 [details]
Bug 1439236: exit early if m_copym gets called with null ptr
https://reviewboard.mozilla.org/r/244434/#review250672
::: commit-message-3c9d6:2
(Diff revision 1)
> +Bug 1439236: exit early if m_copym gets called with null ptr
> +
Please add a reference to the upstream commit in your commit message:
https://github.com/sctplab/usrsctp/commit/d89882d04900c80860874b5eb389b3ed3a0bca3d
Attachment #8976268 -
Flags: review?(dminor) → review+
Updated•7 years ago
|
status-firefox62:
--- → affected
status-firefox-esr60:
--- → fix-optional
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/2755492fe3d1
exit early if m_copym gets called with null ptr r=dminor
Comment 23•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment 24•7 years ago
|
||
No crashes on Nightly since this landed. Worth an uplift request for Beta?
Flags: needinfo?(drno)
Assignee | ||
Comment 25•7 years ago
|
||
Comment on attachment 8976268 [details]
Bug 1439236: exit early if m_copym gets called with null ptr
Approval Request Comment
[Feature/Bug causing the regression]: Probably the sctp in bug 1297418
[User impact if declined]: Crashes in webrtc data channels
[Is this code covered by automated tests?]: The code in general yes, but not the crash.
[Has the fix been verified in Nightly?]: Since we don't know how repro not really. But crashstats in Nightly look good since the patch landed.
[Needs manual test from QE? If yes, steps to reproduce]: No, no known steps to repro
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: It only adds extra safety checks to catch/avoid null pointer crashes.
[String changes made/needed]: N/A
Flags: needinfo?(drno)
Attachment #8976268 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 26•7 years ago
|
||
I just discovered this one crash which didn't happen on the timer thread, but after receiving an incoming packet while trying to generate an outgoing packet https://crash-stats.mozilla.com/report/index/22c9bcf8-296d-4da2-9f7d-d182f0180518
Michael does this provide any more insight into how this nullptr can happen?
Flags: needinfo?(tuexen)
Comment 27•7 years ago
|
||
Comment on attachment 8976268 [details]
Bug 1439236: exit early if m_copym gets called with null ptr
Adds some extra safety checks to avoid webrtc crashes. Approved for 61.0b9 and ESR 60.1.
Attachment #8976268 -
Flags: approval-mozilla-esr60+
Attachment #8976268 -
Flags: approval-mozilla-beta?
Attachment #8976268 -
Flags: approval-mozilla-beta+
Comment 28•7 years ago
|
||
bugherder uplift |
Comment 29•7 years ago
|
||
bugherder uplift |
Comment 30•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #26)
> I just discovered this one crash which didn't happen on the timer thread,
> but after receiving an incoming packet while trying to generate an outgoing
> packet
> https://crash-stats.mozilla.com/report/index/22c9bcf8-296d-4da2-9f7d-
> d182f0180518
>
> Michael does this provide any more insight into how this nullptr can happen?
Unfortunately not... The cause of the problem is when the pointer becomes NULL. The problem gets visible, when the code hits the assumption that the pointer is not NULL.
Best regards
Michael
Flags: needinfo?(tuexen)
Updated•7 years ago
|
Keywords: regression
You need to log in
before you can comment on or make changes to this bug.
Description
•