Crash in [@ RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | sctp_inpcb_free | sctp_close]
Categories
(Core :: WebRTC: Networking, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: bwc)
References
(Blocks 1 open bug)
Details
(Keywords: crash, leave-open)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/054f6c00-b865-4941-b5ba-21ec80220623
Reason: EXCEPTION_ACCESS_VIOLATION_WRITE
Top 10 frames of crashing thread:
0 ntdll.dll RtlpWaitOnCriticalSection
1 ntdll.dll RtlpEnterCriticalSectionContended
2 ntdll.dll RtlEnterCriticalSection
3 xul.dll sctp_inpcb_free netwerk/sctp/src/netinet/sctp_pcb.c:3857
4 xul.dll sctp_close netwerk/sctp/src/netinet/sctp_usrreq.c:842
5 xul.dll sofree netwerk/sctp/src/user_socket.c:287
6 xul.dll mozilla::DataChannelConnection::DestroyOnSTS netwerk/sctp/datachannel/DataChannel.cpp:399
7 xul.dll mozilla::detail::runnable_args_base<mozilla::detail::NoResult>::Run dom/media/webrtc/transport/runnable_utils.h:41
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:465
9 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1202
It appears we're trying to lock a mutex that has been set to NULL. The crash seems to happen only on Windows but bug 1775214 points to the same issue on Linux. This does not appear to be a new bug but it was recently detected by clouseau due to a visible spike.
Reporter | ||
Comment 1•3 years ago
|
||
Alright, now that I can see the whole graph this looks like a regression introduced in version 100. The volume here is non-trivial.
Reporter | ||
Comment 2•3 years ago
|
||
There's several more signatures for this, ouch. At least one of them is on Android so this is indeed a problem that affects all platforms albeit with different signatures.
Updated•3 years ago
|
Comment 3•2 years ago
|
||
Quite a few crashes on related signatures; there's some type of race here, though it doesn't seem security-sensitive
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 4•2 years ago
|
||
You might want to update to the current version. There are still a couple of known issues, but more on the receive path, not sending or closing.
Assignee | ||
Updated•2 years ago
|
![]() |
||
Comment 5•2 years ago
|
||
shifting deps here slightly.
Comment 6•2 years ago
|
||
For the record, updating libusrsctp to the latest version didn't seem to affect this. Anything we can do to move this forward?
![]() |
||
Updated•2 years ago
|
Comment 7•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit BugBot documentation.
Comment 8•2 years ago
|
||
Is there any more information than the stack traces?
![]() |
||
Comment 9•2 years ago
|
||
Jesup, do you have any insight here? We're kind of stalled out.
Comment 10•2 years ago
|
||
No. I'll try to look more deeply. Michael, since updating didn't help, can you look to see what possible paths might lead to this?
Comment 11•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
![]() |
||
Updated•2 years ago
|
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•2 years ago
|
Comment 14•1 year ago
|
||
Clear a needinfo that is pending on an inactive user.
Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE
.
For more information, please visit BugBot documentation.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 16•10 months ago
|
||
Cleaning up the signatures after bug 1895527, adding a bunch of Android ones. The crash appear to be still valid and has significant volume, can we get someone to look into it?
Comment hidden (Intermittent Failures Robot) |
Comment 18•9 months ago
|
||
Added a few more top Android signatures
Assignee | ||
Comment 19•9 months ago
|
||
This might be related:
Assignee | ||
Comment 20•9 months ago
|
||
Ok, not related, since we always build with that check true. The init and finish functions are pretty complicated, and might have holes when parts of init fail. Looking into it...
Assignee | ||
Comment 21•9 months ago
|
||
I've looked at this for a while, and while there are flaws in the init/deinit functions, I'm not seeing one that would cause this specific problem on Windows. I do see that there is no error-checking for the initialization of this mutex/critical section, but the documentation for InitializeCriticalSection says that it is infallible on modern versions of windows, and this happens pretty much only on Windows and Android. It might be that this documentation is wrong or misleading.
I did notice a couple of flaws in our code that muddy the waters somewhat, so I think I'll fix them and hope that we get some more clarity on what is going on here.
Assignee | ||
Comment 22•9 months ago
|
||
Assignee | ||
Comment 23•9 months ago
|
||
Comment 24•9 months ago
|
||
Assignee | ||
Updated•9 months ago
|
Comment 25•9 months ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 28•4 months ago
|
||
Added a new Android signature.
Comment 29•1 month ago
|
||
This continues to show up pretty high in the Android topcrash list. Any other thoughts as to what might be going on here?
Assignee | ||
Comment 30•1 month ago
|
||
I'm not seeing any new information that would shed light on this.
Description
•