Crash in [@ RtlInitializeCriticalSectionAndSpinCount | InitializeCriticalSectionAndSpinCount]
Categories
(Core :: Memory Allocator, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox83 | --- | wontfix |
firefox84 | --- | wontfix |
firefox85 | --- | fixed |
People
(Reporter: gsvelto, Assigned: away)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/8afae343-b8b9-47ea-96b4-43d230201202
Reason: EXCEPTION_ACCESS_VIOLATION_WRITE
Top 10 frames of crashing thread:
0 ntdll.dll RtlInitializeCriticalSectionAndSpinCount
1 kernelbase.dll InitializeCriticalSectionAndSpinCount
2 mozglue.dll arena_t::arena_t memory/build/mozjemalloc.cpp:3517
3 mozglue.dll ArenaCollection::CreateArena memory/build/mozjemalloc.cpp:3601
4 mozglue.dll moz_create_arena_with_params memory/build/malloc_decls.h:115
5 xul.dll static mozilla::dom::DocGroup::Create dom/base/DocGroup.cpp:127
6 xul.dll mozilla::dom::BrowsingContextGroup::AddDocument docshell/base/BrowsingContextGroup.cpp:327
7 xul.dll mozilla::dom::Document::SetScopeObject dom/base/Document.cpp:6913
8 xul.dll mozilla::dom::Document::SetScriptGlobalObject dom/base/Document.cpp:7093
9 xul.dll nsGlobalWindowOuter::SetNewDocument dom/base/nsGlobalWindowOuter.cpp:2395
This crash is hitting this assertion
The crash volume is low on crashstats but there's a corresponding crash I found on the Microsoft health dashboards with an alarmingly high volume: between 1k and 4k per day crashes depending on the days. The (very poor) stack trace is the following there:
0 ntdll RtlInitializeCriticalSectionAndSpinCount 0x1C
1 kernelbase InitializeCriticalSectionAndSpinCount 0xB
2 mozglue arena_t::arena_t 0x22
3 mozglue ArenaCollection::CreateArena 0x5D
4 mozglue je_malloc 0x5A0
5 mozglue moz_xmalloc 0xD
6 firefox std::basic_string_wchar_t,std::char_traits_wchar_t_,std::allocator_wchar_t_ _::_Reallocate_for_`lambda at /builds/worker/checkouts/gecko/vs2017_15.8.4/VC/include/xstring:2668:35',const wchar_t *_ 0x8D
7 firefox _GLOBAL__sub_I_permissionsService.cpp 0x39
8 ucrtbase _initterm 0x43
9 firefox __scrt_common_main_seh 0x81
Reporter | ||
Comment 1•3 years ago
|
||
One particularly worrying aspect is that Microsoft documentation states that InitializeCriticalSectionAndSpinCount()
always succeeds in recent versions of Windows making this extremely puzzling.
Reporter | ||
Comment 2•3 years ago
|
||
Alright, I figured it out by cracking open a minidump:
- This allocation of an
arena_t
object is fallible and is failing, the relevant pointer in the minidump isNULL
- However we're still calling the
arena_t::arena_t
constructor (which I find odd, but the intricacies of C++ object construction have often eluded me) - The failing statement in these crashes is the first one in the constructor which is why we appear to be crashing in the assertion, but we're not.
InitializeCriticalSectionAndSpinCount()
is just being called on aNULL
pointer
(In reply to Gabriele Svelto [:gsvelto] from comment #2)
- However we're still calling the
arena_t::arena_t
constructor (which I find odd, but the intricacies of C++ object construction have often eluded me)
I think that is the key.
Based on https://godbolt.org/z/fn1h5z it looks like clang skips the null check if the allocation function doesn't have noexcept
, and our Windows builds don't have noexcept
because clang-cl sets _MSC_VER
.
Perhaps that _MSC_VER
condition dates back to a time when it meant actual MSVC and clang-cl wasn't something we thought about. In the same godbolt link above, it seems that MSVC doesn't care whether you set noexcept, it'll do the null check either way. I suppose it could be argued that this is a failure of clang-cl to be MSVC-compatible, and I'm happy to file it upstream if y'all agree, but in the meantime I think the easiest fix is to remove the ifdef on our side.
Confirmed by disassembling a try build that it did the trick: https://treeherder.mozilla.org/jobs?repo=try&revision=5a88503d62b9f0e0ff971d2562af80c83cbc32f2
If the arena is null, the code will now fall into the error logging code, in which I happened to notice these famous last words:
// In practice, this is an extremely unlikely failure.
In clang-cl builds, thanks to clang-cl's defining of _MSC_VER
, this function was not marked noexcept
. This led clang to believe that it could call arena_t
's constructor without checking for null.
I suppose we could scope the condition down to real MSVC, but since we don't support that anymore, we can just stop checking.
Updated•3 years ago
|
I looked for other code that may have this pattern but mozjemalloc.cpp seems to be the only one: https://searchfox.org/mozilla-central/search?q=operator.*fallible&path=&case=true®exp=true (SharedBuffer.h is using the global operator new
from cxxalloc.h, which is already noexcept
)
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3239d2c504d3 Use noexcept on arena_t's fallible allocator, even on Windows r=glandium
I filed https://bugs.llvm.org/show_bug.cgi?id=48463 upstream.
Comment 9•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•