Crash in [@ arena_t::MallocSmall | arena_t::Malloc | BaseAllocator::malloc | MozJemalloc::malloc]
Categories
(Core :: Memory Allocator, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox122 | --- | affected |
People
(Reporter: release-mgmt-account-bot, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, topcrash)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/c9cddd00-2acb-4ee8-a4ec-6a9790231104
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(run->mMagic == 0x384adf93)
Top 10 frames of crashing thread:
0 firefox-bin arena_t::MallocSmall memory/build/mozjemalloc.cpp:3296
0 firefox-bin arena_t::Malloc memory/build/mozjemalloc.cpp:3344
0 firefox-bin BaseAllocator::malloc memory/build/mozjemalloc.cpp:4564
0 firefox-bin MozJemalloc::malloc memory/build/malloc_decls.h:51
0 firefox-bin PageMalloc memory/build/PHC.cpp:1309
0 firefox-bin MozJemallocPHC::malloc memory/build/PHC.cpp:1313
0 firefox-bin ReplaceMalloc::malloc memory/build/malloc_decls.h:51
0 firefox-bin malloc memory/build/malloc_decls.h:51
0 firefox-bin moz_xmalloc memory/mozalloc/mozalloc.cpp:52
1 libxul.so operator new memory/mozalloc/cxxalloc.h:33
By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:
- First crash report: 2023-10-25
- Process type: Multiple distinct types
- Is startup crash: No
- Has user comments: No
- Is null crash: Yes - 3 out of 4 crashes happened on null or near null memory address
Updated•2 years ago
|
Comment 1•2 years ago
•
|
||
How does this make sense? To get a crash address of 0x0 like in the linked crash report, in the test "run->mMagic == ARENA_RUN_MAGIC", you'd need run to be null. Except three lines above we have this:
if (MOZ_UNLIKELY(!run)) {
return nullptr;
}
IOW, a null run should have returned.
Comment 2•2 years ago
|
||
Duh, the crash address comes from MOZ_DIAGNOSTIC_ASSERT and is not relevant. Since this is happening during allocation, this means this is not a case where the address of the run is not that of a run. So what this means is that some other code wrote over the magic number via buffer overflow...
Comment 3•2 years ago
|
||
There's three different crash reasons under this signature with the first one being by far the most common:
MOZ_RELEASE_ASSERT(mNode)MOZ_DIAGNOSTIC_ASSERT(run->mMagic == 0x384adf93)MOZ_DIAGNOSTIC_ASSERT(run->mNumFree > 0)
Cracking open minidumps might tell us what those values are, and if they're caused by bit-flips or a real problem. I'm NI?ing myself to do it when I have some free time.
Comment 4•2 years ago
|
||
No luck here, the crash happens within deeply inlined code so it's very hard to recover the values of the variables. I'll try to manually look at the disassembly and see if I can figure something out form those but I make no promises.
Updated•1 year ago
|
| Reporter | ||
Comment 5•9 days ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on release
:pbone, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Comment 6•9 days ago
|
||
I was looking at these crashes with Jens and I noticed that several of the crashes have two threads in the memory allocator at the same time, see for example a5a84af4-a76f-400a-b39a-fe6ab0251210. The crashing thread is the IPC I/O Child thread doing a malloc() and the main thread is also doing a malloc() (but blocked on a lock). That's a pretty strong hint there might be a problem in the allocator itself.
Comment 7•7 days ago
•
|
||
Only looking at the majority of the volume here, these crashes come from Trend Micro users with MOZ_RELEASE_ASSERT(mNode) as a crash reason. This gets reflected in the loaded modules with the presence of Trend Micro DLLs.
Taking an example crash and disassembling at RIP shows:
0:001> u rip
mozglue!arena_t::MallocSmall+0x1c80 [/builds/worker/checkouts/gecko/memory/build/mozjemalloc.cpp @ 2746] [inlined in mozglue!moz_xmalloc+0x1dab [/builds/worker/checkouts/gecko/memory/mozalloc/mozalloc.cpp @ 52]]:
00007ffa`fdcb3ffb cc int 3
00007ffa`fdcb3ffc b9b9000000 mov ecx,0B9h
00007ffa`fdcb4001 e829450400 call mozglue!MOZ_NoReturn (00007ffa`fdcf852f)
B9 is the line value for MOZ_NoReturn(line); at this call site, and so we are at line 185 in RedBlackTree.h, so in [@ RedBlackTree<T>::TreeNode::SetColor]. Hence, this is a variation of bug 1872261. I'm not sure why the inlining info seems confused here and the signature changed (perhaps this part is worth its own investigation), but the assembly code leaves no doubt about this fact.
So... Either we broke something in our blocklist code ourselves, or Trend Micro successfully pushed a bypass to our blocklist code without addressing the underlying issue that caused them to be blocked in the first place.
Updated•7 days ago
|
Comment 8•7 days ago
|
||
Nevermind, I'll file a new bug for the Trend Micro part since the bug was originally not about that. Sorry.
Comment 9•2 days ago
•
|
||
(In reply to Yannis Juglaret [:yannis] from comment #7)
Only looking at the majority of the volume here
:jstutte wanted more precise numbers about this: here they are. Over the last six months, we have received 1353 Firefox crashes on this specific signature. Out of those 1353, 1208 show the presence of a Trend Micro DLL, so 89% overall. But the crashes with Trend Micro DLLs only started in release 143.0 (which matches with the discovery of bug 1872261). Interestingly, no crashes with Trend Micro DLLs in releases 144.0 (the first version with the uplifted patch from bug bug 1872261) and 144.0.2. But they are back in 145.0, 145.0.1, 145.0.2. The prevalence is particularly high for 145.0.2, where we're at 721 out of 729, so 99% of crashes have Trend Micro DLLs.
Below are versions of Firefox for which we received crashes with Trend Micro DLLs, ordered by volume:
Total: 1208
145.0.2: 721
143.0.1: 188
145.0: 130
145.0.1: 113
143.0: 12
144.0b4: 7
143.0b4: 6
143.0b5: 5
143.0b9: 5
143.0b7: 4
143.0rc1: 3
143.0.3: 2
143.0b6: 2
144.0b2: 2
144.0b3: 2
145.0rc2: 2
143.0b3: 1
144.0b5: 1
145.0b1: 1
145.0b2: 1
And the same for crashes without Trend Micro DLLs:
Total: 145
128.13.0esr: 39
121.0.1: 14
145.0.1: 9
145.0.2: 8
143.0.1: 6
140.0.2: 5
142.0.1: 5
140.0.4: 4
141.0.3: 4
145.0: 4
128.14.0esr: 3
136.0b3: 3
140.4.0esr: 3
144.0.2: 3
127.0: 2
128.12.0esr: 2
139.0.4: 2
142.0: 2
143.0.4: 2
144.0b5: 2
146.0b0: 2
121.0: 1
127.0a1: 1
128.11.0esr: 1
128.5.1esr: 1
128.6.0esr: 1
128.7.0esr: 1
129.0.2: 1
130.0: 1
132.0.2: 1
135.0b1: 1
140.5.0esr: 1
140.6.0esr: 1
141.0.2: 1
142.0b3: 1
143.0a1: 1
143.0b0: 1
143.0b3: 1
144.0b1: 1
146.0: 1
146.0a1: 1
146.0b6: 1
Description
•