Switch zstd allocator to use jemalloc instead of malloc
Categories
(Core :: Networking: HTTP, defect, P3)
Tracking
()
People
(Reporter: RyanVM, Assigned: jesup)
References
(Regression)
Details
(Keywords: crash, csectype-nullptr, regression, Whiteboard: [necko-triaged][necko-priority-next])
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/c6d1e42f-e747-492a-911b-d7c000240710
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames:
0 xul.dll ZSTD_DCtx_setParameter(ZSTD_DCtx_s*, <unnamed-tag>, int) third_party/zstd/lib/decompress/zstd_decompress.c:1906
0 xul.dll mozilla::net::ZstdWrapper::ZstdWrapper() netwerk/streamconv/converters/nsHTTPCompressConv.cpp:62
0 xul.dll mozilla::MakeUnique() mfbt/UniquePtr.h:606
0 xul.dll mozilla::net::nsHTTPCompressConv::OnDataAvailable(nsIRequest*, nsIInputStream... netwerk/streamconv/converters/nsHTTPCompressConv.cpp:694
1 xul.dll mozilla::net::HttpChannelChild::DoOnDataAvailable(nsIRequest*, nsIInputStream... netwerk/protocol/http/HttpChannelChild.cpp:812
1 xul.dll mozilla::net::HttpChannelChild::OnTransportAndData(nsresult const&, nsresult ... netwerk/protocol/http/HttpChannelChild.cpp:706
1 xul.dll mozilla::net::HttpChannelChild::ProcessOnTransportAndData::<lambda_19>::opera... netwerk/protocol/http/HttpChannelChild.cpp:631
1 xul.dll std::invoke(mozilla::net::HttpChannelChild::ProcessOnTransportAndData::<lambd... /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/type_traits:1729
1 xul.dll std::_Func_impl_no_alloc<`lambda at /builds/worker/checkouts/gecko/netwerk/pr... /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/functional:905
2 xul.dll mozilla::net::ChannelEventQueue::FlushQueue() netwerk/ipc/ChannelEventQueue.cpp:94
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1871963
:jesup, since you are the author of the regressor, bug 1871963, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
| Reporter | ||
Updated•1 year ago
|
| Assignee | ||
Comment 2•1 year ago
|
||
Null-ptr deref.
Hard to see how this happens; we do mDStream = ZSTD_createDStream(); ZSTD_DCtx_setParameter(mDStream, ...). and setParameter just derefs mDStream.
| Assignee | ||
Comment 3•1 year ago
|
||
The disassembly of the crash location indicates that rax is null (crash address 0x761C):
call ZSTD_createDCtx_internal (which is inlined from ZSTD_createDStream())
mov qword ptr [r12+28h],rax
->cmp dword ptr [rax+761Ch],0
This implies that createDStream somehow returned null. ZSTD_createDCtx_internal() can return NULL if ZSTD_customMalloc() returns null. But ZSTD_customMalloc() should devolve to malloc().
I can add a MOZ_RELEASE_ASSERT(mDStream); we'll still crash if it returns NULL, but it will be more obvious why if this is the cause. Note that the crash reports I looked at did not appear to be in OOM conditions, so I can't see how malloc() could fail.
| Assignee | ||
Comment 4•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 5•1 year ago
|
||
:jesup, there is a patch pending review. Wondering it will make the Fx129 release?
We are in the final week of beta for Fx129
| Assignee | ||
Comment 6•1 year ago
|
||
The patch probably just changes a nullptr crash into an ASSERT, but it would prove what's happening (which is hard to be 100% certain of due to the optimizer). As such, it's not critical to uplift. I've just asked for re-review
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
Comment 8•1 year ago
|
||
| bugherder | ||
| Reporter | ||
Updated•1 year ago
|
Comment 9•1 year ago
|
||
Hi Randell, I found one report where we triggered the assertion:
https://crash-stats.mozilla.org/report/index/1c2d292c-d385-489f-a407-0531a0240909
It does seem the problem is an OOM. A lot of the crashes in zstd were for users with 4 Gb of RAM 😮
I'm not sure if using MakeUniqueFallible would work here
Maybe we need a static constructor function that returns an error if mDStream is null?
I'm also not sure what the user experience would look like if we didn't crash but failed to decode the resource, but it's probably better not to crash 🤷
| Assignee | ||
Comment 10•1 year ago
|
||
I see 5 hits, all OOMs with ERROR_COMMITTMENT_LIMIT for the last error.
Avoiding OOM here will likely just OOM soon elsewhere. The structure is a bit over 64K long largely driven by https://searchfox.org/mozilla-central/source/third_party/zstd/lib/decompress/zstd_decompress_internal.h#197; if that's failing things are bad and we won't get much further
| Assignee | ||
Comment 11•1 year ago
|
||
We could make all allocations from zstd be infallible; currently it should just be calling malloc(). We could define a custom allocator which would be jemalloc's malloc, and get a more 'normal' OOM. This might be good for other reasons.
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
•
|
||
Randell, could we add a null check and return an error.
If there's an OOM, maybe it's OK for it to happen somewhere else so we can close the bug :)
Comment 14•1 year ago
|
||
Friendly ping for adding a null check (comment #12).
Thanks.
| Assignee | ||
Updated•1 year ago
|
| Reporter | ||
Updated•11 months ago
|
Updated•11 months ago
|
Comment 15•10 months ago
|
||
No more crashes for [@ mozilla::net::ZstdWrapper::ZstdWrapper ] since bug 1942817 landed.
Randell, should we leave this open or can we close it yet?
| Assignee | ||
Updated•10 months ago
|
| Reporter | ||
Updated•9 months ago
|
Description
•