Closed Bug 1907382 Opened 1 year ago Closed 10 months ago

Switch zstd allocator to use jemalloc instead of malloc

Categories

(Core :: Networking: HTTP, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- wontfix
firefox128 --- wontfix
firefox129 --- wontfix
firefox130 --- wontfix
firefox131 --- wontfix
firefox136 --- wontfix
firefox137 --- fixed

People

(Reporter: RyanVM, Assigned: jesup)

References

(Regression)

Details

(Keywords: crash, csectype-nullptr, regression, Whiteboard: [necko-triaged][necko-priority-next])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/c6d1e42f-e747-492a-911b-d7c000240710

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames:

0  xul.dll  ZSTD_DCtx_setParameter(ZSTD_DCtx_s*, <unnamed-tag>, int)  third_party/zstd/lib/decompress/zstd_decompress.c:1906
0  xul.dll  mozilla::net::ZstdWrapper::ZstdWrapper()  netwerk/streamconv/converters/nsHTTPCompressConv.cpp:62
0  xul.dll  mozilla::MakeUnique()  mfbt/UniquePtr.h:606
0  xul.dll  mozilla::net::nsHTTPCompressConv::OnDataAvailable(nsIRequest*, nsIInputStream...  netwerk/streamconv/converters/nsHTTPCompressConv.cpp:694
1  xul.dll  mozilla::net::HttpChannelChild::DoOnDataAvailable(nsIRequest*, nsIInputStream...  netwerk/protocol/http/HttpChannelChild.cpp:812
1  xul.dll  mozilla::net::HttpChannelChild::OnTransportAndData(nsresult const&, nsresult ...  netwerk/protocol/http/HttpChannelChild.cpp:706
1  xul.dll  mozilla::net::HttpChannelChild::ProcessOnTransportAndData::<lambda_19>::opera...  netwerk/protocol/http/HttpChannelChild.cpp:631
1  xul.dll  std::invoke(mozilla::net::HttpChannelChild::ProcessOnTransportAndData::<lambd...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/type_traits:1729
1  xul.dll  std::_Func_impl_no_alloc<`lambda at /builds/worker/checkouts/gecko/netwerk/pr...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/functional:905
2  xul.dll  mozilla::net::ChannelEventQueue::FlushQueue()  netwerk/ipc/ChannelEventQueue.cpp:94

Set release status flags based on info from the regressing bug 1871963

:jesup, since you are the author of the regressor, bug 1871963, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Null-ptr deref.

Hard to see how this happens; we do mDStream = ZSTD_createDStream(); ZSTD_DCtx_setParameter(mDStream, ...). and setParameter just derefs mDStream.

Flags: needinfo?(rjesup)

The disassembly of the crash location indicates that rax is null (crash address 0x761C):
call ZSTD_createDCtx_internal (which is inlined from ZSTD_createDStream())
mov qword ptr [r12+28h],rax
->cmp dword ptr [rax+761Ch],0
This implies that createDStream somehow returned null. ZSTD_createDCtx_internal() can return NULL if ZSTD_customMalloc() returns null. But ZSTD_customMalloc() should devolve to malloc().

I can add a MOZ_RELEASE_ASSERT(mDStream); we'll still crash if it returns NULL, but it will be more obvious why if this is the cause. Note that the crash reports I looked at did not appear to be in OOM conditions, so I can't see how malloc() could fail.

Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]

:jesup, there is a patch pending review. Wondering it will make the Fx129 release?
We are in the final week of beta for Fx129

Flags: needinfo?(rjesup)

The patch probably just changes a nullptr crash into an ASSERT, but it would prove what's happening (which is hard to be 100% certain of due to the optimizer). As such, it's not critical to uplift. I've just asked for re-review

Flags: needinfo?(rjesup)
Keywords: leave-open
Pushed by rjesup@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f017fde22ad0 add belt-and-suspenders assertion for Zstd stream creation r=necko-reviewers,kershaw

Hi Randell, I found one report where we triggered the assertion:
https://crash-stats.mozilla.org/report/index/1c2d292c-d385-489f-a407-0531a0240909

It does seem the problem is an OOM. A lot of the crashes in zstd were for users with 4 Gb of RAM 😮

I'm not sure if using MakeUniqueFallible would work here
Maybe we need a static constructor function that returns an error if mDStream is null?

I'm also not sure what the user experience would look like if we didn't crash but failed to decode the resource, but it's probably better not to crash 🤷

Flags: needinfo?(rjesup)

I see 5 hits, all OOMs with ERROR_COMMITTMENT_LIMIT for the last error.
Avoiding OOM here will likely just OOM soon elsewhere. The structure is a bit over 64K long largely driven by https://searchfox.org/mozilla-central/source/third_party/zstd/lib/decompress/zstd_decompress_internal.h#197; if that's failing things are bad and we won't get much further

Flags: needinfo?(rjesup)

We could make all allocations from zstd be infallible; currently it should just be calling malloc(). We could define a custom allocator which would be jemalloc's malloc, and get a more 'normal' OOM. This might be good for other reasons.

Priority: P2 → P3
Summary: Crash in [@ ZSTD_DCtx_setParameter] → Switch zstd allocator to use jemalloc instead of malloc
Keywords: leave-open
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-new]
Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]
Crash Signature: [@ ZSTD_DCtx_setParameter] → [@ mozilla::net::ZstdWrapper::ZstdWrapper ] [@ ZSTD_DCtx_setParameter]

Randell, could we add a null check and return an error.
If there's an OOM, maybe it's OK for it to happen somewhere else so we can close the bug :)

Flags: needinfo?(rjesup)

Sure

Flags: needinfo?(rjesup)

Friendly ping for adding a null check (comment #12).
Thanks.

Flags: needinfo?(rjesup)
Blocks: 1942817
Flags: needinfo?(rjesup)
No longer blocks: 1942817
See Also: → 1942817

No more crashes for [@ mozilla::net::ZstdWrapper::ZstdWrapper ] since bug 1942817 landed.
Randell, should we leave this open or can we close it yet?

Flags: needinfo?(rjesup)
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Flags: needinfo?(rjesup)
Resolution: --- → FIXED
Depends on: 1942817
See Also: 1942817
Target Milestone: --- → 137 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: