Closed Bug 1263292 Opened 9 years ago Closed 9 years ago

Windows e10s jemalloc4 startup permacrash since bug 1235633

Categories

(Core :: Memory Allocator, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---
firefox48 --- fixed

People

(Reporter: RyanVM, Assigned: billm)

References

Details

(Keywords: crash)

Attachments

(1 file)

This is with upstream jemalloc tip (includes the fix for bug 1261226). I bisected this down to the patch from bug 1235633 as the cause. https://hg.mozilla.org/integration/mozilla-inbound/rev/dd3e03fcb06b I've confirmed that this only reproduces with jemalloc4 enabled and that both win32 and win64 builds are affected. The exact top frame of the stack varies from run to run, but it's always in arena dalloc functions. Crash stack: mozglue.dll!arena_run_dalloc(arena_s * arena, arena_run_s * run, bool dirty, bool cleaned, bool decommitted) Line 1907 mozglue.dll!arena_dalloc_large_locked_impl(arena_s * arena, arena_chunk_s * chunk, void * ptr, bool junked) Line 2815 mozglue.dll!je_arena_dalloc_large(tsd_s * tsd, arena_s * arena, arena_chunk_s * chunk, void * ptr) Line 2831 mozglue.dll!je_isqalloc(tsd_s * tsd, void * ptr, unsigned __int64 size, tcache_s * tcache) Line 1100 mozglue.dll!je_arena_ralloc(tsd_s * tsd, arena_s * arena, void * ptr, unsigned __int64 oldsize, unsigned __int64 size, unsigned __int64 alignment, bool zero, tcache_s * tcache) Line 3130 mozglue.dll!je_realloc(void * ptr, unsigned __int64 size) Line 1887 mozglue.dll!realloc_impl(void * ptr, unsigned __int64 size) Line 191 xul.dll!Buffer::try_realloc(unsigned __int64 newlength) Line 54 xul.dll!Buffer::assign(const char * bytes, unsigned __int64 length) Line 87 xul.dll!IPC::Channel::ChannelImpl::ProcessIncomingMessages(base::MessagePumpForIO::IOContext * context, unsigned long bytes_read) Line 430 xul.dll!IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO::IOContext * context, unsigned long bytes_transfered, unsigned long error) Line 515 xul.dll!base::MessagePumpForIO::WaitForIOCompletion(unsigned long timeout, base::MessagePumpForIO::IOHandler * filter) Line 495 xul.dll!base::MessagePumpForIO::DoRunLoop() Line 439 xul.dll!base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate * delegate, base::MessagePumpWin::Dispatcher * dispatcher) Line 56 xul.dll!MessageLoop::RunHandler() Line 224 xul.dll!MessageLoop::Run() Line 204 xul.dll!base::Thread::ThreadMain() Line 177 xul.dll!ThreadEntry(void * arg) Line 256
How do I reproduce this?
Flags: needinfo?(ryanvm)
Add MOZ_JEMALLOC4=1 to your mozconfig and apply https://people.mozilla.org/~rvandermeulen/jemalloc so you don't crash in xpcshell.exe during packaging. Beyond that, just launching via |./mach run| should crash on startup.
Flags: needinfo?(ryanvm)
Does it happen with mozjemalloc? Does it happen if you build with a non-updated jemalloc4 with https://hg.mozilla.org/mozilla-central/rev/0a14d675236e reverted? What about a non-updated jemalloc4 with a cherry-pick of https://github.com/jemalloc/jemalloc/commit/4a8abbb400afe695f145a487380c04a946500bc6 ? If you still have the instructions to get an allocation log, can you get one? (In reply to Bill McCloskey (:billm) from comment #1) > How do I reproduce this? Set UPSTREAM_COMMIT to dev in memory/jemalloc/upstream.info, run memory/jemalloc/update.sh, then build with MOZ_JEMALLOC4=1 set.
From a debug build, but not entirely sure if it's relevant or not: Assertion failure: mRawPtr != 0 (You can't dereference a NULL RefPtr with operator->().), at objdir-fx-64-debug\dist\include\mozilla/RefPtr.h:297 #01: D3DVsyncSource::D3DVsyncDisplay::VBlankLoop (gfx\thebes\gfxwindowsplatform.cpp:2783) #02: RunnableMethod<D3DVsyncSource::D3DVsyncDisplay,void (__cdecl D3DVsyncSource::D3DVsyncDisplay::*)(void) __ptr64,mozilla::Tuple<> >::Run (ipc\chromium\src\base\task.h:290) #03: MessageLoop::RunTask (ipc\chromium\src\base\message_loop.cc:350) #04: MessageLoop::DeferOrRunPendingTask (ipc\chromium\src\base\message_loop.cc:360) #05: MessageLoop::DoWork (ipc\chromium\src\base\message_loop.cc:444) #06: base::MessagePumpDefault::Run (ipc\chromium\src\base\message_pump_default.cc:35) #07: MessageLoop::RunHandler (ipc\chromium\src\base\message_loop.cc:224) #08: MessageLoop::Run (ipc\chromium\src\base\message_loop.cc:204) #09: base::Thread::ThreadMain (ipc\chromium\src\base\thread.cc:177) #10: `anonymous namespace'::ThreadFunc (ipc\chromium\src\base\platform_thread_win.cc:27) #11: BaseThreadInitThunk[C:\Windows\system32\KERNEL32.DLL +0x18102] #12: RtlUserThreadStart[C:\Windows\SYSTEM32\ntdll.dll +0x5c5b4]
Does the DEBUG crash happen without my patch?
Flags: needinfo?(ryanvm)
(In reply to Mike Hommey [:glandium] from comment #3) > Does it happen with mozjemalloc? No. I originally confirmed it was a jemalloc4 issue by removing all jemalloc-related entries from my .mozconfig. I also left just |--enable-jemalloc| set without MOZ_JEMALLOC4=1 and couldn't reproduce. > Does it happen if you build with a non-updated jemalloc4 with > https://hg.mozilla.org/mozilla-central/rev/0a14d675236e reverted? Crash > What about a non-updated jemalloc4 with a cherry-pick of > https://github.com/jemalloc/jemalloc/commit/ > 4a8abbb400afe695f145a487380c04a946500bc6 ? Crash > If you still have the instructions to get an allocation log, can you get one? I'm not sure how to get that from the content process. I set the env vars, but the log only covered the parent process. (In reply to Bill McCloskey (:billm) from comment #5) > Does the DEBUG crash happen without my patch? It does not.
Flags: needinfo?(ryanvm)
BTW, these crashes are reproducible on Try as well. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0996368bb81&group_state=expanded&filter-searchStr=e10s Fun story, Marionette finished green even though the logs clearly show it also crashing. I've filed a bug for that little doozy too :).
tracking-e10s: --- → +
Priority: -- → P3
Attached patch patchSplinter Review
In try_realloc I forgot to consider the case where newlength is 0. In that case we'll get null back from realloc and our buffer gets freed. We need to make sure that we set mReserved correctly in this case or else we'll crash.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8741196 - Flags: review?(jld)
Comment on attachment 8741196 [details] [diff] [review] patch Review of attachment 8741196 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for missing that the first time.
Attachment #8741196 - Flags: review?(jld) → review+
Comment on attachment 8741196 [details] [diff] [review] patch Works great locally and on Try. Thanks!
Attachment #8741196 - Flags: feedback+
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: