Closed Bug 1263292 Opened 4 years ago Closed 4 years ago

Windows e10s jemalloc4 startup permacrash since bug 1235633

Categories

(Core :: Memory Allocator, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---
firefox48 --- fixed

People

(Reporter: RyanVM, Assigned: billm)

References

Details

(Keywords: crash)

Attachments

(1 file)

This is with upstream jemalloc tip (includes the fix for bug 1261226). I bisected this down to the patch from bug 1235633 as the cause.
https://hg.mozilla.org/integration/mozilla-inbound/rev/dd3e03fcb06b

I've confirmed that this only reproduces with jemalloc4 enabled and that both win32 and win64 builds are affected. The exact top frame of the stack varies from run to run, but it's always in arena dalloc functions.

Crash stack:
mozglue.dll!arena_run_dalloc(arena_s * arena, arena_run_s * run, bool dirty, bool cleaned, bool decommitted) Line 1907
mozglue.dll!arena_dalloc_large_locked_impl(arena_s * arena, arena_chunk_s * chunk, void * ptr, bool junked) Line 2815
mozglue.dll!je_arena_dalloc_large(tsd_s * tsd, arena_s * arena, arena_chunk_s * chunk, void * ptr) Line 2831
mozglue.dll!je_isqalloc(tsd_s * tsd, void * ptr, unsigned __int64 size, tcache_s * tcache) Line 1100
mozglue.dll!je_arena_ralloc(tsd_s * tsd, arena_s * arena, void * ptr, unsigned __int64 oldsize, unsigned __int64 size, unsigned __int64 alignment, bool zero, tcache_s * tcache) Line 3130
mozglue.dll!je_realloc(void * ptr, unsigned __int64 size) Line 1887
mozglue.dll!realloc_impl(void * ptr, unsigned __int64 size) Line 191
xul.dll!Buffer::try_realloc(unsigned __int64 newlength) Line 54
xul.dll!Buffer::assign(const char * bytes, unsigned __int64 length) Line 87
xul.dll!IPC::Channel::ChannelImpl::ProcessIncomingMessages(base::MessagePumpForIO::IOContext * context, unsigned long bytes_read) Line 430
xul.dll!IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO::IOContext * context, unsigned long bytes_transfered, unsigned long error) Line 515
xul.dll!base::MessagePumpForIO::WaitForIOCompletion(unsigned long timeout, base::MessagePumpForIO::IOHandler * filter) Line 495
xul.dll!base::MessagePumpForIO::DoRunLoop() Line 439
xul.dll!base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate * delegate, base::MessagePumpWin::Dispatcher * dispatcher) Line 56
xul.dll!MessageLoop::RunHandler() Line 224
xul.dll!MessageLoop::Run() Line 204
xul.dll!base::Thread::ThreadMain() Line 177
xul.dll!ThreadEntry(void * arg) Line 256
How do I reproduce this?
Flags: needinfo?(ryanvm)
Add MOZ_JEMALLOC4=1 to your mozconfig and apply https://people.mozilla.org/~rvandermeulen/jemalloc so you don't crash in xpcshell.exe during packaging. Beyond that, just launching via |./mach run| should crash on startup.
Flags: needinfo?(ryanvm)
Does it happen with mozjemalloc?

Does it happen if you build with a non-updated jemalloc4 with https://hg.mozilla.org/mozilla-central/rev/0a14d675236e reverted?
What about a non-updated jemalloc4 with a cherry-pick of https://github.com/jemalloc/jemalloc/commit/4a8abbb400afe695f145a487380c04a946500bc6 ?

If you still have the instructions to get an allocation log, can you get one?

(In reply to Bill McCloskey (:billm) from comment #1)
> How do I reproduce this?

Set UPSTREAM_COMMIT to dev in memory/jemalloc/upstream.info, run memory/jemalloc/update.sh, then build with MOZ_JEMALLOC4=1 set.
From a debug build, but not entirely sure if it's relevant or not:
Assertion failure: mRawPtr != 0 (You can't dereference a NULL RefPtr with operator->().), at objdir-fx-64-debug\dist\include\mozilla/RefPtr.h:297
#01: D3DVsyncSource::D3DVsyncDisplay::VBlankLoop (gfx\thebes\gfxwindowsplatform.cpp:2783)
#02: RunnableMethod<D3DVsyncSource::D3DVsyncDisplay,void (__cdecl D3DVsyncSource::D3DVsyncDisplay::*)(void) __ptr64,mozilla::Tuple<> >::Run (ipc\chromium\src\base\task.h:290)
#03: MessageLoop::RunTask (ipc\chromium\src\base\message_loop.cc:350)
#04: MessageLoop::DeferOrRunPendingTask (ipc\chromium\src\base\message_loop.cc:360)
#05: MessageLoop::DoWork (ipc\chromium\src\base\message_loop.cc:444)
#06: base::MessagePumpDefault::Run (ipc\chromium\src\base\message_pump_default.cc:35)
#07: MessageLoop::RunHandler (ipc\chromium\src\base\message_loop.cc:224)
#08: MessageLoop::Run (ipc\chromium\src\base\message_loop.cc:204)
#09: base::Thread::ThreadMain (ipc\chromium\src\base\thread.cc:177)
#10: `anonymous namespace'::ThreadFunc (ipc\chromium\src\base\platform_thread_win.cc:27)
#11: BaseThreadInitThunk[C:\Windows\system32\KERNEL32.DLL +0x18102]
#12: RtlUserThreadStart[C:\Windows\SYSTEM32\ntdll.dll +0x5c5b4]
Does the DEBUG crash happen without my patch?
Flags: needinfo?(ryanvm)
(In reply to Mike Hommey [:glandium] from comment #3)
> Does it happen with mozjemalloc?
No. I originally confirmed it was a jemalloc4 issue by removing all jemalloc-related entries from my .mozconfig. I also left just |--enable-jemalloc| set without MOZ_JEMALLOC4=1 and couldn't reproduce.
> Does it happen if you build with a non-updated jemalloc4 with
> https://hg.mozilla.org/mozilla-central/rev/0a14d675236e reverted?
Crash
> What about a non-updated jemalloc4 with a cherry-pick of
> https://github.com/jemalloc/jemalloc/commit/
> 4a8abbb400afe695f145a487380c04a946500bc6 ?
Crash
> If you still have the instructions to get an allocation log, can you get one?
I'm not sure how to get that from the content process. I set the env vars, but the log only covered the parent process.

(In reply to Bill McCloskey (:billm) from comment #5)
> Does the DEBUG crash happen without my patch?
It does not.
Flags: needinfo?(ryanvm)
BTW, these crashes are reproducible on Try as well.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0996368bb81&group_state=expanded&filter-searchStr=e10s

Fun story, Marionette finished green even though the logs clearly show it also crashing. I've filed a bug for that little doozy too :).
tracking-e10s: --- → +
Priority: -- → P3
Attached patch patchSplinter Review
In try_realloc I forgot to consider the case where newlength is 0. In that case we'll get null back from realloc and our buffer gets freed. We need to make sure that we set mReserved correctly in this case or else we'll crash.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8741196 - Flags: review?(jld)
Comment on attachment 8741196 [details] [diff] [review]
patch

Review of attachment 8741196 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for missing that the first time.
Attachment #8741196 - Flags: review?(jld) → review+
Comment on attachment 8741196 [details] [diff] [review]
patch

Works great locally and on Try. Thanks!
Attachment #8741196 - Flags: feedback+
https://hg.mozilla.org/mozilla-central/rev/d1c487cc4ef2
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.