Closed Bug 1597972 Opened 10 months ago Closed 10 months ago

Crash on Linux when starting the profiler with the "native allocations" option set

Categories

(Core :: Gecko Profiler, defect, P1)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: julienw, Assigned: gregtatum)

Details

(Keywords: crash)

Attachments

(1 file)

STR is:

  1. Enable the profiler toolbar icon.
  2. Enable the "Native Allocations" feature.
  3. Start profiling.

=> Crash

Here is the stack:

#0  0x00007fffe9936af4 in MergeStacks(unsigned int, bool, RegisteredThread const&, Registers const&, NativeStack const&, ProfilerStackCollector&) ()
    at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#1  0x00007fffe9936a90 in DoSharedSample(PSAutoLock const&, bool, RegisteredThread&, Registers const&, unsigned long, ProfileBuffer&) ()
    at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#2  0x00007fffe9920058 in profiler_get_backtrace() () at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#3  0x00007fffe9920622 in profiler_add_native_allocation_marker(int, long, unsigned long) () at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#4  0x00007fffe990131b in mozilla::profiler::AllocCallback(void*, unsigned long) () at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#5  0x00007fffe9900de7 in replace_malloc(unsigned long) () at /mnt/desktop/gecko-dev/obj-firefox-artifact/dist/bin/libxul.so
#6  0x000055555557c97e in moz_xmalloc ()
#7  0x00007fffe951e50a in SkARGB32_Shader_Blitter::SkARGB32_Shader_Blitter(SkPixmap const&, SkPaint const&, SkShaderBase::Context*) ()

And the crash happens at this line:

0x00007ffff1693944 in MergeStacks (aFeatures=<optimized out>, aIsSynchronous=true, aRegisteredThread=..., aRegs=..., aNativeStack=..., aCollector=...)
    at /home/julien/travail/git/mozilla-central/tools/profiler/core/platform.cpp:1309
1309	                        ProfilerStackCollector& aCollector) {

Paul Adenot helped me debugging this a big and we found it's really happening in a memset. Still investigating...

Keywords: crash
Assignee: nobody → gtatum
Priority: -- → P1
#0  0x00007ffff6e10f2d in __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:141
#1  0x00007fffe995f602 in MergeStacks(unsigned int, bool, RegisteredThread const&, Registers const&, NativeStack const&, ProfilerStackCollector&) (aFeatures=<optimized out>, aIsSynchronous=true, aRegisteredThread=..., aRegs=..., aNativeStack=..., aCollector=...) at /home/gregtatum/dev/gecko/tools/profiler/core/platform.cpp:1335
#2  0x00007fffe995f4a3 in DoSharedSample(PSAutoLock const&, bool, RegisteredThread&, Registers const&, unsigned long, ProfileBuffer&) (aLock=..., aIsSynchronous=true, aRegisteredThread=..., aRegs=..., aSamplePos=<optimized out>, aBuffer=...) at /home/gregtatum/dev/gecko/tools/profiler/core/platform.cpp:1873
#3  0x00007fffe9957cce in DoSyncSample(PSAutoLock const&, RegisteredThread&, mozilla::TimeStamp const&, Registers const&, ProfileBuffer&) (aLock=..., aRegisteredThread=..., aNow=..., aRegs=..., aBuffer=...)
    at /home/gregtatum/dev/gecko/tools/profiler/core/platform.cpp:1900
#4  0x00007fffe9957cce in profiler_get_backtrace() ()
    at /home/gregtatum/dev/gecko/tools/profiler/core/platform.cpp:4584
#5  0x00007fffe9958188 in profiler_add_native_allocation_marker(int, long, unsigned long) (aMainThreadId=8304, aSize=<optimized out>, aMemoryAddress=<optimized out>) at /home/gregtatum/dev/gecko/tools/profiler/core/platform.cpp:4667
#6  0x00007fffe993deea in mozilla::profiler::FreeCallback(void*) (aPtr=0x7fff2fdfe000)
    at /home/gregtatum/dev/gecko/tools/profiler/core/memory_hooks.cpp:445
#7  0x00007fffe993d9cf in replace_free(void*) (aPtr=0x7fff2fdfe000)
    at /home/gregtatum/dev/gecko/tools/profiler/core/memory_hooks.cpp:502
#8  0x00007fffe605eef7 in InfallibleAllocPolicy::free_<char>(char*, unsigned long) (this=0x7fffc9e88408, aPtr=0x7fffde54d000 "", aNumElems=<optimized out>)
    at /home/gregtatum/dev/gecko/obj-x86_64-pc-linux-gnu/dist/include/mozilla/mozalloc.h:164
#9  0x00007fffe605eef7 in mozilla::BufferList<InfallibleAllocPolicy>::Clear() (this=0x7fffc9e88408)
    at /home/gregtatum/dev/gecko/obj-x86_64-pc-linux-gnu/dist/include/mozilla/BufferList.h:163
#10 0x00007fffe60571ee in mozilla::BufferList<InfallibleAllocPolicy>::~BufferList() (this=0x7fffc9e88408)
    at /home/gregtatum/dev/gecko/obj-x86_64-pc-linux-gnu/dist/include/mozilla/BufferList.h:115
#11 0x00007fffe60571ee in Pickle::~Pickle() (this=0x7fffc9e88408)
    at /home/gregtatum/dev/gecko/ipc/chromium/src/base/pickle.cc:160
#12 0x00007fffe6066aa8 in IPC::Message::~Message() (this=0x7fffc9e88400)
    at /home/gregtatum/dev/gecko/ipc/chromium/src/chrome/common/ipc_message.cc:37
#13 0x00007fffe6065990 in IPC::Channel::ChannelImpl::ProcessOutgoingMessages() (this=0x7fffd541d000)
    at /home/gregtatum/dev/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:768
#14 0x00007fffe606672a in IPC::Channel::ChannelImpl::Send(IPC::Message*) (this=0x7fffd541d000, message=0x7fffc9e88400)
    at /home/gregtatum/dev/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:803
#15 0x00007fffe60d536f in mozilla::detail::RunnableMethodArguments<IPC::Message*>::applyImpl<IPC::Channel, bool (IPC::Channel::*)(IPC::Message*), StorePtrPassByPtr<IPC::Message>, 0ul>(IPC::Channel*, bool (IPC::Channel::*)(IPC::Message*), mozilla::Tuple<StorePtrPassByPtr<IPC::Message> >&, std::integer_sequence<unsigned long, 0ul>) (o=<optimized out>, m=<optimized out>, args=...) at /home/gregtatum/dev/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:1124
#16 0x00007fffe60d536f in _ZN7mozilla6detail23RunnableMethodArgumentsIJPN3IPC7MessageEEE5applyINS2_7ChannelEMS7_FbS4_EEEDTcl9applyImplfp_fp0_dtdefpT10mArgumentstlSt16integer_sequenceImJLm0EEEEEEPT_T0_ (this=0xea, o=<optimized out>, m=<optimized out>) at /home/gregtatum/dev/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:1130
#17 0x00007fffe60d536f in mozilla::detail::RunnableMethodImpl<IPC::Channel*, bool (IPC::Channel::*)(IPC::Message*), false, (mozilla::RunnableKind)0, IPC::Message*>::Run() (this=0xaa)

...

Here's the crash when caught in a debugger.

The native allocations feature added stackwalking that can happen anywhere that
memory is allocated. This means that stackwalking happens in places where the
execution already has a very large execution stack. Stackwalking was relying
on stack-allocated buffers used for merging stacks. This was taking up 64kb of
stack space. On Linux, this was causing a stack overflow, as there is only 256kb of
stack space. I encountered a crash while using GDB. Using pointer arithmetic,
I determined that the stack size before stack walking was around 20kb, and during
stackwalking, we overflowed the stack (>256kb). The largest culprit was the
JS::ProfilingFrameIterator::Frame jsFrames[MAX_JS_FRAMES]. In addition,
Bug 1468789 added another member to the Frame class, also increasing the size
of the stack allocation.

I changed the implementation to allocate some memory on the CorePS class, and
share that with every stackwalk that happens. I tested this loading a large news
site, and didn't get any crashes.

Pushed by gtatum@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/298ae54c2b27
Remove the JS frames array from being stack allocated; r=gerald
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.