SIGSEGV in je_bitmap_sfu on startup


Core :: Memory Allocator

Reporter: fitzgen, Assigned: lsalzman



Pulled down the latest m-c today and now I get segfaults in jemalloc on startup on linux64.

My (git) revision is:

> commit 6944c5ba30bcc3f54ec4549fe8c20bdfc4b25b70
> Merge: 0abaaa3 b3c78a8
> Author: Carsten "Tomcat" Book <>
> Date:   Tue Sep 15 15:05:24 2015 +0200
>     merge mozilla-inbound to mozilla-central a=merge

Here is a gdb session and backtrace:

> bash-4.3$ ./mach run -P dev --debugger gdb
>  0:00.06 /usr/bin/gdb -q --args /home/fitzgen/src/mozilla-central/obj-debug/dist/bin/firefox -P dev -no-remote
> Reading symbols from /home/fitzgen/src/mozilla-central/obj-debug/dist/bin/firefox...done.
> warning: File "/home/fitzgen/src/mozilla-central/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
> To enable execution of this file add
> 	add-auto-load-safe-path /home/fitzgen/src/mozilla-central/.gdbinit
> line to your configuration file "/home/fitzgen/.gdbinit".
> To completely disable this security protection add
> 	set auto-load safe-path /
> line to your configuration file "/home/fitzgen/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
> 	info "(gdb)Auto-loading safe path"
> (gdb) run
> Starting program: /home/fitzgen/src/mozilla-central/obj-debug/dist/bin/firefox -P dev -no-remote
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/".
> warning: File "/home/fitzgen/src/mozilla-central/obj-debug/toolkit/library/" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
> Detaching after fork from child process 7404.
> Detaching after fork from child process 7406.
> warning: Corrupted shared library list: 0x7ffff691a200 != 0x7ffff688fd00
> [New Thread 0x7ffff7feb700 (LWP 7411)]
> [New Thread 0x7fffd5a1e700 (LWP 7410)]
> [New Thread 0x7fffd626f700 (LWP 7409)]
> [New Thread 0x7fffd6ac0700 (LWP 7408)]
> [New Thread 0x7fffda9ca700 (LWP 7407)]
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff7feb700 (LWP 7411)]
> 0x000000000042fc87 in je_bitmap_sfu (bitmap=0x0, binfo=0x0) at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/include/jemalloc/internal/bitmap.h:170
> 170	{
> (gdb) bt
> #0  0x000000000042fc87 in je_bitmap_sfu (bitmap=0x0, binfo=0x0) at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/include/jemalloc/internal/bitmap.h:170
> #1  0x0000000000439888 in arena_run_reg_alloc (run=0x7fffd8406290, bin_info=0x66eda0 <je_arena_bin_info+1920>) at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/src/arena.c:302
> #2  0x0000000000451a08 in je_arena_malloc_small (arena=0x7ffff6a00180, size=1024, zero=true) at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/src/arena.c:2151
> #3  0x00000000004f2f22 in je_calloc (tcache=0x0, zero=true, size=1024, arena=0x7ffff6a00180, tsd=0x7ffff7feb6b0)
>     at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/include/jemalloc/internal/arena.h:1145
> #4  0x00000000004f2f22 in je_calloc (arena=0x0, is_metadata=false, tcache=0x0, zero=true, size=1024, tsd=0x7ffff7feb6b0) at src/include/jemalloc/internal/jemalloc_internal.h:887
> #5  0x00000000004f2f22 in je_calloc (size=1024, tsd=0x7ffff7feb6b0) at src/include/jemalloc/internal/jemalloc_internal.h:920
> #6  0x00000000004f2f22 in je_calloc (num=1, size=1024) at /home/fitzgen/src/mozilla-central/memory/jemalloc/src/src/jemalloc.c:1663
> #7  0x0000000000421ab1 in calloc (num=1, size=1024) at /home/fitzgen/src/mozilla-central/memory/build/replace_malloc.c:181
> #8  0x00007ffff65c9eb3 in PR_Calloc (nelem=1, elsize=1024) at /home/fitzgen/src/mozilla-central/nsprpub/pr/src/malloc/prmem.c:443
> #9  0x00007ffff65c8045 in PR_SetThreadPrivate (index=2, priv=0x7fffd843c5b8) at /home/fitzgen/src/mozilla-central/nsprpub/pr/src/threads/prtpd.c:161
> #10 0x00007fffe2dad66f in mozilla::BlockingResourceBase::ResourceChainAppend(mozilla::BlockingResourceBase*) (this=0x7fffd843c5b8, aPrev=0x0) at ../../dist/include/mozilla/BlockingResourceBase.h:181
> #11 0x00007fffe2da87a5 in mozilla::BlockingResourceBase::Acquire() (this=0x7fffd843c5b8) at /home/fitzgen/src/mozilla-central/xpcom/glue/BlockingResourceBase.cpp:322
> #12 0x00007fffe2da8980 in mozilla::OffTheBooksMutex::Lock() (this=0x7fffd843c5b8) at /home/fitzgen/src/mozilla-central/xpcom/glue/BlockingResourceBase.cpp:383
> #13 0x00007fffe2c48424 in mozilla::Monitor::Lock() (this=0x7fffd843c5b8) at ../../dist/include/mozilla/Monitor.h:35
> #14 0x00007fffe2c4848a in mozilla::MonitorAutoLock::MonitorAutoLock(mozilla::Monitor&) (this=0x7ffff7feae40, aMonitor=...) at ../../dist/include/mozilla/Monitor.h:78
> #15 0x00007fffe2e1f8b2 in mozilla::net::ClosingService::ThreadFunc() (this=0x7fffd843c5a0) at /home/fitzgen/src/mozilla-central/netwerk/base/ClosingService.cpp:206
> #16 0x00007fffe2e35bed in mozilla::net::ClosingService::ThreadFunc(void*) (aClosure=0x7fffd843c5a0) at /home/fitzgen/src/mozilla-central/netwerk/base/ClosingService.h:52
> #17 0x00007ffff65e56f5 in _pt_root (arg=0x7fffd8449f00) at /home/fitzgen/src/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:212
> #18 0x00007ffff7bc7555 in start_thread (arg=0x7ffff7feb700) at pthread_create.c:333
> #19 0x00007ffff6e5cf3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb)
For anyone else hitting this issue, this revision works for me:

> commit 42ed9cb8a74b3f8b4636ad812cb60c6ee9cde4b7
> Author: Kan-Ru Chen <>
> Date:   Thu Sep 3 13:36:02 2015 +0800
>     Bug 1200498 - Clean up dom/browser-element mochitest.ini that has skip-if toolkit != gtk2 now that gtk3 is the default

Going to start bisecting.
Shu says he also hit this, but building with clang side stepped the problem. GCC only, apparently.
Bisect done!

760a84e7cf7fa49c889a5a17a5935d3ca1e02384 is the first bad commit
commit 760a84e7cf7fa49c889a5a17a5935d3ca1e02384
Author: Dragana Damjanovic <>
Date:   Thu Sep 10 19:07:00 2015 +0200

    Bug 1152046 - Make separate thread only for PRClose. r=mcmanus r=mayhemer
    extra : rebase_source : a4f4845023d6cebdd56d75b1ff7afd29447d2167

:040000 040000 f594d863ab973b3488ba662aa6f879466408b223 79061f13112b6b20fb4d7c9805dc6b799a738aed M	netwerk
:040000 040000 490e7544e5192d2256de88a2e4e86330ca3c73f5 b1f0ff4ef68f679cac4f7e1f84c069e064e8a54a M	toolkit
so I ran into something very similar just now trying to run the xpcshell netwerk/test/unit/test_post.js locally on linux x64 - almost the same stack.

I'll back that patch out
Note: I backed out bug 1152046 (again) locally, since I was insta-crashing with it in.  That lets me run, but testing crashes in the same place (jemalloc/internal/bitmap.h) when in e10s - but not in a non-e10s profile.  So it's very touchy and machine/perf/allocation/? dependent.
Note: both crash roughly the same place and same stack, but with the service code it starts it at startup (instacrash); with the patch backed out it doesn't crash until I use UDPSocket from webrtc (which we only do in e10s).
An ASAN build locally doesn't seem to hit this bug.  However, it appears heavily dependent on timing/order-of-allocation/layout/etc.
Only 16KB (4*4096) was allocated to the ClosingService thread, causing us to crash inside jemalloc when the stack overflowed.

This is one of the few callsites in the entire source code that even passes in an explicit stack size other than 0, the default stack size, which should be at least 64KB.

Just pass in 0 to PR_CreateThread here to get the bigger stack we need and avoid overflows.
Okay, slight modification... Instead of using default stack size, which can be much larger than needed, just double the current stack size of the ClosingService thread, which is sufficient to avoid the overflow still.
Attachment #8662123 - Flags: feedback+
double ClosingService thread stack size to avoid stack overflow

double ClosingService thread stack size to avoid stack overflow

(In reply to Mike Hommey [:glandium] from comment #16)
> Err, it was moved from there to here in bug 1152046, so there's only one.

so it sounds like this change should be backported to older channels using the previous location of createthread().. dragana can probably do that.
