Closed Bug 1399921 Opened 4 years ago Closed 4 years ago

MOZ_DIAGNOSTIC_ASSERT(arena->magic == ARENA_MAGIC); failure on startup (OS X local build)

Categories

(Core :: Memory Allocator, defect)

Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- fixed

People

(Reporter: kats, Assigned: glandium)

References

Details

(Keywords: regression)

Attachments

(4 files)

Attached file lldb output
I did a local m-c build (with some unrelated local patches) on OS X. When I ./mach run it immediately exits. Ran with lldb and got the attached output, which shows the arena->magic value is bogus in arena_malloc at http://searchfox.org/mozilla-central/rev/6326724982c66aaeaf70bb7c7ee170f7a38ca226/memory/build/mozjemalloc.cpp#3310
OS: Unspecified → Mac OS X
Version: unspecified → Trunk
Also, this is my mozconfig:

. $topsrcdir/browser/config/mozconfig

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-host-opt
mk_add_options AUTOCLOBBER=1

ac_add_options --enable-warnings-as-errors
# ac_add_options --enable-debug
# ac_add_options --enable-tests
ac_add_options --enable-optimize="-g"
ac_add_options --with-ccache=sccache
ac_add_options --disable-crashreporter
# ac_add_options --disable-install-strip
ac_add_options --enable-dump-painting
What version of OSX do you have, in case that is relevant?
I have OS X 10.11.6.

I tried backing out bug 1397101 but that didn't help.
Also for the record my base m-c rev is gecko-dev 65bd75b8f9a8d6b507138aa3a993ae608277b2ff (== hg cset dd6b788f149763c4014c27f2fe1a1d13228bda82)
Backing out the last two patches of bug 1399031 (on top of the backout of bug 1397101) does fix the problem. So presumably this is a regression from bug 1399031.

I only backed out the last two patches of bug 1399031 because those were the ones that touched mozjemalloc.cpp.
Blocks: 1399031
Flags: needinfo?(mh+mozilla)
Attached file libmozglue.dylib
Here's my libmozglue.dylib file as requested on IRC
In case it's compiler miscompilation here's the command line being used to compile mozjemalloc.cpp:

/Users/kats/.cargo/bin/sccache /usr/bin/clang++ -std=gnu++11 -o mozjemalloc.o -c -fvisibility=hidden -fvisibility-inlines-hidden -DNDEBUG=1 -DTRIMMED=1 -DMOZ_MEMORY_IMPL -DMOZ_HAS_MOZGLUE -I/Users/kats/zspace/mozilla-wr/memory/build -I/Users/kats/zspace/mozilla-wr/obj-host-opt/memory/build  -I/Users/kats/zspace/mozilla-wr/obj-host-opt/dist/include  -I/Users/kats/zspace/mozilla-wr/obj-host-opt/dist/include/nspr -I/Users/kats/zspace/mozilla-wr/obj-host-opt/dist/include/nss       -fPIC  -DMOZILLA_CLIENT -include /Users/kats/zspace/mozilla-wr/obj-host-opt/mozilla-config.h -MD -MP -MF .deps/mozjemalloc.o.pp -Qunused-arguments   -Qunused-arguments -Wall -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wloop-analysis -Wc++11-compat-pedantic -Wc++14-compat -Wc++14-compat-pedantic -Wc++1z-compat -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wno-gnu-zero-variadic-macro-arguments -Wformat-security -Wno-unknown-warning-option -Wno-return-type-c-linkage -fno-exceptions -fno-strict-aliasing -stdlib=libc++ -fno-rtti -fno-exceptions -fno-math-errno -pthread -pipe  -g -g -fno-omit-frame-pointer  -Werror -Wno-unused-function -Wno-error=uninitialized   /Users/kats/zspace/mozilla-wr/memory/build/mozjemalloc.cpp

and the compiler version:

kats@kgupta-air mozilla-wr$ /usr/bin/clang++ --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
This is a funny one.

malloc_init_hard(), which is run from a static initializer, runs before mozilla::detail::ThreadLocalKeyStorage<arena_t*>::ThreadLocalKeyStorage(), which, in your quite not optimized build, is actually writing values to the static and thus already nulled out storage. So malloc_init_hard fills out the thread_arena, but the constructor for it actually runs after that...
Flags: needinfo?(mh+mozilla)
... and resets its value.
So, in fact, if you had run a debug build, you should have hit this assertion instead: https://hg.mozilla.org/mozilla-central/file/8e818b5e9b6b/mfbt/ThreadLocal.h#l215
Can you double check the attached patch makes it work for you?
Flags: needinfo?(bugmail)
Comment on attachment 8908394 [details]
Bug 1399921 - Register zone allocator independently, and delay jemalloc initialization on mac.

https://reviewboard.mozilla.org/r/180008/#review185204

::: commit-message-7fbd7:3
(Diff revision 1)
> +Bug 1399921 - Register zone allocator independently, and delay jemalloc initialization on mac. r?njn
> +
> +In bug 1361258, we uniformized the initialization sequence on mac, and

unified?
Comment on attachment 8908394 [details]
Bug 1399921 - Register zone allocator independently, and delay jemalloc initialization on mac.

https://reviewboard.mozilla.org/r/180008/#review185206

rs=me, assuming it works :)
Attachment #8908394 - Flags: review?(n.nethercote) → review+
(In reply to Nicholas Nethercote [:njn] from comment #14)
> Comment on attachment 8908394 [details]
> Bug 1399921 - Register zone allocator independently, and delay jemalloc
> initialization on mac.
> 
> https://reviewboard.mozilla.org/r/180008/#review185206
> 
> rs=me, assuming it works :)

I should probably add a note to the commit message that in practice, that's what we were doing for the replace-malloc case before bug 1361258.
Unfortunately it's still failing:

(lldb) run
Process 71877 launched: '/Users/kats/zspace/mozilla-wr/obj-host-opt/dist/Nightly.app/Contents/MacOS/firefox' (x86_64)
Process 71877 stopped
* thread #1: tid = 0x21e92f, 0x0000000100011385 libmozglue.dylib`arena_malloc(arena=0x00007fff7a1be000, size=83, zero=false) + 101 at mozjemalloc.cpp:3296, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100011385 libmozglue.dylib`arena_malloc(arena=0x00007fff7a1be000, size=83, zero=false) + 101 at mozjemalloc.cpp:3296
   3293	{
   3294
   3295		MOZ_ASSERT(arena);
-> 3296		MOZ_DIAGNOSTIC_ASSERT(arena->magic == ARENA_MAGIC);
   3297		MOZ_ASSERT(size != 0);
   3298		MOZ_ASSERT(QUANTUM_CEILING(size) <= arena_maxclass);
   3299
(lldb) p/x arena->magic
(uint32_t) $0 = 0x54485244
(lldb) p arenas[0]
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
Flags: needinfo?(bugmail)
Attached file libmozglue.dylib v2
Here's the libmozglue.dylib with the patch applied.
(In reply to Mike Hommey [:glandium] from comment #15)
> (In reply to Nicholas Nethercote [:njn] from comment #14)
> > Comment on attachment 8908394 [details]
> > Bug 1399921 - Register zone allocator independently, and delay jemalloc
> > initialization on mac.
> > 
> > https://reviewboard.mozilla.org/r/180008/#review185206
> > 
> > rs=me, assuming it works :)
> 
> I should probably add a note to the commit message that in practice, that's
> what we were doing for the replace-malloc case before bug 1361258.

Mmmmm actually it's not.
Can you try the last patch?
Assignee: nobody → mh+mozilla
Flags: needinfo?(bugmail)
(In reply to Mike Hommey [:glandium] from comment #10)
> So, in fact, if you had run a debug build, you should have hit this
> assertion instead:
> https://hg.mozilla.org/mozilla-central/file/8e818b5e9b6b/mfbt/ThreadLocal.
> h#l215

Confirmed. This is what I got from a try build without the patch:
Assertion failure: Storage<T>::initialized(), at /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ThreadLocal.h:215

And another try with the patch applied confirmed the patch works this time.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> Also, this is my mozconfig:
(snip)
> ac_add_options --enable-optimize="-g"

Note this is why this didn't reproduce out of the box on the noopt builds we have on automation (if you download one): memory/build is still built with optimizations when optimizations are disabled, but the assumed way to disable them is to use --disable-optimize, not to use --enable-optimize with flags that don't include -O.
Flags: needinfo?(bugmail)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/0e349b74bfc6
Register zone allocator independently, and delay jemalloc initialization on mac. r=njn
Confirmed that the latest patch works for me locally, thanks!
Depends on: 1400146
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/507703647781
Register zone allocator independently, and delay jemalloc initialization on mac. r=njn
https://hg.mozilla.org/mozilla-central/rev/507703647781
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.