Closed Bug 1466192 Opened 2 years ago Closed Last month

In MinGW x64 --enable-jemalloc crashes Firefox on startup when --enable-sandbox is enabled

Categories

(Core :: Memory Allocator, defect, P5)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tjr, Assigned: tjr)

References

Details

IIUC, arena_dalloc is crashing on `auto arena = chunk->arena`? I bet we're trying to free() something that didn't come from jemalloc...
It sounds like my patch in Bug 1460720 didn't quite do the job. And bad luck is that the sandbox is the only thing that exposed it...
Does aPtr look like anything that might have come from the Windows heap? (!heap -a)
(In reply to David Major [:dmajor] from comment #3)
> Does aPtr look like anything that might have come from the Windows heap?
> (!heap -a)

That is indeed the case.  I imagine that my mapping in Bug 1460720 fixed the compilation issue but did not actually take effect in changing what function gets called when _aligned_malloc gets called. I sent https://treeherder.mozilla.org/#/jobs?repo=try&revision=c251b4309d4c2922dbc7600bae4695a2ac3a1f9f in to trick MinGW to not map _aligned_malloc -> mm_malloc (which I imagine will fall through the Windows allocator).

If that makes this problem go away; then it will confirm that I need to redo Bug 1460720
Hm. That did not work - still see the same problem. Looks Like I'm going to need to do gflags ust afterall...
Alright, I narrowed this down to a function inside winpthreads.  Either _pthread_once_raw or pthread_once (doesn't matter.)

We alloc an object in enterOnceObject (https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-libraries/winpthreads/src/thread.c#l543 ) using calloc - which flows into MSVC's implementation and we free it in leaveOnceObject ( https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-libraries/winpthreads/src/thread.c#l570 ) which goes into jemalloc's free.
Not sure if that comes as a surprise or not but I need to compile with --disable-jemalloc for 32bit Windows, too. Otherwise the build crashes during start-up. I am not convinced yet that that's due to the sandbox, but barring further investigation I'll leave that note here at least.
(In reply to Georg Koppen from comment #7)
> Not sure if that comes as a surprise or not but I need to compile with
> --disable-jemalloc for 32bit Windows, too. Otherwise the build crashes
> during start-up. I am not convinced yet that that's due to the sandbox, but
> barring further investigation I'll leave that note here at least.

I had thought x86 worked; but I may have gotten builds mixed up. There's no clear reason why it should work and x64 not so I probably got mixed up. I'll confirm next week.

I think this crash is not really because of the sandbox per-se, but because the sandbox is doing things early or odd in a way that omitting the sandbox does not.
Documenting in this bug - some time ago Jacek suggested two things to try:

1) Replacing problematic calloc with malloc (to see if it's calloc-specific)


2) Marking them as dllimport, which may lead to weird linker situation, using the below patch:

> diff --git a/mingw-w64-headers/crt/malloc.h b/mingw-w64-headers/crt/malloc.h
> index 9d75ea6f..33a84753 100644
> --- a/mingw-w64-headers/crt/malloc.h
> +++ b/mingw-w64-headers/crt/malloc.h
> @@ -73,10 +73,10 @@ extern "C" {
>  
>  #ifndef _CRT_ALLOCATION_DEFINED
>  #define _CRT_ALLOCATION_DEFINED
> -  void *__cdecl calloc(size_t _NumOfElements,size_t _SizeOfElements);
> -  void __cdecl free(void *_Memory);
> -  void *__cdecl malloc(size_t _Size);
> -  void *__cdecl realloc(void *_Memory,size_t _NewSize);
> +  _CRTIMP void *__cdecl calloc(size_t _NumOfElements,size_t _SizeOfElements);
> +  _CRTIMP void __cdecl free(void *_Memory);
> +  _CRTIMP void *__cdecl malloc(size_t _Size);
> +  _CRTIMP void *__cdecl realloc(void *_Memory,size_t _NewSize);
>    _CRTIMP void *__cdecl _recalloc(void *_Memory,size_t _Count,size_t _Size);
>  
>  #ifdef __DO_ALIGN_DEFINES
> diff --git a/mingw-w64-headers/crt/stdlib.h b/mingw-w64-headers/crt/stdlib.h
> index 4d9c1d96..be2ed4b7 100644
> --- a/mingw-w64-headers/crt/stdlib.h
> +++ b/mingw-w64-headers/crt/stdlib.h
> @@ -498,10 +498,10 @@ float __cdecl __MINGW_NOTHROW strtof(const char * __restrict__ _Str,char ** __re
>  
>  #ifndef _CRT_ALLOCATION_DEFINED
>  #define _CRT_ALLOCATION_DEFINED
> -  void *__cdecl calloc(size_t _NumOfElements,size_t _SizeOfElements);
> -  void __cdecl free(void *_Memory);
> -  void *__cdecl malloc(size_t _Size);
> -  void *__cdecl realloc(void *_Memory,size_t _NewSize);
> +  _CRTIMP void *__cdecl calloc(size_t _NumOfElements,size_t _SizeOfElements);
> +  _CRTIMP void __cdecl free(void *_Memory);
> +  _CRTIMP void *__cdecl malloc(size_t _Size);
> +  _CRTIMP void *__cdecl realloc(void *_Memory,size_t _NewSize);
>    _CRTIMP void *__cdecl _recalloc(void *_Memory,size_t _Count,size_t _Size);
>  /* Make sure that X86intrin.h doesn't produce here collisions.  */
>  #if (!defined (_XMMINTRIN_H_INCLUDED) && !defined (_MM_MALLOC_H_INCLUDED)) || defined(_aligned_malloc)


The calloc -> malloc+memset had the same symptoms. https://treeherder.mozilla.org/#/jobs?repo=try&revision=4bef63861874e5736720b6668739f6e51cff7746

The patch on the opt build also had the same symptoms and actually broke the debug build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5670131759988140592adac2e70ed650d39f23c2&selectedJob=181914362
Can you share the full stack of the problematic malloc (originally calloc)?
I debugged this build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e0f446ca19c8ad518c629ff36a0e0deebce55ce

It seems like we're no longer crashing as a result of the allocation in enterOnceObject - but I'm having difficulty figuring out where are crashing from now.  (Analysis at https://ritter.vg/misc/transient/1466192-2.txt)

I'm sure it matters though - we're definetly allocating through the OS allocator while trying to free through jemalloc.  Just need to figure out how to fix that...
Going back to double checking assumptions. This seems to still crash if I --enable-sandbox but set MOZ_DISABLE_CONTENT_SANDBOX.  Going to try --disable-sandbox again....
Okay, I have a very clean build that right away in XRE_Main calls malloc/free and then calloc/free: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9288ba3ce2e8fb7e8f2802ca6be0bc28c7c343f6

Crashes on the calloc/free. So calloc definetly isn't being redirected the same way malloc is.  I looked through MinGW and -esr60 and don't see any obvious differences between malloc/calloc that stand out to me. 

(I did notice that base_calloc doesn't use a CheckedInt: https://searchfox.org/mozilla-central/rev/881a3c5664ede5e08ee986d76433bc5c4b5680e6/memory/build/mozjemalloc.cpp#1478 )

:dmajor - is there anything can can be examined in terms of imports/exports from libraries? I realize that won't give the answer _why_ this is happening, but I'm wondering if there's anything more assertive that can be learned in terms of what _is_ happen that could guide us to why?
Flags: needinfo?(dmajor)
See Also: → 1460720
+  char* d = (char*)calloc(10, 10);
+  MOZ_ASSERT(true, "About to free");
+  free(a);

I think you want `free(d)`. :-)
So after fixing my double-free, it no longer crashes in main - meaning that the calloc/free in main works and implying that the calloc there is going to the write calloc and we're not mis-matching them.

But later I see the same crash from the mismatched calloc/free.  It seems like calls to calloc from mingw libraries aren't being sent to the right calloc?  (Or maybe they're supposed to go to the system calloc, and the free's aren't going to the system free?)
(In reply to Tom Ritter [:tjr] from comment #15)
> So after fixing my double-free, it no longer crashes in main - meaning that
> the calloc/free in main works and implying that the calloc there is going to
> the write calloc and we're not mis-matching them.
> 
> But later I see the same crash from the mismatched calloc/free.  It seems
> like calls to calloc from mingw libraries aren't being sent to the right
> calloc?  (Or maybe they're supposed to go to the system calloc, and the
> free's aren't going to the system free?)

Do you have a link to a build that exhibits this?
Flags: needinfo?(dmajor)

Clearing out some old bugmail... is this bug still an issue?

It is; but I haven't investigated it in a while because I've been focusing on mingw-clang.

Priority: -- → P5

mingw-gcc is dead, long live mingw-clang

Status: NEW → RESOLVED
Closed: Last month
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.