Last Comment Bug 804303 - (replace-malloc) Add malloc instrumentation in memory/build
(replace-malloc)
: Add malloc instrumentation in memory/build
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: General (show other bugs)
: unspecified
: All All
: -- normal with 2 votes (vote)
: mozilla20
Assigned To: Mike Hommey [:glandium]
:
Mentors:
Depends on: 805855 815071 819256
Blocks: 811483 1004359 704240 NewDMD 811497 813214 813843 814498 815681 818922 999913 1080341
  Show dependency treegraph
 
Reported: 2012-10-22 13:00 PDT by Benoit Jacob [:bjacob] (mostly away)
Modified: 2014-10-08 21:50 PDT (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed
fixed


Attachments
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now (9.86 KB, patch)
2012-10-22 13:04 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now (9.87 KB, patch)
2012-10-22 13:18 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
Alternative implementation (12.52 KB, patch)
2012-10-23 03:33 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
interdiff (+++glandium ---bjacob) (14.42 KB, text/plain)
2012-10-24 20:09 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Alternative implementation (23.70 KB, patch)
2012-10-31 04:21 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (36.56 KB, patch)
2012-11-02 10:21 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (26.11 KB, patch)
2012-11-02 10:33 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (36.59 KB, patch)
2012-11-06 08:06 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (38.65 KB, patch)
2012-11-08 02:25 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (36.38 KB, patch)
2012-11-08 02:26 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (36.38 KB, patch)
2012-11-08 04:52 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
some tweaks (6.04 KB, patch)
2012-11-14 00:11 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (40.34 KB, patch)
2012-11-16 00:14 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (38.97 KB, patch)
2012-11-16 00:20 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
tweak: pass a struct containing all malloc functions (7.90 KB, patch)
2012-11-18 19:10 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
tweak: pass a struct containing all malloc functions (9.86 KB, patch)
2012-11-18 20:45 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (45.73 KB, patch)
2012-11-19 10:15 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (44.86 KB, patch)
2012-11-19 10:59 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
fix build of c++ replace_malloc libs (1.67 KB, patch)
2012-11-20 06:10 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (54.19 KB, patch)
2012-11-21 03:15 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (39.04 KB, patch)
2012-11-21 03:19 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (51.91 KB, patch)
2012-11-26 05:24 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (43.04 KB, patch)
2012-11-26 05:26 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc (6.01 KB, patch)
2012-11-26 05:27 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 4 - Allow to load a replace-malloc lib from android builds (1.03 KB, patch)
2012-11-26 05:28 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (51.91 KB, patch)
2012-11-26 09:48 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (43.80 KB, patch)
2012-11-26 09:49 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc (8.60 KB, patch)
2012-11-26 09:50 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 4 - Set environment variables earlier on Android (4.26 KB, patch)
2012-11-26 12:12 PST, Mike Hommey [:glandium]
blassey.bugs: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑b2g18+
Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (52.06 KB, patch)
2012-11-27 07:44 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (44.61 KB, patch)
2012-11-27 07:47 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc (8.60 KB, patch)
2012-11-27 07:48 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (52.10 KB, patch)
2012-12-03 03:05 PST, Mike Hommey [:glandium]
justin.lebar+bug: review+
Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (45.58 KB, patch)
2012-12-03 03:06 PST, Mike Hommey [:glandium]
justin.lebar+bug: review+
Details | Diff | Review
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer. (54.69 KB, patch)
2012-12-05 01:46 PST, Mike Hommey [:glandium]
khuey: review+
justin.lebar+bug: approval‑mozilla‑aurora+
justin.lebar+bug: approval‑mozilla‑b2g18+
Details | Diff | Review
part 2 - Add ability to dynamically replace or supplement jemalloc implementation. (45.70 KB, patch)
2012-12-05 01:46 PST, Mike Hommey [:glandium]
khuey: review+
justin.lebar+bug: approval‑mozilla‑aurora+
justin.lebar+bug: approval‑mozilla‑b2g18+
Details | Diff | Review
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc (8.61 KB, patch)
2012-12-05 01:47 PST, Mike Hommey [:glandium]
khuey: review+
justin.lebar+bug: approval‑mozilla‑aurora+
justin.lebar+bug: approval‑mozilla‑b2g18+
Details | Diff | Review

Description Benoit Jacob [:bjacob] (mostly away) 2012-10-22 13:00:33 PDT
The kind of instrumentation that I want to add there is basically what's needed to replace nsTraceMalloc. I think that that's better done in memory/build than in nsTraceMalloc for the following reasons:
 - removes incompatibility with jemalloc
 - can intercept all allocations, no matter how early they're done during startup (see bug 717853 comment 18).
 - allows for a more efficient implementation (doubly linked list instead of hash table).
 - removes concerns about hash table collisions (in current impl of nsTraceMalloc, any addresses differing by a multiple of 2^32 == 4G would collide; so the only thing making collisions unlikely is that most segments are concentrated in a 4G span of address space, which is a little uncomfortable)

We don't want to instrument jemalloc itself because we want to keep using the original jemalloc, but mozilla/build seems like an appropriate place for such instrumentation.
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2012-10-22 13:04:56 PDT
Created attachment 673979 [details] [diff] [review]
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now

The current patch assumes (and automatically enables) jemalloc 3. This is not essential at all, but extending it to non-jemalloc-3 is not a priority as that will soon be our default.

It adds a --enable-malloc-instrumentation option which enables jemalloc 3 and disables the ability to use any native jemalloc, and enables prefixing of public jemalloc functions with "real_jemalloc_". We then implement, in memory/build/instrumentation.c, the un-prefixed public malloc functions, calling the real_jemalloc_ ones.
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2012-10-22 13:18:57 PDT
Created attachment 673982 [details] [diff] [review]
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now
Comment 3 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-10-22 22:19:05 PDT
Comment on attachment 673982 [details] [diff] [review]
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now

Review of attachment 673982 [details] [diff] [review]:
-----------------------------------------------------------------

This is really cool.  It'll make NewDMD (bug 717853) simpler;  currently I have just copied all the malloc/free interception code from trace-malloc, which isn't too bad on Mac and Linux but is pretty horrible on Windows.

Being able to call real_jemalloc_malloc() et al is also really nice;  currently trace-malloc has to set a "suppress tracing into malloc" flag every time it wants to do a malloc for itself, which is a pain.

You mentioned a doubly-linked list -- where is that?

I don't understand how this will be used in a tool like DMD or trace-malloc -- where do they hook in?  Do they provide their own functions like this:

  MOZ_EXPORT_API(void*)
  malloc(size_t size)
  {
    void *p = real_jemalloc_function(malloc)(size);
    // do stuff with p
    return p;
  }

::: memory/build/instrumentation.c
@@ +12,5 @@
> +
> +#include "mozilla/Types.h"
> +#include "jemalloc_types.h"
> +
> +#define real_jemalloc_function(a) real_jemalloc_ ## a

This is used a lot -- how about |real| for the name instead?
Comment 4 Mike Hommey [:glandium] 2012-10-22 23:23:09 PDT
Comment on attachment 673982 [details] [diff] [review]
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now

Review of attachment 673982 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +1697,5 @@
> +[  --enable-malloc-instrumentation     Enable malloc instrumentation; also enables jemalloc 3 (default=no)],
> +    MOZ_MALLOC_INSTRUMENTATION=1,
> +    MOZ_MALLOC_INSTRUMENTATION= )
> +if test -n "$MOZ_MALLOC_INSTRUMENTATION"; then
> +    MOZ_JEMALLOC=1

You want MOZ_MEMORY too.

@@ +7048,5 @@
>  fi
>  
>  if test -z "$MOZ_MEMORY"; then
>    if test -n "$MOZ_JEMALLOC"; then
> +    if test -n "$MOZ_MALLOC_INSTRUMENTATION"; then

you might as well do both tests on the same line and avoid the AC_CHECK_FUNCS altogether.

@@ +8997,2 @@
>    ac_configure_args="$_SUBDIR_CONFIG_ARGS --build=$build --host=$target --enable-stats --with-jemalloc-prefix=je_"
> +  MANGLE="malloc calloc valloc free realloc posix_memalign memalign aligned_alloc malloc_usable_size mallctl mallctlnametomib mallctlbymib"

Why unmangle mallctl, mallctlnametomib and mallctlbymib, and wrap them ? They're not allocation functions.

@@ +8997,5 @@
>    ac_configure_args="$_SUBDIR_CONFIG_ARGS --build=$build --host=$target --enable-stats --with-jemalloc-prefix=je_"
> +  MANGLE="malloc calloc valloc free realloc posix_memalign memalign aligned_alloc malloc_usable_size mallctl mallctlnametomib mallctlbymib"
> +
> +  if test -n "$MOZ_MALLOC_INSTRUMENTATION"; then
> +    JEMALLOC_WRAPPER="real_jemalloc_"

I agree with nick, it's a long prefix.

@@ +9004,3 @@
>    fi
> +  if test -n "$_WRAP_MALLOC"; then
> +    JEMALLOC_WRAPPER="${JEMALLOC_WRAPPER}__wrap_"

if you're mangling as real_*, you don't need to add __wrap_ in the picture. However, you do need to call *your* wrappers __wrap_*.

::: memory/build/instrumentation.c
@@ +44,5 @@
> +
> +/* define the instrumented malloc functions that we want to expose unprefixed */
> +
> +MOZ_EXPORT_API(void*)
> +malloc(size_t size)

You need all these functions to be called:
- je_* on mac and windows
- __wrap_* on android

The additional problem is that this won't work on mac, because jemalloc is registering itself as the zone allocator. You need to hijack that with your own zone allocator. See memory/jemalloc/src/src/zone.c. One part of the fun is that jemalloc does it in a static initializer, and you need to do your own registration after that, not before ; otherwise jemalloc is going to hijack you. That being said, maybe it would make sense to have jemalloc not register itself when the default zone allocator is not the system default zone allocator... that would simplify our own hooking. Let me bring this up to Jason.
Comment 5 Mike Hommey [:glandium] 2012-10-23 00:24:56 PDT
(In reply to Nicholas Nethercote [:njn] from comment #3)
> I don't understand how this will be used in a tool like DMD or trace-malloc
> -- where do they hook in?  Do they provide their own functions like this:
> 
>   MOZ_EXPORT_API(void*)
>   malloc(size_t size)
>   {
>     void *p = real_jemalloc_function(malloc)(size);
>     // do stuff with p
>     return p;
>   }

What might work better for us would be to do something like:
MOZ_EXPORT_API(void*)
malloc(size_t size)
{
  void *p = real_jemalloc_function(malloc)(size);
  instrument_malloc(size, p);
  return p;
}

Where instrument_* can be provided by different (and exclusive) things, somewhere else (but that still needs to be linked in mozglue).
Comment 6 Mike Hommey [:glandium] 2012-10-23 00:27:02 PDT
or, even better:
if (instrument_malloc)
  instrument_malloc(size, p);

where instrument_malloc is provided by a LD_PRELOAD library (or DYLD_INSERT_LIBRARIES library, or whatever it is on windows)
Comment 7 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-10-23 00:33:53 PDT
> or, even better:
> if (instrument_malloc)
>   instrument_malloc(size, p);

Yes, that's more like what I was expecting.
Comment 8 Mike Hommey [:glandium] 2012-10-23 02:07:31 PDT
Comment on attachment 673982 [details] [diff] [review]
Add --enable-malloc-instrumentation stub, not doing anything, assuming jemalloc 3 for now

Review of attachment 673982 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/instrumentation.c
@@ +17,5 @@
> +
> +/* import the real jemalloc functions */
> +
> +MOZ_IMPORT_API(void*)
> +real_jemalloc_function(malloc)(size_t size);

In fact, you can #include "jemalloc/jemalloc.h" and you'll get these definitions. (you'll need to add LOCAL_INCLUDES += -I../jemalloc/src/include to the Makefile.in, though)
Comment 9 Mike Hommey [:glandium] 2012-10-23 02:08:50 PDT
(In reply to Mike Hommey [:glandium] from comment #8)
> In fact, you can #include "jemalloc/jemalloc.h" and you'll get these
> definitions. (you'll need to add LOCAL_INCLUDES += -I../jemalloc/src/include
> to the Makefile.in, though)

(Don't worry about jemalloc.h declaring je_* functions, that's re#defined in jemalloc_defs.h.
Comment 10 Mike Hommey [:glandium] 2012-10-23 03:33:38 PDT
Created attachment 674177 [details] [diff] [review]
Alternative implementation

It turns out I needed something similar too, so I went a bit further. This removes the whole unmangling problem completely, using je_ prefix for all jemalloc functions, and changing the facing functions for windows. I didn't add mac support yet.
Comment 11 Mike Hommey [:glandium] 2012-10-23 03:36:44 PDT
(In reply to Mike Hommey [:glandium] from comment #10)
> Created attachment 674177 [details] [diff] [review]
> Alternative implementation
> 
> It turns out I needed something similar too, so I went a bit further. This
> removes the whole unmangling problem completely, using je_ prefix for all
> jemalloc functions, and changing the facing functions for windows. I didn't
> add mac support yet.

Note that i expect this to fail to build on windows, as the instrument_* functions are defined nowhere.
Comment 12 Mike Hommey [:glandium] 2012-10-23 05:00:00 PDT
I'm not convinced it's useful to wrap malloc_usable_size.
Comment 13 Benoit Jacob [:bjacob] (mostly away) 2012-10-23 08:49:26 PDT
Thanks for all the feedback will reply ASAP.

Just replying to Nick's questions for now:

(In reply to Nicholas Nethercote [:njn] from comment #3)
> You mentioned a doubly-linked list -- where is that?

The patch here is just a _stub_, it only has stubs for instumented malloc functions, it does not yet have anything nontrivial like the linked list stuff.

That will come by adapting the existing jemalloc patch,
https://bug704240.bugzilla.mozilla.org/attachment.cgi?id=668730

So the stubs you see in the present patch will be expanded to allocate extra space to store doubly linked list elements, backtraces, etc.


> 
> I don't understand how this will be used in a tool like DMD or trace-malloc
> -- where do they hook in?  Do they provide their own functions like this:
> 
>   MOZ_EXPORT_API(void*)
>   malloc(size_t size)
>   {
>     void *p = real_jemalloc_function(malloc)(size);
>     // do stuff with p
>     return p;
>   }

No, the idea is that the second patch I'll attach to this bug, once the first one settles, providing the actual instrumentation, will offer an API to iterate over the linked list and query block metadata. See the above-linked jemalloc patch for a taste of what it could be like. See je_dump_list_of_blocks for a simple usage example.

> ::: memory/build/instrumentation.c
> @@ +12,5 @@
> > +
> > +#include "mozilla/Types.h"
> > +#include "jemalloc_types.h"
> > +
> > +#define real_jemalloc_function(a) real_jemalloc_ ## a
> 
> This is used a lot -- how about |real| for the name instead?

Yes, and this will also make it more independent of whether the underlying allocator is jemalloc.
Comment 14 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-10-23 10:08:23 PDT
> So the stubs you see in the present patch will be expanded to allocate extra
> space to store doubly linked list elements, backtraces, etc.

About the backtraces... in general, a tool can associate arbitrary metadata with each heap block.  For example, DMD records both the stack trace of the allocation point *and* the stack trace of the report site (or NULL if it hasn't been reported).  Will your code allow for this?

Also, DMD needs to quickly look up the metadata for each heap block.  So I currently have a hash table where the keys are the heap block pointers.  Will the double-linked list allow fast lookup like this?
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2012-10-23 10:30:20 PDT
(In reply to Nicholas Nethercote [:njn] from comment #14)
> > So the stubs you see in the present patch will be expanded to allocate extra
> > space to store doubly linked list elements, backtraces, etc.
> 
> About the backtraces... in general, a tool can associate arbitrary metadata
> with each heap block.  For example, DMD records both the stack trace of the
> allocation point *and* the stack trace of the report site (or NULL if it
> hasn't been reported).  Will your code allow for this?

It seems that I was underestimating how much the needs of different tools could vary.

Let's find the right level of generality that will be just enough to cover the needs of the two tools at hand here --- DMD and the reference analysis stuff.

Can DMD's need be addressed by the approach of allocating extra space for each block, letting the tool control how much extra space is allocated, storing doubly linked list pointers there, and letting the tool provide its own hook that will be called to let it instrument new blocks in whatever way it wants?

In other words can DMD be satisfied with just being able to control:
 - the amount of extra space that needs to be allocated for each block
 - the function that needs to be called to instrument a new block

?

> 
> Also, DMD needs to quickly look up the metadata for each heap block.  So I
> currently have a hash table where the keys are the heap block pointers. 
> Will the double-linked list allow fast lookup like this?

It should be much faster even, since the cost of a hash table lookup would be replaced by just adding a (negative) offset to your point to find where the metadata is stored, since the metadata will be stored at a constant offset just before the block payload.
Comment 16 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-10-23 10:58:17 PDT
> In other words can DMD be satisfied with just being able to control:
>  - the amount of extra space that needs to be allocated for each block
>  - the function that needs to be called to instrument a new block
> 
> ?

That should suffice, yes.


> It should be much faster even, since the cost of a hash table lookup would
> be replaced by just adding a (negative) offset to your point to find where
> the metadata is stored, since the metadata will be stored at a constant
> offset just before the block payload.

Great!  Presumably we can iterate over this list at any point?
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2012-10-23 11:15:45 PDT
(In reply to Nicholas Nethercote [:njn] from comment #16) 
> Great!  Presumably we can iterate over this list at any point?

Yes: from the payload pointer we get the doubly linked list pointers as part of the metadata, just by adding a constant offset; using these doubly linked list pointers we can iterate.
Comment 18 Mike Hommey [:glandium] 2012-10-24 00:58:20 PDT
(In reply to Benoit Jacob [:bjacob] from comment #13)
> > I don't understand how this will be used in a tool like DMD or trace-malloc
> > -- where do they hook in?  Do they provide their own functions like this:
> > 
> >   MOZ_EXPORT_API(void*)
> >   malloc(size_t size)
> >   {
> >     void *p = real_jemalloc_function(malloc)(size);
> >     // do stuff with p
> >     return p;
> >   }
> 
> No, the idea is that the second patch I'll attach to this bug, once the
> first one settles, providing the actual instrumentation

I'd really prefer something like the patch I attached, because it allows interchangeable instrumentations and allows one-offs like what I'm doing locally on top of that patch.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2012-10-24 20:01:30 PDT
(In reply to Mike Hommey [:glandium] from comment #4)
> Why unmangle mallctl, mallctlnametomib and mallctlbymib, and wrap them ?
> They're not allocation functions.

We have to wrap at least mallctl and mallctlbymib because the wrapped pointers are going to differ from the underlying real_ pointers.

Example: the user calls malloc(100).
Our instrumentation adds 64 bytes per block, so we call real_malloc(164). This returns the pointer 0xf000. We reserve the 64 = 0x40 first bytes for ourselves and therefore our malloc(100) actually returns 0xf040.

Now if the user calls mallctl to query the length of the block at 0xf040, we have to wrap that call to call real_mallctl on 0xf000 instead of 0xf040. Right?

> 
> @@ +9004,3 @@
> >    fi
> > +  if test -n "$_WRAP_MALLOC"; then
> > +    JEMALLOC_WRAPPER="${JEMALLOC_WRAPPER}__wrap_"
> 
> if you're mangling as real_*, you don't need to add __wrap_ in the picture.
> However, you do need to call *your* wrappers __wrap_*.

I'd like this base prefix (je_ or __wrap_ or nothing) to be set somewhere once and for all rather than having to duplicate a nontrivial switch like this.

> 
> ::: memory/build/instrumentation.c
> @@ +44,5 @@
> > +
> > +/* define the instrumented malloc functions that we want to expose unprefixed */
> > +
> > +MOZ_EXPORT_API(void*)
> > +malloc(size_t size)
> 
> You need all these functions to be called:
> - je_* on mac and windows
> - __wrap_* on android

Understood. Again I'd like this prefix to be set once and for all, presumably in configure.in.

> 
> The additional problem is that this won't work on mac, because jemalloc is
> registering itself as the zone allocator. You need to hijack that with your
> own zone allocator. See memory/jemalloc/src/src/zone.c. One part of the fun
> is that jemalloc does it in a static initializer, and you need to do your
> own registration after that, not before ; otherwise jemalloc is going to
> hijack you. That being said, maybe it would make sense to have jemalloc not
> register itself when the default zone allocator is not the system default
> zone allocator... that would simplify our own hooking. Let me bring this up
> to Jason.

Since I barely have time to work at all on this during the week days, I'll have to leave this to you. It's OK for me if the initial pass doesn't work on Mac because of this.

(In reply to Mike Hommey [:glandium] from comment #5)
> (In reply to Nicholas Nethercote [:njn] from comment #3)
> > I don't understand how this will be used in a tool like DMD or trace-malloc
> > -- where do they hook in?  Do they provide their own functions like this:
> > 
> >   MOZ_EXPORT_API(void*)
> >   malloc(size_t size)
> >   {
> >     void *p = real_jemalloc_function(malloc)(size);
> >     // do stuff with p
> >     return p;
> >   }
> 
> What might work better for us would be to do something like:
> MOZ_EXPORT_API(void*)
> malloc(size_t size)
> {
>   void *p = real_jemalloc_function(malloc)(size);
>   instrument_malloc(size, p);
>   return p;
> }
> 
> Where instrument_* can be provided by different (and exclusive) things,
> somewhere else (but that still needs to be linked in mozglue).

This is not quite enough, as it still relies on the assumption that the return value of malloc will be the same as that of real_malloc. See above example: the return value of malloc was 0xf040 when that of real_malloc was 0xf000 because the first 0x40 bytes were reserved by the instrumentation code for... instrumentation.

But, we all agree by now that we want to let this sort of flexibility to the user of this framework, so that they can provide their own instrument_malloc() function. I'm just saying that there is a second parameter that we must let them control: the amount of extra space that they want to reserve for each block.

(In reply to Mike Hommey [:glandium] from comment #8)
> Comment on attachment 673982 [details] [diff] [review]
> Add --enable-malloc-instrumentation stub, not doing anything, assuming
> jemalloc 3 for now
> 
> Review of attachment 673982 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: memory/build/instrumentation.c
> @@ +17,5 @@
> > +
> > +/* import the real jemalloc functions */
> > +
> > +MOZ_IMPORT_API(void*)
> > +real_jemalloc_function(malloc)(size_t size);
> 
> In fact, you can #include "jemalloc/jemalloc.h" and you'll get these
> definitions. (you'll need to add LOCAL_INCLUDES += -I../jemalloc/src/include
> to the Makefile.in, though)

Will that work even though jemalloc.h only exists in the object directory and does not exist in the source directory? I though I had tried that, and that failed for that reason.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2012-10-24 20:09:46 PDT
Created attachment 674958 [details]
interdiff (+++glandium ---bjacob)

So, bugzilla's interdiff feature is broken (at least it display a discouraging red warning) so here's a homemade interdiff.
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2012-10-24 20:31:21 PDT
Comment on attachment 674177 [details] [diff] [review]
Alternative implementation

Review of attachment 674177 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +8987,5 @@
>  # Run jemalloc configure script
>  
>  if test -z "$MOZ_NATIVE_JEMALLOC" -a "$MOZ_JEMALLOC" -a "$MOZ_MEMORY" ; then
> +
> +  # XXX brutally cut things here. If that's not OK, please explain why.

You may want to remove this comment now that you've looked at it.

@@ +8993,5 @@
>    ac_configure_args="$_SUBDIR_CONFIG_ARGS --build=$build --host=$target --enable-stats --with-jemalloc-prefix=je_"
> +  if test -z "$MOZ_MALLOC_INSTRUMENTATION"; then
> +    case "$OS_ARCH" in
> +      Linux|DragonFly|FreeBSD|NetBSD|OpenBSD)
> +        MANGLE="malloc calloc valloc free realloc posix_memalign"

See my above comment: at least for the needs of the particular tool that I would like to port, I need to wrap every function that either takes or returns any malloc'd pointer. That is, again, because my wrapped blocks will be at a nonzero offset into the real underlying blocks.

@@ +8996,5 @@
> +      Linux|DragonFly|FreeBSD|NetBSD|OpenBSD)
> +        MANGLE="malloc calloc valloc free realloc posix_memalign"
> +        case "$OS_ARCH" in
> +          Linux)
> +            MANGLE="$MANGLE memalign malloc_usable_size"

For the above reasons, I would need to unconditionally wrap malloc_usable_size.

Also: can you (re-)explain why we can't just unconditionally wrap even functions that may not exist on a given platform? What's the problem with that? Would make this configure.in simpler and less error-prone.

::: memory/build/Makefile.in
@@ +23,5 @@
> +ifdef MOZ_MALLOC_INSTRUMENTATION
> +DEFINES += -DMOZ_MALLOC_INSTRUMENTATION
> +CSRCS += instrumentation.c
> +
> +LOCAL_INCLUDES += -I../jemalloc/src/include

<insert "sudden clarity Clarence" meme here>
So... the current working directory here is the current _object_ directory?

::: memory/build/extraMallocFuncs.c
@@ +10,5 @@
>  #elif defined(XP_WIN) || defined(XP_MACOSX)
> +#  if defined(MOZ_MALLOC_INSTRUMENTATION)
> +#    define wrap(a) wrap_ ## a
> +#  else
> +#    define wrap(a) je_ ## a

Please, could you a more explicit name than 'wrap'? When I read that code, I was confused about what that wrapping was about, and furthermore, the instrumentation ("wrapping") that we are adding in this bug will add another layer of possible confusion around here.

::: memory/build/instrumentation.c
@@ +36,5 @@
> +extern void instrument_realloc(void *, size_t, void *) WEAK;
> +extern void instrument_free(void *) WEAK;
> +extern void instrument_memalign(size_t, size_t, void *) WEAK;
> +extern void instrument_valloc(size_t, void *) WEAK;
> +extern void instrument_malloc_usable_size(void *, size_t) WEAK;

See above comment about why I need to instrument every function that takes or returns any malloc'd pointer, so that includes mallctl etc.

@@ +45,5 @@
> +  void *p = je_malloc(size);
> +  if (instrument_malloc)
> +    instrument_malloc(size, p);
> +  return p;
> +}

See above comment about why I need the ability to return a different pointer here, than what the real_malloc returned. And also, I was forgetting, I need the ability to pass a different size to real_malloc.

All I need is the ability to choose a "instrumentation size". Then this function could look like this:

MOZ_EXPORT_API(void*)
wrap(malloc)(size_t size)
{
  void *p = je_malloc(size + instrumentation_size);
  if (!p)
    return NULL;
  if (instrument_malloc)
    instrument_malloc(size, p);
  return (char*) p + instrumentation_size;
}

Finally, another comment: let's not hardcode je_ here, right? How about a macro to do this wrapping like in my patch?
Comment 22 Mike Hommey [:glandium] 2012-10-24 23:27:24 PDT
(In reply to Benoit Jacob [:bjacob] from comment #21)
> For the above reasons, I would need to unconditionally wrap
> malloc_usable_size.
> 
> Also: can you (re-)explain why we can't just unconditionally wrap even
> functions that may not exist on a given platform? What's the problem with
> that? Would make this configure.in simpler and less error-prone.

My patch wraps *all* functions as je_* when instrumentation is enabled, and doesn't change anything for the default case.

> ::: memory/build/Makefile.in
> @@ +23,5 @@
> > +ifdef MOZ_MALLOC_INSTRUMENTATION
> > +DEFINES += -DMOZ_MALLOC_INSTRUMENTATION
> > +CSRCS += instrumentation.c
> > +
> > +LOCAL_INCLUDES += -I../jemalloc/src/include
> 
> <insert "sudden clarity Clarence" meme here>
> So... the current working directory here is the current _object_ directory?

yes.
Comment 23 Mike Hommey [:glandium] 2012-10-25 01:29:09 PDT
So, thinking a bit more about it, I see three different use cases I'd like to address with the hooks we'd provide:
- Your instrumentation, which needs to add data around the allocated stuff.
- My instrumentation, which just needs the return values from the malloc implementation
- Hypothetical testing of alternate mallocs without rebuilding Firefox. That could allow to test newer versions of jemalloc, or entirely different allocation libraries.

To accommodate all these, I think we need something like this:
MOZ_EXPORT_API(void*)
wrap(malloc)(size_t size)
{
  if (!hook_malloc)
    return je_malloc(size);
  else
    return hook_malloc(size, je_malloc);
}

The hook_malloc function would be responsible for calling the malloc function it is given. Giving the malloc function also avoids possible linkage headaches for the library providing hook_malloc.
Comment 24 Benoit Jacob [:bjacob] (mostly away) 2012-10-25 07:17:06 PDT
(In reply to Mike Hommey [:glandium] from comment #23)
> So, thinking a bit more about it, I see three different use cases I'd like
> to address with the hooks we'd provide:
> - Your instrumentation, which needs to add data around the allocated stuff.
> - My instrumentation, which just needs the return values from the malloc
> implementation
> - Hypothetical testing of alternate mallocs without rebuilding Firefox. That
> could allow to test newer versions of jemalloc, or entirely different
> allocation libraries.

...And Nick's DMD. But he confirmed in coment 16 that DMD's needs were similar to those of my tool in this respect as long as we allow providing a custom hook.


> 
> To accommodate all these, I think we need something like this:
> MOZ_EXPORT_API(void*)
> wrap(malloc)(size_t size)
> {
>   if (!hook_malloc)
>     return je_malloc(size);
>   else
>     return hook_malloc(size, je_malloc);
> }
> 
> The hook_malloc function would be responsible for calling the malloc
> function it is given. Giving the malloc function also avoids possible
> linkage headaches for the library providing hook_malloc.

Yes, I was thinking exactly the same! Go for it.
Comment 25 Mike Hommey [:glandium] 2012-10-31 04:21:45 PDT
Created attachment 676970 [details] [diff] [review]
Alternative implementation

This still only works on Linux. This does some cleanup in how things are defined for wrapping without replacement, which also helps make things clearer in a replace_malloc implementation.

A replace_malloc implementation declares things like this:

  #include "malloc_replace.h"

  void *replace_malloc(size_t size, malloc_impl_t malloc_impl)
  {
    void *ptr = malloc_impl(size /* Possible change the size */);
    /* Do whatever you want on the ptr */
    return ptr;
  }

  (etc.)

Note there will be a need for some additional functions for the mac zone allocator, other than those declared in malloc_decls.h.

As for mallctl, i think you've confused it with malloc_usable_size. mallctl is not used to query pointer sizes.
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2012-10-31 05:27:42 PDT
Indeed I read xpcom/base/nsMemoryReporterManager.cpp and misunderstood it as reading certain block sizes; I now understand that what it reads isn't block sizes.
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2012-10-31 05:28:47 PDT
I mean jemalloc_stats.
Comment 28 Mike Hommey [:glandium] 2012-11-02 10:21:02 PDT
Created attachment 677806 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

This is an attempt at cleaning up how mozjemalloc and jemalloc3 are built and linked with the rest of mozglue, and how gecko code uses some of their functions.
It also simplifies things for part 2, which implements replacement/hook functions.

Try with jemalloc 3
https://tbpl.mozilla.org/?tree=Try&rev=39ee1c340b24

Try with mozjemalloc/trace-malloc (iow, normal builds)
https://tbpl.mozilla.org/?tree=Try&rev=a8f0a15861ea

Try with --disable-jemalloc
https://tbpl.mozilla.org/?tree=Try&rev=fa35958a6b60

I need to double check the resulting binaries, but at least the test suites don't show any obvious problems.
Comment 29 Mike Hommey [:glandium] 2012-11-02 10:33:39 PDT
Created attachment 677812 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

This is like attachment 676970 [details] [diff] [review] (see comment 25), and show work on both Linux and Windows. I still need to figure out a workable plan for mac (see below).

One shortcoming that i want to address is that you still need to rebuild libxul if you switch replace_malloc on/off, which i'd like to avoid. It also doesn't allow to hook jemalloc_stats yet.

Try with jemalloc 3
http://tbpl.mozilla.org/?tree=Try&rev=0e2e2492f2c3

Try with mozjemalloc/trace-malloc (iow, normal builds)
https://tbpl.mozilla.org/?tree=Try&rev=8b20f4252f2a

Try with --disable-jemalloc
https://tbpl.mozilla.org/?tree=Try&rev=2bcd420d80d8

Try with replace_malloc enabled
https://tbpl.mozilla.org/?tree=Try&rev=17920df6c79f

I need to double check the resulting binaries, but local builds did have the right behaviour, albeit, with slightly different patches.

The mac build failure is due to missing headers in malloc_replace.c for the zone allocator code, which is fixed in this patch. The resulting mac binary will fail to start, because of the missing replace_* symbols, and sadly dyld doesn't try to use the ones provided with DYLD_INSERT_LIBRARIES. It actually tries to resolve them before loading DYLD_INSERT_LIBRARIES (bummer). An alternative solution I used, that proved the code to work, is to lazy link against a prebuilt replace_malloc implementation library, but the resulting binary does still require the library at runtime.
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2012-11-02 10:37:37 PDT
(In reply to Mike Hommey [:glandium] from comment #29)
> One shortcoming that i want to address is that you still need to rebuild
> libxul if you switch replace_malloc on/off, which i'd like to avoid. It also
> doesn't allow to hook jemalloc_stats yet.

That would be very nice indeed. In fact, since the cost of a conditional branching should be utterly negligible compared to the cost of a malloc/free call, so maybe we can simply enable this unconditionally?

(Side note: these conditional branches are particularly easy for the CPU to predict since they are few and evaluate in the same way for the whole lifetime of the process).
Comment 31 Mike Hommey [:glandium] 2012-11-02 10:40:46 PDT
On, and the b2g panda failures are due to bug 799425.
Comment 32 Mike Hommey [:glandium] 2012-11-02 10:42:42 PDT
(In reply to Benoit Jacob [:bjacob] from comment #30)
> That would be very nice indeed. In fact, since the cost of a conditional
> branching should be utterly negligible compared to the cost of a malloc/free
> call, so maybe we can simply enable this unconditionally?

I thought about this, and I'm not entirely sure of the security implications that could have.
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2012-11-02 10:45:08 PDT
(In reply to Mike Hommey [:glandium] from comment #32)
> I thought about this, and I'm not entirely sure of the security implications
> that could have.

If an attacker was able to LD_PRELOAD his own library into our process, haven't we lost already, anyway?
Comment 34 Mike Hommey [:glandium] 2012-11-02 10:45:54 PDT
(In reply to Mike Hommey [:glandium] from comment #29)
> One shortcoming that i want to address is that you still need to rebuild
> libxul if you switch replace_malloc on/off, which i'd like to avoid. It also
> doesn't allow to hook jemalloc_stats yet.

On linux, however, switching from mozjemalloc to malloc_replace may work. (but not from jemalloc3)
Comment 35 Mike Hommey [:glandium] 2012-11-02 10:47:44 PDT
(In reply to Benoit Jacob [:bjacob] from comment #33)
> (In reply to Mike Hommey [:glandium] from comment #32)
> > I thought about this, and I'm not entirely sure of the security implications
> > that could have.
> 
> If an attacker was able to LD_PRELOAD his own library into our process,
> haven't we lost already, anyway?

I'm more worried about the windows side of things.
Comment 36 David Bolter [:davidb] 2012-11-02 10:59:12 PDT
CC'ing some people who may help answer comment 35 concerns.
Comment 37 Mike Hommey [:glandium] 2012-11-06 08:06:19 PST
Created attachment 678759 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Refreshed after the landing of bug 807112, and don't wrap strdup functions on osx, since the libc ones are going to call the zone allocator anyways.
Comment 38 Justin Lebar (not reading bugmail) 2012-11-06 12:17:53 PST
I may not be able to look at this until after the B2G work week.  Is that OK?
Comment 39 Mike Hommey [:glandium] 2012-11-06 13:16:13 PST
Comment on attachment 678759 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

I need to change some things anyway.
Comment 40 Mike Hommey [:glandium] 2012-11-08 02:25:51 PST
Created attachment 679596 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

This is not final. I'll reupdate after bug 805855 is done, since it adds a function that needs to be handled in this patch too.
Comment 41 Mike Hommey [:glandium] 2012-11-08 02:26:58 PST
Created attachment 679598 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

This is not final for the same reason as part 1. Also note this depends on jemalloc3 being updated to contain this patch:
http://www.canonware.com/pipermail/jemalloc-discuss/2012-November/000511.html
Comment 42 Mike Hommey [:glandium] 2012-11-08 04:52:43 PST
Created attachment 679631 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

With a fix for mac configure.
Comment 43 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 14:17:03 PST
Does this currently work on Android? I have a use for it, especially on Android, in bug 811483. I'm also interested in B2G if we have B2G crash reporting yet.
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 14:51:05 PST
I just filed two bugs, bug 811483 and bug 811497, that are about doing useful things whenever malloc returns null. I think that these should use the framework being developed here. This adds a design constraint: the wrapping must be done unconditionally, at least by default. If the possible security concerns about allowing to provide replace_malloc functions is still a concern, then this could be addressed by putting the replace_malloc inside a if() / #ifdef block that would check if malloc-instrumentation is enabled, but the part that does extra stuff when malloc returns 0 should always be able to run.

Like this (assuming we want the check for :

MOZ_EXPORT_API(void*)
wrap(malloc)(size_t size)
{
  void *retval;

  if (enable_malloc_instrumentation && replace_malloc)
    retval = replace_malloc(size, je_malloc);
  else
    retval = je_malloc(size);

  if (MOZ_UNLIKELY(!retval)) {
    // do extra things --- regardless of enable_malloc_instrumentation
    annotateCrashReport(); // bug 811483
    fireMemoryPressureEvent(); // bug 811497
  }  
}
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 14:51:52 PST
(In reply to Benoit Jacob [:bjacob] from comment #44)
> Like this (assuming we want the check for :

Sorry, this sentence should read:

Like this (assuming we want the check for enable_malloc_instrumentation to be a runtime check)
Comment 46 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 14:54:14 PST
Also, to be clear, I'm not asking you to add this if (MOZ_UNLIKELY(!retval)) {...} block yourself --- just to design the wrap(malloc) functions here in a way that allows me to add such code later and have it work regardless of enable_malloc_instrumentation.

And yes, I did forget the return statement in comment 44.
Comment 47 Mike Hommey [:glandium] 2012-11-13 23:14:18 PST
(In reply to Benoit Jacob [:bjacob] from comment #44)
>   if (MOZ_UNLIKELY(!retval)) {
>     // do extra things --- regardless of enable_malloc_instrumentation
>     annotateCrashReport(); // bug 811483
>     fireMemoryPressureEvent(); // bug 811497

That's a tough one. Because these functions live in mozglue, and they will have great difficulty calling back into libxul. At least not without a registration system and a callback.

However, can that even happen? (malloc returning NULL) The process will likely be OOM killed before reaching a state where that /can/ happen.

(In reply to Benoit Jacob [:bjacob] from comment #43)
> Does this currently work on Android? I have a use for it, especially on
> Android, in bug 811483. I'm also interested in B2G if we have B2G crash
> reporting yet.

It should work on android, although LD_PRELOAD is not easy, there. There would need to be a helper in the java code used before loading mozglue.
Comment 48 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 23:30:15 PST
(In reply to Mike Hommey [:glandium] from comment #47)
> (In reply to Benoit Jacob [:bjacob] from comment #44)
> >   if (MOZ_UNLIKELY(!retval)) {
> >     // do extra things --- regardless of enable_malloc_instrumentation
> >     annotateCrashReport(); // bug 811483
> >     fireMemoryPressureEvent(); // bug 811497
> 
> That's a tough one. Because these functions live in mozglue, and they will
> have great difficulty calling back into libxul. At least not without a
> registration system and a callback.

I didn't realize that difficulty. That doesn't seem that difficult though, if it's just passing around another function pointer, but maybe I'm missing something.

> 
> However, can that even happen? (malloc returning NULL) The process will
> likely be OOM killed before reaching a state where that /can/ happen.

If malloc returning null could never happen, then why would we bother having infallible vs fallible malloc?

I thought that a typical case where that would happen was when doing a relatively large memory allocation, say 1 MB, with fallible malloc (which one typically uses for allocations that can be large, so one can handle failure). The concern is that code mis-handling null return values from fallible malloc could be a common reason for crashes that are related to OOM conditions, but may not immediately be obviously OOM crashes.

> 
> (In reply to Benoit Jacob [:bjacob] from comment #43)
> > Does this currently work on Android? I have a use for it, especially on
> > Android, in bug 811483. I'm also interested in B2G if we have B2G crash
> > reporting yet.
> 
> It should work on android, although LD_PRELOAD is not easy, there. There
> would need to be a helper in the java code used before loading mozglue.

OK. Well, the part here for which I'm specifically interested in Android, is the handling-malloc-returning-null part, which would not rely on LD_PRELOAD (would be built in).
Comment 49 Benoit Jacob [:bjacob] (mostly away) 2012-11-13 23:40:55 PST
Oh, also, I thought that a cause of crashes that we had on 32bit Windows machines + long-running Firefox sessions + memory leaks was that we ran out of address space. If I understand correctly, these crashes would follow an allocation failure.
Comment 50 Benoit Jacob [:bjacob] (mostly away) 2012-11-14 00:11:52 PST
Created attachment 681385 [details] [diff] [review]
some tweaks

Some tweaks on top of the patches here.

 1. use MOZ_UNLIKELY to help the compiler favor the right branch
 2. replace the if(!replace_foo) by if(replace_foo) to remove a useless negation
 3. add a malloc_failed(size_t size) stub, get it called on failure

1. and 2. are just in case you'd like it. 3. is what I proposed above, I actually care about that if only to get a way to test the theory that malloc returning null happens and is correlated to certain bugs/crashes --- if it turns out that that theory is wrong, we can take it out, but without something like that I don't see how we are going to find out.
Comment 51 Mike Hommey [:glandium] 2012-11-14 00:24:08 PST
I'd prefer if we didn't touch replace_malloc.c to add things that should be handled by a replace_malloc implementation library. What we could do though, is use a default replace_malloc implementation to be used when there's none hooked.
Comment 52 Benoit Jacob [:bjacob] (mostly away) 2012-11-14 00:41:37 PST
OK, that works too, but I wanted this malloc-failure-handling stuff to be enabled by default, at least on Android/B2G. Do you think we can hope to --enable-replace-malloc by default on these platforms? If not, what would be an appropriate place to add this checking?
Comment 53 Mike Hommey [:glandium] 2012-11-14 00:48:29 PST
(In reply to Benoit Jacob [:bjacob] from comment #52)
> OK, that works too, but I wanted this malloc-failure-handling stuff to be
> enabled by default, at least on Android/B2G. Do you think we can hope to
> --enable-replace-malloc by default on these platforms?

Not before we're ready to enable jemalloc3, unless we also make replace malloc work with mozjemalloc, which, at the point the patches are, might not be so much work.
Comment 54 Benoit Jacob [:bjacob] (mostly away) 2012-11-14 00:50:52 PST
OK, waiting for jemalloc3 sounds OK. But do you expect there to be other objections to turning it on by default on Android/B2G, e.g. in terms of
 - overhead?  (I think not, as a could if()'s should be negligible compared to a malloc)
 - security?  (Not understanding the above concern on Windows, I don't know if it applies on Android as well).
Comment 55 Benoit Jacob [:bjacob] (mostly away) 2012-11-14 00:51:46 PST
so dyslexic :-/     s/could/couple of/
Comment 56 Mike Hommey [:glandium] 2012-11-14 01:13:10 PST
(In reply to Benoit Jacob [:bjacob] from comment #54)
> OK, waiting for jemalloc3 sounds OK. But do you expect there to be other
> objections to turning it on by default on Android/B2G, e.g. in terms of
>  - overhead?  (I think not, as a could if()'s should be negligible compared
> to a malloc)

I think not, either.

>  - security?  (Not understanding the above concern on Windows, I don't know
> if it applies on Android as well).

On Linux, Android, and b2g, the function pointers used by replace_malloc hooking in the ifs() are in the GOT, like function pointers used by the PLT when calling the functions (although they're different pointers, they're all in the GOT). So security-wise, this is pretty much equal to normal function calls. On Mac, this is a similar situation. On Windows, however, the situation is different, so it needs to be carefully considered.
Comment 57 Mike Hommey [:glandium] 2012-11-16 00:14:58 PST
Created attachment 682365 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Refreshed after bug 805855. I still want to change some comments before sending for review.
Comment 58 Mike Hommey [:glandium] 2012-11-16 00:20:58 PST
Created attachment 682366 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

Refreshed after bug 805855. I want to make it work with mozjemalloc as well, before sending to review.
I applied a part of bjacob's tweaks, namely, i added LIKELY/UNLIKELY hints, but i didn't change ordering because LIKELY/UNLIKELY doesn't do anything on MSVC.
Comment 59 Justin Lebar (not reading bugmail) 2012-11-16 09:44:02 PST
+  if (MOZ_LIKELY(!replace_malloc))
+    return je_malloc(size);

Is the idea that we'd release builds with replace-malloc enabled?  If so, what's the use-case for that?  Otherwise I don't see why this is MOZ_LIKELY.
Comment 60 Benoit Jacob [:bjacob] (mostly away) 2012-11-16 09:52:29 PST
(In reply to Justin Lebar [:jlebar] from comment #59)
> +  if (MOZ_LIKELY(!replace_malloc))
> +    return je_malloc(size);
> 
> Is the idea that we'd release builds with replace-malloc enabled?

Yes, hopefully.

> If so, what's the use-case for that?

I was interested in getting the ability of annotating crash reports if malloc ever returns null. See bug 811483. The use case if trying to figure why we get so many crash reports on Android: how many of those are flavors of OOM?
Comment 61 Mike Hommey [:glandium] 2012-11-16 09:57:42 PST
(In reply to Justin Lebar [:jlebar] from comment #59)
> +  if (MOZ_LIKELY(!replace_malloc))
> +    return je_malloc(size);
> 
> Is the idea that we'd release builds with replace-malloc enabled?  If so,
> what's the use-case for that?  Otherwise I don't see why this is MOZ_LIKELY.

Because a build with replace-malloc enabled doesn't necessarily run with a replace-malloc lib loaded.
Not even talking about release builds, this means you could run a debug build without trace malloc (if we ever port trace malloc to this), and only enable it when you want.
Comment 62 Justin Lebar (not reading bugmail) 2012-11-16 10:03:10 PST
> Not even talking about release builds, this means you could run a debug build without 
> trace malloc (if we ever port trace malloc to this), and only enable it when you want.

You can turn it on without restarting the process?

If not, at the point that I have to restart the process, the fact that I have to recompile does not seem like a big problem here.

I just feel like we may be adding complexity to meet use-cases which don't exist, making it harder to meet real use-cases, such as DMD.  YAGNI.
Comment 63 Mike Hommey [:glandium] 2012-11-16 10:12:25 PST
(In reply to Justin Lebar [:jlebar] from comment #62)
> > Not even talking about release builds, this means you could run a debug build without 
> > trace malloc (if we ever port trace malloc to this), and only enable it when you want.
> 
> You can turn it on without restarting the process?

I guess we could allow that if there's a use case for it... But how useful is hooking the malloc implementation if you don't have any track record of past allocations?

> If not, at the point that I have to restart the process, the fact that I
> have to recompile does not seem like a big problem here.

Not everyone likes to wait for a build to finish just to test a little something. Especially when you can test that little something by not building at all (that is, when your toolset is already built, and all you need is just to restart to enable it).

> I just feel like we may be adding complexity to meet use-cases which don't
> exist, making it harder to meet real use-cases, such as DMD.  YAGNI.

These patches have already been useful to me, so there is at least one use case it fulfils.
Comment 64 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-16 14:30:25 PST
I'm going to try building a simple tool with this on Monday to see how it meets DMD's needs.  Before I do, I have some questions whose answers would be helpful.

- Can the wrapper functions be compiled in, e.g. if an --enable-foo config flag is specified?  (As opposed to LD_PRELOADing them.)

- Can the malloc wrappers call functions that might allocate?  For example, can I add something to a hash table, which might trigger a realloc() of the hash table's storage?  To do this there needs to be a way to temporarily disable malloc interception/instrumentation, so that the hash table realloc() doesn't trigger the realloc-wrapper.  In trace-malloc and DMD that's currently handled via a per-thread interception blocking mechanism -- see AutoBlockIntercepts in the attachment in bug 717853.

- Can the malloc wrappers use Gecko code, e.g. nsTHashtable?  I think the answers to the first two questions must be "yes" for this to be the case.

Thanks!
Comment 65 Justin Lebar (not reading bugmail) 2012-11-16 14:35:38 PST
> I'm going to try building a simple tool with this on Monday to see how it meets DMD's needs.

I've been working on this today.  I didn't get particularly far; what I have is a working malloc wrapper that calls into libxul.  The libxul code can then do whatever it wants -- in particular, it could call into new-dmd.  That part seems relatively straightforward, and works on b2g.

The bigger roadblock is that we can't get stacktraces on b2g.  Without that, none of this is useful.

I'll post some code in the newdmd bug.

> - Can the wrapper functions be compiled in, e.g. if an --enable-foo config flag is specified?  (As 
> opposed to LD_PRELOADing them.)

Glandium and I talked about this on IRC.  I believe the conclusion was "probably not".  But the glue I have makes this relatively painless for you now.

> - Can the malloc wrappers call functions that might allocate?  

You need to do the same kind of re-entrancy detection that new-dmd already does.

> - Can the malloc wrappers use Gecko code, e.g. nsTHashtable?

The malloc wrappers must live in a library that's not libxul.  But the wrapper I wrote today calls into libxul, at which point you can do whatever.
Comment 66 Mike Hommey [:glandium] 2012-11-16 14:47:06 PST
(In reply to Nicholas Nethercote [:njn] from comment #64)
> I'm going to try building a simple tool with this on Monday to see how it
> meets DMD's needs.  Before I do, I have some questions whose answers would
> be helpful.
> 
> - Can the wrapper functions be compiled in, e.g. if an --enable-foo config
> flag is specified?  (As opposed to LD_PRELOADing them.)

Yes, but it will be linked in libmozglue, so it doesn't help much with the third question. In fact it makes it harder (see below).
 
> - Can the malloc wrappers call functions that might allocate?  For example,
> can I add something to a hash table, which might trigger a realloc() of the
> hash table's storage?  To do this there needs to be a way to temporarily
> disable malloc interception/instrumentation, so that the hash table
> realloc() doesn't trigger the realloc-wrapper.  In trace-malloc and DMD
> that's currently handled via a per-thread interception blocking mechanism --
> see AutoBlockIntercepts in the attachment in bug 717853.

You can do whatever you want in the replacement functions, including handling such blockers.
 
> - Can the malloc wrappers use Gecko code, e.g. nsTHashtable?  I think the
> answers to the first two questions must be "yes" for this to be the case.

We discussed this with Justin today. It's possible, but kind of painful. You need to link your preloaded library against libxul, and ensure the dynamic linker will find libxul (by either preloading it as well or setting an appropriate search path)

An alternative is to cherry pick the parts you want and just link the corresponding object files. Adding e.g. nsTHashtable.o and pldhash.o to the objects you link in your library might be all you need, although avoiding to export their symbols might be better.

Note that in that case, you'd maybe need to link the library against libmozglue on windows, although i'm not sure.

AIUI, Justin has been doing some DMD stuff with the patches from this bug, so you should talk to him before trying anything on monday.
Comment 67 Benoit Jacob [:bjacob] (mostly away) 2012-11-17 19:32:44 PST
While working on my replace_malloc library, I wondered: why do we need to pass function pointers to replace_foo functions, why aren't we just letting the replace_foo functions call the foo function by themselves?

Like, replace this:

  void *replace_malloc(size_t size, malloc_impl_t malloc_impl)
  {
    void *ptr = malloc_impl(size);
    return ptr;
  }

By this:

  void *replace_malloc(size_t size)
  {
    // no need to pass malloc_impl func pointer, the replace_ library
    // just links to libmozalloc (I suppose)
    void *ptr = malloc_impl(size);
    return ptr;
  }

My use case is I actually need to call the raw _impl funtions in a few more places. For example I need to allocate the dummy block used as return value for malloc(0), etc.
Comment 68 Justin Lebar (not reading bugmail) 2012-11-17 20:48:18 PST
> My use case is I actually need to call the raw _impl funtions in a few more places. For example I 
> need to allocate the dummy block used as return value for malloc(0), etc.

We need something like this for DMD as well.  As a hack, I was just storing pointers to &orig_malloc and &orig_free the first time replace_{malloc,free} was called.

> why aren't we just letting the replace_foo functions call the foo function by themselves?

However, it's not clear to me that this would work, particularly if replace_malloc lives inside libxul (bug 717853 comment 40).

Instead of passing a pointer to malloc, we could pass a pointer to a struct which contains pointers to malloc, free, etc...
Comment 69 Mike Hommey [:glandium] 2012-11-18 02:28:58 PST
(In reply to Benoit Jacob [:bjacob] from comment #67)
> While working on my replace_malloc library, I wondered: why do we need to
> pass function pointers to replace_foo functions, why aren't we just letting
> the replace_foo functions call the foo function by themselves?

Because the foo functions won't necessarily be exported.

(In reply to Justin Lebar [:jlebar] from comment #68)
> Instead of passing a pointer to malloc, we could pass a pointer to a struct
> which contains pointers to malloc, free, etc...

Yeah, I started pondering about passing a function table when we were talking about DMD on irc the other day.
Comment 70 Mike Hommey [:glandium] 2012-11-18 03:13:25 PST
(In reply to Mike Hommey [:glandium] from comment #69)
> Yeah, I started pondering about passing a function table when we were
> talking about DMD on irc the other day.

... doing that makes me wonder if it wouldn't be better to just call an init function in the replace malloc lib, with the function table, and be done with it.
Comment 71 Justin Lebar (not reading bugmail) 2012-11-18 04:39:55 PST
> ... doing that makes me wonder if it wouldn't be better to just call an init function in 
> the replace malloc lib, with the function table, and be done with it.

Yeah; I'm probably going to save the function table anyway on the first malloc.  :)
Comment 72 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-18 16:59:33 PST
Comment on attachment 682365 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Review of attachment 682365 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/extraMallocFuncs.c
@@ -89,5 @@
> -#ifdef XP_WIN
> -/*
> - *  There's a fun allocator mismatch in (at least) the VS 2010 CRT
> - *  (see the giant comment in this directory's Makefile.in
> - *  that gets redirected here to avoid a crash on shutdown.

There's a missing ')' in this comment.
Comment 73 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-18 17:12:41 PST
A naming nit:  "replace-malloc" is a verb phrase.  "malloc-replacement" is a noun phrase.  "malloc-replace" falls awkwardly between them -- could we use one of the other two names?
Comment 74 Benoit Jacob [:bjacob] (mostly away) 2012-11-18 19:10:00 PST
Created attachment 682968 [details] [diff] [review]
tweak: pass a struct containing all malloc functions

> (In reply to Justin Lebar [:jlebar] from comment #68)
> > Instead of passing a pointer to malloc, we could pass a pointer to a struct
> > which contains pointers to malloc, free, etc...
> 
> Yeah, I started pondering about passing a function table when we were
> talking about DMD on irc the other day.

Here's a quick patch doing that.
Comment 75 Benoit Jacob [:bjacob] (mostly away) 2012-11-18 20:45:26 PST
Created attachment 682987 [details] [diff] [review]
tweak: pass a struct containing all malloc functions

actually working
Comment 76 Benoit Jacob [:bjacob] (mostly away) 2012-11-18 21:00:04 PST
...Though the proposal to pass the struct once and for all in some init() function, rather than passing it on every malloc-like function, would be even better IMHO.
Comment 77 Mike Hommey [:glandium] 2012-11-18 22:52:59 PST
(In reply to Nicholas Nethercote [:njn] from comment #72)
> Comment on attachment 682365 [details] [diff] [review]
> part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to
> make it clearer.
> 
> Review of attachment 682365 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: memory/build/extraMallocFuncs.c
> @@ -89,5 @@
> > -#ifdef XP_WIN
> > -/*
> > - *  There's a fun allocator mismatch in (at least) the VS 2010 CRT
> > - *  (see the giant comment in this directory's Makefile.in
> > - *  that gets redirected here to avoid a crash on shutdown.
> 
> There's a missing ')' in this comment.

Note the comment you're quoting is being removed. And it's moved to wrap_malloc.c, where the ')' is already added at the same time.
Comment 78 Mike Hommey [:glandium] 2012-11-19 10:15:11 PST
Created attachment 683202 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

New iteration. Only tested on Linux. At this point, I don't expect more changes than making things and work build if they are broken (that is, no more comment changes, no more macro changes, etc).
Comment 79 Mike Hommey [:glandium] 2012-11-19 10:59:20 PST
Created attachment 683221 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

New part 2. Only tested on Linux. It works with both mozjemalloc or jemalloc as malloc implementation. There's a WIP replace-malloc library that implements jemalloc when mozjemalloc is built-in, and mozjemalloc when jemalloc is built-in, but I haven't updated it after my last changes, so it probably doesn't build (and it's not hooked in the build system anyways).

It also adds a call to an init function with the function table, instead of giving the function(s) to each call. There are a few places that probably need comment updates.

I also spotted something rather disturbing: we don't run prefork and postfork hooks on osx with mozjemalloc. I'll file a separate bug for that.
Comment 80 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-19 21:30:47 PST
What is aligned_alloc()?  According to MXR it only shows up in memory/jemalloc/src/test/aligned_alloc.c.
Comment 81 Mike Hommey [:glandium] 2012-11-20 04:29:07 PST
(In reply to Nicholas Nethercote [:njn] from comment #80)
> What is aligned_alloc()?  According to MXR it only shows up in
> memory/jemalloc/src/test/aligned_alloc.c.

It's a C11 function, and may be used by system libraries. It is provided by jemalloc3, and wasn't provided by mozjemalloc, so to make things align, I added it to mozjemalloc.
Comment 82 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 04:40:23 PST
By the way, how about _aligned_alloc on Windows?
http://msdn.microsoft.com/en-us/library/8z34s9c6%28v=vs.80%29.aspx

There is also a thing called _mm_malloc, provided I think by the same Intel-compatible headers that provide SSE intrinsics.

Since all these aligned-alloc functions do the same thing, it may make sense to implement them internally in malloc_replace.c as wrappers around replace_memalign rather than ask every malloc_replace library to wrap all of them.
Comment 83 Mike Hommey [:glandium] 2012-11-20 05:08:06 PST
(In reply to Benoit Jacob [:bjacob] from comment #82)
> By the way, how about _aligned_alloc on Windows?
> http://msdn.microsoft.com/en-us/library/8z34s9c6%28v=vs.80%29.aspx

We're currently not hooking it, so let's keep things as they are now, and see about that later. I'm only adding aligned_alloc because it's already in jemalloc3 and it allows an easy switch between mozjemalloc and jemalloc3.

> Since all these aligned-alloc functions do the same thing, it may make sense
> to implement them internally in malloc_replace.c as wrappers around
> replace_memalign rather than ask every malloc_replace library to wrap all of
> them.

When your replacement library is a tracer, that might be true, but if you're hooking a different memory allocator (e.g. tcmalloc, or others), the distinction between the various functions can be important. For instance, there are optimization opportunities in the difference between the various alignment functions that could be used in some allocators.

The replace_malloc.h header could however provide helpers to make writing these functions easier.
Comment 84 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 06:10:48 PST
Created attachment 683557 [details] [diff] [review]
fix build of c++ replace_malloc libs

Can't do extern "C" inside a struct
Comment 85 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 06:16:55 PST
Comment on attachment 683557 [details] [diff] [review]
fix build of c++ replace_malloc libs

glandium has a simpler fix.
Comment 86 Mike Hommey [:glandium] 2012-11-21 03:15:52 PST
Created attachment 683958 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

This should be the final version. Note this unexports recalloc, msize and expand from mozjemalloc, but we a) don't have them in jemalloc3, b) don't use them in our codebase, c) even if system libraries are using them, they're currently not calling into mozjemalloc anyways.
Comment 87 Mike Hommey [:glandium] 2012-11-21 03:19:02 PST
Created attachment 683959 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

This one is almost final. I need to add the helpers mentioned in comment 83 in replace_malloc.h, and more importantly, need to figure why the dummy library is not weaky linked on x86-64 only on mac, which might well be a bug in the old linker we're using on the buildbots. Filed bug 813930 to get a build slave to try to figure things out.
Comment 88 Mike Hommey [:glandium] 2012-11-26 05:24:51 PST
Created attachment 685117 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Only a few changes from last iteration. Adding khuey to the party for build system parts.
Comment 89 Mike Hommey [:glandium] 2012-11-26 05:26:16 PST
Created attachment 685119 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.
Comment 90 Mike Hommey [:glandium] 2012-11-26 05:27:53 PST
Created attachment 685120 [details] [diff] [review]
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc

Note memory/replace is being added to tier_platform_dirs instead of memory/replace/jemalloc because there will be more in the future.
Comment 91 Mike Hommey [:glandium] 2012-11-26 05:28:34 PST
Created attachment 685121 [details] [diff] [review]
part 4 - Allow to load a replace-malloc lib from android builds
Comment 92 Mike Hommey [:glandium] 2012-11-26 06:04:59 PST
Comment on attachment 685117 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

I'll need another round.
Comment 93 Mike Hommey [:glandium] 2012-11-26 07:20:56 PST
Comment on attachment 685121 [details] [diff] [review]
part 4 - Allow to load a replace-malloc lib from android builds

The android system linker actually doesn't handle weak linking properly.
Comment 94 Mike Hommey [:glandium] 2012-11-26 09:48:59 PST
Created attachment 685215 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.
Comment 95 Mike Hommey [:glandium] 2012-11-26 09:49:46 PST
Created attachment 685217 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.
Comment 96 Mike Hommey [:glandium] 2012-11-26 09:50:27 PST
Created attachment 685218 [details] [diff] [review]
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc
Comment 97 Mike Hommey [:glandium] 2012-11-26 10:36:44 PST
Sadly, this still doesn't work well on android because malloc is called before putenv.
Comment 98 Mike Hommey [:glandium] 2012-11-26 12:12:50 PST
Created attachment 685286 [details] [diff] [review]
part 4 - Set environment variables earlier on Android

This makes putenv the first thing done in mozglue, before jemalloc can be called. It might be better to make it run only once, though.
Comment 99 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-26 13:52:05 PST
Comment on attachment 685215 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Review of attachment 685215 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/extraMallocFuncs.c
@@ +8,2 @@
>  #include "mozilla/Types.h"
>  

Nit: remove "this"(?)

::: memory/build/mozjemalloc_compat.c
@@ +49,5 @@
> +malloc_good_size_impl(size_t size)
> +{
> +  size_t ret;
> +  if (size == 0)
> +    size = 1;

Why is this necessary?  Perhaps a comment explaining would be helpful.

::: memory/mozjemalloc/jemalloc.h
@@ +26,5 @@
> +MOZ_BEGIN_EXTERN_C
> +
> +/*
> + * On OSX, malloc/malloc.h contains the declaration for malloc_good_size,
> + * which will call back in jemalloc, threw the zone allocator so just use it.

s/threw/through/

::: memory/build/mozmemory_wrap.h
@@ +27,5 @@
> + *   convenience, some are treated as being cross-platform, and available
> + *   as such.
> + *
> + * - duplication functions:
> + *   malloc:

Why is this "malloc:" line here?
Comment 100 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-26 16:01:09 PST
Comment on attachment 685217 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

Review of attachment 685217 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +6978,5 @@
> +dnl ========================================================
> +dnl = Enable dynamic replacement of malloc implementation
> +dnl ========================================================
> +MOZ_ARG_ENABLE_BOOL(replace-malloc,
> +[  --enable-replace-malloc   Enable ability to dynamically replace the malloc implementation],

I've found it unclear all along if --enable-jemalloc is required in conjunction with this, or if this forces --enable-malloc.  If the latter, please say "enables jemalloc as well" or something like that in the comment.

::: memory/build/replace_malloc.h
@@ +22,5 @@
> + *
> + * The const malloc_table_t pointer given to that function is a table
> + * containing pointers to the original jemalloc implementation, so that
> + * replacement functions can call them back if they need to. The pointer
> + * itself can safely be kept around (no need to copy the table itself).

Nice!

::: memory/replace/dummy/dummy_replace_malloc.c
@@ +5,5 @@
> +#include "mozilla/Types.h"
> +
> +#define MALLOC_FUNCS MALLOC_FUNCS_ALL
> +#define MALLOC_DECL(name, ...) \
> +  MOZ_EXPORT void replace_ ## name() { }

I have no idea what memory/replace/dummy/* is for.  Could you explain in a comment?
Comment 101 Brad Lassey [:blassey] (use needinfo?) 2012-11-26 17:54:48 PST
Comment on attachment 685286 [details] [diff] [review]
part 4 - Set environment variables earlier on Android

Review of attachment 685286 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoAppShell.java
@@ +492,5 @@
>              sNSSLibsLoaded = true;
>          }
>      }
>  
> +    public static void loadMozGlue(Context context) {

I'd rather pass in an Activity (or the Intent itself) than a context if we're just going to cast it below.
Comment 102 Mike Hommey [:glandium] 2012-11-27 00:00:21 PST
(In reply to Brad Lassey [:blassey] from comment #101)
> I'd rather pass in an Activity (or the Intent itself) than a context if
> we're just going to cast it below.

That just moves the cast to the caller. Is that really better?
Comment 103 Mike Hommey [:glandium] 2012-11-27 00:04:22 PST
(In reply to Nicholas Nethercote [:njn] from comment #100)
> ::: memory/replace/dummy/dummy_replace_malloc.c
> @@ +5,5 @@
> > +#include "mozilla/Types.h"
> > +
> > +#define MALLOC_FUNCS MALLOC_FUNCS_ALL
> > +#define MALLOC_DECL(name, ...) \
> > +  MOZ_EXPORT void replace_ ## name() { }
> 
> I have no idea what memory/replace/dummy/* is for.  Could you explain in a
> comment?

It's explained in configure.in, actually. I guess I could put a note sending there.
Comment 104 Mike Hommey [:glandium] 2012-11-27 07:44:39 PST
Created attachment 685630 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Refreshed with njn's comments.
Comment 105 Mike Hommey [:glandium] 2012-11-27 07:47:51 PST
Created attachment 685632 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.
Comment 106 Mike Hommey [:glandium] 2012-11-27 07:48:57 PST
Created attachment 685633 [details] [diff] [review]
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc
Comment 107 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-27 19:03:21 PST
I just got the new patches working with NewDMD (and I'm exporting replace_* from libxul.so so I don't need jlebar's tolibxul stuff any more).  The presence of replace_init and malloc_table_t made things nicer, so thanks for that.

The only stumbling block is that mozalloc_macro_wrappers.h has various macros like this:

  #define malloc(_) moz_malloc(_)

and DMD.cpp #includes them (via nscore.h) which means that e.g. |gMallocTable->malloc| was being rewritten as |gMallocTable->moz_malloc|!

I just #undef'd all these macros, but maybe replace_malloc.h should do the #undef'ing instead?  Not sure.
Comment 108 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-27 20:40:08 PST
configure.in says this:

[  --enable-replace-malloc   Enable ability to dynamically replace the malloc implementation (requires --enable-jemalloc)],

But I just tried --enable-replace-malloc builds of DMD with and without --enable-jemalloc and they both worked and the final binaries appeared to be equivalent (i.e. they were the same size and behaved the same in my testing).
Comment 109 Mike Hommey [:glandium] 2012-11-27 23:32:20 PST
(In reply to Nicholas Nethercote [:njn] from comment #108)
> configure.in says this:
> 
> [  --enable-replace-malloc   Enable ability to dynamically replace the
> malloc implementation (requires --enable-jemalloc)],
> 
> But I just tried --enable-replace-malloc builds of DMD with and without
> --enable-jemalloc and they both worked and the final binaries appeared to be
> equivalent (i.e. they were the same size and behaved the same in my testing).

Don't forget --enable-jemalloc is the default on non-debug builds (except mac 32-bits)

(In reply to Nicholas Nethercote [:njn] from comment #107)
> I just got the new patches working with NewDMD (and I'm exporting replace_*
> from libxul.so so I don't need jlebar's tolibxul stuff any more).  The
> presence of replace_init and malloc_table_t made things nicer, so thanks for
> that.
> 
> The only stumbling block is that mozalloc_macro_wrappers.h has various
> macros like this:
> 
>   #define malloc(_) moz_malloc(_)
> 
> and DMD.cpp #includes them (via nscore.h) which means that e.g.
> |gMallocTable->malloc| was being rewritten as |gMallocTable->moz_malloc|!
> 
> I just #undef'd all these macros, but maybe replace_malloc.h should do the
> #undef'ing instead?  Not sure.

#define MOZ_NO_MOZALLOC maybe?
Comment 110 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-28 00:38:42 PST
> Don't forget --enable-jemalloc is the default on non-debug builds (except
> mac 32-bits)

Oh.  I knew that was the default, but I still read that line as saying that you had to explicit give --enable-jemalloc.  Maybe saying "requires jemalloc" would be clearer?  Or maybe if you say --enable-replace-malloc it forces on jemalloc?
Comment 111 Mike Hommey [:glandium] 2012-11-28 00:40:44 PST
(In reply to Nicholas Nethercote [:njn] from comment #110)
> > Don't forget --enable-jemalloc is the default on non-debug builds (except
> > mac 32-bits)
> 
> Oh.  I knew that was the default, but I still read that line as saying that
> you had to explicit give --enable-jemalloc.  Maybe saying "requires
> jemalloc" would be clearer?  Or maybe if you say --enable-replace-malloc it
> forces on jemalloc?

Or saying nothing, and let people hit the AC_MSG_ERROR if MOZ_MEMORY is not set.
Comment 112 Brad Lassey [:blassey] (use needinfo?) 2012-11-28 16:43:51 PST
(In reply to Mike Hommey [:glandium] from comment #102)
> (In reply to Brad Lassey [:blassey] from comment #101)
> > I'd rather pass in an Activity (or the Intent itself) than a context if
> > we're just going to cast it below.
> 
> That just moves the cast to the caller. Is that really better?

No, GeckoApp is an Activity, no cast is needed.
Comment 113 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-28 20:17:24 PST
Another nit from experience:  I want to call malloc (indirectly) from replace_init().  (Reason being, it's a very convenient place to do initialization.)  But I get an infinite loop.  If I change the setting of |replace_malloc_initialized| like so:

  init()
  {
  #ifdef MOZ_NO_REPLACE_FUNC_DECL
    replace_malloc_init_funcs();
  #endif
    replace_malloc_initialized = 1;
    if (replace_init)
      replace_init(&malloc_table);
  }

then it works.
Comment 114 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-28 20:24:59 PST
... but then I get these warnings:

WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/njn/moz/mi4/xpcom/base/nsTraceRefcntImpl.cpp, line 141

So it looks like replace_init is called from a static constructor, and you're probably going to tell me that doing anything other than copy the malloc_table_t in replace_init is a bad idea :(
Comment 115 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-29 11:23:52 PST
Comment on attachment 685632 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

Review of attachment 685632 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/replace_malloc.h
@@ +95,5 @@
> +    *ptr = NULL;
> +    return 0;
> +  }
> +  /* alignment must be a power of two and a multiple of sizeof(void *) */
> +  if (((alignment - 1) & alignment) != 0 || || (alignment % sizeof(void *)))

My compiler doesn't like two || operators in a row :P
Comment 116 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-11-29 16:18:17 PST
Actually, comment 113 stands -- I've now got DMD initialization working from replace_init(), and it works well.  (The problem in comment 114 can be worked around.)  In particular, I'm guaranteed to see every malloc/free, whereas before I would miss some of the early ones.
Comment 117 Mike Hommey [:glandium] 2012-12-03 03:05:13 PST
Created attachment 687700 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Same patch, but with context fixes due to recent changes on m-c.
Comment 118 Mike Hommey [:glandium] 2012-12-03 03:06:32 PST
Created attachment 687701 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

With a few more notes, and taking njn's comments into account.
Comment 119 Justin Lebar (not reading bugmail) 2012-12-04 14:50:17 PST
Comment on attachment 687700 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Wow.  That was...something.  I'm not convinced I followed much of that at
all, but I have comments on the comments!

Sorry these review comments aren't more useful.

>diff --git a/memory/Makefile.in b/memory/Makefile.in
>new file mode 100644
>--- /dev/null
>+++ b/memory/Makefile.in
>@@ -0,0 +1,22 @@
>+# This Source Code Form is subject to the terms of the Mozilla Public
>+# License, v. 2.0. If a copy of the MPL was not distributed with this
>+# file, You can obtain one at http://mozilla.org/MPL/2.0/.
>+
>+DEPTH           = @DEPTH@
>+topsrcdir       = @top_srcdir@
>+srcdir          = @srcdir@
>+VPATH           = @srcdir@
>+
>+include $(DEPTH)/config/autoconf.mk
>+
>+DIRS += mozjemalloc

We build mozjemalloc even if MOZ_JEMALLOC is enabled?

I wonder if MOZ_JEMALLOC should be called MOZ_NEW_JEMALLOC (or MOZ_JEMALLOC3 or
something) until we remove mozjemalloc.  It's certainly surprising that
MOZ_JEMALLOC defined does not mean we're building mozjemalloc!

>+#ifndef MOZ_JEMALLOC
>+#  error Should not compile this file when not building with jemalloc 3

Maybe eliminate the double-negative?  I misread this the first few times.

>diff --git a/memory/build/mozmemory_wrap.h b/memory/build/mozmemory_wrap.h

Some nits here for you to fix or ignore, at your leisure.

>+ * This header helps properly defining or using malloc implementation
>+ * functions with the names they are meant to have in various cases.

The gerunds are no good here ("This helps properly defining," is not a
sentence).

I'm not sure what you mean here exactly, but maybe something like the following
would suffice?

  This header contains #defines which tweak the names of various heap allocator
  functions.

>+ * There are several types of functions related to memory allocation
>+ * that are meant to be used publicly by the Gecko codebase:
>+ *
>+ * - malloc implementation functions:
>+ *   - malloc
>+ *   - posix_memalign
>+ *   - aligned_alloc
>+ *   - calloc
>+ *   - realloc
>+ *   - free
>+ *   - memalign
>+ *   - valloc
>+ *   - malloc_usable_size
>+ *   - malloc_good_size
>+ *   Some of these functions are specific to some systems, but for
>+ *   convenience, some are treated as being cross-platform, and available
>+ *   as such.

Do you mean s/, some/, they/?

>+ * - On Windows, the malloc implementation functions are all prefixed with
>+ *   "je_", the duplication functions are prefixed with "wrap_", and jemalloc
>+ *   specific functions are left unprefixed. All these functions are however
>+ *   aliased when exporting them, such that the resulting mozglue.dll exports
>+ *   them unprefixed (see $(topsrcdir)/mozglue/build/mozglue.def.in). The
>+ *   prefixed malloc implementation and duplication functions are however not
>+ *   exported.

Nit: I'd get rid of the second "however" to vary the sentence structure.

>+ * - On MacOSX, the system libc has a zone allocator, which allows to

allows /us/ to

>+ *   hook custom malloc implementation functions without exporting them.
>+ *   The malloc implementation functions are all prefixed with "je_" and used
>+ *   this way from the custom zone allocator. They are not exported.
>+ *   Duplication functions are not included, since they will call-back the

s/call-back/call

>+ *   custom zone allocator anyways. Jemalloc specific functions are also left
>+ *   unprefixed.

Preservation of hyphens: "Jemalloc-specific".

>+ * - On Android, both malloc implementation and duplication functions are
>+ *   prefixed with "__wrap_". Additionally, C++ allocation functions
>+ *   (operator new/delete) are also exported and prefixed with "__wrap_".
>+ *   Jemalloc specific functions are left unprefixed.
>+ *
>+ * - On Gonk, all functions are left unprefixed. Additionally, C++ allocation
>+ *   functions (operator new/delete) are also exported and unprefixed.
>+ *
>+ * - On other systems (mostly Linux), all functions are left unprefixed.
>+ *
>+ * Only Android and Gonk add C++ allocation functions.
>+ *
>+ * Proper exporting of the various functions is done with the MOZ_MEMORY_API
>+ * and MOZ_JEMALLOC_API macros. MOZ_MEMORY_API is meant to be used for malloc
>+ * implementation and duplication functions, while MOZ_JEMALLOC_API is
>+ * dedicated to jemalloc specific functions.

In a bunch of places in this patch you use MFBT_API directly.  If that's
intentional, perhaps this comment could be updated to explain where we should
use which define?

>+ * All these functions are meant to be called with no prefix from Gecko code.
>+ * In most cases, this is because that's how they are available at runtime.
>+ * However, on Android, "__wrap_" prefixing is left to the build-time linker
>+ * (with -Wl,--wrap), or to the mozmemory.h header for malloc_good_size and
>+ * jemalloc specific functions..

s/.././

>+ * Within libmozglue (when MOZ_MEMORY_IMPL is defined), all the functions
>+ * should be suffixed with "_impl" both for declarations and use.
>+ * That is, the implementation declaration for e.g. strdup would look like:
>+ *   char* strdup_impl(const char *)
>+ * That implementation would call malloc by using "malloc_impl".
>+ *
>+ * While mozjemalloc uses these "_impl" suffixed helpers, jemalloc3, as
>+ * being third-party code, doesn't, but instead has an elaborate way to
>+ * mangle individual functions. See under "Run jemalloc configure script"
>+ * in $(topsrcdir)/configure.in.

s/as being/being

>+#ifndef MOZ_MEMORY
>+#  error Should not include mozmemory_wrap.h when MOZ_MEMORY is not set.

Rephrase double-negative?
Comment 120 Mike Hommey [:glandium] 2012-12-04 14:58:29 PST
(In reply to Justin Lebar [:jlebar] from comment #119)
> >+DIRS += mozjemalloc
> 
> We build mozjemalloc even if MOZ_JEMALLOC is enabled?

We do, kind of. memory/mozjemalloc/Makefile essentially is there for EXPORTS = jemalloc_types.h when MOZ_JEMALLOC is enabled (the header is used for the mozjemalloc compatibility jemalloc_stats function).

> I wonder if MOZ_JEMALLOC should be called MOZ_NEW_JEMALLOC (or MOZ_JEMALLOC3
> or
> something) until we remove mozjemalloc.  It's certainly surprising that
> MOZ_JEMALLOC defined does not mean we're building mozjemalloc!

Agreed, and since this is a cleanup patch, I think it makes sense to do so here.

> >+ *   Some of these functions are specific to some systems, but for
> >+ *   convenience, some are treated as being cross-platform, and available
> >+ *   as such.
> 
> Do you mean s/, some/, they/?

I did mean some, but maybe this is confusing.
Comment 121 Nicholas Nethercote [:njn] (on vacation until July 11) 2012-12-04 15:01:36 PST
Comment on attachment 687701 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

Review of attachment 687701 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/replace_malloc.c
@@ +133,5 @@
> +{
> +#ifdef MOZ_NO_REPLACE_FUNC_DECL
> +  replace_malloc_init_funcs();
> +#endif
> +  replace_malloc_initialized = 1;

In my local patch which reordered this assignment I have the following comment, which might be worth adding, since this is non-obvious:

  // Set this *before* calling replace_init, otherwise if replace_init calls
  // malloc() we'll get an infinite loop.
Comment 122 Justin Lebar (not reading bugmail) 2012-12-04 15:27:06 PST
Comment on attachment 687701 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

>+#include "replace_malloc.h"
>+
>+#define MALLOC_DECL(name, return_type, ...) \
>+    je_ ## name,
>+
>+static const malloc_table_t malloc_table = {
>+#include "malloc_decls.h"
>+};

The table's members are now "je_malloc" &c instead of "malloc"?  I kind of
liked it how it was; I don't see a reason to couple this interface specifically
to jemalloc.  But if we're going to use je_malloc here, we should update the
comment explaining how to implement replace_malloc.

>+ * On OSX, MOZ_MEMORY_API is defined to nothing, because malloc function

s/function/functions

>+ * are meant to be with hidden visibility. But since the functions are only

s/be with/have

>+ * used locally in the zone allocator further below, we can allow the
>+ * compiler to optimize more by switching to static.
>+ */
>+#ifdef XP_DARWIN
>+#undef MOZ_MEMORY_API
>+#define MOZ_MEMORY_API static
>+#endif
>+
>+/*
>+ * Malloc implementation functions are MOZ_MEMORY_API, and jemalloc
>+ * specific functions MOZ_JEMALLOC_API, see mozmemory_wrap.h

Nit: Comma splice.  s/,/;/.
Comment 123 Mike Hommey [:glandium] 2012-12-04 23:19:29 PST
(In reply to Justin Lebar [:jlebar] from comment #122)
> Comment on attachment 687701 [details] [diff] [review]
> part 2 - Add ability to dynamically replace or supplement jemalloc
> implementation.
> 
> >+#include "replace_malloc.h"
> >+
> >+#define MALLOC_DECL(name, return_type, ...) \
> >+    je_ ## name,
> >+
> >+static const malloc_table_t malloc_table = {
> >+#include "malloc_decls.h"
> >+};
> 
> The table's members are now "je_malloc" &c instead of "malloc"?  I kind of
> liked it how it was; I don't see a reason to couple this interface
> specifically
> to jemalloc.  But if we're going to use je_malloc here, we should update the
> comment explaining how to implement replace_malloc.

Note it is 
  static const malloc_table_t malloc_table = {
not
  typedef struct {
which is in replace_malloc.h.

What you're quoting is populating the function table that is going to be passed to replace_init() with the jemalloc functions. We can't use malloc, etc. there because that would resolve to the replace-malloc.c function themselves, which would force all replace-malloc implementations to have to handle allocation loops, and I don't want to do that. In fact, even DMD shouldn't have to deal with that.
Comment 124 Mike Hommey [:glandium] 2012-12-05 01:46:01 PST
Created attachment 688665 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Addressed Justin's comments. Sending to Kyle for the build system bits.
Comment 125 Mike Hommey [:glandium] 2012-12-05 01:46:35 PST
Created attachment 688666 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

Likewise
Comment 126 Mike Hommey [:glandium] 2012-12-05 01:47:32 PST
Created attachment 688667 [details] [diff] [review]
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc

Refreshed to fit changes to the previous patches.
Comment 129 Justin Lebar (not reading bugmail) 2012-12-11 10:20:42 PST
Now that this has baked for a while, how do you feel about landing it on aurora/beta?

a=me to do so if you feel that the risk is manageable. (There are basically no bits on by default here, right?  So the only risk to branches is red builds?)
Comment 130 Mike Hommey [:glandium] 2012-12-12 03:39:08 PST
Comment on attachment 685286 [details] [diff] [review]
part 4 - Set environment variables earlier on Android

[Approval Request Comment]
User impact if declined: The replace-malloc feature won't be available on Android even if enabled at build time. Arguably, the uplift is mainly meant for b2g, so this patch is not crucial to the feature.
Testing completed (on m-c, etc.): baked on m-i/m-c for a few days
Risk to taking this patch (and alternatives if risky): It doesn't affect default behaviour. It only affects passing environment variables from the command line on android, and only does so by making the environment variables set earlier.
String or UUID changes made by this patch: none
Comment 131 Mike Hommey [:glandium] 2012-12-12 03:43:26 PST
Comment on attachment 688665 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

[Approval Request Comment]
Testing completed (on m-c, etc.): baked for a few days on m-i/m-c.
Risk to taking this patch (and alternatives if risky): Resulting builds could end up with jemalloc disabled (should be noticed pretty fast), or red ; both of which, since they didn't happen on m-i/m-c (at least, not that i'm aware of), are a pretty low risk.
String or UUID changes made by this patch: none
Comment 132 Mike Hommey [:glandium] 2012-12-12 03:44:38 PST
Comment on attachment 688666 [details] [diff] [review]
part 2 - Add ability to dynamically replace or supplement jemalloc implementation.

[Approval Request Comment]
Testing completed (on m-c, etc.): baked on m-i/m-c for a few days
Risk to taking this patch (and alternatives if risky): very low: no change that is part of the default build.
String or UUID changes made by this patch: none
Comment 133 Mike Hommey [:glandium] 2012-12-12 03:46:19 PST
Comment on attachment 688667 [details] [diff] [review]
part 3 - Build jemalloc3 as a replace-malloc library when building with mozjemalloc

[Approval Request Comment]
Testing completed (on m-c, etc.): baked on m-i/m-c for a few days
Risk to taking this patch (and alternatives if risky): very low: no change that is part of the default build.
String or UUID changes made by this patch: none
Comment 134 Justin Lebar (not reading bugmail) 2012-12-12 07:39:31 PST
I didn't a+ part 4 because it's not necessary for our purposes in B2G, and that's where my a+ authority ends.
Comment 135 Justin Lebar (not reading bugmail) 2012-12-12 07:50:49 PST
Comment on attachment 688665 [details] [diff] [review]
part 1 - Cleanup how the mozjemalloc/jemalloc3 glue is set up, attempting to make it clearer.

Note: We're landing on the new b2g repository, not beta.  But I don't see an approval flag for the new repository.
Comment 137 Mike Hommey [:glandium] 2012-12-13 00:02:08 PST
(In reply to Ryan VanderMeulen from comment #136)
> This doesn't apply cleanly to mozilla-b2g18, so I was unable to land it
> there.

So, it's more involved than I thought. Here is what I had to transplant locally to avoid (most) conflicts:
- bug 788955
- bug 762445
- bug 805096
- bug 805416 (conflicted in mfbt/SHA1.h)
- bug 807112
Comment 138 Justin Lebar (not reading bugmail) 2012-12-13 07:50:13 PST
(As unpleasant as this is) none of those bugs scares me too much at first blush to say that we can't have this (and therefore that we can't have dmd and an invalid-free checking tool) on B2G.

Note You need to log in before you can comment on or make changes to this bug.