Last Comment Bug 422055 - Use jemalloc on OpenSolaris
: Use jemalloc on OpenSolaris
Status: RESOLVED FIXED
[RC2+]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: Trunk
: All OpenSolaris
: -- normal (vote)
: mozilla1.9
Assigned To: Ginn Chen
:
: Mike Hommey [:glandium]
Mentors:
: 433608 (view as bug list)
Depends on:
Blocks: 446302 586962
  Show dependency treegraph
 
Reported: 2008-03-11 02:50 PDT by Ginn Chen
Modified: 2010-08-13 05:56 PDT (History)
7 users (show)
mtschrep: blocking1.9-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
draft patch (6.65 KB, patch)
2008-03-11 04:56 PDT, Ginn Chen
no flags Details | Diff | Splinter Review
test result (10.79 KB, text/html)
2008-03-12 03:49 PDT, Ginn Chen
no flags Details
patch (8.57 KB, patch)
2008-03-12 23:30 PDT, Ginn Chen
ted: review-
Details | Diff | Splinter Review
patch v3 (10.59 KB, patch)
2008-03-17 05:02 PDT, Ginn Chen
pavlov: review-
Details | Diff | Splinter Review
patch v4 (11.34 KB, patch)
2008-03-24 03:46 PDT, Ginn Chen
no flags Details | Diff | Splinter Review
patch v4a (11.83 KB, patch)
2008-03-31 17:26 PDT, Jason Evans
no flags Details | Diff | Splinter Review
patch v4b [Checkin: Comment 30] (11.93 KB, patch)
2008-04-01 14:14 PDT, Jason Evans
ted: review+
mbeltzner: approval1.9+
Details | Diff | Splinter Review
fix thunderbird and seamonkey bustage (1.85 KB, patch)
2008-05-05 01:16 PDT, Ginn Chen
no flags Details | Diff | Splinter Review
fix bustage on Solaris/SPARC (672 bytes, patch)
2008-05-05 01:31 PDT, Ginn Chen
no flags Details | Diff | Splinter Review
Remove alloca() call (1.97 KB, patch)
2008-05-05 10:56 PDT, Jason Evans
ginn.chen: review+
Details | Diff | Splinter Review
Remove alloca() call (and #define) [Checkin: Comment 39] (2.19 KB, patch)
2008-05-06 07:39 PDT, Jason Evans
mtschrep: approval1.9+
Details | Diff | Splinter Review
patch revised since bug 418016 reverts to libjemalloc.so (2.05 KB, patch)
2008-05-07 23:40 PDT, Ginn Chen
no flags Details | Diff | Splinter Review
fix linkage on Solaris (5.28 KB, patch)
2008-05-08 04:44 PDT, Ginn Chen
ted: review-
Details | Diff | Splinter Review
alternative patch for linkage [Checkin: Comment 59] (4.55 KB, patch)
2008-05-26 00:28 PDT, Ginn Chen
ted: review+
shaver: approval1.9+
Details | Diff | Splinter Review

Description Ginn Chen 2008-03-11 02:50:19 PDT
 
Comment 1 Ginn Chen 2008-03-11 04:56:13 PDT
Created attachment 308596 [details] [diff] [review]
draft patch

There's a little hack.
I need to put -ljemalloc above -lpthread, otherwise pthread leaks.
Because -lpthread is in LDFLAGS, and in rules.mk, LDFALGS is above LIBS, so I have to overwrite LDFLAGS.

Maybe we should introduce a new variable for rules.mk, or use WRAP_MALLOC_LIB and move it above LDFLAGS. I don't know if it works on other platform.

I use sysconf() instead of kstat, since kstat uses malloc, and we could not do malloc in malloc_init().
Comment 2 Ginn Chen 2008-03-12 03:49:57 PDT
Created attachment 308837 [details]
test result

I've applied the patch to Firefox 3 beta 4.
Here're test results: (all values are median of 3 tests)

It seems we save less memory on SPARC, I guess it relates to the pagesize. It's 4096 on x86, 8192 on SPARC.
We have a little trade off on performance, I hope we can get it back by using PGO.

*****
There's an issue I didn't solve with draft patch.
There's no posix_memalign() in Solaris libc.
So if the user uses LD_PRELOAD, or embeds Firefox in another app, e.g. yelp, we will leak.
Comment 3 Ginn Chen 2008-03-12 23:30:29 PDT
Created attachment 309044 [details] [diff] [review]
patch

Changes:
1. enable it by default in configure.in

2. On Solaris, libc and other malloc lib has memalign but not posix_memalign.
If we use posix_memalign in jemalloc, and other malloc lib or libc is loaded earlier, we will leak. To avoid this, use memalign instead and hide posix_memalign.
Comment 4 Ted Mielczarek [:ted.mielczarek] 2008-03-13 03:19:44 PDT
Comment on attachment 309044 [details] [diff] [review]
patch

+ifdef MOZ_MEMORY
+ifeq ($(OS_ARCH),SunOS)
+LDFLAGS = -L$(DIST)/lib -ljemalloc @LDFLAGS@
+else
+ifneq ($(OS_ARCH),WINNT)
+LIBS += -ljemalloc
+endif
+endif
+endif

I don't like this. First of all, you should use EXPAND_LIBNAME_PATH here, second of all, I don't like the @LDFLAGS@ expansion there. I realize you need to get the ordering correct, but I think there must be a better way to do it.

Also, you'll need review from other people on the js and jemalloc portions of this patch.
Comment 5 Ginn Chen 2008-03-13 03:39:43 PDT
Right, I don't like it, either.

So, should I add something like SOLARIS_JEMALLOC_LIBS to rules.mk?  
Comment 6 Ted Mielczarek [:ted.mielczarek] 2008-03-13 04:42:05 PDT
That would be better, I think.
Comment 7 Ginn Chen 2008-03-17 05:02:59 PDT
Created attachment 309932 [details] [diff] [review]
patch v3
Comment 8 Jason Evans 2008-03-17 07:27:23 PDT
The change from using kstat_*() to sysconf() to get the number of processors looks good.

The patch in 422960 will cause a conflict with the patch in this bug, but resolution should be obvious.

The patch in bug 418016 integrated jemalloc into libxul, which will impact the build-related aspects of the patch in this bug.  Also, earlier comments in this bug refer to "leaking" if multiple memory allocators are used, but I would expect undefined behavior (including likely crashes) if we mix allocator use.  

I'm not very happy about the prospect of burdening all users of posix_memalign() in mozilla with having to conditionally use memalign() instead.  Is this really necessary for standard use cases like yelp?  If it is strictly to make LD_PRELOAD of alternate malloc libraries work, I'm not convinced that it is worth the ongoing maintenance burden.
Comment 9 timeless 2008-03-17 08:06:19 PDT
on windows we do have multiple allocators and it has to work (and it generally  does). on unix multiple allocators should be made to work (the only cases i know of today don't work because they're buggy, and those are IMEs).
Comment 10 Jason Evans 2008-03-17 08:45:36 PDT
My comments are specific to run-time selection of memory allocator, whereas I think you (timeless) are referring to compile-time selection.  I agree that nothing should be done to prevent compile-time selection, but I am questioning the value of requiring developers to write code like the js/src/jsgc.c portion of the patch in this bug everywhere that aligned memory is allocated.  It may be that a simpler solution is to just use memalign() in all cases.
Comment 11 Stuart Parmenter 2008-03-17 10:17:35 PDT
Comment on attachment 309932 [details] [diff] [review]
patch v3

I agree with Jason about the posix_memalign change.  That should be fixed differently. (i.e. not using POSIX_MEMALIGN) (there is code there to do other things already (mmap, oversized allcs, etc).

I don't understand this part:
+#elif defined(MOZ_MEMORY_SOLARIS)
+inline int
+posix_memalign(void **memptr, size_t alignment, size_t size)

Shouldn't be needed.
Comment 12 Ginn Chen 2008-03-17 22:59:19 PDT
Stuart,
I'm trying to hide posix_memalign, just like using __hidden keyword, therefore any problem that links to libjemalloc (or libxul) won't use posix_memalign and libc free together.

Jason,
If js/src/jsgc.c uses memalign() in all cases, should I add inline before posix_memalign in jemalloc.c ?

Comment 13 Stuart Parmenter 2008-03-17 23:06:16 PDT
Ginn: I don't know how the dynamic loader on Solaris works, but on Linux linking will cause everything to use jemalloc's free, malloc, posix_memalign.  That is how things should work.  If we need to hook it up on solaris differently we should do that but we shouldn't hide the symbol.
Comment 14 Stuart Parmenter 2008-03-17 23:09:16 PDT
We want jsgc.c to use posix_memalign, not memalign.  It should be using the posix_memalign from jemalloc on Solaris.
Comment 15 Ginn Chen 2008-03-18 00:07:20 PDT
If jsgc.c uses posix_memalign, that would be a problem on Solaris, if any malloc library or evan libc library is loaded earlier.

It would work for Firefox because we put -lxul before any other libraries.
But it wouldn't work for yelp or LD_PRELOAD case.

Here's a blog on this.
http://blogs.sun.com/quenelle/entry/malloc_interposition_can_t_possibly
Comment 16 Stuart Parmenter 2008-03-18 02:15:24 PDT
That basically says "don't miss allocation APIs" as best I can tell, so I'm not sure why having posix_memalign is a problem.
Comment 17 Ginn Chen 2008-03-18 02:48:46 PDT
Comparing to jemalloc, libc misses posix_memalign. That's the problem.

e.g. thunderbird will work as "undefined behavior (including likely crashes)" since we didn't put -lxul ahead when linking thunderbird-bin.
Any program that uses libmozjs.so is same.


Comment 18 Stuart Parmenter 2008-03-18 10:11:38 PDT
I don't understand.  If you're saying people using posix_memalign _will_ use jemalloc and people using malloc won't, then I don't understand this patch at all.  
Comment 19 Jason Evans 2008-03-18 10:45:20 PDT
My understanding of Ginn Chen's comments is that if we use posix_memalign() from jemalloc on Solaris, then using LD_PRELOAD to override jemalloc with some other allocator will fail, because jemalloc's posix_memalign() will continue to call into jemalloc, but (for example) free() will call into the replacement allocator.

It just occurred to me that we could solve this problem on Solaris by reversing the way jemalloc implements memalign() and posix_memalign(), so that posix_memalign() calls memalign() rather than the other way around.  That way we can always use posix_memalign(), and if LD_PRELOAD is used to override jemalloc, the posix_memalign() code will call the replacement memalign().

As far as I know, there are no platforms for which this reversal would cause problems, so I expect we can do it unconditionally.
Comment 20 Ginn Chen 2008-03-19 02:37:19 PDT
Jason,
I've tested, reversing the way jemalloc implements memalign() and posix_memalign() would make LD_PRELOAD work.

But we need to make sure jemalloc's memalign() is not inlined into posix_memalign().
We can use  __attribute__ ((noinline)) for GNUC and #pragma noinline for Sun Studio.

I hope it will not be a performance overhead.

Another thing is:
If libjemalloc is out of libxul (using -disable-libxul option), we need to add -ljemalloc when linking libmozjs.so or linking any program that uses libmozjs.so.
Comment 21 Ginn Chen 2008-03-19 02:59:29 PDT
(In reply to comment #20)
> Another thing is:
> If libjemalloc is out of libxul (using -disable-libxul option), we need to add
> -ljemalloc when linking libmozjs.so or linking any program that uses
> libmozjs.so.
> 
Ignore this part, jemalloc will always be part of libxul.

No matter --enable-libxul or --disable-libxul,
we need to add -lxul for linking programs that uses libmozjs.so.
e.g. xpcshell.
Comment 22 Ginn Chen 2008-03-24 03:46:55 PDT
Created attachment 311348 [details] [diff] [review]
patch v4
Comment 23 Ginn Chen 2008-03-31 04:42:19 PDT
Comment on attachment 311348 [details] [diff] [review]
patch v4

Jason, can you give some comments?
Comment 24 Jason Evans 2008-03-31 17:26:03 PDT
Created attachment 312824 [details] [diff] [review]
patch v4a

Attached is a patch derived from patch v4.

Patch v4 does not compile on Linux, due to a lacking prototype for memalign().  The fix is to reorder memalign() and posix_memalign().

There are some minor issues with error condition handling.  memalign() is not required to do validation of the alignment argument, so I moved that code back into posix_memalign() and added an assertion in memalign().  I also adjusted the MALLOC_XMALLOC code, though that is a pedantic change since the code is disabled anyway.

I don't think we need or want to avoid inlining memalign() on Linux, so I adjusted that code to apply only to Solaris.

Ginn, can you please look over the revised patch and make sure it works correctly on Solaris?
Comment 25 Ginn Chen 2008-03-31 23:56:12 PDT
It works great on Solaris. Thank you!

In case someone build it with gcc on Solaris,
I think it would be better if we write it like,

#elif defined(MOZ_MEMORY_SOLARIS)
#if defined(__SUNPRO_C)
void *
memalign(size_t alignment, size_t size);
#pragma no_inline(memalign)
#elif defined(__GNU_C__)
_attribute__((noinline))
#endif
VISIBLE
void *
memalign(size_t alignment, size_t size)
#else
Comment 26 Jason Evans 2008-04-01 14:14:13 PDT
Created attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

Ginn, I think patch v4b addresses your feedback.  Please let me know if I misunderstood your intent.
Comment 27 Ginn Chen 2008-04-01 18:18:58 PDT
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

Yes, it does.
Thanks.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2008-04-28 08:01:49 PDT
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

r=me on the build bits.
Comment 29 Mike Beltzner [:beltzner, not reading bugmail] 2008-04-29 06:45:04 PDT
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

a1.9=beltzner
Comment 30 Ginn Chen 2008-04-30 00:15:16 PDT
Checking in configure.in;
/cvsroot/mozilla/configure.in,v  <--  configure.in
new revision: 1.1989; previous revision: 1.1988
done
Checking in browser/app/Makefile.in;
/cvsroot/mozilla/browser/app/Makefile.in,v  <--  Makefile.in
new revision: 1.154; previous revision: 1.153
done
Checking in config/rules.mk;
/cvsroot/mozilla/config/rules.mk,v  <--  rules.mk
new revision: 3.594; previous revision: 3.593
done
Checking in memory/jemalloc/Makefile.in;
/cvsroot/mozilla/memory/jemalloc/Makefile.in,v  <--  Makefile.in
new revision: 1.8; previous revision: 1.7
done
Checking in memory/jemalloc/jemalloc.c;
/cvsroot/mozilla/memory/jemalloc/jemalloc.c,v  <--  jemalloc.c
new revision: 1.12; previous revision: 1.11
done
Comment 31 Ginn Chen 2008-05-05 00:29:11 PDT
There're 2 issues.
1) When compiling thunderbird, it doesn't have MOZ_ENABLE_LIBXUL, therefore XPCOM_LIBS doesn't have LIBXUL_LIBS.
The result is xpcshell failed to be linked, because posix_memalign is missing.

2) On Solaris/SPARC, we don't have alloca(). I didn't notice bug 420678 used it.
Comment 32 Ginn Chen 2008-05-05 01:16:39 PDT
Created attachment 319343 [details] [diff] [review]
fix thunderbird and seamonkey bustage

Move the definition of SOLARIS_JEMALLOC_LDFLAGS to rules.mk, so that it will always apply to xpcshell and *-bin.
Comment 33 Ginn Chen 2008-05-05 01:31:46 PDT
Created attachment 319344 [details] [diff] [review]
fix bustage on Solaris/SPARC

In alloca.h, we have
#define alloca(x)   __builtin_alloca(x)
Comment 34 Jason Evans 2008-05-05 09:22:17 PDT
Comment on attachment 319344 [details] [diff] [review]
fix bustage on Solaris/SPARC

This alloca() trouble is a pain that I don't want to have to deal with on an ongoing basis.  I'm going to modify the code to not use alloca() at all.
Comment 35 Jason Evans 2008-05-05 10:56:43 PDT
Created attachment 319414 [details] [diff] [review]
Remove alloca() call

This patch removes the alloca() call, thus avoiding the need for portability fixes.  Some platforms have alloca.h, and others define alloca() in stdlib.h.  I think we would have needed to add an autoconf test to solve this problem.
Comment 36 Ginn Chen 2008-05-05 22:52:13 PDT
Comment on attachment 319414 [details] [diff] [review]
Remove alloca() call

removal of alloca is great, since it's machine-, compiler- and system-dependent.

Nit: I think we can also remove
#define alloca _alloca
added by bug 420678.
Comment 37 Jason Evans 2008-05-06 07:39:16 PDT
Created attachment 319569 [details] [diff] [review]
Remove alloca() call (and #define)
[Checkin: Comment 39]

Per comment #36, remove the unneeded alloca #define.  Thanks for catching that, Ginn.
Comment 38 Mike Schroepfer 2008-05-07 12:08:13 PDT
Comment on attachment 319569 [details] [diff] [review]
Remove alloca() call (and #define)
[Checkin: Comment 39]

a+ based on Pav's risk assessment and moz2 meeting today
Comment 39 Ginn Chen 2008-05-07 22:37:13 PDT
Checking in jemalloc.c;
/cvsroot/mozilla/memory/jemalloc/jemalloc.c,v  <--  jemalloc.c
new revision: 1.14; previous revision: 1.13
done
Comment 40 Ginn Chen 2008-05-07 23:40:18 PDT
Created attachment 319931 [details] [diff] [review]
patch revised since bug 418016 reverts to libjemalloc.so
Comment 41 Ginn Chen 2008-05-08 03:41:30 PDT
Comment on attachment 319931 [details] [diff] [review]
patch revised since bug 418016 reverts to libjemalloc.so

Sadly it's not working. Because if we compile xpidl with -ljemalloc, xpidl could not be run at compiling time, since dist/libjemalloc.so is a symbolic link.
Comment 42 Ginn Chen 2008-05-08 04:44:30 PDT
Created attachment 319960 [details] [diff] [review]
fix linkage on Solaris

This is the best fix I can imagine right now.

Add dependence of jemalloc to mozjs, since it uses posix_memalign.
Apply SOLARIS_JEMALLOC_LDFLAGS to every binary that we want to use jemalloc.
Comment 43 Jason Evans 2008-05-08 08:07:01 PDT
I discovered a syntax error that slipped into revision 1.12 of jemalloc.c as part of the commit on 30 April 2008:

#  elif (defined(__GNU_C__)

should be:

#  elif (defined(__GNU_C__))
                           ^

I'm a little surprised this hasn't caused any trouble, but if there are no associated build errors, then we can wait to fix it until the patch in bug #422960 is committed.
Comment 44 Alfred Peng 2008-05-09 10:55:17 PDT
This issue still breaks Firefox 3.0 on Solaris. ask for blocker 1.9.
Comment 45 Jason Evans 2008-05-09 11:19:18 PDT
(In reply to comment #44)
> This issue still breaks Firefox 3.0 on Solaris. ask for blocker 1.9.

What is the current problem you are referring to?  If comment #41 describes the issue at hand, perhaps there is a simple solution, such as passing an appropriate RPATH-related flag during linking.
Comment 46 Mike Schroepfer 2008-05-09 12:56:24 PDT
Alfred does comment 45 work for you?
Comment 47 Alfred Peng 2008-05-09 21:43:47 PDT
Jason, the Solaris tinerboxes(http://tinderbox.mozilla.org/Firefox-Ports/) are still broken as comment #41 mentioned. Ginn, any comments for Jason's suggestion?
Comment 48 Ginn Chen 2008-05-11 22:55:38 PDT
Jason, I think RPATH doesn't work for symbolic links to another directories.

Comment 49 Evan Yan 2008-05-13 20:02:49 PDT
*** Bug 433608 has been marked as a duplicate of this bug. ***
Comment 50 Ted Mielczarek [:ted.mielczarek] 2008-05-23 08:46:02 PDT
Comment on attachment 319960 [details] [diff] [review]
fix linkage on Solaris

Could we do this centrally, in config.mk or rules.mk? It sucks to have that block duplicated in every binary's Makefile.
Comment 51 Ginn Chen 2008-05-25 22:39:54 PDT
Ted, I tried, but then we need an exception for xpidl, as I mentioned in comment #41.

Also I think it's necessary to add -ljemalloc for mozjs any way, because it uses posix_memalign().
Comment 52 Ginn Chen 2008-05-26 00:28:21 PDT
Created attachment 322490 [details] [diff] [review]
alternative patch for linkage
[Checkin: Comment 59]

Use a black list rather than a white list.
Is it look better?
Comment 53 Ted Mielczarek [:ted.mielczarek] 2008-05-27 07:10:07 PDT
Comment on attachment 322490 [details] [diff] [review]
alternative patch for linkage
[Checkin: Comment 59]

Yeah, I like that better.
Comment 54 Ted Mielczarek [:ted.mielczarek] 2008-05-27 07:11:01 PDT
Comment on attachment 319960 [details] [diff] [review]
fix linkage on Solaris

Use the other patch.
Comment 55 Mike Schroepfer 2008-05-27 15:09:54 PDT
Ted we should take this on trunk
Comment 56 Ted Mielczarek [:ted.mielczarek] 2008-05-27 15:13:04 PDT
Was that a question or directive?
Comment 57 Ginn Chen 2008-05-27 20:58:31 PDT
It doesn't affect any platform other than Solaris.
It would be better to commit it on 1.9.0 branch.

Otherwise some people who use source tar ball may have trouble to build it on Solaris.

Comment 58 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-05-28 11:28:19 PDT
Comment on attachment 322490 [details] [diff] [review]
alternative patch for linkage
[Checkin: Comment 59]

Approved, please land by noon PDT to make Firefox 3.0.
Comment 59 :Gavin Sharp [email: gavin@gavinsharp.com] 2008-05-28 11:44:02 PDT
mozilla/browser/app/Makefile.in 	1.156
mozilla/config/rules.mk 	3.595
mozilla/js/src/Makefile.in 	3.125
mozilla/xpcom/typelib/xpidl/Makefile.in 	1.45
mozilla/xpcom/typelib/xpt/tools/Makefile.in 	1.33
Comment 60 timeless 2008-07-20 17:33:39 PDT
re comment 43, bug 446302 reports that this is a real issue. i don't understand why this was left unfixed....
Comment 61 Jason Evans 2008-07-20 21:25:57 PDT
(In reply to comment #60)
> re comment 43, bug 446302 reports that this is a real issue. i don't understand
> why this was left unfixed....

It was fixed in changeset 663c51189e98 on 20 June 2008, as per comment #5 in bug #442960.
Comment 62 timeless 2008-07-21 14:23:16 PDT
it was not fixed in CVS. you have a responsibility to all branches you break, not just central.

Note You need to log in before you can comment on or make changes to this bug.