Closed Bug 422055 Opened 12 years ago Closed 12 years ago

Use jemalloc on OpenSolaris

Categories

(Core :: Memory Allocator, defect)

All
OpenSolaris
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: ginnchen+exoracle, Assigned: ginnchen+exoracle)

References

Details

(Whiteboard: [RC2+])

Attachments

(4 files, 10 obsolete files)

10.79 KB, text/html
Details
11.93 KB, patch
ted
: review+
beltzner
: approval1.9+
Details | Diff | Splinter Review
2.19 KB, patch
mtschrep
: approval1.9+
Details | Diff | Splinter Review
4.55 KB, patch
ted
: review+
shaver
: approval1.9+
Details | Diff | Splinter Review
 
Attached patch draft patch (obsolete) — Splinter Review
There's a little hack.
I need to put -ljemalloc above -lpthread, otherwise pthread leaks.
Because -lpthread is in LDFLAGS, and in rules.mk, LDFALGS is above LIBS, so I have to overwrite LDFLAGS.

Maybe we should introduce a new variable for rules.mk, or use WRAP_MALLOC_LIB and move it above LDFLAGS. I don't know if it works on other platform.

I use sysconf() instead of kstat, since kstat uses malloc, and we could not do malloc in malloc_init().
Attached file test result
I've applied the patch to Firefox 3 beta 4.
Here're test results: (all values are median of 3 tests)

It seems we save less memory on SPARC, I guess it relates to the pagesize. It's 4096 on x86, 8192 on SPARC.
We have a little trade off on performance, I hope we can get it back by using PGO.

*****
There's an issue I didn't solve with draft patch.
There's no posix_memalign() in Solaris libc.
So if the user uses LD_PRELOAD, or embeds Firefox in another app, e.g. yelp, we will leak.
Attached patch patch (obsolete) — Splinter Review
Changes:
1. enable it by default in configure.in

2. On Solaris, libc and other malloc lib has memalign but not posix_memalign.
If we use posix_memalign in jemalloc, and other malloc lib or libc is loaded earlier, we will leak. To avoid this, use memalign instead and hide posix_memalign.
Attachment #308596 - Attachment is obsolete: true
Attachment #309044 - Flags: review?(ted.mielczarek)
Comment on attachment 309044 [details] [diff] [review]
patch

+ifdef MOZ_MEMORY
+ifeq ($(OS_ARCH),SunOS)
+LDFLAGS = -L$(DIST)/lib -ljemalloc @LDFLAGS@
+else
+ifneq ($(OS_ARCH),WINNT)
+LIBS += -ljemalloc
+endif
+endif
+endif

I don't like this. First of all, you should use EXPAND_LIBNAME_PATH here, second of all, I don't like the @LDFLAGS@ expansion there. I realize you need to get the ordering correct, but I think there must be a better way to do it.

Also, you'll need review from other people on the js and jemalloc portions of this patch.
Attachment #309044 - Flags: review?(ted.mielczarek) → review-
Right, I don't like it, either.

So, should I add something like SOLARIS_JEMALLOC_LIBS to rules.mk?  
That would be better, I think.
Attached patch patch v3 (obsolete) — Splinter Review
Attachment #309044 - Attachment is obsolete: true
Attachment #309932 - Flags: review?(ted.mielczarek)
Attachment #309932 - Flags: review?(pavlov)
The change from using kstat_*() to sysconf() to get the number of processors looks good.

The patch in 422960 will cause a conflict with the patch in this bug, but resolution should be obvious.

The patch in bug 418016 integrated jemalloc into libxul, which will impact the build-related aspects of the patch in this bug.  Also, earlier comments in this bug refer to "leaking" if multiple memory allocators are used, but I would expect undefined behavior (including likely crashes) if we mix allocator use.  

I'm not very happy about the prospect of burdening all users of posix_memalign() in mozilla with having to conditionally use memalign() instead.  Is this really necessary for standard use cases like yelp?  If it is strictly to make LD_PRELOAD of alternate malloc libraries work, I'm not convinced that it is worth the ongoing maintenance burden.
on windows we do have multiple allocators and it has to work (and it generally  does). on unix multiple allocators should be made to work (the only cases i know of today don't work because they're buggy, and those are IMEs).
My comments are specific to run-time selection of memory allocator, whereas I think you (timeless) are referring to compile-time selection.  I agree that nothing should be done to prevent compile-time selection, but I am questioning the value of requiring developers to write code like the js/src/jsgc.c portion of the patch in this bug everywhere that aligned memory is allocated.  It may be that a simpler solution is to just use memalign() in all cases.
Comment on attachment 309932 [details] [diff] [review]
patch v3

I agree with Jason about the posix_memalign change.  That should be fixed differently. (i.e. not using POSIX_MEMALIGN) (there is code there to do other things already (mmap, oversized allcs, etc).

I don't understand this part:
+#elif defined(MOZ_MEMORY_SOLARIS)
+inline int
+posix_memalign(void **memptr, size_t alignment, size_t size)

Shouldn't be needed.
Attachment #309932 - Flags: review?(pavlov) → review-
Stuart,
I'm trying to hide posix_memalign, just like using __hidden keyword, therefore any problem that links to libjemalloc (or libxul) won't use posix_memalign and libc free together.

Jason,
If js/src/jsgc.c uses memalign() in all cases, should I add inline before posix_memalign in jemalloc.c ?

Ginn: I don't know how the dynamic loader on Solaris works, but on Linux linking will cause everything to use jemalloc's free, malloc, posix_memalign.  That is how things should work.  If we need to hook it up on solaris differently we should do that but we shouldn't hide the symbol.
We want jsgc.c to use posix_memalign, not memalign.  It should be using the posix_memalign from jemalloc on Solaris.
If jsgc.c uses posix_memalign, that would be a problem on Solaris, if any malloc library or evan libc library is loaded earlier.

It would work for Firefox because we put -lxul before any other libraries.
But it wouldn't work for yelp or LD_PRELOAD case.

Here's a blog on this.
http://blogs.sun.com/quenelle/entry/malloc_interposition_can_t_possibly
That basically says "don't miss allocation APIs" as best I can tell, so I'm not sure why having posix_memalign is a problem.
Comparing to jemalloc, libc misses posix_memalign. That's the problem.

e.g. thunderbird will work as "undefined behavior (including likely crashes)" since we didn't put -lxul ahead when linking thunderbird-bin.
Any program that uses libmozjs.so is same.


I don't understand.  If you're saying people using posix_memalign _will_ use jemalloc and people using malloc won't, then I don't understand this patch at all.  
My understanding of Ginn Chen's comments is that if we use posix_memalign() from jemalloc on Solaris, then using LD_PRELOAD to override jemalloc with some other allocator will fail, because jemalloc's posix_memalign() will continue to call into jemalloc, but (for example) free() will call into the replacement allocator.

It just occurred to me that we could solve this problem on Solaris by reversing the way jemalloc implements memalign() and posix_memalign(), so that posix_memalign() calls memalign() rather than the other way around.  That way we can always use posix_memalign(), and if LD_PRELOAD is used to override jemalloc, the posix_memalign() code will call the replacement memalign().

As far as I know, there are no platforms for which this reversal would cause problems, so I expect we can do it unconditionally.
Jason,
I've tested, reversing the way jemalloc implements memalign() and posix_memalign() would make LD_PRELOAD work.

But we need to make sure jemalloc's memalign() is not inlined into posix_memalign().
We can use  __attribute__ ((noinline)) for GNUC and #pragma noinline for Sun Studio.

I hope it will not be a performance overhead.

Another thing is:
If libjemalloc is out of libxul (using -disable-libxul option), we need to add -ljemalloc when linking libmozjs.so or linking any program that uses libmozjs.so.
(In reply to comment #20)
> Another thing is:
> If libjemalloc is out of libxul (using -disable-libxul option), we need to add
> -ljemalloc when linking libmozjs.so or linking any program that uses
> libmozjs.so.
> 
Ignore this part, jemalloc will always be part of libxul.

No matter --enable-libxul or --disable-libxul,
we need to add -lxul for linking programs that uses libmozjs.so.
e.g. xpcshell.
Attached patch patch v4 (obsolete) — Splinter Review
Attachment #309932 - Attachment is obsolete: true
Attachment #309932 - Flags: review?(ted.mielczarek)
Comment on attachment 311348 [details] [diff] [review]
patch v4

Jason, can you give some comments?
Attachment #311348 - Flags: review?(jasone)
Attached patch patch v4a (obsolete) — Splinter Review
Attached is a patch derived from patch v4.

Patch v4 does not compile on Linux, due to a lacking prototype for memalign().  The fix is to reorder memalign() and posix_memalign().

There are some minor issues with error condition handling.  memalign() is not required to do validation of the alignment argument, so I moved that code back into posix_memalign() and added an assertion in memalign().  I also adjusted the MALLOC_XMALLOC code, though that is a pedantic change since the code is disabled anyway.

I don't think we need or want to avoid inlining memalign() on Linux, so I adjusted that code to apply only to Solaris.

Ginn, can you please look over the revised patch and make sure it works correctly on Solaris?
Attachment #311348 - Attachment is obsolete: true
Attachment #311348 - Flags: review?(jasone)
It works great on Solaris. Thank you!

In case someone build it with gcc on Solaris,
I think it would be better if we write it like,

#elif defined(MOZ_MEMORY_SOLARIS)
#if defined(__SUNPRO_C)
void *
memalign(size_t alignment, size_t size);
#pragma no_inline(memalign)
#elif defined(__GNU_C__)
_attribute__((noinline))
#endif
VISIBLE
void *
memalign(size_t alignment, size_t size)
#else
Attachment #312824 - Flags: review?(ted.mielczarek)
Ginn, I think patch v4b addresses your feedback.  Please let me know if I misunderstood your intent.
Attachment #312824 - Attachment is obsolete: true
Attachment #312824 - Flags: review?(ted.mielczarek)
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

Yes, it does.
Thanks.
Attachment #312959 - Flags: review?(ted.mielczarek)
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

r=me on the build bits.
Attachment #312959 - Flags: review?(ted.mielczarek) → review+
Attachment #312959 - Flags: approval1.9?
Comment on attachment 312959 [details] [diff] [review]
patch v4b
[Checkin: Comment 30]

a1.9=beltzner
Attachment #312959 - Flags: approval1.9? → approval1.9+
Checking in configure.in;
/cvsroot/mozilla/configure.in,v  <--  configure.in
new revision: 1.1989; previous revision: 1.1988
done
Checking in browser/app/Makefile.in;
/cvsroot/mozilla/browser/app/Makefile.in,v  <--  Makefile.in
new revision: 1.154; previous revision: 1.153
done
Checking in config/rules.mk;
/cvsroot/mozilla/config/rules.mk,v  <--  rules.mk
new revision: 3.594; previous revision: 3.593
done
Checking in memory/jemalloc/Makefile.in;
/cvsroot/mozilla/memory/jemalloc/Makefile.in,v  <--  Makefile.in
new revision: 1.8; previous revision: 1.7
done
Checking in memory/jemalloc/jemalloc.c;
/cvsroot/mozilla/memory/jemalloc/jemalloc.c,v  <--  jemalloc.c
new revision: 1.12; previous revision: 1.11
done
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: General → jemalloc
QA Contact: general → jemalloc
There're 2 issues.
1) When compiling thunderbird, it doesn't have MOZ_ENABLE_LIBXUL, therefore XPCOM_LIBS doesn't have LIBXUL_LIBS.
The result is xpcshell failed to be linked, because posix_memalign is missing.

2) On Solaris/SPARC, we don't have alloca(). I didn't notice bug 420678 used it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Move the definition of SOLARIS_JEMALLOC_LDFLAGS to rules.mk, so that it will always apply to xpcshell and *-bin.
Attachment #319343 - Flags: review?(ted.mielczarek)
Attached patch fix bustage on Solaris/SPARC (obsolete) — Splinter Review
In alloca.h, we have
#define alloca(x)   __builtin_alloca(x)
Attachment #319344 - Flags: review?(jasone)
Comment on attachment 319344 [details] [diff] [review]
fix bustage on Solaris/SPARC

This alloca() trouble is a pain that I don't want to have to deal with on an ongoing basis.  I'm going to modify the code to not use alloca() at all.
Attached patch Remove alloca() call (obsolete) — Splinter Review
This patch removes the alloca() call, thus avoiding the need for portability fixes.  Some platforms have alloca.h, and others define alloca() in stdlib.h.  I think we would have needed to add an autoconf test to solve this problem.
Attachment #319344 - Attachment is obsolete: true
Attachment #319414 - Flags: review?(ginn.chen)
Attachment #319344 - Flags: review?(jasone)
Comment on attachment 319414 [details] [diff] [review]
Remove alloca() call

removal of alloca is great, since it's machine-, compiler- and system-dependent.

Nit: I think we can also remove
#define alloca _alloca
added by bug 420678.
Attachment #319414 - Flags: review?(ginn.chen)
Attachment #319414 - Flags: review+
Attachment #319414 - Flags: approval1.9?
Per comment #36, remove the unneeded alloca #define.  Thanks for catching that, Ginn.
Attachment #319414 - Attachment is obsolete: true
Attachment #319414 - Flags: approval1.9?
Attachment #319569 - Flags: approval1.9?
Comment on attachment 319569 [details] [diff] [review]
Remove alloca() call (and #define)
[Checkin: Comment 39]

a+ based on Pav's risk assessment and moz2 meeting today
Attachment #319569 - Flags: approval1.9? → approval1.9+
Checking in jemalloc.c;
/cvsroot/mozilla/memory/jemalloc/jemalloc.c,v  <--  jemalloc.c
new revision: 1.14; previous revision: 1.13
done
Attachment #319343 - Attachment is obsolete: true
Attachment #319931 - Flags: review?(ted.mielczarek)
Attachment #319343 - Flags: review?(ted.mielczarek)
Comment on attachment 319931 [details] [diff] [review]
patch revised since bug 418016 reverts to libjemalloc.so

Sadly it's not working. Because if we compile xpidl with -ljemalloc, xpidl could not be run at compiling time, since dist/libjemalloc.so is a symbolic link.
Attachment #319931 - Attachment is obsolete: true
Attachment #319931 - Flags: review?(ted.mielczarek)
Attached patch fix linkage on Solaris (obsolete) — Splinter Review
This is the best fix I can imagine right now.

Add dependence of jemalloc to mozjs, since it uses posix_memalign.
Apply SOLARIS_JEMALLOC_LDFLAGS to every binary that we want to use jemalloc.
Attachment #319960 - Flags: review?(ted.mielczarek)
I discovered a syntax error that slipped into revision 1.12 of jemalloc.c as part of the commit on 30 April 2008:

#  elif (defined(__GNU_C__)

should be:

#  elif (defined(__GNU_C__))
                           ^

I'm a little surprised this hasn't caused any trouble, but if there are no associated build errors, then we can wait to fix it until the patch in bug #422960 is committed.
This issue still breaks Firefox 3.0 on Solaris. ask for blocker 1.9.
Flags: blocking1.9?
(In reply to comment #44)
> This issue still breaks Firefox 3.0 on Solaris. ask for blocker 1.9.

What is the current problem you are referring to?  If comment #41 describes the issue at hand, perhaps there is a simple solution, such as passing an appropriate RPATH-related flag during linking.
Alfred does comment 45 work for you?
Flags: blocking1.9? → blocking1.9-
Whiteboard: [not needed for 1.9]
Jason, the Solaris tinerboxes(http://tinderbox.mozilla.org/Firefox-Ports/) are still broken as comment #41 mentioned. Ginn, any comments for Jason's suggestion?
Jason, I think RPATH doesn't work for symbolic links to another directories.

Duplicate of this bug: 433608
Comment on attachment 319960 [details] [diff] [review]
fix linkage on Solaris

Could we do this centrally, in config.mk or rules.mk? It sucks to have that block duplicated in every binary's Makefile.
Ted, I tried, but then we need an exception for xpidl, as I mentioned in comment #41.

Also I think it's necessary to add -ljemalloc for mozjs any way, because it uses posix_memalign().
Use a black list rather than a white list.
Is it look better?
Attachment #322490 - Flags: review?(ted.mielczarek)
Comment on attachment 322490 [details] [diff] [review]
alternative patch for linkage
[Checkin: Comment 59]

Yeah, I like that better.
Attachment #322490 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 319960 [details] [diff] [review]
fix linkage on Solaris

Use the other patch.
Attachment #319960 - Flags: review?(ted.mielczarek) → review-
Whiteboard: [not needed for 1.9] → [not needed for 1.9?][RC2?][has patch][has ted review]
Ted we should take this on trunk
Was that a question or directive?
It doesn't affect any platform other than Solaris.
It would be better to commit it on 1.9.0 branch.

Otherwise some people who use source tar ball may have trouble to build it on Solaris.

Comment on attachment 322490 [details] [diff] [review]
alternative patch for linkage
[Checkin: Comment 59]

Approved, please land by noon PDT to make Firefox 3.0.
Attachment #322490 - Flags: approval1.9? → approval1.9+
Whiteboard: [not needed for 1.9?][RC2?][has patch][has ted review] → [not needed for 1.9?][RC2+][has patch][has ted review]
mozilla/browser/app/Makefile.in 	1.156
mozilla/config/rules.mk 	3.595
mozilla/js/src/Makefile.in 	3.125
mozilla/xpcom/typelib/xpidl/Makefile.in 	1.45
mozilla/xpcom/typelib/xpt/tools/Makefile.in 	1.33
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: [not needed for 1.9?][RC2+][has patch][has ted review] → [RC2+]
Target Milestone: --- → mozilla1.9
Attachment #312959 - Attachment description: patch v4b → patch v4b [Checkin: Comment 30]
Attachment #319569 - Attachment description: Remove alloca() call (and #define) → Remove alloca() call (and #define) [Checkin: Comment 39]
Attachment #322490 - Attachment description: alternative patch for linkage → alternative patch for linkage [Checkin: Comment 59]
Attachment #319960 - Attachment is obsolete: true
Blocks: 446302
re comment 43, bug 446302 reports that this is a real issue. i don't understand why this was left unfixed....
(In reply to comment #60)
> re comment 43, bug 446302 reports that this is a real issue. i don't understand
> why this was left unfixed....

It was fixed in changeset 663c51189e98 on 20 June 2008, as per comment #5 in bug #442960.
it was not fixed in CVS. you have a responsibility to all branches you break, not just central.
Blocks: 586962
You need to log in before you can comment on or make changes to this bug.