Closed Bug 435683 Opened 17 years ago Closed 14 years ago

libjemalloc deadlocks with LD_PRELOAD libraries

Categories

(Core :: Memory Allocator, defect)

Product:

Component:

Platform:

All

Linux

Type:

defect

Priority:

Not set

Severity:

major

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla2.0b5

Tracking Flags:

Tracking

Status

blocking2.0

---

-

People

(Reporter: josh, Assigned: glandium)

References

Details

Attachments

(2 files, 4 obsolete files)

a possible patch 17 years ago Mike Hommey [:glandium] 2.93 KB, patch		Details \| Diff \| Splinter Review
a possible patch 17 years ago Mike Hommey [:glandium] 2.94 KB, patch		Details \| Diff \| Splinter Review
preloadtest.c 15 years ago Peter Åstrand 1.69 KB, text/plain		Details
Use syscall for mmap and munmap 15 years ago Mike Hommey [:glandium] 2.93 KB, patch	jasone : review+	Details \| Diff \| Splinter Review
Disable ncpus with MOZ_MEMORY_NARENAS_DEFAULT_ONE 15 years ago Mike Hommey [:glandium] 1.19 KB, patch		Details \| Diff \| Splinter Review
Use syscall() for mmap and munmap and disable ncpus in jemalloc to work around deadlocks with LD_PRELOADed libraries 14 years ago Mike Hommey [:glandium] 4.41 KB, patch	jasone : review+ benjamin : approval2.0+	Details \| Diff \| Splinter Review

Reporter

Description

•

17 years ago

Please see http://bugs.gentoo.org/show_bug.cgi?id=213080 and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=475166 for background on this bug. libjemalloc's initialization calls mmap to get memory. aoss overrides mmap so it can handle mmap'ed OSS devices, and the aoss mmap calls calloc. This causes a deadlock. Solving this deadlock seems non-trivial. The analysis in the Gentoo bug points out that using dlopen/dlsym to obtain the real mmap won't work, because those allocate memory. As a workaround, perhaps libjemalloc could have a small pool of global memory with which to satisfy early requests, and then it could call mmap when that pool runs out?

Updated

•

17 years ago

Component: General → jemalloc

Product: Firefox → Core

QA Contact: general → jemalloc

Version: 3.0 Branch → 1.9.0 Branch

Mike Hommey [:glandium]

Assignee

Comment 1

•

17 years ago

The analysis in the gentoo bug is half wrong. The problem is with open(), not mmap(). And the problem is that libjemalloc does open() /proc/cpuinfo to know how many processors there are, which can be dangerous when open is caught by something else than libc... others functions that are used during jemalloc initialization are getenv and readlink. Note it does avoid using open() to read /etc/malloc.conf, which is supposed to be a symbolic link pointing nowhere, where the linked path is the set of malloc options (which just sounds awful). I don't know if this has been done specifically to avoid open() or not, but it sounds like it. The same code as for solaris for malloc_ncpus should just work. I'll test that.

Mike Hommey [:glandium]

Assignee

Comment 2

•

17 years ago

Erf, sysconf seems to do what malloc_ncpus did: #0 0x00007fc6c4c7c384 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fc6c4c77c5c in _L_lock_1054 () from /lib/libpthread.so.0 #2 0x00007fc6c4c77b30 in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x00007fc6c597987d in malloc_init_hard () at jemalloc.c:5201 #4 0x00007fc6c597a114 in malloc_init () at jemalloc.c:5180 #5 0x00007fc6c597a50e in malloc (size=140491697409504) at jemalloc.c:5732 #6 0x00007fc6c517278a in __fopen_internal () from /lib/libc.so.6 #7 0x00007fc6c51dcf12 in get_nprocs () from /lib/libc.so.6 #8 0x00007fc6c51ae12d in sysconf () from /lib/libc.so.6

Mike Hommey [:glandium]

Assignee

Comment 3

•

17 years ago

Note this happens not only with aoss, but with libtrash, too (for the same reasons)

Mike Hommey [:glandium]

Assignee

Comment 4

•

17 years ago

Also note it does end up blocking on mmap with aoss, but that is after you make the open non-blocking first.

Mike Hommey [:glandium]

Assignee

Comment 5

•

17 years ago

Attached patch a possible patch (obsolete) — Details — Splinter Review

No idea who should be reviewing this. The idea is to have an arbitrarily sized static buffer to allocate memory from when jemalloc is still initializing itself. realloc() is not taken care of here, but until some LD_PRELOADable library does use realloc in either open or mmap, it should be safe. The gotos are put there to avoid using __builtin_expect and help with branch prediction. The static buffer is 64k. On amd64, dlsym needs 32 bytes, and aoss initialization takes 8k. 64k should be too big already.

Assignee: nobody → mh+mozilla

Status: NEW → ASSIGNED

Mike Hommey [:glandium]

Assignee

Comment 6

•

17 years ago

Attached patch a possible patch (obsolete) — Details — Splinter Review

This one actually builds...

Attachment #327317 - Attachment is obsolete: true

Reed Loden [:reed]

Updated

•

17 years ago

Attachment #327318 - Flags: review?(pavlov)

Attachment #327318 - Flags: review?(jasone)

Comment 7

•

17 years ago

(In reply to comment #5) > Created an attachment (id=327317) [details] > a possible patch > > No idea who should be reviewing this. The idea is to have an arbitrarily sized > static buffer to allocate memory from when jemalloc is still initializing > itself. > try stuart or jasone

Comment 8

•

17 years ago

Comment on attachment 327318 [details] [diff] [review] a possible patch I don't think this patch should be used. First off, if we were to use an approach of this nature, I think it would be better to combine malloc_initialized and malloc_initializing into a single multi-state variable, in order to avoid extra branches during allocation (in the common case, where initialization is completed). Second, we don't actually use ncpus for anything right now, so a simpler solution would be to completely disable all ncpus-related code if MOZ_MEMORY_NARENAS_DEFAULT_ONE is defined. As for the readlink() call for /etc/malloc.conf, we don't really need that, so if it's a problem, we can just remove it.

Attachment #327318 - Flags: review?(jasone) → review-

Comment 9

•

17 years ago

Does aoss really wrap mmap(2) with a function that calls malloc(3)? If so, why not change aoss to use a static allocation buffer instead of modifying malloc?

Comment 10

•

17 years ago

(In reply to comment #1) > Note it does avoid using open() to read > /etc/malloc.conf, which is supposed to be a symbolic link pointing nowhere, > where the linked path is the set of malloc options (which just sounds awful). I > don't know if this has been done specifically to avoid open() or not, but it > sounds like it. This is a performance hack that jemalloc inherited from phkmalloc (the previous FreeBSD malloc implementation). The hack provides a way to get configurable system-wide malloc configuration with only a single system call (readlink() vs. open()+read()+close()). As noted in comment #8, we don't really need this hack in the browser.

Mike Hommey [:glandium]

Assignee

Comment 11

•

17 years ago

Note 2 things: - ncpus does change the number of arenas, - hardsetting ncpus to 1 solves the open() issues, but *not* the mmap() one with aoss, so the static memory pool approach is still necessary. Since the static memory pool is still necessary, there is no need to disable ncpus. Also note that there doesn't seem to be any place where having the two variables as a single multi-state variable would save extra branches during allocation.

Mike Hommey [:glandium]

Assignee

Comment 12

•

17 years ago

(In reply to comment #9) > Does aoss really wrap mmap(2) with a function that calls malloc(3)? If so, why > not change aoss to use a static allocation buffer instead of modifying malloc? 1. Because aoss is not alone doing this kind of stuff 2. Because aoss allocates the memory it will return for the mmap(2) call with malloc.

Stuart Parmenter

Updated

•

17 years ago

Attachment #327318 - Flags: review?(pavlov)

Arun Prasannan (polar)

Comment 13

•

16 years ago

Is this related? (Mozilla/5.0 (X11; U; Linux i686; rv:1.9.1b2pre) Gecko/20081111 SeaMonkey/2.0a2pre) #0 0xb8044f74 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0xb8040878 in _L_lock_95 () from /lib/libpthread.so.0 #2 0xb804026a in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x08049f89 in malloc_mutex_lock (mutex=0xfffffe00) at jemalloc.c:1396 #4 0x08054ae3 in malloc_init_hard () at jemalloc.c:5379 #5 0x08055745 in malloc_init () at jemalloc.c:5358 #6 0x08055e9a in calloc (num=1, size=20) at jemalloc.c:6094 #7 0xb7d5f31c in _dlerror_run () from /lib/libdl.so.2 #8 0xb7d5ed6c in dlsym () from /lib/libdl.so.2 #9 0xb80749f7 in getLibraryFunction () from /lib/libsafe.so.2 #10 0xb8074e57 in memcpy () from /lib/libsafe.so.2 #11 0x08054b93 in malloc_init_hard () at jemalloc.c:5453 #12 0x08055745 in malloc_init () at jemalloc.c:5358 #13 0x08055e9a in calloc (num=1, size=20) at jemalloc.c:6094 #14 0xb7d5f31c in _dlerror_run () from /lib/libdl.so.2 #15 0xb7d5ed6c in dlsym () from /lib/libdl.so.2 #16 0xb80749f7 in getLibraryFunction () from /lib/libsafe.so.2 #17 0xb8074e57 in memcpy () from /lib/libsafe.so.2 #18 0xb8046737 in __libc_sigaction () from /lib/libpthread.so.0 #19 0xb803d4f1 in __pthread_initialize_minimal_internal () from /lib/libpthread.so.0 #20 0xb803cd48 in _init () from /lib/libpthread.so.0 #21 0xb8088ca0 in call_init () from /lib/ld-linux.so.2 #22 0xb8088e27 in _dl_init_internal () from /lib/ld-linux.so.2 #23 0xb807a84f in _dl_start_user () from /lib/ld-linux.so.2 Works with LD_PRELOAD=/lib/libc.so.6

Arun Prasannan (polar)

Comment 14

•

16 years ago

Oops. Sorry, I didn't mean to commit that. The answer to my question is no; removing libsafe helped.

Mike Hommey [:glandium]

Assignee

Comment 15

•

16 years ago

(In reply to comment #14) > Oops. > > Sorry, I didn't mean to commit that. The answer to my question is no; removing > libsafe helped. The answer is actually yes, libsafe triggers the same deadlock in jemalloc that aoss does.

Arun Prasannan (polar)

Comment 16

•

16 years ago

In which case, bug 462686 is a duplicate?

Comment 18

•

16 years ago

As the poster for bug #462686, I can report that firefox-3.1b2 WORKSFORME even when libsafe is loaded on the system (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2). Thanks to all for the work on this.

Comment 19

•

16 years ago

Like the now marked duplicate bug #462686, the latest version beta version of firefox (firefox-3.1b3) once again fails to load when libsafe is used as an LD_PRELOAD condition. Steps to reproduce: 1) Create the file, ld.so.preload, in /etc and put the line, '/lib/libsafe.so.2' in it. Or put the environmental variable, LD_PRELOAD=/lib/libsafe.so.2, into the shell before executing the firefox-3.1b3 program from the command line. 2) Attempt to start FF 3.1b3 (e.g. LD_PRELOAD=/lib/libsafe.so.2 /usr/lib/firefox-3.1b3/firefox -P beta3.1) Actual results: FF3.1b3 doesn't show, except that top shows that the loading of FF3.1b3 has partially loaded. Expected Results: FF3.1b3 to show up. Workaround: If libsafe is present in the /etc file, ld.so.preload, FF3.1b3 can be started successfully by starting it with the environmental variable LD_PRELOAD to '/lib/libc.so.6' (e.g., LD_PRELOAD=/lib/libc.so.6 /usr/lib/firefox-3.1b3/firefox -P beta3.1).

Comment 20

•

16 years ago

Forgot to provide the stack backtrace from gdb when FF3.1b3 locks up (i.e., FF3.1b3 is invoked without the LD_PRELOAD=/lib/libc.so.6 being used). The backtrace is as follows: (gdb) bt #0 0xb7fa8f74 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0xb7fa4878 in _L_lock_95 () from /lib/libpthread.so.0 #2 0xb7fa426a in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x0804f79b in ?? () #4 0x08056bc0 in ?? () #5 0x00000000 in ?? () Looks like the problem is, once again, at the __lll_lock_wait function (as was the case with bug #462686).

Comment 21

•

16 years ago

Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b4pre) Gecko/20090319 Shiretoko/3.5b4pre Tested out the b4 pre-build, and it exhibits the same problem as the 3.1b3 builds reported above. The system this was tested on (and all previous reports) was Slackware 12.2.

Comment 22

•

16 years ago

FF3.5b4 still locks up at the same point as FF3.1b3 (see Comment #20) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b4) Gecko/20090423 Firefox/3.5b4 Work Around: see Comment #19

Comment 23

•

16 years ago

FF3.5rc{1,2} both lock up at the _same_ point as reported above. Would really like to see FF3.5 working with libsafe because _everytime_ I've run FF3.5 without libsafe, my computer locks up after a few hours. And this _only_ occurs when I run FF3.5 (and, previously, FF3.1) without libsafe. Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090616 Firefox/3.5 Work Around: see Comment #19

Karl Tomlinson (:karlt back Feb 10)

Comment 24

•

16 years ago

From http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=475166: #4 0x00002aaaabedf205 in calloc (num=1024, size=8) at jemalloc.c:5215 #5 0x00002aaaaacc853a in ?? () from /usr/lib/libaoss.so #6 0x00002aaaaacc9205 in open () from /usr/lib/libaoss.so #7 0x00002aaaabedcf30 in malloc_init_hard () at jemalloc.c:4931 |open| should be async-signal-safe (according to POSIX). If there is an implementation of open that calls calloc, isn't that a bug?

Comment 28

•

16 years ago

*** Bug 503206 is a duplicate of this ???

Comment 30

•

15 years ago

Any news re this ? . And FYI the Arch forum newbie thread 'Firefox 3.5 will not start' has 2822 hits as of todays date. So affects many ! BW Chris.

Karl Tomlinson (:karlt back Feb 10)

Comment 31

•

15 years ago

Has a bug been filed at http://www.alsa-project.org/ re open in aoss?

Comment 32

•

15 years ago

>Has a bug been filed at http://www.alsa-project.org/ re open in aoss? That doesn't make sense, since the bug is really in libjemalloc. Until this bug is solved, Firefox with libjemalloc will deadlock with pretty much *any* LD_PRELOAD library. In addition to aoss, libsafe, cyclades-serial-client, and libtrash, this also includes libpulsedsp.so which is used by the PulseAudio "padsp", as well as most commercial LD_PRELOAD libraries. This includes typical Sun Ray and ThinLinc environments. This bug is pretty critical, IMHO. RHEL isn't yet building with libjemalloc, but Fedora, Novell SLED as well as the binaries from firefox.com are affected.

Reed Loden [:reed]

Updated

•

15 years ago

blocking2.0: --- → ?

See Also: → http://bugs.gentoo.org/show_bug.cgi?id=213080

Summary: libjemalloc deadlocks with aoss → libjemalloc deadlocks with LD_PRELOAD libraries

Version: 1.9.0 Branch → unspecified

Reed Loden [:reed]

Comment 34

•

15 years ago

Mike, can we get a new patch for jemalloc based on review comments?

Mike Hommey [:glandium]

Assignee

Comment 35

•

15 years ago

(In reply to comment #34) > Mike, can we get a new patch for jemalloc based on review comments? See comment #11

Reed Loden [:reed]

Comment 36

•

15 years ago

Comment on attachment 327318 [details] [diff] [review] a possible patch Very well. Jason, can you take another look at this patch based on Mike's response in comment #11 to your original review?

Attachment #327318 - Flags: review- → review?(jasone)

Comment 37

•

15 years ago

I (In reply to comment #36) > (From update of attachment 327318 [details] [diff] [review]) > Very well. Jason, can you take another look at this patch based on Mike's > response in comment #11 to your original review? Regarding extra branches, malloc_initialized *and* malloc_initializing are checked for every allocation, so indeed there are extra branches for every allocation introduced by this patch. I came up with a solution to the problem last summer that I integrated into standalone jemalloc: http://www.canonware.com/cgi-bin/hg_jemalloc/rev/20ff57483b5f Recursive allocation is handled without any extra branches in the fast path, and no static buffer is used. Regarding aoss calling malloc(3) from inside an mmap(2) system call wrapper, let me state things more bluntly than in comment #9: aoss is messing with low level system call interfaces, and it is aoss's responsibility to assure that it does not directly introduce bootstrapping issues. This is an aoss problem, not a jemalloc problem.

Reed Loden [:reed]

Comment 38

•

15 years ago

(In reply to comment #37) > Regarding aoss calling malloc(3) from inside an mmap(2) system call wrapper, > let me state things more bluntly than in comment #9: aoss is messing with low > level system call interfaces, and it is aoss's responsibility to assure that it > does not directly introduce bootstrapping issues. This is an aoss problem, not > a jemalloc problem. It's been pointed out in this bug and in the various dupes that aoss isn't the only one that has problems with jemalloc and LD_PRELOAD. Do all the libraries affected have the same problem, or are there really two problems here -- one in jemalloc and one in aoss?

Mike Hommey [:glandium]

Assignee

Comment 39

•

15 years ago

(In reply to comment #37) > I came up with a solution to the problem > last summer that I integrated into standalone jemalloc: > > http://www.canonware.com/cgi-bin/hg_jemalloc/rev/20ff57483b5f This changeset changes too many unrelated things, would you mind providing a minimal patch ? > Recursive allocation is handled without any extra branches in the fast path, > and no static buffer is used. > > Regarding aoss calling malloc(3) from inside an mmap(2) system call wrapper, > let me state things more bluntly than in comment #9: aoss is messing with low > level system call interfaces, and it is aoss's responsibility to assure that it > does not directly introduce bootstrapping issues. This is an aoss problem, not > a jemalloc problem. Reality is that there are LD_PRELOAD libraries that are doing it, and that you're not going to instantly fix them. Which means there is no other way than fixing the issue in Firefox if you want your users to stop complaining.

Comment 40

•

15 years ago

I don't know the internal details of aoss, but I trust comment #1 which states that it is open() that is the problem - not mmap(). I would say that at this point, nothing indicates that we have multiple problems. The fundamental bug is that when using jemalloc, LD_PRELOAD libraries can't do much at all. This includes calling dlsym(). Without access to such functions, I don't see how preload libraries have any chance at all of doing their job properly. I think that the problem should be viewed from an API side: Many functions of the POSIX API needs to allocate memory. This includes, but is not limited to, dlsym(). What jemalloc effectively does is to require that open() never allocates anything. AFAIK, POSIX have no such requirements, so the jemalloc demands can't be justified.

Mike Hommey [:glandium]

Assignee

Comment 41

•

15 years ago

(In reply to comment #40) > I don't know the internal details of aoss, but I trust comment #1 which states > that it is open() that is the problem - not mmap(). See comment #4

Comment 42

•

15 years ago

(In reply to comment #38) > It's been pointed out in this bug and in the various dupes that aoss isn't the > only one that has problems with jemalloc and LD_PRELOAD. Do all the libraries > affected have the same problem, or are there really two problems here -- one in > jemalloc and one in aoss? Yes, there are two separate problems: 1) The version of jemalloc in Firefox cannot handle recursive allocation caused by calls to system library interfaces such as sysctl(2). 2) aoss calls malloc(3) from within an mmap(2) wrapper, which is a major problem because it means a low level allocation function is recursing into a high level allocation function. We can (and should) fix (1), but (2) isn't our fault, and it should be fixed at the source of the problem (aoss). I can eventually create a patch for this bug, but I am terribly busy right now (1-week-old son, prepping home for sale, graduating, moving 1000 miles in 2 weeks, starting a new job, etc.), so if this is a burning issue, someone else will have to look through the standalone jemalloc source code and extract the relevant parts.

Comment 43

•

15 years ago

Attached file preloadtest.c — Details

The problem is not specific to malloc(). I've attached a minimal LD_PRELOAD test library which demonstrates that the problem exists with open() as well. With Firefox and jemalloc, it hangs during open of /proc/cpuinfo. If this library needs to be "fixed", please tell me how.

Mike Hommey [:glandium]

Assignee

Comment 44

•

15 years ago

(In reply to comment #43) > Created an attachment (id=415831) [details] > preloadtest.c > > The problem is not specific to malloc(). I've attached a minimal LD_PRELOAD > test library which demonstrates that the problem exists with open() as well. > With Firefox and jemalloc, it hangs during open of /proc/cpuinfo. If this > library needs to be "fixed", please tell me how. Nothing new here, this is what happens with aoss and others: jemalloc initialization calls open, which ends up being the one in the LD_PRELOADed library, which calls dlsym, which calls malloc, which ends up in jemalloc, which locks, since it is initializing.

Karl Tomlinson (:karlt back Feb 10)

Comment 45

•

15 years ago

Comment on attachment 415831 [details] preloadtest.c >int >open(const char *pathname, int flags, ...) >{ > static int (*func) (const char *, int, mode_t) = NULL; > va_list args; > mode_t mode; > > fprintf(stderr, "preloadtest: open: %s\n", pathname); > > if (!func) { > func = (int (*)(const char *, int, mode_t)) dlsym(REAL_LIBC, "open"); |open| must be async-signal-safe. See man 7 signal. If it is not async-signal-safe, then other problems will also result. dlsym is not async-signal-safe, so cannot be called from open. I'm only guessing, but I wonder whether calling dlsym from an __attribute__((constructor)) function and keeping the result may be an option.

Comment 46

•

15 years ago

(In reply to comment #44) > Nothing new here, this is what happens with aoss and others: jemalloc > initialization calls open, which ends up being the one in the LD_PRELOADed > library, which calls dlsym, which calls malloc, which ends up in jemalloc, > which locks, since it is initializing. I know, but since the discussion about mmap() comes up now and then, I wanted to demonstrate by example that it happens with open() as well.

Comment 47

•

15 years ago

(In reply to comment #45) > > func = (int (*)(const char *, int, mode_t)) dlsym(REAL_LIBC, "open"); > > |open| must be async-signal-safe. See man 7 signal. > If it is not async-signal-safe, then other problems will also result. > dlsym is not async-signal-safe, so cannot be called from open. Thanks for pointing this out, but does it have anything todo with this specific problem? Ie, does async-signal-safe also mean that you cannot alloc?

Mike Hommey [:glandium]

Assignee

Comment 48

•

15 years ago

(In reply to comment #42) > 2) aoss calls malloc(3) from within an mmap(2) wrapper, which is a major > problem because it means a low level allocation function is recursing into a > high level allocation function. > > We can (and should) fix (1), but (2) isn't our fault, and it should be fixed at > the source of the problem (aoss). I gave a bit more thought to the above and I would add this: - aoss is not guaranteed to be alone in this case, better be safe and fix (2) than sorry. - glibc deals fine with LD_PRELOADED mmap calling malloc, and most people are only testing against glibc.

Mike Hommey [:glandium]

Assignee

Comment 49

•

15 years ago

There is another deadlock with padsp (pulseaudio wrapper), which is more problematic: padsp calls dlsym to get the original symbols for the functions it diverts, which is the typical use. Except that it does so on demand. What then happens is that something in nsXULStub calls access() before jemalloc is initialized. access() is caught by padsp, which locks a mutex, and resolves the original symbol with dlsym(). dlsym() ends up allocating memory, which triggers jemalloc initialization code, that opens /proc/cpuinfo. And open gets into padsp again, which locks the same mutex as above before resolving the open symbol with dlsym(). Only posting for documentation, I don't think this should be addressed at jemalloc level.

Mike Hommey [:glandium]

Assignee

Comment 50

•

15 years ago

Almost the same problem as comment #49 with artsdsp, except it doesn't deadlock, but crashes. Same succession of events, and it ends up crashing because, as being uninitialized, it tries to call the original open function, which at the time, it null.

Comment 51

•

15 years ago

(In reply to comment #45) > > I'm only guessing, but I wonder whether calling dlsym from an > __attribute__((constructor)) function and keeping the result may be an option. Tried that, however, the constructor function is not called in the case of jemalloc::open() call. Is it because jemalloc is a shared library? GCC document is not very clear about it. BTW, can we fix the jemalloc soon? All user using preload library that overrides open() will see hang! Thanks!

Benjamin Smedberg

Updated

•

15 years ago

blocking2.0: ? → -

Mike Hommey [:glandium]

Assignee

Comment 52

•

15 years ago

(In reply to comment #37) > http://www.canonware.com/cgi-bin/hg_jemalloc/rev/20ff57483b5f This now redirects to a gitweb summary, which doesn't help to know what the diff was originally. Could you either provide the patch or a working url ?

Mike Hommey [:glandium]

Assignee

Comment 53

•

15 years ago

Attached patch Use syscall for mmap and munmap (obsolete) — Details — Splinter Review

A much better approach, which should also work around the padsp and artsdsp problems. It combines the suggestion from comment 8 to disable all ncpus code with MOZ_MEMORY_NARENAS_DEFAULT_ONE, and replaces the calls to mmap() an munmap() from the libc, which, as we saw with e.g. aoss may be overridden with a function calling dlsym, which in turn is calling malloc, with direct calls to the system calls. There may be a need for some additional #ifdefs though, for platforms where the syscall(SYS_m(un)map, ...) form is unsupported.

Attachment #327318 - Attachment is obsolete: true

Attachment #448274 - Flags: review?

Attachment #327318 - Flags: review?(jasone)

Mike Hommey [:glandium]

Assignee

Updated

•

15 years ago

Attachment #448274 - Flags: review?(pavlov)

Attachment #448274 - Flags: review?(jasone)

Attachment #448274 - Flags: review?

Updated

•

15 years ago

Attachment #448274 - Flags: review?(jasone) → review+

Mike Hommey [:glandium]

Assignee

Comment 54

•

15 years ago

Comment on attachment 448274 [details] [diff] [review] Use syscall for mmap and munmap This actually doesn't work on various platforms because of how the mmap syscall works. For example, on x86, the patch as is leads to crash, and the call needs to be modified, or SYS_mmap2 to be used. On scratchbox, only the latter is actually available, apparently (at least in the headers). On s390, the syscall only takes one argument, that is a pointer to a struct containing all the parameters. On alpha, I haven't figured yet how to get something working. And on hurd, there is no mmap syscall. There are also architectures where some checks need to be done on the returned value. All in all, it would probably be better to use SYS_mmap2 when available and to only use the syscall for a given list of architecture.

Attachment #448274 - Flags: review?(pavlov)

Mike Hommey [:glandium]

Assignee

Comment 55

•

15 years ago

FTR, on alpha, it's a bug in glibc's syscall() function.

Mike Hommey [:glandium]

Assignee

Comment 56

•

15 years ago

Attached patch Disable ncpus with MOZ_MEMORY_NARENAS_DEFAULT_ONE (obsolete) — Details — Splinter Review

Would it be okay to land that part already ? I have something that should be working for the other part, but I need to test it on all our (Debian) architectures first.

Attachment #448274 - Attachment is obsolete: true

Attachment #448385 - Flags: review?(jasone)

Updated

•

14 years ago

Blocks: 586962

Mike Hommey [:glandium]

Assignee

Comment 57

•

14 years ago

Attached patch Use syscall() for mmap and munmap and disable ncpus in jemalloc to work around deadlocks with LD_PRELOADed libraries — Details — Splinter Review

This is the patch I've been using for a while on Debian and that works on all Debian supported architectures.

Attachment #448385 - Attachment is obsolete: true

Attachment #465667 - Flags: review?(jasone)

Attachment #448385 - Flags: review?(jasone)

Updated

•

14 years ago

Attachment #465667 - Flags: review?(jasone) → review+

Mike Hommey [:glandium]

Assignee

Comment 58

•

14 years ago

Comment on attachment 465667 [details] [diff] [review] Use syscall() for mmap and munmap and disable ncpus in jemalloc to work around deadlocks with LD_PRELOADed libraries This is very important for Linux systems, where the use of /dev/dsp wrappers is widespread, and currently leads to deadlocks.

Attachment #465667 - Flags: approval2.0?

Benjamin Smedberg

Comment 59

•

14 years ago

Comment on attachment 465667 [details] [diff] [review] Use syscall() for mmap and munmap and disable ncpus in jemalloc to work around deadlocks with LD_PRELOADed libraries a=me for checkin within the next week

Attachment #465667 - Flags: approval2.0? → approval2.0+

Mike Hommey [:glandium]

Assignee

Comment 60

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/2c3644af7354

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla2.0b5

You need to log in before you can comment on or make changes to this bug.