Open Bug 419131 Opened 17 years ago Updated 2 years ago

Use malloc_usable_size() and/or malloc_size() to increase PLArena sizes.

Categories

(NSPR :: NSPR, defect, P5)

Tracking

(Not tracked)

REOPENED

People

(Reporter: moz, Unassigned)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 3 obsolete files)

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9b3) Gecko/2008020511 Firefox/3.0b3 Build Identifier: This is an enhancement. Memory allocated for PLArenas (using malloc) has an overhead associated with it. Most malloc'd memory regions do. The malloc_usable_size() function can be used to discover how big the memory region's overhead is. (See also malloc_size() on Mac OS X.) With that, we can make the PLArenas larger. The result will be better use of the memory that we are already consuming, which is equivalent to a result of reduced memory use. Reproducible: Always See bug 415967.
More than an enhancement, this is a potentially significant footprint and perf buf.
Assignee: wtc → moz
Severity: enhancement → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: footprint, perf
Might be worth an interim NSPR release.
Flags: blocking1.9?
Status: NEW → ASSIGNED
Priority: -- → P3
Flags: tracking1.9? → blocking1.9?
Flags: blocking1.9? → wanted1.9+
Priority: P3 → P2
Can someone suggest a workload that I can use to test the effects of my changes? I need a workload that doesn't take long to run and which mildly stresses the use of PLArenas.
Sunspider is OK, takes ~10 seconds if you run with --runs=1. The JS test suite in the browser, if you exclude slow tests, takes a couple of minutes. bc has a good URL that I can't find now.
Scratch that, misread as JSArenas. Not sure where we use PLArenas, but whatever behaviour motivated this patch would probably point to it. :) (Frames? In which case Tdhtml would work well.)
On OS X, would we want to use malloc_good_size? Does jemalloc provide that on other platforms? The malloc_good_size() function rounds size up to a value that the allo- cator implementation can allocate without adding any padding and returns that rounded up value.
(In reply to comment #5) > Scratch that, misread as JSArenas. Not sure where we use PLArenas, but > whatever behaviour motivated this patch would probably point to it. :) > > (Frames? In which case Tdhtml would work well.) Yup. I'll look at using Tdhtml. Thank you for your help, Mike.
(In reply to comment #6) > On OS X, would we want to use malloc_good_size? Does jemalloc provide that on > other platforms? > > The malloc_good_size() function rounds size up to a value that the allo- > cator implementation can allocate without adding any padding and returns > that rounded up value. malloc_good_size() is a bit like malloc_usable_size(). The difference is that malloc_good_size() will tell you in advance, based on the size of your proposed allocation, how much space the allocation really takes. In contrast, malloc_usable_size() tells how big an already-allocated region really is. malloc_good_size is not available on jemalloc, but malloc_usable_size is. On Mac, I'll use the malloc_size() function as a an exact replacement for malloc_usable_size(). This was done also in bug 415967.
In reply to comment 3, SSL uses PLArenas of various sizes heavily.
Status update: There are some problems with taking measurements of the allocations of all PLArenas. Namely, there are some leaks. I have a plan to address those, and will open some bugs shortly. Meanwhile, I can present some preliminary stats. I collected these by starting the browser, running Tdhtml, and immediately stopping the browser. An unmodified build has: numMallocs: 35390 numReallocs: 0 numFrees: 35385 sumRequested: 5563 sumReal: 7456 hwmRequested: 2142923 hwmReal: 2417936 The interesting number there is the hwmReal, the high water mark for the real memory usage. (Note that 5 memory regions are leaked - more mallocs than frees.) After using malloc_usable_size: numMallocs: 33063 numReallocs: 0 numFrees: 33059 sumRequested: 4964 sumReal: 6432 hwmRequested: 1916729 hwmReal: 2147408 The number of calls to malloc has dropped by 6.5%. The real memory usage has dropped by 11%. These are substantial improvements in memory use. I feel that there is some substantial unnecessary fragmentation happening due to the use of a global free list of arenas. I won't fix that in this bug.
To check for fragmentation that might be happening due to arenas being too small, I collected allocation stats before and after the following change to PL_InitArenaPool: pool->first.base = pool->first.avail = pool->first.limit = (PRUword)PL_ARENA_ALIGN(pool, &pool->first + 1); pool->current = &pool->first; - pool->arenasize = size; + pool->arenasize = PR_MAX(4032,size); #ifdef PL_ARENAMETER memset(&pool->stats, 0, sizeof pool->stats); pool->stats.name = strdup(name); Basically, I put a lower bound of about 4K on the arena sizes of the pool. Without using malloc_usable_size(), the results were: . a 7.5% increase in real memory use . a 28% reduction in calls to malloc() So, for a small increase in memory use, we might get a speed increase. BUT, with malloc_usable_size() the results for that change were: . a 14.5% increase in real memory use . a 22% reduction in calls to malloc() So, the change is much less appealing when malloc_usable_size() is enabled. I include this information because this change is a tempting one to anyone who sees the histogram of allocations by PLArenas from a Tdhtml run. Here is one without malloc_usable_size, and without the above arena size change: ((279,288),4707) ((287,288),1) ((291,304),20) ((303,304),21) ((535,1024),2) ((547,1024),1) ((559,1024),30) ((599,1024),1) ((887,1024),2539) ((1043,1536),5967) ((1047,1536),83) ((1095,1536),3) ((2067,2560),51) ((2071,2560),22) ((2083,2560),14) ((2595,3072),4) ((3619,4096),19244) ((4115,4608),2216) ((4871,5120),741) ((8211,8704),5) ((8215,8704),15) Above, the value ((x,y),z) indicates that z calls to malloc(x) were made, and that the real size of memory reserved by malloc on those calls was y.
Attachment #308307 - Flags: review?(wtc)
Attachment 308307 [details] [diff] needs review. Not sure who is best to review. (?)
Blocks: 422245
Blocks: 414088
No longer blocks: 422245
Comment on attachment 308307 [details] [diff] [review] v1 - tested with Tdhtml, informal browser use The idea of this patch is simple and fine. The section of code that defines the USE_MALLOC_USABLE_SIZE macro needs work. For example, >+# if defined(XP_UNIX) && !IS_MAC >+# include <malloc.h> /* for malloc_usable_size */ >+# define USE_MALLOC_USABLE_SIZE 1 This assumes a nonstandard compilation environment on Unix. Standard Unix (as defined in the Single Unix Specification) doesn't have the <malloc.h> header and doesn't have the malloc_usable_size function. NSPR is also built and used as a stand-alone library, independent of Mozilla. So we must be able to compile NSPR in a standard Unix environment. What does MOZ_MEMORY mean?
(In reply to comment #14) > (From update of attachment 308307 [details] [diff] [review]) > The idea of this patch is simple and fine. The section of code > that defines the USE_MALLOC_USABLE_SIZE macro needs work. I agree. I hope that you'll help me work on it, so that we can make it right for the platforms supported by NSPR. > >+# if defined(XP_UNIX) && !IS_MAC > >+# include <malloc.h> /* for malloc_usable_size */ > >+# define USE_MALLOC_USABLE_SIZE 1 > This assumes a nonstandard compilation environment on Unix. > Standard Unix (as defined in the Single Unix Specification) > doesn't have the <malloc.h> header and doesn't have the > malloc_usable_size function. This is intended to say "if XP_UNIX is defined, which means that I'm being compiled as part of a Mozilla build (where the XP_UNIX macro is defined), and I'm not Mac, then I expect a "malloc.h" file to be somewhere in my standard include directories". I was thinking that if XP_UNIX is not defined, then we don't expect anything. If it is defined, then we have to have a malloc.h somewhere. This implies that we can only create builds where XP_UNIX is defined on platforms which have a malloc.h. Further, compilation will produce a warning if the malloc.h does not have a malloc_usable_size, and linking will produce an error if there is no such size. Suggestions for improvement? > What does MOZ_MEMORY mean? It is the macro that is defined when Mozilla is using "jemalloc", a memory subsystem which always includes malloc_usable_size (but which does not yet supply a function prototype for that).
(In reply to comment #14) > (From update of attachment 308307 [details] [diff] [review]) > >+# if defined(XP_UNIX) && !IS_MAC > >+# include <malloc.h> /* for malloc_usable_size */ > >+# define USE_MALLOC_USABLE_SIZE 1 We might also consider a stricter requirement. Eiher MOZ_MEMORY is defined, or the user must explicitly define USE_MALLOC_USABLE_SIZE=1 in the compilation environment: #ifndef USE_MALLOC_USABLE_SIZE # if defined(MOZ_MEMORY) size_t malloc_usable_size(void *); # define USE_MALLOC_USABLE_SIZE 1 # endif #endif
XP_UNIX does not mean "This is a mozilla build". Other products that mozilla products also define XP_UNIX.
Correction: Other products THAN mozilla products also define XP_UNIX. It is generally defined in any build of NSPR on Unix.
Nelson, the important part, to me, is to get Firefox to use malloc_usable_size. Firefox will have MOZ_MEMORY defined (eventually, on all plaftorms). I would like other platforms to get the benefit of my changes, but I don't know how to determine whether an arbitrary platform has the malloc_usable_size function (or where to find the prototype). One idea is to simply allow the user (person compiling) to tell us that there is a MALLOC_USABLE_SIZE. What do you think of this simplification? #ifndef USE_MALLOC_USABLE_SIZE # ifdef MOZ_MEMORY size_t malloc_usable_size(void *); # define USE_MALLOC_USABLE_SIZE 1 # endif #endif
(In reply to comment #19) > Nelson, the important part, to me, is to get Firefox to use > malloc_usable_size. Firefox will have MOZ_MEMORY defined (eventually, > on all plaftorms). Obviously not all platforms support it. > I would like other platforms to get the benefit of my changes, but I don't > know how to determine whether an arbitrary platform has the > malloc_usable_size function (or where to find the prototype). The tasks of determining what platforms support a feature, and what header files need to be included, and what libraries need to be linked, are among the most time and resource consuming aspects of NSPR development. That is one reason why so little development work is done on NSPR. Very few organizations, and even fewer indivuals, have access to all the necessary systems to make that determination. > One idea is to simply allow the user (person compiling) to tell us that there > is a MALLOC_USABLE_SIZE. What do you think of this simplification? NSPR uses a configuration script that allows the code builder to make such configuration choices. But remember that for every true developer who compiles open source, there are hundreds, maybe thousands, of open source code users who are not developers and who know little more than how to type "gmake". So, a product should not depend too heavily on them for that input. Any feature that relies on the builder to enable it will not be much used. If you want to pursue this subject, of how to get code to build with the optimal combinations of features on various platforms, I suggest you discuss this on one of Mozilla's newsgroups, such as news://news.mozilla.org:119/mozilla.dev.builds > #ifndef USE_MALLOC_USABLE_SIZE > # ifdef MOZ_MEMORY > size_t malloc_usable_size(void *); > # define USE_MALLOC_USABLE_SIZE 1 > # endif > #endif >
(In reply to comment #20) > (In reply to comment #19) > > Nelson, the important part, to me, is to get Firefox to use > > malloc_usable_size. Firefox will have MOZ_MEMORY defined (eventually, > > on all plaftorms). > Obviously not all platforms support it. Sorry, Nelson, my comments were not clear. I meant to say that eventually, the MOZ_MEMORY macro will be defined when we are building mozilla code, regardless of the platform for which we are building. I was not saying that I wanted malloc_usable_size used on all platforms - that obviously can't be done, because, as you say, not all platforms support it. So, when building Firefox MOZ_MEMORY will always be defined (soon). In that case, I want malloc_usable_size to be used. It will always be available when MOZ_MEMORY is defined, because it is a part of the jemalloc code. That latter is activated exactly when MOZ_MEMORY is defined. > The tasks of determining what platforms support a feature, and what header > files need to be included, and what libraries need to be linked, are among > the most time and resource consuming aspects of NSPR development. That is > one reason why so little development work is done on NSPR. Very few > organizations, and even fewer indivuals, have access to all the necessary > systems to make that determination. I understand, and sympathize. > > One idea is to simply allow the user (person compiling) to tell us that there > > is a MALLOC_USABLE_SIZE. What do you think of this simplification? > NSPR uses a configuration script that allows the code builder to make such > configuration choices. But remember that for every true developer who > compiles open source, there are hundreds, maybe thousands, of open source > code users who are not developers and who know little more than how to type > "gmake". So, a product should not depend too heavily on them for that input. > Any feature that relies on the builder to enable it will not be much used. I agree - it will not be much used. But, it will be used by the people who care to investigate the code to find performance improvements. And, my changes will be used by Mozilla. So, will you accept a patch with the simplifying change that I made?
One option is to look up the malloc_usable_size at run time with dlopen and dlsym. We can do that in the InitializeArenas function in plarena.c. Is malloc_usable_size a function in jemalloc? Does Mozilla use jemalloc as a shared library?
(In reply to comment #22) > One option is to look up the malloc_usable_size at run time with > dlopen and dlsym. We can do that in the InitializeArenas function > in plarena.c. Is malloc_usable_size a function in jemalloc? Does > Mozilla use jemalloc as a shared library? This would create a more flexible solution, but presents a similar problem to the compile-time one. Which library contains the malloc_usable_size that we must use depends on whether jemalloc is activated (which MOZ_MEMORY tells us anyway). Some OSs put malloc and friends in libc, others in different libraries (libSystem.B.dylib). Mac OS X uses a different name for malloc_usable_size - it uses malloc_size. Overall, I think that this idea does not resolve enough of the uncertainty about platform differences to be worth its implementation complexity. What do you think about this: #ifndef MALLOC_USABLE_SIZE # ifdef MOZ_MEMORY # define MALLOC_USABLE_SIZE malloc_usable_size # endif #endif #ifdef MALLOC_USABLE_SIZE PR_BEGIN_EXTERN_C size_t MALLOC_USABLE_SIZE(void *); PR_END_EXTERN_C #endif
Attached patch v2 (obsolete) — Splinter Review
Wan-Teh, this one simplifies the preprocessor macro code, and improves documentation on 'FlexibleMalloc'.
Attachment #308307 - Attachment is obsolete: true
Attachment #312204 - Flags: review?(wtc)
Attachment #308307 - Flags: review?(wtc)
Attachment #312204 - Attachment is obsolete: true
Attachment #312207 - Flags: review?(wtc)
Attachment #312204 - Flags: review?(wtc)
Comment on attachment 312207 [details] [diff] [review] v3 (v2 had unnecessary whitespace change) Hi Robin, I still don't understand how we're going to pass the MALLOC_USABLE_SIZE and MOZ_MEMORY macros to this file. Also, NSPR can be used independent of Mozilla, so we try to avoid depending on Mozilla's macros such as MOZ_MEMORY. Can Firefox define MALLOC_USABLE_SIZE as malloc_usable_size for NSPR if MOZ_MEMORY is defined? Ignoring this issue, I think this patch looks good. Here are some nits. 1. Remove PR_BEGIN_EXTERN_C and PR_END_EXTERN_C because this is a C file. 2. In the block comment before FlexibleMalloc, use NULL instead of 0 (again, because this is a C file and NULL is used to represent a null pointer).
(In reply to comment #26) > I still don't understand how we're going to pass the > MALLOC_USABLE_SIZE and MOZ_MEMORY macros to this file. The macros are intended to be defined by the compilation environment (i.e. in the Makefiles of persons compiling NSPR). For example, I compiled NSPR as part of a Firefox build, which means that I issued make -C nsprpub XCFLAGS="-DMALLOC_USABLE_SIZE=malloc_size" from mozilla/$OBJDIR. > Also, NSPR can be used independent of Mozilla, so we > try to avoid depending on Mozilla's macros such as > MOZ_MEMORY. Can Firefox define MALLOC_USABLE_SIZE as > malloc_usable_size for NSPR if MOZ_MEMORY is defined? We don't depend on MOZ_MEMORY. If it is there, then it has meaning; if it is not, then things function as they used to. (That is, unless MALLOC_USABLE_SIZE is defined.) Are you saying that you don't want any mention of MOZ_MEMORY inside NSPR code, and that MALLOC_USABLE_SIZE should be defined in the compilation environment (Makefiles)? > Ignoring this issue, I think this patch looks good. > Here are some nits. > 1. Remove PR_BEGIN_EXTERN_C and PR_END_EXTERN_C because > this is a C file. True, but a C++ compiler might compile it. Many other C files in NSPR use 'extern "C"' via PR_BEGIN/END_EXTERN_C. Are you sure that you want those removed? > 2. In the block comment before FlexibleMalloc, use NULL > instead of 0 (again, because this is a C file and NULL > is used to represent a null pointer). Yes, this is a good idea.
Robin, PR_BEGIN_EXTERN_C is used exclusively in header files (.h). Header files may be included by either c or c++ source files (.c) and may be compiled with either c or c++ compilers. Your patch would add PR_BEGIN_EXTERN_C to a .c source file, which should not be compiled with any c++ compiler.
(In reply to comment #28) > PR_BEGIN_EXTERN_C is used exclusively in header files (.h). You're right. I made a mistake with my find/grep command. Only pr/src/misc/prdtoa.c uses 'extern "C"', and not via PR_BEGIN/END_EXTERN_C. I will immediately supply a new patch.
I would like to emphasize the importance of this small patch by reiterating the performance and memory footprint benefits it gives on Firefox. We might expect similar performance gains for non-Mozilla users of NSPR arena code. From a prior comment: I collected these by starting the browser, running Tdhtml, and immediately stopping the browser. An unmodified build has: numMallocs: 35390 numFrees: 35385 hwmRequested: 2142923 hwmReal: 2417936 The interesting number there is the hwmReal, the high water mark for the real memory usage. After using malloc_usable_size: numMallocs: 33063 numFrees: 33059 hwmRequested: 1916729 hwmReal: 2147408 The number of calls to malloc has dropped by 6.5%. The real memory usage has dropped by 11%. (!) These are substantial improvements in memory use.
Attachment #312207 - Attachment is obsolete: true
Attachment #313025 - Flags: review?(wtc)
Attachment #312207 - Flags: review?(wtc)
review ping
Comment on attachment 313025 [details] [diff] [review] v4 - no PR_BEGIN/END_EXTERN_C Hi Robin, Thanks for your work on this bug. Please address this issue I raised in comment 26: Can Firefox define MALLOC_USABLE_SIZE as malloc_usable_size for NSPR if MOZ_MEMORY is defined? That is, NSPR's configure script or source file should not need to deal with MOZ_MEMORY. I believe this means removing the following from this patch: +** MOZ_MEMORY indicates that the Mozilla memory subsystem, which supplies +** malloc_usable_size, is active. +*/ +#if !defined(MALLOC_USABLE_SIZE) && defined(MOZ_MEMORY) +#define MALLOC_USABLE_SIZE malloc_usable_size +#endif and replacing AC_DEFINE(MOZ_MEMORY) by AC_DEFINE_UNQUOTED(MALLOC_USABLE_SIZE,malloc_usable_size) in mozilla/nsprpub/configure.in. Would this work?
Attachment #313025 - Flags: review?(wtc) → review-
Robin, I just noticed that you already responded to that issue with a question in your comment 27: Are you saying that you don't want any mention of MOZ_MEMORY inside NSPR code, and that MALLOC_USABLE_SIZE should be defined in the compilation environment (Makefiles)? Yes, that's what I'm saying. What I meant by "depend on MOZ_MEMORY" is "check/test for MOZ_MEMORY".
(In reply to comment #33) > Yes, that's what I'm saying. What I meant by "depend on MOZ_MEMORY" > is "check/test for MOZ_MEMORY". I'll make appropriate modifications, and upload a patch, soon. Thanks, Wan-Teh.
Robin, could you also change mozilla/nsprpub/configure.in to define MALLOC_USABLE_SIZE? This is what I have in mind: - Add a new configure option --with-jemalloc - If OS is Linux, define MALLOC_USABLE_SIZE as malloc_usable_size - If OS is Darwin, define MALLOC_USABLE_SIZE as malloc_size - But if --with-jemalloc is specified, override the above and define MALLOC_USABLE_SIZE as malloc_usable_size
So, the idea is to pass '--with-jemalloc' to NSPR when the option of the same name was passed to mozilla/configure.in? Then, the logic for the MALLOC_USABLE_SIZE macro is all in nspr/configure.in, right?
Yes, that's the idea. This way you don't need to pass MALLOC_USABLE_SIZE into NSPR using XCFLAGS as you showed in comment 27. Let's use --enable-jemalloc instead because that's what mozilla/configure.in uses.
Wan-Teh, Adding a '--enable-jemalloc' option to NSPR's configure would suggest (to me) that the user wants NSPR to be compiled with the jemalloc malloc subsystem. But, that's not what you want, is it? You want '--enable-jemalloc' to indicate that NSPR will be linked with a library (the jemalloc code) which contains malloc_usable_size, right? In Mozilla, '--enable-jemalloc' tells mozilla to compile and use the jemalloc subsystem. Having subtly different meanings for that configure option is confusing, IMO. What do you think? The subtle difference is between the making of jemalloc, versus the mere presence of jemalloc at link time. I hope I'm being clear enough. Maybe we should instead intentionally use name different from '--enable-jemalloc'?
Robin, Yes, I want '--enable-jemalloc' to indicate that NSPR will be linked with a library (the jemalloc code) which contains malloc_usable_size I don't know how Mozilla compile and use jemalloc. My suggestion of adding a --enable-jemalloc option to NSPR's configure was purely based on my inspection of Mozilla's configure.in: 6211 dnl ======================================================== 6212 dnl = Enable jemalloc 6213 dnl ======================================================== 6214 MOZ_ARG_ENABLE_BOOL(jemalloc, 6215 [ --enable-jemalloc Replace memory allocator with jemalloc], 6216 MOZ_MEMORY=1, 6217 MOZ_MEMORY=) It implies that --enable-jemalloc and MOZ_MEMORY mean the same thing (to Mozilla). Since your patches use MOZ_MEMORY, I thought NSPR's configure could also use --enable-jemalloc. Is the subtle difference between --enable-jemalloc vs. --with-jemalloc? --with-jemalloc is my original suggestion. I believe Mozilla's configure passes its options to NSPR's configure, so we'd need to replace --enable-jemalloc with --with-jemalloc in _SUBDIR_CONFIG_ARGS in Mozilla's configure.
This bug seems to have stalled due solely to my lack of work on it. It has some compelling stats associated with the patch: see comment#10 and comment#11. More significantly, this change would benefit all NSPR users, not merely Firefox. Who can champion this cause?
This won't be needed once PLArenas are all power-of-two sized.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
This is not is not mutually-exclusive with bug 676457. In Gecko there are callers that calculate a non-power-of-two |size|, and probably many non-Gecko callers that do so too.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
> In Gecko there are > callers that calculate a non-power-of-two |size|, and probably many > non-Gecko callers that do so too. At this point all the callers in Gecko use a power-of-two size, so there's no benefit to be had for Firefox. I won't close the bug though because it's conceivable that someone in the world might still want to do this for non-Gecko consumers, even though it hasn't been touched in four years.

The bug assignee didn't login in Bugzilla in the last 7 months and this bug has priority 'P2'/severity 'major'.
:KaiE, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: moz → nobody
Flags: needinfo?(kaie)
Severity: major → S4
Flags: needinfo?(kaie)
Priority: P2 → P5
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: