Last Comment Bug 580408 - (jemalloc3) import latest jemalloc changes into our source tree
(jemalloc3)
: import latest jemalloc changes into our source tree
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: Other Branch
: x86 All
: -- normal with 9 votes (vote)
: mozilla16
Assigned To: Mike Hommey [:glandium]
:
Mentors:
: 642822 (view as bug list)
Depends on: 736959 736963 737084 738176 751482 751511 762661 946878
Blocks: 470217 770612 586962 MatchStartupMem 762445 762446 762448 763920
  Show dependency treegraph
 
Reported: 2010-07-20 15:15 PDT by Andreas Gal :gal
Modified: 2014-12-15 15:29 PST (History)
41 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
jemalloc testcase (patch to jemalloc tip) (2.08 KB, patch)
2011-08-26 08:25 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Review
WIP (1.25 MB, patch)
2012-02-23 11:36 PST, Mike Hommey [:glandium]
no flags Details | Diff | Review
WIP, based on upstream git 025d86118673f153b6ccd68e49054e58493b57f4 (1.06 MB, patch)
2012-03-13 05:57 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Import jemalloc dev branch (650285d) (1.06 MB, patch)
2012-03-20 00:40 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Build glue for jemalloc2 (11.08 KB, patch)
2012-03-20 00:50 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Move Mozilla fork of jemalloc to memory/mozjemalloc (4.05 KB, patch)
2012-04-02 06:59 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Import jemalloc dev branch (09a0769) (1.09 MB, patch)
2012-04-02 07:00 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Build glue for jemalloc2 (10.81 KB, patch)
2012-04-02 07:06 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Move Mozilla fork of jemalloc to memory/mozjemalloc (3.79 KB, patch)
2012-05-14 01:21 PDT, Mike Hommey [:glandium]
khuey: review+
Details | Diff | Review
Import jemalloc 3.0.0 (1.20 MB, patch)
2012-05-14 01:24 PDT, Mike Hommey [:glandium]
khuey: review+
gerv: review+
Details | Diff | Review
Glue for jemalloc 3.0.0 (13.14 KB, patch)
2012-05-14 01:35 PDT, Mike Hommey [:glandium]
khuey: review+
Details | Diff | Review

Description Andreas Gal :gal 2010-07-20 15:15:35 PDT
http://www.canonware.com/jemalloc/
Comment 1 Benjamin Smedberg [:bsmedberg] 2010-07-20 19:02:25 PDT
jasone has sent us the changes in the past
Comment 2 Paul Biggar 2010-08-12 07:20:26 PDT
jasone:

Some questions (I don't see docs on this, and I've trawled through lots of bugs on this looking for information):

- What version of jemalloc is in mozilla-central/memory/jemalloc?
- What version should I update it to?
- I'm also going to look at using jemalloc on mac (bug 580404) - will this make any difference to the previous 2 questions.

Anything else I need to know about this?

Thanks.
Comment 3 Jason Evans 2010-08-12 11:25:24 PDT
The version of jemalloc in mozilla-central is based on code from FreeBSD:

__FBSDID("$FreeBSD: head/lib/libc/stdlib/malloc.c 180599 2008-07-18 19:35:44Z jasone $");

Since then I have switched over to primarily developing jemalloc here:

http://www.canonware.com/jemalloc/

Merging a newer jemalloc is going to be quite a bit of work, because so much has changed, both in the mozilla and stand-alone versions of jemalloc.
Comment 4 Paul Biggar 2010-09-07 09:14:14 PDT
I've been doing a lot of archaeology here, to figure out a decent plan for fixing this. It seems that the best "sync" time between the repos is at FreeBSD 181733 and Mozilla b884112e0922 (just after the memory reserve feature was pulled). At that time, the major differences between Mozilla jemalloc and FreeBSD jemalloc were:

- Mozilla half removed the DSS feature
- Mozilla added the DECOMMIT feature
- Mozilla added the PAGEFILE feature
- Mozilla added the VALIDATE feature
- Mozilla added Valgrind support
- Mozilla added the JEMALLOC_USES_MAP_ALIGN feature
- Mozilla #ifdefed off the FILL, UTRACE, XMALLOC and SYSV features
- Mozilla is riddled with platform dependent #ifdefs

My target is to have jemalloc as a pure vendor dependency. That is, memory/jemalloc/jemalloc.{h,c} will be Mozilla specific files which reach into memory/jemalloc/vendor/, which will be a pull direct from git master over at canonware.com.

To do this, I intend to merge Mozilla features back to canonware, and vice-versa. Rough plan at the moment is:

- Merge into Mozilla tip some small fixes from FreeBSD which were lost along the way.
- Merge into Mozilla tip the missing DSS feature
- Merge into canonware base (that is, the first revision in the canonware repo, from 2009) the added Mozilla features (DECOMMIT and PAGEFILE).
- Merge the small #ifdef features into Canonware
- From here it's a bit hazy:
  - move the early contents of the canonware repo to Mozilla.
  - move through the canonware repo, a revision at a time, keeping the features building.
  - somehow pull every platform dependent #ifdef out of the Mozilla code.

At the end I intend to put a big sign on the memory/jemalloc directory instructing future changes to be merged upstream and brought in wholesale, barring civil war or an act of god.

I may also do the same to the FreeBSD repo as the Mozilla repo, if that is welcome over at FreeBSD.

To simplfiy matters, are there any optional features that can be removed? You never added DECOMMIT or PAGEFILE support to the Canonware repo; are they useful? Could they just be removed?
Comment 5 Paul Biggar 2010-09-10 08:05:06 PDT
(In reply to comment #4)
> It seems that the best "sync" time between the repos is at FreeBSD
> 181733 and Mozilla b884112e0922 (just after the memory reserve feature was
> pulled). 

Correction: FreeBSD 180599.
Comment 6 Jason Evans 2010-09-12 13:03:37 PDT
I finished integrating OS X support into the stand-alone jemalloc repository (git://canonware.com/jemalloc.git) yesterday.  The VALIDATE feature was required for OS X, so I integrated it as well.

Comments on the various Mozilla-specific jemalloc features:

- DSS is of no use anywhere except for compatibility issues on FreeBSD, and it complicates chunk mapping machinery.  I ripped it out of Mozilla's code, since it was a maintenance burden, and hopelessly broken by bitrot anyway.

- The DECOMMIT feature is critical to Windows support.

- The PAGEFILE feature is unused, and can be removed.  It was part of an earlier strategy for dealing with out-of-memory conditions by preventing VM overcommit.

- The Valgrind support as it currently exists isn't ideal because there are no red zones.  It is possible (though certainly not easy) to add red zone support to jemalloc, but for the time being I would consider porting this feature to the stand-alone jemalloc a lower priority, since it is of limited usefulness anyway.

- The JEMALLOC_USES_MAP_ALIGN feature is not of particular importance.

- The stand-alone jemalloc already has configure-time options for FILL, UTRACE, XMALLOC, and SYSV.

As I see it, there are two primary challenges remaining before jemalloc can be a pure vendor dependency.  The first of these is that Windows support has to be integrated into stand-alone jemalloc.  The second is that the Mozilla build system needs to be modified.  These two issues are actually closely tied together, because it is quite painful to replace the system malloc on Windows.  Mozilla's current strategy relies on source code access for the crt, and I'm told that this is preventing the use of newer MS development environments.  I recently spent some time looking at how tcmalloc solves this problem (in the context of Google Chrome, as it turns out).  The general approach looks sound to me.  For more details, take a look at:

    https://groups.google.com/group/google-perftools/browse_thread/thread/41cd3710af85e57b

This, in combination with a perusal of the tcmalloc source code (http://code.google.com/p/google-perftools/), should serve as an excellent implementation guide.
Comment 7 Paul Biggar 2011-03-18 10:09:37 PDT
*** Bug 642822 has been marked as a duplicate of this bug. ***
Comment 8 Nicholas Nethercote [:njn] 2011-06-29 03:10:28 PDT
(In reply to comment #6)
> 
> - The Valgrind support as it currently exists isn't ideal because there are
> no red zones.  It is possible (though certainly not easy) to add red zone
> support to jemalloc, but for the time being I would consider porting this
> feature to the stand-alone jemalloc a lower priority, since it is of limited
> usefulness anyway.

I just WONTFIXed bug 503249, which was about better Valgrind+jemalloc integration.  Building with --disable-jemalloc is pretty much de riguer for using Valgrind.  So I wouldn't lose any sleep over that.


> As I see it, there are two primary challenges remaining before jemalloc can
> be a pure vendor dependency.  The first of these is that Windows support has
> to be integrated into stand-alone jemalloc.  The second is that the Mozilla
> build system needs to be modified.  These two issues are actually closely
> tied together, because it is quite painful to replace the system malloc on
> Windows.  Mozilla's current strategy relies on source code access for the
> crt, and I'm told that this is preventing the use of newer MS development
> environments.

I don't understand these issues, but http://blog.kylehuey.com/post/7015378885/migrating-to-msvc-2010 says that it's now possible to build with MSVC 2010.
Comment 9 Justin Lebar (not reading bugmail) 2011-07-07 15:32:48 PDT
One way we could try to measure whether the new version would be helpful is to run Firefox for a while and record malloc()s and free()s.  Then replay the recorded session (from a toy executable) with Mozilla's jemalloc and Canonware's, comparing speed and RSS.
Comment 10 Paul Biggar 2011-07-07 17:32:34 PDT
I'm not going to be able to work on this, but can still advise. I'm pretty sure there's going to be both a speed and a fragmentation benefit to switching to the latest jemalloc (that's not to say we shouldn't test, of course).

The strategy I was using was to pull patches roughly chronologically (alternating between freebsd, mozilla and canonware as the source, depending on which was the next logical commit), and merge, build, test, then continue. This seems like a sensible way to ensure we don't miss anything subtle, but it actually took a huge amount of time and I didn't get very far with it.

So I think a better step forward might just be to go hardcore. Delete the current jemalloc, import the latest one, and see if it can be made roughly work. If it kinda works, then port the mozilla changes like DECOMMIT and the recent OSX work to canonware.
Comment 11 Paul Biggar 2011-07-26 16:30:14 PDT
With a very small amount of effort, khuey and I have Firefox standing up with the upsteam version of jemalloc, on Mac. See http://hg.mozilla.org/users/khuey_mozilla.com/jemalloc2/ for the code.

Currently, we're focussed on getting this working the whole way on Mac, and getting that committed. From our initial survey:


- standing it up on windows will be easy, but it wont be worth much without decommit
- my recent changes to Mac integration have not been upstreamed
- apart from that, upstream changes for mac will be pretty minimal
Comment 12 Justin Lebar (not reading bugmail) 2011-07-26 16:33:13 PDT
I presume the change of component to js-ctypes was a mistake?
Comment 13 Justin Lebar (not reading bugmail) 2011-08-23 14:32:41 PDT
During and after memshrink, pbiggar, khuey, and I discussed this bug.

I'm going to measure to see whether this update will reduce memory usage -- I found, e.g. in bug 636220 comment 17, that fragmentation in jemalloc can result in us wasting large amounts of memory (on the order of 1/3 of RSS).  I've read that the new jemalloc has much better fragmentation performance.

Paul also mentioned that he observed a 7% speedup on sunspider, between the old version of jemalloc in our tree and this version.  So even if the new version doesn't reduce memory usage, it still may be a large win.
Comment 14 Justin Lebar (not reading bugmail) 2011-08-24 08:53:19 PDT
It looks like memory is not being freed on my 10.6 machine.  In a release build, I opened a few google docs tabs, then closed them.  Numbers here are after opening all the tabs, and after closing all the tabs and minimizing memory usage:

after opening
  heap-allocated: 250mb
  resident:       370mb

after closing
  heap-allocated:  85mb
  resident:       350mb
Comment 15 Justin Lebar (not reading bugmail) 2011-08-26 06:47:18 PDT
We should test jemalloc on mac with a toy program to see if it frees up memory there.  If not (as seems likely), it's a bug in jemalloc.
Comment 16 Justin Lebar (not reading bugmail) 2011-08-26 08:25:30 PDT
Created attachment 556034 [details] [diff] [review]
jemalloc testcase (patch to jemalloc tip)

This is a test program which allocates some memory, touches it, and then frees it.

On both Linux and Mac, jemalloc dutifully frees most of the memory I allocate, when I allocate big chunks (tested 512B, 1024B, 1M).

It seems that there's a constant amount of data left over after allocating N chunks, regardless of size.  That's worrying and worth looking into, but I don't think it explains the behavior I saw in comment 14.
Comment 17 Justin Lebar (not reading bugmail) 2011-08-26 08:26:44 PDT
To build and run, apply the patch, then

  LINUX: $ make check && test/rss
  MAC:   $ make check && DYLD_FALLBACK_LIBRARY_PATH=lib test/rss
Comment 18 Justin Lebar (not reading bugmail) 2011-08-30 14:46:29 PDT
> It looks like memory is not being freed on my 10.6 machine.

I just spoke with pibiggar about this on IRC.  He's convinced me that MADV_FREE doesn't reduce reported RSS on Mac until memory pressure occurs.  He has a test program (bug 414946 comment 83) which can be used to show RSS decreasing upon memory pressure.  And more to the point, it can also be used in conjunction with the test I wrote to show that, upon memory pressure, my testcase frees about half of its remaining memory.

This does mean that it's a real pain to measure RSS on Mac...
Comment 19 Justin Lebar (not reading bugmail) 2011-08-30 15:04:52 PDT
> In a release build, I opened a few google docs tabs, then closed them.  Numbers here are 
> after opening all the tabs, and after closing all the tabs and minimizing memory usage:

Here's the same testcase (not directly comparable to the old testcase; different documents, and it looks like Google has been changing some things in docs lately) except that before taking each measurement, I ran a program which malloc's and touches 4G of memory.

Before opening tabs:
  heap-allocated:   49mb
  resident:         93mb

After opening tabs:
  heap-allocated:  196mb
  resident:        284mb

After closing tabs:
  heap-allocated:   65mb
  resident:        150mb

Of course, the problem with this methodology is that when I touch 4G of memory, Firefox itself might get swapped out, artificially reducing its RSS.  This doesn't appear to be happening here, or at least, swapping isn't responsible for the entire RSS decrease, because swap file usage increased only 50mb, but RSS decreased by 150mb.

So this is great; it shows that, at least on my system, mac jemalloc2 isn't totally broken.  Now we need to compare it to the old version of jemalloc.
Comment 20 Justin Lebar (not reading bugmail) 2011-08-30 15:20:38 PDT
Comparing with the old jemalloc is a project for another day (I'd also prefer to do it on Linux, where I have better memory introspection and don't have to do this "allocate a bunch of memory to reduce RSS" hack).

But just as a back of the envelope calculation, in bug 636220 (on Linux, old jemalloc), there was a 70mb difference between initial resident and resident after closing tabs.  Here, the difference is 57mb.  That's about 20% less memory.  But this a completely inaccurate comparison -- the operating systems and testcases are different, and we don't know how noisy the numbers are.
Comment 21 Justin Lebar (not reading bugmail) 2011-10-14 20:00:43 PDT
Browsing through the source code, it looks like the new jemalloc has a lock-free allocation and deallocation path.

If this works, this would be *huge*, since it should let us get rid of most (all?) of our custom allocators.  The only good reason for managing our own js gc chunks, for example, is that malloc is too slow.  This lockless path looks very fast.

See bug 670596, bug 166701 for examples of the pain we go through due to these custom allocators.
Comment 22 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-10-15 07:04:28 PDT
Do we have data on the locking overhead costs?
Comment 23 Justin Lebar (not reading bugmail) 2011-10-15 08:46:33 PDT
All I have is comment 13: "Paul also mentioned that he observed a 7% speedup on sunspider, between the old version of jemalloc in our tree and this version."
Comment 24 Mike Hommey [:glandium] 2012-01-20 11:53:37 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #22)
> Do we have data on the locking overhead costs?

When I did some profiling a year ago, I saw 12% startup time spent running the pthread_mutex functions (actual instructions, not even lock contention). I also saw that it spents a lot of time trying to find a pointer where in the end it just gives out the last one that was freed (less true after startup, but during startup, it's pretty massive). I don't know if the new jemalloc has a fastpath for that, but if it does have a lock-free allocation path, that's going to be massive wins across the board.
Comment 25 Mike Hommey [:glandium] 2012-01-20 12:02:55 PST
If noone picks this bug in the next couple weeks, I probably will.
Comment 26 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2012-01-20 12:37:13 PST
Are you planning to port the Windows stuff, or just do this for Linux?
Comment 27 Mike Hommey [:glandium] 2012-01-20 14:05:03 PST
I'll look at all platforms.
Comment 28 Justin Lebar (not reading bugmail) 2012-01-20 14:12:44 PST
While I'm thinking about it: One piece of work is decommit support, which we use for Windows.

On Linux/Mac, we madvise pages we're not using.  When we go back to use a page, we just need to touch it.  On Windows, we explicitly decommit these pages, then recommit them when we need them again.

Exactly why decommit was used, rather than VirtualAlloc(MEM_RESET), was a mystery until I got this e-mail from jasone:

> At a high level, jemalloc does decommit for Windows because Windows doesn't over-commit virtual 
> memory (unlike most Unix OSes).  That said, I think we may have avoided MEM_RESET simply because 
> we couldn't get accurate memory usage statistics with it.  At the time we were testing on XP and 
> Vista, and they had different virtual memory reporting semantics.  It's possible that MEM_RESET 
> would have been a reasonable choice, but we may not have realized that until we already had the 
> decommit code working.

I'm not sure how the over-commit issue affects us, but note that we have the same measurement problem on Mac, with MADV_FREE.  MADV_FREE'd pages shouldn't count against our measured RSS, but they do.  The current hack in our jemalloc is that, before reading RSS, we explicitly decommit all MADV_FREE'd pages.  We'll want this functionality added to jemalloc2, and we could do something similar on Windows.
Comment 29 Justin Lebar (not reading bugmail) 2012-01-20 14:13:38 PST
Mike, we've done a fair bit of hacking on jemalloc recently, so be sure to check out the hg log.
Comment 30 Mike Hommey [:glandium] 2012-02-23 11:36:24 PST
Created attachment 600110 [details] [diff] [review]
WIP

This imports jemalloc 2.2.5, a set of patches I sent upstream, a few more for build integration, an ugly but working version of malloc_usable_size_in_advance, and minimalistic build glue. Only works on linux and breaks about:memory. Final patch will put it under memory/jemalloc instead of memory/jemalloc2. memory/jemalloc is kept for the moment for jemalloc.h.

From there, we should be able to test memory usage with multiple arenas vs. single arena (which can be triggered by adding --with-one-arena to the relevant ac_configure_args variable in configure.in)
Comment 31 Nicholas Nethercote [:njn] 2012-02-23 14:25:50 PST
> an ugly but working version of malloc_usable_size_in_advance

I realize that malloc_usable_size_in_advance is a pain, and you've provided an obvious and simple implementation, but I think that version will be unacceptably slow.  Every single SQLite allocation uses that function before calling malloc.  It's pretty stupid behaviour from SQLite, but we're stuck with it :(
Comment 32 Mike Hommey [:glandium] 2012-02-23 14:45:14 PST
(In reply to Nicholas Nethercote [:njn] from comment #31)
> > an ugly but working version of malloc_usable_size_in_advance
> 
> I realize that malloc_usable_size_in_advance is a pain, and you've provided
> an obvious and simple implementation, but I think that version will be
> unacceptably slow.  Every single SQLite allocation uses that function before
> calling malloc.  It's pretty stupid behaviour from SQLite, but we're stuck
> with it :(

It's only this ugly implementation because i wanted something that works. I had a version using jemalloc internals but it was utterly failing, and my quick attempts at fixing it led to something that kind of works, but doesn't pass TestJemalloc.
Comment 33 Mike Hommey [:glandium] 2012-03-13 05:57:44 PDT
Created attachment 605364 [details] [diff] [review]
WIP, based on upstream git 025d86118673f153b6ccd68e49054e58493b57f4

This is roughly the same as the previous WIP, but it uses an unpatched jemalloc, from current dev branch on upstream git.
Comment 34 Justin Lebar (not reading bugmail) 2012-03-15 08:27:22 PDT
Comment on attachment 605364 [details] [diff] [review]
WIP, based on upstream git 025d86118673f153b6ccd68e49054e58493b57f4

Try push: https://tbpl.mozilla.org/?tree=Try&rev=d2db12fea478
Comment 35 Justin Lebar (not reading bugmail) 2012-03-15 08:31:12 PDT
John, if this try push builds successfully, could you run it through the areweslimyet tests?  I'm interested in the RSS numbers, particularly end memory resident settled.
Comment 36 Justin Lebar (not reading bugmail) 2012-03-15 08:39:51 PDT
Hm, I bet AWSY is linux64.  https://tbpl.mozilla.org/?tree=Try&rev=6c356cfa0c87
Comment 37 Andrew McCreight [:mccr8] 2012-03-15 08:52:24 PDT
Yeah, it is just run on a Linux box under somebody's desk.
Comment 38 Mike Hommey [:glandium] 2012-03-15 09:12:32 PDT
If it's possible, it would be interesting to run these builds "normally" and with the JE_MALLOC_CONF environment variable set to "narenas:1".
Comment 39 Justin Lebar (not reading bugmail) 2012-03-15 11:13:21 PDT
These try builds appear to crash on startup.  :)
Comment 40 Mike Hommey [:glandium] 2012-03-19 10:44:41 PDT
At this point, I have a working jemalloc (dev branch), on linux and android, with only a one-liner patch to jemalloc code. Except I get a crash at shutdown:

Program received signal SIGSEGV, Segmentation fault.
arena_salloc (ptr=<optimized out>) at /home/mh/mozilla-central/memory/jemalloc2/src/arena.c:1438
1438			size_t binind = arena_bin_index(chunk->arena, run->bin);
(gdb) bt
#0  arena_salloc (ptr=<optimized out>) at /home/mh/mozilla-central/memory/jemalloc2/src/arena.c:1438
#1  0x00000000004175e4 in isalloc (ptr=0x645050) at include/jemalloc/internal/jemalloc_internal.h:692
#2  free (ptr=0x645050) at /home/mh/mozilla-central/memory/jemalloc2/src/jemalloc.c:1189
#3  0x00007ffff7deda89 in _dl_deallocate_tls () from /lib64/ld-linux-x86-64.so.2
#4  0x00007ffff7bc792d in __free_stacks (limit=41943040) at allocatestack.c:278
#5  0x00007ffff7bc7a39 in queue_stack (stack=<optimized out>) at allocatestack.c:306
#6  __deallocate_stack (pd=0x400000) at allocatestack.c:758
#7  0x00007ffff7bc8e3d in pthread_join (threadid=140737052145408, thread_return=0x7fffffffb838) at pthread_join.c:110
#8  0x00007ffff6c67b56 in PR_JoinThread (thred=0x7ffff6058400) at /home/mh/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:560
#9  0x00007ffff3cd2ca5 in nsThread::Shutdown (this=0x7ffff6128cc0) at /home/mh/mozilla-central/xpcom/threads/nsThread.cpp:503
#10 0x00007ffff3cdc4ad in nsCycleCollector_shutdownThreads () at /home/mh/mozilla-central/xpcom/base/nsCycleCollector.cpp:4094
#11 0x00007ffff3cabe18 in mozilla::ShutdownXPCOM (servMgr=0x7ffff6074508)
    at /home/mh/mozilla-central/xpcom/build/nsXPComInit.cpp:614
#12 0x00007ffff32ed6ad in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0x7fffffffbfd0, __in_chrg=<optimized out>)
    at /home/mh/mozilla-central/toolkit/xre/nsAppRunner.cpp:1124
#13 0x00007ffff32f2d21 in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>)
    at /home/mh/mozilla-central/toolkit/xre/nsAppRunner.cpp:3731
#14 0x0000000000402442 in do_main (argv=0x7fffffffe3e8, argc=2) at /home/mh/mozilla-central/browser/app/nsBrowserApp.cpp:190
#15 main (argc=<optimized out>, argv=<optimized out>) at /home/mh/mozilla-central/browser/app/nsBrowserApp.cpp:277
Comment 41 Mike Hommey [:glandium] 2012-03-19 10:46:13 PDT
Corresponding try build:
https://tbpl.mozilla.org/?tree=Try&rev=d63b01c161cb
Comment 42 Mike Hommey [:glandium] 2012-03-20 00:40:29 PDT
Created attachment 607474 [details] [diff] [review]
Import jemalloc dev branch (650285d)
Comment 43 Mike Hommey [:glandium] 2012-03-20 00:50:14 PDT
Created attachment 607475 [details] [diff] [review]
Build glue for jemalloc2
Comment 44 Mike Hommey [:glandium] 2012-03-20 00:51:07 PDT
(In reply to Mike Hommey [:glandium] from comment #42)
> Created attachment 607474 [details] [diff] [review]
> Import jemalloc dev branch (650285d)

(In reply to Mike Hommey [:glandium] from comment #43)
> Created attachment 607475 [details] [diff] [review]
> Build glue for jemalloc2

This is the current status, working on Android and Linux. Try build:
https://tbpl.mozilla.org/?tree=Try&rev=953758f86a36
Comment 45 Mike Hommey [:glandium] 2012-03-20 00:52:01 PDT
(In reply to Mike Hommey [:glandium] from comment #44)
> This is the current status, working on Android and Linux. Try build:
> https://tbpl.mozilla.org/?tree=Try&rev=953758f86a36

(This requires patches from bug 736959, bug 736963 and bug 737084)
Comment 46 Mike Hommey [:glandium] 2012-03-20 12:16:45 PDT
Better colors on this try build:
https://tbpl.mozilla.org/?tree=Try&rev=ae26377b6f37

I'll attach corresponding patches tomorrow. Could these be run on AWSY ? (see comment 35 and comment 38)
Comment 47 John Schoenick [:johns] 2012-03-21 12:00:07 PDT
I'm running this on AWSY now, appears to be running without issues/crashing - I'll update when it's finished
Comment 48 John Schoenick [:johns] 2012-03-21 14:00:18 PDT
This tested successfully, although some of the memory reporters appear to be returning bogus values. You can see the results on albus (if you're on the network) or areweslimyet.com (if you have the password):

http://albus.mv.mozilla.com:8000/?series=jemalloc2
https://areweslimyet.com/?series=jemalloc

JSON of the memory report:
http://albus.mv.mozilla.com:8000/data/ae26377b6f37ee33921c291c03bd5a719a05b489.json.gz
https://areweslimyet.com/data/ae26377b6f37ee33921c291c03bd5a719a05b489.json.gz
Comment 49 John Schoenick [:johns] 2012-03-21 14:00:57 PDT
That second link should be

https://areweslimyet.com/?series=jemalloc2
Comment 50 Justin Lebar (not reading bugmail) 2012-03-21 14:08:04 PDT
The two most important values here are the purple and light green dots, representing our memory usage after GC before and after closing the benchmark's tabs.

                                   jemalloc1  jemalloc2
RSS: After TP5 [+30s, forced GC]:     ~315MB      343MB
RSS: After TP5, tabs closed [+30s]:   ~145MB      224MB

So neither of these is an improvement.  :-/

John, would you mind running once more, with the JE_MALLOC_CONF environment variable set to "narenas:1"?
Comment 52 Justin Lebar (not reading bugmail) 2012-03-21 19:33:52 PDT
A clear improvement with narenas:1, but still above jemalloc1 fragmentation levels (second line).
 
                                    jemalloc1  jemalloc2  narenas:1
 RSS: After TP5 [+30s, forced GC]:     ~315MB      343MB      310MB
 RSS: After TP5, tabs closed [+30s]:   ~145MB      224MB      190MB

I'm pretty surprised by this, tbh.  AIUI, a lot of work has gone into reducing fragmentation since jemalloc1.  But perhaps our jemalloc2's constants aren't tuned as well as our jemalloc1's constants.  Or maybe we're measuring incorrectly.
Comment 53 Jason Evans 2012-03-21 20:31:13 PDT
Thread caching is possibly to blame.  JE_MALLOC_CONF="narenas:1,tcache:false" will turn it off (in addition to using one arena).  The only other relevant way in which configuration differs is lg_dirty_mult.  The version of jemalloc currently in Firefox uses a hard limit on the number of dirty unused pages that is allowed to accumulate, whereas newer versions scale the limit relative to active pages.
Comment 54 Justin Lebar (not reading bugmail) 2012-03-21 20:45:05 PDT
Once we get a build up and running and hooked into about:memory, this should be pretty easy to troubleshoot.
Comment 55 Mike Hommey [:glandium] 2012-03-21 23:49:50 PDT
(In reply to Justin Lebar [:jlebar] from comment #52)
> A clear improvement with narenas:1, but still above jemalloc1 fragmentation
> levels (second line).
>  
>                                     jemalloc1  jemalloc2  narenas:1
>  RSS: After TP5 [+30s, forced GC]:     ~315MB      343MB      310MB

This pretty much confirms my intuition about narenas increasing memory consumption.
Comment 56 John Schoenick [:johns] 2012-03-21 23:57:25 PDT
I did another one with "narenas:1,tcache:false", you need to use the 'nocondense' option to see it since it merges the too-close-together points otherwise (and the only-three-datapoints precludes zooming):

https://areweslimyet.com/?series=jemalloc2&nocondense

These are only a few commands to run on my end, so let me know if you want any others tested
Comment 57 Mike Hommey [:glandium] 2012-03-22 01:11:25 PDT
(In reply to Justin Lebar [:jlebar] from comment #52)
A clear improvement with narenas:1, but still above jemalloc1 fragmentation
levels (second line).
 
                                    jemalloc1  jemalloc2  narenas:1   +tcache:false
 RSS: After TP5 [+30s, forced GC]:     ~315MB      343MB      310MB      301MB
 RSS: After TP5, tabs closed [+30s]:   ~145MB      224MB      190MB      166MB
Comment 58 Mike Hommey [:glandium] 2012-04-02 06:59:59 PDT
Created attachment 611442 [details] [diff] [review]
Move Mozilla fork of jemalloc to memory/mozjemalloc
Comment 59 Mike Hommey [:glandium] 2012-04-02 07:00:31 PDT
Created attachment 611443 [details] [diff] [review]
Import jemalloc dev branch (09a0769)
Comment 60 Mike Hommey [:glandium] 2012-04-02 07:06:22 PDT
Created attachment 611444 [details] [diff] [review]
Build glue for jemalloc2

This queue keeps current jemalloc under memory/mozjemalloc for the unsupported platforms. Note you need to build with MOZ_JEMALLOC set to 1 in order to enable jemalloc instead of mozjemalloc.
Comment 61 :Ms2ger 2012-04-30 07:05:10 PDT
Comment on attachment 611442 [details] [diff] [review]
Move Mozilla fork of jemalloc to memory/mozjemalloc

Review of attachment 611442 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/content/license.html
@@ +1686,5 @@
>  
>      <h1><a id="jemalloc"></a>jemalloc License</h1>
>  
>      <p>This license applies to files in the directory
> +    <span class="path">memory/mozjemalloc/</span>.

I assume this will need to mention the new code as well
Comment 62 Mike Hommey [:glandium] 2012-05-10 03:50:16 PDT
John, could you do another AWSY run with the builds from https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-dc2e9ae667af/ (linux and windows) and https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-ef619df004b2/ (mac) ?

Thanks.
Comment 63 Justin Lebar (not reading bugmail) 2012-05-10 08:18:06 PDT
(In reply to Mike Hommey [:glandium] from comment #62)
> John, could you do another AWSY run with the builds from
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-
> dc2e9ae667af/ (linux and windows) and
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-
> ef619df004b2/ (mac) ?
> 
> Thanks.

AWSY is Linux64 only.  Which is for the best, since without double-purge or "hard" decommit, the Win/Mac numbers should be high.

Do you need any special flags (narenas=1, disable tcache)?
Comment 64 Mike Hommey [:glandium] 2012-05-10 08:45:41 PDT
These builds default to the flags I'm interested in.
Comment 65 John Schoenick [:johns] 2012-05-10 14:36:08 PDT
I ran a test on build dc2e9ae667af, you can see the results here, along with the most recent week of tests for comparison:

https://areweslimyet.com/?series=jemalloc&nocondense

(although some recent builds apparently regressed memory usage so its kind of noisy, the new build is the new test on the right)

As Justin noted, AWSY can only do linux builds at the moment, so I wasn't able to test the windows/mac stuff
Comment 66 Justin Lebar (not reading bugmail) 2012-05-10 14:53:17 PDT
The try build is atop 052109db69ab, which is from May 7.  The regressiony noise didn't start until May 10.  So I read this build's 200mb purple line as a likely regression from May 7's 150mb level.
Comment 67 Mike Hommey [:glandium] 2012-05-14 01:21:38 PDT
Created attachment 623604 [details] [diff] [review]
Move Mozilla fork of jemalloc to memory/mozjemalloc
Comment 68 Mike Hommey [:glandium] 2012-05-14 01:24:17 PDT
Created attachment 623605 [details] [diff] [review]
Import jemalloc 3.0.0

Everything under memory/jemalloc/src is plain jemalloc 3.0.0 release, gotten with the memory/jemalloc/update.sh script.
The license.html changes were discussed with gerv on irc.
Comment 69 Mike Hommey [:glandium] 2012-05-14 01:35:25 PDT
Created attachment 623607 [details] [diff] [review]
Glue for jemalloc 3.0.0

A few notes:
- This doesn't enable jemalloc 3 by default. There are still issues that need to be solved before we can enable it. (double purge for osx and windows, rss usage regression, missing info for about:memory). However, landing this in the current state will allow more testing and tweaking in separate bugs.
- valloc and memalign are not used from mozalloc on windows since bug 738176
- -ENTRY:DllMain was never added to LDFLAGS in mozglue/build because it was set before including rules.mk. It's not a problem, because DllMain is already the default entry.
Comment 70 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2012-05-21 10:15:39 PDT
Comment on attachment 623605 [details] [diff] [review]
Import jemalloc 3.0.0

Review of attachment 623605 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine.  Gerv, could you sign off on the license bits?
Comment 71 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2012-06-06 11:55:54 PDT
Comment on attachment 623607 [details] [diff] [review]
Glue for jemalloc 3.0.0

Review of attachment 623607 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +9029,5 @@
> +      fi
> +    done
> +    ac_configure_args="$ac_configure_args --with-mangling=$MANGLED"
> +  fi
> +  unset CONFIG_FILES

What is this for?

@@ +9043,5 @@
> +  cache_file=$_objdir/memory/jemalloc/src/config.cache
> +  AC_OUTPUT_SUBDIRS(memory/jemalloc/src)
> +  cache_file="$_save_cache_file"
> +  ac_configure_args="$_SUBDIR_CONFIG_ARGS"
> +fi

Should we assert that at most one of MOZ_MEMORY is defined if MOZ_JEMALLOC is?

::: memory/jemalloc/Makefile.in
@@ +21,5 @@
> +
> +CSRCS := $(notdir $(wildcard $(srcdir)/src/src/*.c))
> +ifneq ($(OS_TARGET),Darwin)
> +CSRCS := $(filter-out zone.c,$(CSRCS))
> +endif

Yuck.  Can we just list the source files instead?
Comment 72 Mike Hommey [:glandium] 2012-06-06 12:52:24 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #71)
> Comment on attachment 623607 [details] [diff] [review]
> Glue for jemalloc 3.0.0
> 
> Review of attachment 623607 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: configure.in
> @@ +9029,5 @@
> > +      fi
> > +    done
> > +    ac_configure_args="$ac_configure_args --with-mangling=$MANGLED"
> > +  fi
> > +  unset CONFIG_FILES
> 
> What is this for?

It's modified and exported by the MOZ_TREE_FREETYPE codepath (used on android builds). This alters the files sub-configure generates.

@@ +9043,5 @@
> > +  cache_file=$_objdir/memory/jemalloc/src/config.cache
> > +  AC_OUTPUT_SUBDIRS(memory/jemalloc/src)
> > +  cache_file="$_save_cache_file"
> > +  ac_configure_args="$_SUBDIR_CONFIG_ARGS"
> > +fi
> 
> Should we assert that at most one of MOZ_MEMORY is defined if MOZ_JEMALLOC
> is?

How about making the jemalloc sub-configure run only if both are set. Would that work for you?

> ::: memory/jemalloc/Makefile.in
> @@ +21,5 @@
> > +
> > +CSRCS := $(notdir $(wildcard $(srcdir)/src/src/*.c))
> > +ifneq ($(OS_TARGET),Darwin)
> > +CSRCS := $(filter-out zone.c,$(CSRCS))
> > +endif
> 
> Yuck.  Can we just list the source files instead?

I'd rather not have to change it when we import newer versions. It's pretty much guaranteed that zone.c will remain the only exception for a while.
Comment 73 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2012-06-06 14:02:23 PDT
(In reply to Mike Hommey [:glandium] from comment #72)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #71)
> > Comment on attachment 623607 [details] [diff] [review]
> > Glue for jemalloc 3.0.0
> > 
> > Review of attachment 623607 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: configure.in
> > @@ +9029,5 @@
> > > +      fi
> > > +    done
> > > +    ac_configure_args="$ac_configure_args --with-mangling=$MANGLED"
> > > +  fi
> > > +  unset CONFIG_FILES
> > 
> > What is this for?
> 
> It's modified and exported by the MOZ_TREE_FREETYPE codepath (used on
> android builds). This alters the files sub-configure generates.
> 
> @@ +9043,5 @@
> > > +  cache_file=$_objdir/memory/jemalloc/src/config.cache
> > > +  AC_OUTPUT_SUBDIRS(memory/jemalloc/src)
> > > +  cache_file="$_save_cache_file"
> > > +  ac_configure_args="$_SUBDIR_CONFIG_ARGS"
> > > +fi
> > 
> > Should we assert that at most one of MOZ_MEMORY is defined if MOZ_JEMALLOC
> > is?
> 
> How about making the jemalloc sub-configure run only if both are set. Would
> that work for you?

Yes.

> > ::: memory/jemalloc/Makefile.in
> > @@ +21,5 @@
> > > +
> > > +CSRCS := $(notdir $(wildcard $(srcdir)/src/src/*.c))
> > > +ifneq ($(OS_TARGET),Darwin)
> > > +CSRCS := $(filter-out zone.c,$(CSRCS))
> > > +endif
> > 
> > Yuck.  Can we just list the source files instead?
> 
> I'd rather not have to change it when we import newer versions. It's pretty
> much guaranteed that zone.c will remain the only exception for a while.

Ok, I'll go along with it, even though I don't like it.

Note You need to log in before you can comment on or make changes to this bug.