Last Comment Bug 414946 - Use jemalloc on mac
: Use jemalloc on mac
Status: RESOLVED FIXED
[MemShrink:P1][qa-]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: Trunk
: x86 All
: -- normal with 5 votes (vote)
: mozilla10
Assigned To: Paul Biggar
:
Mentors:
: 496873 580404 584446 659694 (view as bug list)
Depends on: 446096 603655 post2.0 651952 659632 670175 670468 670492 677501 703087
Blocks: 427099 427107 584446 586962 659694 693404 694896 714638
  Show dependency treegraph
 
Reported: 2008-01-30 13:27 PST by Robert Sayre
Modified: 2012-01-02 08:54 PST (History)
72 users (show)
benjamin: wanted1.9.1-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
Use jemalloc on OS X (10.31 KB, patch)
2008-07-18 15:25 PDT, Jason Evans
pavlov: review+
Details | Diff | Splinter Review
Build on OS X (14.71 KB, patch)
2010-08-18 05:49 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Port standalone jemalloc for Mac. (4.44 KB, patch)
2010-08-23 09:56 PDT, Paul Biggar
jasone: feedback+
Details | Diff | Splinter Review
Port standalone jemalloc for Mac (with normal lock) (4.44 KB, patch)
2010-08-24 06:11 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Fix OS X build when using --enable-jemalloc (8.26 KB, application/octet-stream)
2010-08-28 21:48 PDT, Jason Evans
no flags Details
Fix OS X build when using --enable-jemalloc (24.72 KB, patch)
2010-08-29 20:17 PDT, Jason Evans
no flags Details | Diff | Splinter Review
Fix OS X build when using --enable-jemalloc (27.07 KB, patch)
2010-08-30 23:16 PDT, Jason Evans
no flags Details | Diff | Splinter Review
Packaging fixes (1.42 KB, patch)
2010-08-31 21:14 PDT, Jason Evans
no flags Details | Diff | Splinter Review
jemalloc for OS X (28.34 KB, patch)
2010-09-07 00:16 PDT, Jason Evans
no flags Details | Diff | Splinter Review
vmmap output from the jemalloc-enabled build after running a couple of days, all tabs closed (788 bytes, text/plain)
2010-09-20 11:25 PDT, Alex Limi (:limi) — Firefox UX Team
no flags Details
vmmap output from the jemalloc-enabled build after running a couple of days, all tabs closed (257.09 KB, text/plain)
2010-09-20 11:38 PDT, Alex Limi (:limi) — Firefox UX Team
no flags Details
mozconfig for 32 bit builds. (1.00 KB, text/plain)
2010-09-23 07:36 PDT, Paul Biggar
no flags Details
Stack trace from running 32-bit builds (60.82 KB, text/plain)
2010-09-23 08:23 PDT, Paul Biggar
no flags Details
Short program to test that the OS reclaims unused pages. (545 bytes, text/plain)
2010-09-24 02:48 PDT, Paul Biggar
no flags Details
Test whether OS X reclaims like we think (2.78 KB, text/plain)
2010-09-24 08:27 PDT, Paul Biggar
no flags Details
jemalloc on OS X (now with about:memory) (37.61 KB, patch)
2010-09-24 09:16 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Output of vmmap, no arguments (263.54 KB, text/plain)
2010-09-28 23:00 PDT, Alex Limi (:limi) — Firefox UX Team
no flags Details
vmmap with -resident argument added (281.60 KB, text/plain)
2010-09-28 23:01 PDT, Alex Limi (:limi) — Firefox UX Team
no flags Details
…and the "heap" command, as requested by Paul (3.78 KB, text/plain)
2010-09-28 23:02 PDT, Alex Limi (:limi) — Firefox UX Team
no flags Details
jemalloc on OS X (now works on 32-bit) (46.43 KB, patch)
2010-10-06 01:43 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
jemalloc on OS X (now with APSL-mandated comments) (48.09 KB, patch)
2010-10-12 09:37 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
jemalloc on OSX (now without copyright issues) (45.79 KB, patch)
2010-10-20 09:35 PDT, Paul Biggar
pavlov: review+
timeless: review-
jasone: feedback+
Details | Diff | Splinter Review
jemalloc for Mac (but only in known versions) (45.63 KB, patch)
2010-12-22 18:39 PST, Paul Biggar
no flags Details | Diff | Splinter Review
Proposed final patch (45.63 KB, patch)
2011-03-23 19:38 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Final 32-bit bug licked (45.13 KB, patch)
2011-05-21 21:36 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
With fixed visiblility behaviour (45.52 KB, patch)
2011-05-22 16:05 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
final version (43.80 KB, patch)
2011-05-26 07:21 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
fixed on 10.7 (17.85 KB, patch)
2011-06-07 17:16 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Fixed on 10.7, 10.8 etc (49.61 KB, patch)
2011-06-13 19:04 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Works on 10.7 and 10.8 (um, probably) too (50.46 KB, patch)
2011-06-19 20:12 PDT, Paul Biggar
pavlov: review+
Details | Diff | Splinter Review
Decommit properly (2.07 KB, patch)
2011-07-15 17:18 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Build with jemalloc disabled on i386, but enabled on x86-64 (7.36 KB, patch)
2011-08-30 21:49 PDT, Paul Biggar
no flags Details | Diff | Splinter Review
Build with jemalloc disabled on i386, but enabled on x86-64 (1.12 KB, patch)
2011-09-01 05:17 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Splinter Review
Rebase on top of glandium's patches (6.62 KB, patch)
2011-09-01 15:56 PDT, Paul Biggar
khuey: review+
Details | Diff | Splinter Review
Don't link sqlite with mozutils on OSX (987 bytes, patch)
2011-09-08 00:03 PDT, Mike Hommey [:glandium]
khuey: review+
Details | Diff | Splinter Review

Description Robert Sayre 2008-01-30 13:27:37 PST
AIUI, this is going to work a bit differently than it does on windows. We'll do something like define malloc/free as mozMalloc/mozFree (along with the C++ new/delete operators).

We'd do it this way in order to avoid the OS X zone malloc API, which can really slow things down in calls to free().
Comment 1 Robert Sayre 2008-01-30 14:01:14 PST
I understand this may not be a space win. If we can pick up some speed, it might be worth it.
Comment 2 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2008-06-23 13:12:50 PDT
Can we bump this back up so we can use the new OOM work for 1.9.1?
Comment 3 Jason Evans 2008-06-23 13:35:32 PDT
Sure, I'll take a look at it this week.
Comment 4 Jason Evans 2008-07-18 15:25:35 PDT
Created attachment 330314 [details] [diff] [review]
Use jemalloc on OS X

Enable jemalloc on OS X, such that it replaces the default system zone.
Adjust various tuning parameters to values that are appropriate across
platforms.
Comment 5 jag (Peter Annema) 2009-03-09 19:18:57 PDT
Why hasn't this been checked in?
Comment 6 Jason Evans 2009-03-09 21:19:23 PDT
We ran into some stability problems with the patch applied.  The patch is now long obsolete, and inappropriate for checkin.
Comment 7 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2009-06-08 05:54:31 PDT
*** Bug 496873 has been marked as a duplicate of this bug. ***
Comment 8 Joel Reymont (:joelr) 2010-05-07 04:32:18 PDT
Is there a Shark profile showing the slowdown from 'free' on the Mac?
Comment 9 Joel Reymont (:joelr) 2010-05-07 07:40:14 PDT
I'm not able to figure out from this bug what precisely pointed the finger at the system allocator being the culprit on the Mac. Is there more background on the Mac side of this issue?

Here's the discussion on the Darwin-Dev mailing list that I started:

http://groups.google.com/group/darwin-dev/browse_thread/thread/ba19e06923e29aaa

It appears that the 10.6 allocator is already quite optimized for heavy multithreading.
Comment 10 Joel Reymont (:joelr) 2010-05-12 08:01:05 PDT
Highly relevant:

http://cocoawithlove.com/2010/05/look-at-how-malloc-works-on-mac.html
Comment 11 Joel Reymont (:joelr) 2010-05-21 09:21:40 PDT
Rather than swap the memory allocator for an entirely different one, would it make sense to first try to allocate Javascript memory from a different malloc zone?
Comment 12 Robert Sayre 2010-05-21 09:37:53 PDT
(In reply to comment #11)
> Rather than swap the memory allocator for an entirely different one, would it
> make sense to first try to allocate Javascript memory from a different malloc
> zone?

This seems like a fine suggestion, but we need jemalloc working on mac, even if it is not a performance win, because we may decide to lean on it more heavily to manage our JS heap.

I wouldn't mind examining both avenues.
Comment 13 Joel Reymont (:joelr) 2010-05-21 09:40:58 PDT
I apologize for missing a large chunk of research here but -why- do we need jemalloc on the mac to manage our JS heap? Please feel free to point me to other bugs where this is explained.
Comment 14 Robert Sayre 2010-05-21 10:41:26 PDT
(In reply to comment #13)
> I apologize for missing a large chunk of research here but -why- do we need
> jemalloc on the mac to manage our JS heap? Please feel free to point me to
> other bugs where this is explained.

We don't need it currently, it's just one idea we might explore as we evolve the GC. Bug 560818
is one example. Further in the future, we may want to reuse a large part of jemalloc if move to a copying collector. 

Also, we've seen plenty of differences in behavior between the system allocator and jemalloc on Linux and Windows, and it will help us diagnose performance problems if we can compare the behavior of the system allocator to jemalloc. The bottom line is that we need jemalloc to work on Mac, even if we don't end up using it.
Comment 15 Joel Reymont (:joelr) 2010-05-21 12:17:44 PDT
Is there a test or sequence of steps in Firefox that I can use to benchmark
memory allocators? I'm looking for a page to open, JS script to run, etc.
Comment 16 Brendan Eich [:brendan] 2010-05-21 12:35:19 PDT
Oh no, benchmarking. SunSpider and V8 are too "industry-standard" benchmarks the major browser JS implementors are tuning for, or at least worrying about due to bench-marketing. They're fairly bogus based on real-world VM-logging studies done by Jan Vitek and his students at Purdue. But we are stuck with them for now.

Another issue than perf is fragmentation. Stuard had some visualizations and a test harness (IIRC schrep helped on it) up on his blog in Firefox 3 beta era, showing victory via jemalloc, and better behavior than other browsers. We need this kind of test infra running 24x7. Is it yet?

/be
Comment 17 Stuart Parmenter 2010-05-21 15:20:17 PDT
Lets just get us using jemalloc everywhere on mac.
Comment 18 Alex Limi (:limi) — Firefox UX Team 2010-08-04 12:06:42 PDT
Another reason (whether it'll help or not, I don't know):
https://bugzilla.mozilla.org/show_bug.cgi?id=584446
Comment 19 Paul Biggar 2010-08-17 03:08:55 PDT
*** Bug 580404 has been marked as a duplicate of this bug. ***
Comment 20 Paul Biggar 2010-08-17 03:22:58 PDT
joelr: I've started looking at this now, let me know if you're still working on it.
Comment 21 Paul Biggar 2010-08-17 03:23:30 PDT
Quoting from pavlov in bug 580404:

<quote>
It should be pretty trivial.  Most of the code runs, the biggest issue is just
how to hook it up to the system.  We've got different ways we use on each OS. 
On Windows, we replace the CRT which works great.  On Linux, we can just build
a dynamic library, link to it, and the dynamic linker takes over.  On Mac, we
wrote code to hook in to the OS's zone system.  Check out:
http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#6380

Basically on the Mac, there is this zone system, which is basically just a list
of things that can allocate.  It has some weird characteristics though, such as
free() walks through each zone and asks if the zone owns the object and if so
then calls the zone's free function.  This means that there are a number of
bugs in MacOS where the OS double frees things, beucase if nothing owns it,
nothing frees it!  We optimized that a bit, so that we could check as well.

Because the Mac dynamic linker doesn't work the same as Linux, it is difficult
to get our code to load fast enough to replace the default zone before other
allocations happen.  We ended up using the linker trick here:
http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/Makefile.in#141
to get it to happen early, but it wasn't quite perfect.

If you can get the stuff above to work well, it would probably be best, but
alternatively you could use #defines or see if GCC's wrap stuff works (like
we're using on Android) to just overwrite allocations throughout all of
Mozilla's code and let the OS/system libraries allocate things in its own
space.

Anyways, I don't think any of this would be a ton of work.  Maybe a week of
trying a few different things and running it through our unit tests?  We had
jemalloc running on Mac, so at worst you're just looking at a little bitrot of
ifdefs in the code.
</quote>
Comment 22 Paul Biggar 2010-08-18 05:49:47 PDT
Created attachment 466996 [details] [diff] [review]
Build on OS X

This makes Firefox build again on Mac. It's not tested to see if it will even run.

It's basically the old patch, provided above by jasone, with a few small build system hacks, and a fix for the bitrot.
Comment 23 Paul Biggar 2010-08-20 04:46:10 PDT
jemalloc doesn't build as a library on the Mac, because PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is not defined. Apparently, OS X doesn't have the right kind of mutexes, so this is non-trivial to fix.

I'm guessing that I can workaround this by using NSPR, so I'm going to try importing directly into the Mozilla sources (bug 580408) first.
Comment 24 Jason Evans 2010-08-21 18:48:54 PDT
PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is a Linux-specific performance hack.  We just need to be able to statically initialize the mutex, which should still be possible on OS X.
Comment 25 Andreas Gal :gal 2010-08-21 20:04:35 PDT
The pthreads spec says this should work:

pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
Comment 26 Paul Biggar 2010-08-23 09:56:55 PDT
Created attachment 468335 [details] [diff] [review]
Port standalone jemalloc for Mac.

Coming from a different angle, this makes the standalone jemalloc (from git://canonware.com/jemalloc.git) compile on Mac.

There aren't any tests, so I don't know whether this works.

Not compiled on any other platform.
Comment 27 Jason Evans 2010-08-23 21:27:55 PDT
Comment on attachment 468335 [details] [diff] [review]
Port standalone jemalloc for Mac.

Wow, I'm surprised it took so little to get the source to compile.  The patch looks good to me, other than that we should use PTHREAD_MUTEX_DEFAULT rather than PTHREAD_MUTEX_RECURSIVE.

There's also quite a bit of zone-related code that needs to be integrated for jemalloc to actually do anything useful on OS X.  First though I want to get the code that's already in mozilla-central working.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2010-08-24 05:47:45 PDT
I assume the zone-related code is something like this:
http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_mac.mm?annotate=51407#l610
Comment 29 Paul Biggar 2010-08-24 06:08:03 PDT
(In reply to comment #27)
> Wow, I'm surprised it took so little to get the source to compile.  The patch
> looks good to me, other than that we should use PTHREAD_MUTEX_DEFAULT rather
> than PTHREAD_MUTEX_RECURSIVE.

Is there any way to test this stuff? I'm feel like I'm flying a bit blind.
Comment 30 Paul Biggar 2010-08-24 06:11:34 PDT
Created attachment 468660 [details] [diff] [review]
Port standalone jemalloc for Mac (with normal lock)

Use normal lock.
Comment 31 Jason Evans 2010-08-25 22:47:31 PDT
(In reply to comment #28)
> I assume the zone-related code is something like this:
> http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_mac.mm?annotate=51407#l610

Actually, memory/jemalloc/jemalloc.c already has the zone-related code, but I haven't integrated it with the stand-alone jemalloc yet.
Comment 32 Alex Limi (:limi) — Firefox UX Team 2010-08-28 17:26:57 PDT
Any chance of a tryserver or plain binary build for the Mac? (32-bit or 64-bit)

I'm happy to run it through my standard torture test over the next few days to see if it helps with memory fragmentation and release.
Comment 33 Jason Evans 2010-08-28 17:37:32 PDT
I just got Firefox+jemalloc building and running on OS X a couple of hours ago.  Once I'm a bit more confident about the changes I'll attach a patch for review.
Comment 34 Jason Evans 2010-08-28 21:48:06 PDT
Created attachment 470247 [details]
Fix OS X build when using --enable-jemalloc

This patch fixes the build on OS X when --enable-jemalloc is specified (still not enabled by default).  I built/ran Firefox on OS X and Linux with this patch applied, with no apparent issues.

I know very little about the current memory fragmentation problems on OS X; can someone fill me in on the problems?  Are the problems specific to OS X 10.6?  JavaScript?
Comment 35 Jason Evans 2010-08-29 20:17:29 PDT
Created attachment 470361 [details] [diff] [review]
Fix OS X build when using --enable-jemalloc

This updated patch incorporates/updates the jemalloc_dzone code from the older patches associated with this bug, but leaves the malloc_zones order alone.  The result is that nearly all memory allocation is performed via jemalloc, but the odd allocation that came from the system szone is still safely handled.

I tortured a debug Firefox with this patch incorporated, and saw no stability issues over the course of 45 minutes.  For lack of an easy alternative, I used sunspider to compare speed and memory usage.  Memory usage decreased by ~4%, and speed decreased by ~2%.
Comment 36 Paul Biggar 2010-08-30 03:05:56 PDT
Comment on attachment 470361 [details] [diff] [review]
Fix OS X build when using --enable-jemalloc

I'm afraid I don't know the code well enough to review this yet. Stuart?
Comment 37 Paul Biggar 2010-08-30 03:31:23 PDT
(In reply to comment #34)
> I know very little about the current memory fragmentation problems on OS X; can
> someone fill me in on the problems?  Are the problems specific to OS X 10.6? 
> JavaScript?

The fragmentation problems are documented in bug 584446 (check out the attachments). (Bug 584446 mention tagged pointers, which should be gone when fatvals lands tomorrow I think, on the tracemonkey branch).
Comment 38 Paul Biggar 2010-08-30 09:58:12 PDT
(In reply to comment #35)
> Created attachment 470361 [details] [diff] [review]
> Fix OS X build when using --enable-jemalloc

I pushed this to tryserver, and it doesn't build on most platforms (http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry, search for e41b94886585). I presume this was known.

To test, I'd really like to get a working version to :limi. Can you push to tryserver when working?
Comment 39 Jason Evans 2010-08-30 14:34:41 PDT
(In reply to comment #38)
> I pushed this to tryserver, and it doesn't build on most platforms
> (http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry, search for
> e41b94886585). I presume this was known.

Nope I didn't know about the build problems because I had to get my hg account re-enabled yesterday (now taken care of).  It looks to me like only 2 of the ~14 platforms suffered a build failure for your tryserver push... Am I looking at the wrong info?
Comment 40 Paul Biggar 2010-08-30 15:54:44 PDT
(In reply to comment #39)
> Nope I didn't know about the build problems because I had to get my hg account
> re-enabled yesterday (now taken care of).  It looks to me like only 2 of the
> ~14 platforms suffered a build failure for your tryserver push... Am I looking
> at the wrong info?

No, you're looking at the right info, my mistake (when I wrote the comment only those two had failed, so "most" meant windows and mac).
Comment 41 Ted Mielczarek [:ted.mielczarek] 2010-08-30 16:08:18 PDT
Note that most of those "builds" are actually just test runs. If your build failed on Windows and Mac, then the only thing it succeeded in building was Linux.
Comment 42 Paul Biggar 2010-08-30 16:14:57 PDT
(In reply to comment #41)
> Note that most of those "builds" are actually just test runs. If your build
> failed on Windows and Mac, then the only thing it succeeded in building was
> Linux.

Everything is green except OS X opt and Win2003 opt.
Comment 43 Jason Evans 2010-08-30 23:16:36 PDT
Created attachment 470707 [details] [diff] [review]
Fix OS X build when using --enable-jemalloc

This updated patch probably fixes the build failures on OS X and Windows 2003, though I pushed enough bad intermediate tryserver jobs that it's going to be a while before we know for sure.
Comment 44 Alex Limi (:limi) — Firefox UX Team 2010-08-31 15:20:32 PDT
Starting to test whether it'll solve the issues with memory fragmentation that I was seeing, will know in a few days, I guess.

I'm using this build, let me know if it's not the right one:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-fbbadab94d7d/tryserver-macosx/
Comment 45 Alex Limi (:limi) — Firefox UX Team 2010-08-31 16:42:11 PDT
Since this now has a new patch, I'm clearing that whiteboard value (if this isn't how it works, let me know! :)
Comment 46 Jason Evans 2010-08-31 17:23:58 PDT
No, that build is missing the patch to enable jemalloc on OS X.  You want this one instead:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-c7a064be4e25/tryserver-macosx/
Comment 47 Alex Limi (:limi) — Firefox UX Team 2010-08-31 18:09:26 PDT
Thanks! Good thing I checked. ;)
Comment 48 Alex Limi (:limi) — Firefox UX Team 2010-08-31 18:17:16 PDT
(In reply to comment #46)
> No, that build is missing the patch to enable jemalloc on OS X.  You want this
> one instead:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-c7a064be4e25/tryserver-macosx/

Crashes on load (note that I renamed the app bundle to not conflict with my other installations) with:

Process:         firefox-bin [151]
Path:            /Applications/jemalloc.app/Contents/MacOS/firefox-bin
Identifier:      org.mozilla.minefield
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  launchd [99]

Date/Time:       2010-08-31 18:14:35.723 -0700
OS Version:      Mac OS X 10.6.4 (10F569)
Report Version:  6

Interval Since Last Report:          346128 sec
Crashes Since Last Report:           4
Per-App Crashes Since Last Report:   3
Anonymous UUID:                      28570194-9B5C-405F-A8D5-818F3F658117

Exception Type:  EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000
Crashed Thread:  0

Dyld Error Message:
  Library not loaded: @executable_path/libjemalloc.dylib
  Referenced from: /Applications/jemalloc.app/Contents/MacOS/firefox-bin
  Reason: image not found
Comment 49 Nick Thomas [:nthomas] 2010-08-31 21:08:03 PDT
Probably need a packaging change to
  http://hg.mozilla.org/mozilla-central/file/tip/browser/installer/package-manifest.in
to include libjemalloc.dylib in the dmg.
Comment 50 Jason Evans 2010-08-31 21:12:10 PDT
There's some build system magic that is apparently necessary for a library to be included in the final package.  I have another tryserver build in process now that will probably result in a useful package at:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-a5313f943162/tryserver-macosx/
Comment 51 Jason Evans 2010-08-31 21:14:37 PDT
Created attachment 471023 [details] [diff] [review]
Packaging fixes

Some of the changes in this patch may be unnecessary.  Can someone more knowledgeable about the build system comment on which parts to integrate into the main patch?
Comment 52 Jason Evans 2010-09-01 08:39:35 PDT
(In reply to comment #50)
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-a5313f943162/tryserver-macosx/

This package does have libjemalloc.dylib in it, but a crash occurs shortly after startup:

#0  0x97f978d8 in szone_free_definite_size ()
#1  0x97f96b88 in free ()
#2  0x97fbafec in fclose ()
#3  0x00dc6104 in nsINIParser_internal::Init ()^M#4  0x0000003f in ?? ()
Previous frame inner to this frame (gdb could not unwind past this frame)

The MinefiledDebug package I built on my own machine last night worked fine...  I'll poke at things some more this evening.
Comment 53 Alex Limi (:limi) — Firefox UX Team 2010-09-01 20:01:27 PDT
Any try server builds available yet? Deadline approaching, so eager to start testing. :)
Comment 54 Jason Evans 2010-09-01 21:15:38 PDT
Is this good enough for testing?:

http://www.canonware.com/~jasone/mozilla/firefox-4.0b5pre.en-US.mac64.dmg
Comment 55 Alex Limi (:limi) — Firefox UX Team 2010-09-01 21:27:26 PDT
Hasn't crashed so far, so at least I can give it a try. I'll report back in a day or so. Thanks! :)
Comment 56 Jason Evans 2010-09-01 23:08:13 PDT
What's the next step?  I can build OS X packages that seem to work, but the tryserver package crashes on startup, and if I copy the .mozconfig used for the tryserver build I get a build error.
Comment 57 Nick Thomas [:nthomas] 2010-09-02 01:35:16 PDT
Hopefully you're just looking at the 64bit builds coming of the 10.6 builder on try, which is using XCode 3.2.1. How does that compare to your machine ? Could you post the build error ?
Comment 58 Jason Evans 2010-09-07 00:16:06 PDT
Created attachment 472560 [details] [diff] [review]
jemalloc for OS X

This patch fixes a couple of zone-related issues as compared to the previous one.  There is still some sort of problem with running a 32-bit 10.5 binary on 10.6 (crashreporter intervenes), but since the only way I can build such binaries right now is via the try server, I haven't had time to dig in any further.

This patch feels really close to being done.
Comment 59 Jason Evans 2010-09-07 00:19:43 PDT
(In reply to comment #57)
> Hopefully you're just looking at the 64bit builds coming of the 10.6 builder on
> try, which is using XCode 3.2.1. How does that compare to your machine ? Could
> you post the build error ?

I was trying to run a 32-bit build (10.5 builder) on 10.6.  I'm running OS X 10.6.4, with XCode 3.2.3.
Comment 60 Jason Evans 2010-09-12 12:30:22 PDT
To summarize:

* I have been using a 64-bit optimized Minefield based on patches attached to this bug for about two weeks, with zero problems.

* The latest patch results in a completely successful tryserver build/test.

* I tested the .dmg packages built by tryserver on my OS X 10.6.4 system, with the following results:
  - Works: 64-bit opt (10.6)
  - Works: 64-bit debug (10.6)
  - Fails: 32-bit opt (10.5)
  - Works: 32-bit debug (10.5)
I fixed several problems regarding malloc zone structure compatibility that showed up in earlier rounds of tryserver testing, but I see nothing useful in the crashes I get from tryserver packages built with the latest patch.  I don't have OS X 10.5, so for me to do further testing of 10.5 builds would require substantial additional effort (namely setting up an additional development environment).  If this bug is a priority, I hope that someone with broader platform access can look into this final niggle.

* We don't have any data I'm aware of that indicates whether jemalloc improves the current fragmentation problem on OS X.  The current version of this patch is certainly solid enough for such measurement.  In the absence of such data, this change is too risky to justify inclusion in a late-stage beta release (though it really should be committed in the long run, even if jemalloc is disabled by default).
Comment 61 Mike Beltzner [:beltzner, not reading bugmail] 2010-09-13 10:29:13 PDT
Thanks for the analysis, Jason - moving target to future.
Comment 62 Paul Biggar 2010-09-13 18:05:13 PDT
(In reply to comment #60)
> I fixed several problems regarding malloc zone structure compatibility that
> showed up in earlier rounds of tryserver testing, but I see nothing useful in
> the crashes I get from tryserver packages built with the latest patch.  I don't
> have OS X 10.5, so for me to do further testing of 10.5 builds would require
> substantial additional effort (namely setting up an additional development
> environment).  If this bug is a priority, I hope that someone with broader
> platform access can look into this final niggle.

I'm on it, with my shiny new mac.


> * We don't have any data I'm aware of that indicates whether jemalloc improves
> the current fragmentation problem on OS X.  The current version of this patch
> is certainly solid enough for such measurement.  In the absence of such data,
> this change is too risky to justify inclusion in a late-stage beta release
> (though it really should be committed in the long run, even if jemalloc is
> disabled by default).

Alexander, can you evaluate this?
Comment 63 Alex Limi (:limi) — Firefox UX Team 2010-09-14 13:32:47 PDT
(In reply to comment #62)
> Alexander, can you evaluate this?

Yes, I'm currently running the build from:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-aa8dd97a92eb/tryserver-macosx64/

…and will report back once I have enough vmmap data + about:memory output to see if it helps fragmentation.

(let me know if this is not the right build to test)
Comment 64 Alex Limi (:limi) — Firefox UX Team 2010-09-14 13:51:32 PDT
Jason,

Turns out, about:memory doesn't report the numbers I need for this comparison with the build mentioned in comment 63.

The entirety of the output looks like this:

Other Information
xpconnect/js/gcchunks      51,380,224
storage/sqlite/pagecache   28,692,808
storage/sqlite/other 1,619,856
images/chrome/used/raw  0
images/chrome/used/uncompressed   271,752
images/chrome/unused/raw   0
images/chrome/unused/uncompressed   0
images/content/used/raw 180,521
images/content/used/uncompressed 287,820
images/content/unused/raw  4,570
images/content/unused/uncompressed  8,192
layout/all  9,522,748
layout/bidi 7,796
gfx/surface/image 556,384
content/canvas/2d_pixel_bytes 0

Note how all the malloc/* data is missing, as well as the topmost "Overview" section.

If you can post a tryserver build with these hooked up (you need to tell it that jemalloc is enabled, according to Vlad), I can get some real numbers on this.
Comment 65 Paul Biggar 2010-09-14 14:04:41 PDT
(In reply to comment #64)
> Note how all the malloc/* data is missing, as well as the topmost "Overview"
> section.
> 
> If you can post a tryserver build with these hooked up (you need to tell it
> that jemalloc is enabled, according to Vlad), I can get some real numbers on
> this.

I'm doing this now.
Comment 66 Paul Biggar 2010-09-14 14:09:18 PDT
(In reply to comment #60)
>   - Fails: 32-bit opt (10.5)

I can reproduce this. Working on it.
Comment 67 Alex Limi (:limi) — Firefox UX Team 2010-09-15 16:24:03 PDT
Paul got me a working build with the malloc + Overview data, so I'm running that for the next days to see if vmmap and that output gives us anything useful.
Comment 68 Alex Limi (:limi) — Firefox UX Team 2010-09-20 11:25:44 PDT
Created attachment 476860 [details]
vmmap output from the jemalloc-enabled build after running a couple of days, all tabs closed

OK, so here's the output from about:memory, and I've attached the vmmap output from the process.

about:memory:

Memory mapped:
682,622,976

Memory in use:
379,272,540

malloc/allocated               379,339,996
malloc/mapped                  682,622,976
malloc/committed               682,622,976
malloc/dirty                     3,194,880
xpconnect/js/gcchunks           82,837,504
storage/sqlite/pagecache        53,719,744
storage/sqlite/other             2,252,296
images/chrome/used/raw0images/chrome/used/uncompressed   597,440
images/chrome/unused/raw0images/chrome/unused/uncompressed  3,072
images/content/used/raw            341,833
images/content/used/uncompressed 1,343,528
images/content/unused/raw            7,398
images/content/unused/uncompressed  16,384
layout/all                         771,736
layout/bidi0gfx/surface/image    1,990,284
content/canvas/2d_pixel_bytes    8,044,872
Comment 69 Alex Limi (:limi) — Firefox UX Team 2010-09-20 11:38:02 PDT
Created attachment 476865 [details]
vmmap output from the jemalloc-enabled build after running a couple of days, all tabs closed

Sorry, attached the wrong file earlier.
Comment 70 Vladimir Vukicevic [:vlad] [:vladv] 2010-09-20 11:40:05 PDT
(In reply to comment #68)> malloc/allocated               379,339,996> malloc/mapped                  682,622,976> malloc/committed               682,622,976I know that on OSX we don't have a way to force the OS to reclaim unused pages, but is the memory actually set up in such a way that the OS will do it under pressure?  That is, above, will the OS take back the 300 or so MB that we're not actually using, if another app needs it?
Comment 71 Alex Limi (:limi) — Firefox UX Team 2010-09-20 22:40:24 PDT
(In reply to comment #70)
> (In reply to comment #68)> malloc/allocated               379,339,996>
> malloc/mapped                  682,622,976> malloc/committed              
> 682,622,976
> I know that on OSX we don't have a way to force the OS to reclaim
> unused pages, but is the memory actually set up in such a way that the OS will
> do it under pressure?  That is, above, will the OS take back the 300 or so MB
> that we're not actually using, if another app needs it?

Josh might know, adding him to CC.
Comment 72 Josh Aas 2010-09-20 22:49:13 PDT
(In reply to comment #71)
> Josh might know, adding him to CC.

I don't know.
Comment 73 Jason Evans 2010-09-21 10:47:03 PDT
(In reply to comment #70)
> (In reply to comment #68)> malloc/allocated               379,339,996>
> malloc/mapped                  682,622,976> malloc/committed              
> 682,622,976I know that on OSX we don't have a way to force the OS to reclaim
> unused pages, but is the memory actually set up in such a way that the OS will
> do it under pressure?  That is, above, will the OS take back the 300 or so MB
> that we're not actually using, if another app needs it?

I think so, but I haven't run experiments to verify this.  See comment #5 in bug #584446 for details.
Comment 74 Paul Biggar 2010-09-23 07:29:42 PDT
(In reply to comment #60)
> * I tested the .dmg packages built by tryserver on my OS X 10.6.4 system, with
> the following results:
>   - Works: 64-bit opt (10.6)
>   - Works: 64-bit debug (10.6)
>   - Fails: 32-bit opt (10.5)
>   - Works: 32-bit debug (10.5)

I'm trying to understand this better.  Are you saying that when you pushed to tryserver, 32bit builds worked, but only the dmgs did not?

For me, everything 32bit fails (opt, build, combinations thereof).
Comment 75 Paul Biggar 2010-09-23 07:36:27 PDT
Created attachment 477924 [details]
mozconfig for 32 bit builds.

This is the mozconfig I'm using to build 32-bit firefox, which fails on load. It varies from the one on the wiki, largely by using i386-apple-darwin9.2.0 as the target, and drawf-2 as the debug symbol type, which I copied from the tryserver logs.

Every variation of --{enable,disable}-{debug,optimize} fails. I'm executing this as objdir/dist/Minefield.app.Contents/MacOS/firefox-bin or objdir/dist/MinefieldDebug.app.Contents/MacOS/firefox-bin.
Comment 76 Paul Biggar 2010-09-23 08:23:59 PDT
Created attachment 477942 [details]
Stack trace from running 32-bit builds

This is pretty much the same stack trace I get from any 32-bit build. The interesting partis here:

44  com.apple.AppKit                0x9021e811 +[NSBundle(NSNibLoading) loadNibFile:externalNameTable:withZone:] + 158
45  XUL                             0x00c99115 nsAppShell::Init() + 821 (nsAppShell.mm:294)

which touches on Zone stuff. In widget/src/cocoa/nsAppShell.mm:

  [NSBundle
    loadNibFile: [NSString stringWithUTF8String:(const char*)nibPath.get()]
    externalNameTable: [NSDictionary 
                          dictionaryWithObject:[NSApplication sharedApplication]
                          forKey:@"NSOwner"]
    withZone:NSDefaultMallocZone()];

Note the NSDefaultMallocZone.
Comment 77 Paul Biggar 2010-09-23 08:30:18 PDT
(In reply to comment #58)
> Created attachment 472560 [details] [diff] [review]
> jemalloc for OS X
> 
> This patch fixes a couple of zone-related issues as compared to the previous
> one.  There is still some sort of problem with running a 32-bit 10.5 binary on
> 10.6 (crashreporter intervenes), but since the only way I can build such
> binaries right now is via the try server, I haven't had time to dig in any
> further.
> 
> This patch feels really close to being done.



Are you sure this is the patch you pushed to tryserver?  It crashes for me in debug since malloc_initialized is only set after both these calls, which assert that it's true.

 #ifdef MOZ_MEMORY_DARWIN
        /* Register the custom zone. */
        malloc_zone_register(create_zone());
 
        /*
         * Convert the default szone to an "overlay zone" that is capable
         * of deallocating szone-allocated objects, but allocating new
         * objects from jemalloc.
         */
        szone2ozone(malloc_default_zone());
 #endif
 
       malloc_initialized = true;


I wonder if I've been trying to debug the wrong problem?
Comment 78 Paul Biggar 2010-09-23 08:47:37 PDT
Just so that we're on the same page, the remaining problems, and the people solving/working on them, are:

vlad:
- we dont know if this solves the fragmentation problem (comment 70)

pbiggar (vlad and jasone were looking at this too):
-  we dont know if the OS reclaims unused pages (comment 73)

pbiggar (need feedback from jasone):
- I can't get any 32bit build to work (comment 75)
- jasone's tryserver build failed for 32bit optimized only (comment 60)

jasone (waiting for feedback from nthomas):
- the tryserver dmgs fail due to a packaging problem (comment 52)
Comment 79 Paul Biggar 2010-09-24 02:48:57 PDT
Created attachment 478211 [details]
Short program to test that the OS reclaims unused pages.

(In reply to comment #78)
> pbiggar (vlad and jasone were looking at this too):
> -  we dont know if the OS reclaims unused pages (comment 73)

Running the attached program (alloc1), we can see that the OS returns the memory. I'm currently using 2.8GB of a 4GB mac, and alloc1 mallocs 1GB then writes into it so the OS actually allocated it. This registers in Activity Monitor as using 1GB. As soon as free() is called, the Activity Monitor shows an extra 1GB of free memory, and lists alloc1 as using 340KB only.

This is on 10.6. I assume this is good enough, and that I don't need to check for 10.5 (and probably can't without a 10.5 OS anyway).
Comment 80 Paul Biggar 2010-09-24 02:55:24 PDT
(In reply to comment #79)
> Running the attached program (alloc1), we can see that the OS returns the
> memory. 

Or I completely misunderstood the problem. My bad. Looking at this properly now.
Comment 81 Paul Biggar 2010-09-24 03:48:49 PDT
(In reply to comment #70)
> (In reply to comment #68)> malloc/allocated               379,339,996>
> malloc/mapped                  682,622,976> malloc/committed              
> 682,622,976
> I know that on OSX we don't have a way to force the OS to reclaim
> unused pages, but is the memory actually set up in such a way that the OS will
> do it under pressure?  That is, above, will the OS take back the 300 or so MB
> that we're not actually using, if another app needs it?

I confess that I don't fully understand what question is being asked. The most obvious have obvious answers (yes and yes):

- are allocations which are never written to reused by the OS
- are allocations which are freed reused by the OS


So that leaves complicated ones:

- if we realloc an allocation to use less space, does the OS reclaim the now unused pages?
- if we only use a small portion of an allocation, is all the allocated space used? (I would think not, but we'll see)

I'll get answers to them now.
Comment 82 Paul Biggar 2010-09-24 06:29:15 PDT
IIn reply to comment #81)
> I confess that I don't fully understand what question is being asked.

Ah, the question is from bug 584446, esp comments 4 and 5. So the question is that if we allocate memory, use it, then no longer need most of it, can we give it back. In the context of jemalloc, this means we take a very large allocation, hand it out to smaller user-space allocations, get those allocations back, and want to return them to the OS. Specifically, we want check whether msync(MS_KILLPAGES) does this.
Comment 83 Paul Biggar 2010-09-24 08:27:16 PDT
Created attachment 478270 [details]
Test whether OS X reclaims like we think

This program allows us test whether memory is being reclaimed. Procedure:

Compile with gcc: gcc -c alloc2.c -o alloc2
Run in the mode you'd like to test in one terminal: ./alloc2 lk
Wait for the action, then run in "normal" mode in other terminal: ./alloc2 ln

See what happens to the swap in the OS X Activity monitor.

This is hardcoded to allocate 2GB, so you should have (a lot) less than 2GB of free RAM to test this. Else change the hard coding.

Results to follow.
Comment 84 Paul Biggar 2010-09-24 09:13:54 PDT
(In reply to comment #83)
> Results to follow.

So it looks like the correct way to reclaim pages is madvise(MADV_FREE). I also tried madvise(MADV_DONTNEED), msync(MS_DEACTIVATE) and msync(MS_KILLPAGES). Tried with both mmap and malloc, and it held up for both.

Note that everything on the internet says the opposite, so this means my results really need to be double-checked. For reference, here's my system info:

Version 10.6.4

$ uname -a
Darwin valrhona.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386 MacBookPro6,2 Darwin

 $ gcc --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)


For reference, the internet disagrees, so this must have changed relatively recently.

http://lists.apple.com/archives/darwin-development/2003/Apr/msg00223.html
http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/ (madvise does nothing)

In our jemalloc code we have this (use msync(MS_KILLPAGES) rather than madvise(MADV_FREE)):

#elif (defined(MOZ_MEMORY_DARWIN))
        msync((void *)((uintptr_t)chunk + (i <<
            pagesize_2pow)), pagesize * npages,
            MS_KILLPAGES);
#else
        madvise((void *)((uintptr_t)chunk + (i <<
            pagesize_2pow)), (npages << pagesize_2pow),
            MADV_FREE);
#endif


My worry here is that different versions of the platforms use different conventions, which would be pretty awkward.
Comment 85 Paul Biggar 2010-09-24 09:16:23 PDT
Created attachment 478290 [details] [diff] [review]
jemalloc on OS X (now with about:memory)
Comment 86 Paul Biggar 2010-09-24 09:19:30 PDT
(In reply to comment #85)
> Created attachment 478290 [details] [diff] [review]
> jemalloc on OS X (now with about:memory)

This is jasone's, with support for seeing jemalloc stats in about:memory. This was the patch used for the build limi discusses in comment 69.
Comment 87 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-09-24 09:28:03 PDT
You might ask on the Apple perf list about the memory discarding stuff.

http://lists.apple.com/mailman/listinfo/perfoptimization-dev
Comment 88 Alex Limi (:limi) — Firefox UX Team 2010-09-24 15:56:10 PDT
*** Bug 584446 has been marked as a duplicate of this bug. ***
Comment 89 Paul Biggar 2010-09-28 12:06:00 PDT
It seems like the linker flag -dead_strip is the problem with the 32-bit optimized build. That's the only difference between a build with --enable-optimize=-O0 --enable-debug and one built --disable-optimize --enable-debug, and the latter works.
Comment 90 Vladimir Vukicevic [:vlad] [:vladv] 2010-09-28 19:59:49 PDT
(In reply to comment #84)
> My worry here is that different versions of the platforms use different
> conventions, which would be pretty awkward.

Well, luckily there aren't that many different versions in play here.. we can easily make the way to free be a runtime choice that we make based on OS version, right?  If you can put together a small standalone test program, we should be able to collect all the data that we need quickly.
Comment 91 Alex Limi (:limi) — Firefox UX Team 2010-09-28 23:00:00 PDT
Created attachment 479315 [details]
Output of vmmap, no arguments

The corresponsing about:memory output after running Paul's new build for a few days:

Memory mapped:
707,788,800

Memory in use:
405,449,436

malloc/allocated                 405,514,844
malloc/mapped                    707,788,800
malloc/committed                 707,788,800
malloc/dirty                       2,555,904
xpconnect/js/gcchunks             99,614,720
storage/sqlite/pagecache          58,766,768
storage/sqlite/other               2,450,704
images/chrome/used/raw0images/chrome/used/uncompressed     881,608
images/chrome/unused/raw0images/chrome/unused/uncompressed   3,072
images/content/used/raw              101,302
images/content/used/uncompressed   1,176,316
images/content/unused/raw            276,447
images/content/unused/uncompressed 4,959,808
layout/all                           721,448
layout/bidi                                0
gfx/surface/image                  7,021,944
content/canvas/2d_pixel_bytes              0
Comment 92 Alex Limi (:limi) — Firefox UX Team 2010-09-28 23:01:24 PDT
Created attachment 479316 [details]
vmmap with -resident argument added

Note that all of the vmmap/heap added this to stdout when run:

2010-09-28 22:56:42.035 vmmap[7915:903] *** Symbolication:  Don't know how to introspect target process's malloc zone named jemalloc_ozone

2010-09-28 22:56:42.039 vmmap[7915:903] *** Symbolication:  Don't know how to introspect target process's malloc zone named jemalloc_zone
Comment 93 Alex Limi (:limi) — Firefox UX Team 2010-09-28 23:02:56 PDT
Created attachment 479318 [details]
…and the "heap" command, as requested by Paul
Comment 94 Paul Biggar 2010-09-29 09:37:55 PDT
(In reply to comment #89)
> It seems like the linker flag -dead_strip is the problem with the 32-bit

Nope, my mistake. The current stack trace I'm getting (with CFQSortArray at the top) occurs on all 32-bit builds with jasone's patch.
Comment 95 Paul Biggar 2010-10-06 01:43:03 PDT
Created attachment 481164 [details] [diff] [review]
jemalloc on OS X (now works on 32-bit)

This fixes the problems with 32-bit, I believe. (Actually, this fixes the problems with running 32-bit build on 10.6, so we're waiting for the verdict from TryServer).

When we compile against 10.5 headers, the malloc_zone_t struct is smaller than when we link against 10.6 headers. However, at run-time on 10.6, we're passed the 10.6 struct, which we copy into a struct which is too small. When OS X checks for the missing fields, we segfault. I'm not sure why it only happened where it did, perhaps it's the first call to memalign.

This fixes it by using the correct size, predicated on the version field.

We copy from the Apple headers to know both sizes no matter which SDK we use, so this may need some legal thing, which I'll look into next.
Comment 96 Paul Biggar 2010-10-07 07:08:22 PDT
(In reply to comment #79)
> This is on 10.6. I assume this is good enough, and that I don't need to check
> for 10.5 (and probably can't without a 10.5 OS anyway).

I've confirmed this on 10.5 too. So on OSX, madvise(MADV_FREE) works.
Comment 97 Paul Biggar 2010-10-07 07:15:44 PDT
(In reply to comment #92)
> Created attachment 479316 [details]
> vmmap with -resident argument added
> 
> Note that all of the vmmap/heap added this to stdout when run:
> 
> 2010-09-28 22:56:42.035 vmmap[7915:903] *** Symbolication:  Don't know how to
> introspect target process's malloc zone named jemalloc_ozone
> 
> 2010-09-28 22:56:42.039 vmmap[7915:903] *** Symbolication:  Don't know how to
> introspect target process's malloc zone named jemalloc_zone

Thanks!

OK, the problem remains that we can't tell if this is helpful. The reason is that we don't supply the right callbacks to introspect the jemalloc heap. I'm looking into this now, but I can't find any docs that describe this, so I'll have to reconstruct from reading the OSX source code.
Comment 98 Jason Evans 2010-10-07 10:51:41 PDT
(In reply to comment #97)
> (In reply to comment #92)
> > Created attachment 479316 [details] [details]
> > vmmap with -resident argument added
>
> OK, the problem remains that we can't tell if this is helpful. The reason is
> that we don't supply the right callbacks to introspect the jemalloc heap. I'm
> looking into this now, but I can't find any docs that describe this, so I'll
> have to reconstruct from reading the OSX source code.

Why isn't RSIZE, as reported by top, in combination with the about:memory stats, adequate?

Re: the introspection callbacks, I know how to write them, but it is quite difficult to get the information out of jemalloc's internal data structures.  Do we really need this?
Comment 99 Paul Biggar 2010-10-07 12:06:27 PDT
(In reply to comment #98)
> Re: the introspection callbacks, I know how to write them, but it is quite
> difficult to get the information out of jemalloc's internal data structures. 
> Do we really need this?

Perhaps not, I'll check it out.

But as a matter of interest, is this actually possible? It seems that heaps and vmmap call introspection algorithms which are part of heaps/vmmap, not part of firefox or jemalloc. Is it possible to actually get introspection to work? I've tried writing a throw-away program that calls an introspection function, and i haven't been able.
Comment 100 Paul Biggar 2010-10-08 05:51:34 PDT
(In reply to comment #98)
> Why isn't RSIZE, as reported by top, in combination with the about:memory
> stats, adequate?

I think it should be, assuming you mean RPRVT?. Just to be sure I'm on the same page, you're saying that the difference between RPRVT in top and "Memory in use" in about:memory indicates the fragmentation?

Limi is away, but I'll assume that the way to test this is to heavily load the browser with things like google docs, leave it, close all the windows, and see the difference described above.
Comment 101 Paul Biggar 2010-10-11 09:15:27 PDT
Here are the memory stats (speed stats to come):

Ultra-short summary:
--------------------
jemalloc is 15-20% better overall.

Short summary:
-------------------
Memory use is reduced 15% at full load and 20% after tabs are closed. Fragmentation is solved. However, some numbers look worse, and it could be perceived that we use _more_ memory, which is not true.


Procedure:
------------

- opened empty firefox
- added tab for about:memory
- added gmail tab
- added google docs tab
- opened 16 google docs documents, each in a separate tab
- waited for tabs to load and then cycled through them
- refreshed about:memory and recorded "Full load" measurement.
- closed each tab, except about:memory
- refreshed about:memory and recorded "closed tabs" measurement.
- launched a compile in the background and waited a while, so that memory would be reclaimed from firefox by the OS
- refreshed about:memory and recorded "after a while" measurement.

Each time measurement consists of:

 - about:memory: "Memory mapped" and "Memory in use"
 - Activity monitor: "Real mem" and "Private mem".

I recorded these for the 4 combinations of jemalloc enabled/disabled and 32-bit/64-bit. Note that the 32-bit measurement was on 10.6, not 10.5, so the 32-bit-jemalloc-disabled result may not be what 10.5 users actually experience. However, 10.6 supposedly brought improvements to fragmentation, so I believe the 10.5 results are going to be unfair to jemalloc, if anything.




Memory use discussion:
---------------------

- By comparing absolute "Private mem" numbers, we can see the total amount of memory that firefox consumes (that is, which is unavailable to other applications without paging). **This is the most important figure**

On both 32-bit and 64-bit, jemalloc is better, 8-9% on 32bit and 15% on 64bit, during "full load". "After a while", jemalloc is much better, 33% on 32-bit and 20% on 64-bit. This is the most important number, and it's a definite, but not huge, win.

The jemalloc "Memory in use" number is way higher than the mac allocator one. Since we allocate the same amount of memory internally, this is a measure of how much overhead jemalloc takes, which is about 30%. However, this is more than accounted for by the fragmentation, and so jemalloc comes out better here.


Fragmentation discussion:
-----------------------

- We can determine fragmentation by comparing "Memory in use" to "Private mem".

On 32-bit, the mac allocator uses way more memory than we are trying to: 50% more during "full load", and 100% more "after a while". In jemalloc, the difference is minimal, about 3%.

On 64-bit, it's just as bad; about 42%, going up to 100% "after a while". Jemalloc has a smaller difference, perhaps 15%, going down to 1% "after a while".

Note that this is just a side show to "Memory use discussion", but it shows that fragmentation was the problem, and jemalloc solves it.


Perceived memory use:
--------------------

- Users are unlikely to know that private memory is the correct metric, and are more likely to look at larger numbers.
- Those who dig deeper are more like to to use "Real mem" and "Memory in use" since they sound correct. 
- The OS doesn't take memory back immediately, so the "closed tab" metric might be used, even though the OS will take back that memory if it's needed. The reporting tools will report the higher numbers until that is taken back.

Those looking at "about:memory" may believe that we are now using much more memory. The jemalloc figures are much higher, perhaps 25% higher on 64 bit.

The "memory in use" figures look really good for the mac allocator, about twice as good as for jemalloc. However, they are just fiction, and in "Private mem", jemalloc always does better.

Those looking at Activity monitor's "real mem" will see no difference between the two allocators.

jemalloc looks much worst in the "closed tabs" segment of the test. It is much slower to return memory, and appears to hold on to 40-60% more memory than the mac allocator. However, it happily hands it back when the OS asks.


Raw numbers:
----------------

Number are in the form:                 Full load  -> closed tabs -> after a while
Mac, 32bit, Private mem:                464 -> 130 -> 120
Mac, 32bit, Memory in use:            303 ->  93 ->   59
jemalloc, 32bit, Private mem:        426 -> 288 ->  91
jemalloc, 32bit, Memory in use:    412 -> 134 -> 103
Mac, 64bit, Private mem:                 789 -> 250 -> 178
Mac, 64bit, Memory in use:            445 -> 165 ->  77
jemalloc, 64bit, Private mem:        693 -> 430 -> 152
jemalloc, 64bit, Memory in use:   605 -> 213 -> 148

Mac, 32bit, Real mem:                    579 -> 420 -> 402
Mac, 32 bit, Memory mapped:       359 -> 353 -> 351
jemalloc, 32bit, Real mem:             581 -> 446 -> 207
jemalloc, 32bit, Memory mapped: 391 -> 409 -> 378
Mac, 64bit, Real mem:                     880 -> 601 -> 220
Mac, 64 bit, Memory mapped:        486 -> 478 -> 466
jemalloc, 64bit, Real mem:             878 -> 622 -> 330
jemalloc, 64bit, Memory mapped: 631 -> 621 -> 570

I experienced some unrelated crashes, and had to repeat some of the tests in various ways, but these numbers were pretty static the whole time, and I have high confidence in them. While it isn't the most scientific test, I believe it's powerful enough to show that the move to jemalloc is warranted.
Comment 102 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-10-11 09:21:39 PDT
Closing tabs may not do what you think it does, due to docshells being cached for fast re-opening including user state (form, dynamic DOM modification, plugin stuff, etc.).  I forget how the timers work there, but it's what users will actually *do*, so you're probably measuring the right thing. ;-)
Comment 103 Paul Biggar 2010-10-11 09:29:02 PDT
(In reply to comment #102)
> Closing tabs may not do what you think it does, due to docshells being cached

> I forget how the timers work there, but it's what users
> will actually *do*, so you're probably measuring the right thing. ;-)

Yeah, this is what I was going for.
Comment 104 Paul Biggar 2010-10-12 06:44:26 PDT
Speed tests:

Raw data:
------------
                            Sunspider      |       v8      |    kraken
jemalloc32:        389.9                    2312       8069
mac32:                386.5                   2334          7774
jemalloc64:        410                       2149           7789
mac64:               381                       2115           7597


Procedure:
------------

These were all measured in-browser, not using the shell, since we haven't go jemalloc working properly for the shell yet. I measured the sunspider one 10 times, and the others 5 times each, and averaged them. They all do multiple runs, but I still hit a lot of variance, hence multiple runs.


Discussion
-----------

This shows that jemalloc slows things down, but not massively. On 32 bit, jemalloc is 1%, 1% and 3.6% slower on SS, v8 and kraken respectively. On 64bit, it's 7% slower, 1.6% faster and 2.5% slower. I think that all of these are in the noise, except the 7%, which I am suspicious of, but which we can at least think of as a reasonably hard bound.
Comment 105 Paul Biggar 2010-10-12 09:37:28 PDT
Created attachment 482570 [details] [diff] [review]
jemalloc on OS X (now with APSL-mandated comments)

The APSL needs some comments detailing modifications and modification dates, for osx_zone_types.h. This is discussed in bug 603655.
Comment 106 Paul Biggar 2010-10-13 05:32:46 PDT
I sent this to tryserver and found no bugs. There are plenty of oranges, but I can't find any that aren't in other builds around the same time, or that aren't intermittent oranges with bugs filed.
Comment 107 Paul Biggar 2010-10-20 09:35:08 PDT
Created attachment 484711 [details] [diff] [review]
jemalloc on OSX (now without copyright issues)

This removes any Apple code from the patch. It doesn't need to be checked for legal issues (that's being handled in bug 603655), but it should be checked for readability from the fallout of that bug. (We aren't allowed copy apple's malloc_zone_t struct, as it's under an incompatible license, and so have to work around it).
Comment 108 Jason Evans 2010-10-20 10:13:24 PDT
Comment on attachment 484711 [details] [diff] [review]
jemalloc on OSX (now without copyright issues)

I'll try to take a look at this tonight.
Comment 109 Jason Evans 2010-10-21 09:10:30 PDT
Comment on attachment 484711 [details] [diff] [review]
jemalloc on OSX (now without copyright issues)

Nicely done.  Well structured, and well documented.
Comment 110 jag (Peter Annema) 2010-10-21 11:26:27 PDT
Comment on attachment 484711 [details] [diff] [review]
jemalloc on OSX (now without copyright issues)

Awesome work. Looking forward to seeing this get checked in.

>diff --git a/configure.in b/configure.in

>     else
>         MOZ_OPTIMIZE_FLAGS="-O3 -fno-omit-frame-pointer"
>     fi
>+		MOZ_MEMORY=1
>     _PEDANTIC=
>     CFLAGS="$CFLAGS -fpascal-strings -fno-common"
>     CXXFLAGS="$CXXFLAGS -fpascal-strings -fno-common"

Just something I noticed as I was skimming the patch: use spaces here instead of tabs.
Comment 111 Stuart Parmenter 2010-10-25 17:47:13 PDT
I think we should take this, and continue work on taking upstream jemalloc, which should give us better perf characteristics.
Comment 112 Paul Biggar 2010-10-26 06:34:25 PDT
(In reply to comment #110)
> >+		MOZ_MEMORY=1
> Just something I noticed as I was skimming the patch: use spaces here instead
> of tabs.

Good spot, thanks.
Comment 113 Alex Limi (:limi) — Firefox UX Team 2010-11-19 11:43:03 PST
Sorry to add process noise here, but who makes the call on whether this will make Firefox 4? I care quite a bit about this from a user experience perspective, as low-memory situations are quite common on the Mac (especially after 10.6 shipped), so I want us to be as good with freeing up memory and avoiding fragmentation as we possibly can.

The patch seems complete and working, and I've been running the tryserver builds for a few days without any issues.
Comment 114 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2010-11-19 12:39:52 PST
I am opposed to this feature landing in FF4: switching allocators is incredibly risky at this point in the game, because it can affect all kins of third-party code that is loaded in our process in rather unpredictable ways. This is especially true because allocator errors typically don't show up as recognizable signatures in crash-stats, but as memory corruption errors that crash at unpredictable points later.
Comment 115 Paul Biggar 2010-11-21 16:27:19 PST
(In reply to comment #114)
> I am opposed to this feature landing in FF4: switching allocators is incredibly
> risky at this point in the game

jemalloc has been used on the other tier 1 platforms for two years, which I think means it is used by 90%-ish percent of our users. If anything, this reduces the error paths because we now use the same allocators on all platforms.

Describing it as 'switching allocators' greatly overstates the risk; I would describe it as "removing the less rigorously-tested allocation paths". 


> because it can affect all kind of third-party
> code that is loaded in our process in rather unpredictable ways.

From a safety perspective, memory is memory unless we're writing outside it's bounds. If we are somehow relying on writing outside the bounds of memory at some point, it's less risky to do it on the same allocator on all platforms.

Also, the jemalloc functions which take addresses (free(), realloc(), etc), happily accept any address, check whether it belongs to jemalloc, and pass it on to the system allocator otherwise. This makes it pretty robust in the face of "all kinds of third party code".


> especially true because allocator errors typically don't show up as
> recognizable signatures in crash-stats, but as memory corruption errors that
> crash at unpredictable points later.

I haven't been around long, but I do not believe this has been a real problem with our use of jemalloc on any platform. jemalloc is safe and stable and this doesn't change that.


I think it's pretty clear that there is a large gain here in terms of usability (as evidenced by limi pushing for this), and the risk is not nearly as significant as you think. That said, if there are unexplained crashes, I'd be the first to push for backing-out the jemalloc changeset.
Comment 116 timeless 2010-11-22 02:33:18 PST
allocator mixing interaction is only interesting when you interact with foreign libraries. such libraries tend to be platform specific.

as such claiming that you're unifying codepaths ignores the interesting case. you're switching from a system whose codepaths with foreign libraries work to a system which happens to work with unrelated foreign libraries and is untested with the class of potential foreign libraries.

i'm not personally expressing an opinion in favor/opposed, but I do agree that it isn't just "a safe switch".
Comment 117 Paul Biggar 2010-11-22 03:43:03 PST
(In reply to comment #116)
> you're switching from a system whose codepaths with foreign libraries work to a
> system which happens to work with unrelated foreign libraries and is untested
> with the class of potential foreign libraries.

I tried to explain this above. On the Mac, we know we will be interacting with foreign libraries, and will be asked to free memory which we did not allocate. As such, we check whether jemalloc owns memory before it frees it, and gives it back to the system (de)allocator if it does not. So those foreign libraries are not a risk.


No change is a "safe switch", but I am just pointing out that the switch to jemalloc is not remotely as unsafe as one would think by saying "switching allocators".
Comment 118 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2010-11-22 05:56:15 PST
Things may need to work in the other direction as well, though. If we allocate memory using jemalloc and then pass it to a library which frees it using non-jemalloc free, might that cause problems?

I'm not talking about the code-correctness of jemalloc. I'm talking about the interactions between jemalloc and existing code.

The point about crash-stats is that we cannot easily identify problems after this code has landed. They may show up simply as somewhat additional crashiness, without identifiable signatures. Given that we are past feature freeze and we are trying to reduce the risk from optional changes, I don't think it makes any sense to add risk here. Land it for FF5 and we can get a nice long beta cycle to notice and fix issues.
Comment 119 Robert Sayre 2010-11-22 15:26:08 PST
(In reply to comment #113)
> 
> The patch seems complete and working, and I've been running the tryserver
> builds for a few days without any issues.

Do you see any improvements? My conversations with Paul have lead me to believe that this patch doesn't release memory to the OS, so it might not appear to be different.
Comment 120 timeless 2010-11-22 15:43:24 PST
assuming 10.7 comes out with a different malloc behavior and we ship ff4 with this code, are we likely to crash on launch or do something similar?

the code in szone2ozone leads me to think you're assuming that 10.7+ will be compatible w/ 10.6.

Given that 10.6 was not compatible with 10.5 in this case, I think that style assumption is faulty.

As such this code seems like it's asking for trouble in 10.7 or 10.8 or whatever.

This is a *terrible* approach for something which will ship and be deployed for ages. People are likely to upgrade OS X only to discover our browser doesn't work, and we probably wouldn't be able to upgrade it because we'd crash too early.

It seems safer to fall back to not using jemalloc for malloc versions we don't recognize. But I'm not sure what other assumptions our code tries to make if we think we're built with jemalloc.
Comment 121 Paul Biggar 2010-11-22 15:48:01 PST
(In reply to comment #119)
> My conversations with Paul have lead me to believe
> that this patch doesn't release memory to the OS, so it might not appear to be
> different.


It does release memory to the OS. I was trying to get across a different point, which is that _it can appear_ to use more memory, if you don't know what you're looking at (which is likely, because the correct thing to look at is confusing).

Here is the relevant quote from comment 101:

> jemalloc looks much worst in the "closed tabs" segment of the test. It is much
> slower to return memory, and appears to hold on to 40-60% more memory than the
> mac allocator. However, it happily hands it back when the OS asks.
Comment 122 Paul Biggar 2010-11-22 15:52:23 PST
(In reply to comment #120)
> It seems safer to fall back to not using jemalloc for malloc versions we don't
> recognize. But I'm not sure what other assumptions our code tries to make if we
> think we're built with jemalloc.

Yes, you're right, great catch. I'll add a check for recognized versions (3 and 6), and we won't use jemalloc if we don't recognize the version. Thanks!

(I should point out this doesn't affect the "should this ship in FF4 question").
Comment 123 timeless 2010-11-23 03:00:41 PST
Comment on attachment 484711 [details] [diff] [review]
jemalloc on OSX (now without copyright issues)

I'm going to record my r- since it's actually meaningful.

note that having actually read the code and the way it hooks, with the change promised in comment 122, this might be ok.

the idea is that jemalloc registers with the zone allocator system, so when another library calls free() and friends, it will eventually be directed back to our jemalloc if it was allocated by our jemalloc.

other interesting cases:
* what happens when another interesting zone allocator is brought in by another library.

Probably the simplest version of that question is:
suppose i use a comment 122 based jemalloc for a library and it gets loaded into a comment 122 based jemalloc application on 10.6, will they coexist peacefully?

I haven't spent the time to ensure that nothing interesting would happen for a case like this.

The other variations of this probably involve broken zone impls. As long as we have some provision to handle that...

* is it possible for some library to assume something about how the allocator works which jemalloc doesn't do? (the assumption can be invalid, but that doesn't prevent shipping code from having it) historically some allocators had different alignment behavior (for fun, people can read all the interesting hacks ms has for msvcrt compat)
- i believe that jemalloc isn't evil enough to give out pointers with the lowest bit set, so I can't think of any basic cases offhand
- the easiest way for this stuff to be relevant is for a plugin which calls NPN_Alloc a couple of times in a row, since it controls the cpu of the "main" thread, it could be coded with stupid assumptions (please don't claim plugins don't make stupid assumptions, we have an entire product tracking plugin bugs).

One way of ameliorating these risks would be something like -safe-mode which turns off jemalloc -- this needs to be possible w/o asking the user to use a Terminal, we're talking about Mac users :)
Comment 124 Paul Biggar 2010-11-23 05:33:29 PST
(In reply to comment #123)
> I'm going to record my r- since it's actually meaningful.

> note that having actually read the code and the way it hooks, with the change
> promised in comment 122, this might be ok.

Sure, I'll fix this up and resubmit.


> other interesting cases:
> * what happens when another interesting zone allocator is brought in by another
> library.

You could make this work, but it wouldn't out of the box. Right now, it would fail hard. Basically, we don't support this, nor should we.


> Probably the simplest version of that question is:
> suppose i use a comment 122 based jemalloc for a library and it gets loaded
> into a comment 122 based jemalloc application on 10.6, will they coexist
> peacefully?

No, they won't. I believe it would fail hard. As above, I don't know a good reason to support this.



> The other variations of this probably involve broken zone impls. As long as we
> have some provision to handle that...

I'm not sure what you mean? Basically, it works on 10.5 and 10.6, and I'll make sure it isn't loaded on other versions.

 
> * is it possible for some library to assume something about how the allocator
> works which jemalloc doesn't do? (the assumption can be invalid, but that
> doesn't prevent shipping code from having it) historically some allocators had
> different alignment behavior (for fun, people can read all the interesting
> hacks ms has for msvcrt compat)

Of course, libraries can assume anything. The most dangerous is that its possible to write outside the bounds of some allocated memory, which will fail differently with jemalloc than with the mac allocator. This doesn't bother us because it will now work the same on mac as other platforms (which already use jemalloc). In the case where a Mac-only extension relies on some invalid assumption about the 10.5 or 106. allocator, I don't think we should support this.


> - the easiest way for this stuff to be relevant is for a plugin which calls
> NPN_Alloc a couple of times in a row, since it controls the cpu of the "main"
> thread, it could be coded with stupid assumptions (please don't claim plugins
> don't make stupid assumptions, we have an entire product tracking plugin bugs).

Can you go into more detail here? I'm not sure what you're getting at.

 
> One way of ameliorating these risks would be something like -safe-mode which
> turns off jemalloc -- this needs to be possible w/o asking the user to use a
> Terminal, we're talking about Mac users :)

I think we would need something more concrete in order to support this. I think this is too speculative otherwise. Bear in mind we already use jemalloc on the other tier 1 platforms, without checking for these assumptions. At the very least, adding support for this should move to a different bug.
Comment 125 Paul Biggar 2010-11-23 05:38:36 PST
(In reply to comment #118)
> Things may need to work in the other direction as well, though. If we allocate
> memory using jemalloc and then pass it to a library which frees it using
> non-jemalloc free, might that cause problems?

We overwrite the memory allocation structure (the 'default memory zone") during
initialization. This means that all libraries use the jemalloc allocator from
then on. The problem is that some memory is allocated _before_ we can overwrite
the default memory zone, and are freed later.

What also happens is that some Darwin libraries call the darwin allocators
directly. Then they need to be freed by the darwin libraries, so jemalloc
passes it back. Now in theory, we _could_ pass some memory to a Darwin library
which we have allocated using jemalloc, which then decided to free it. I don't
think this is really realistic because it would fail hard (and we would likely
find it), and because that would be one incredibly misbehaved library. Finally,
if that did happen, I think it's safe to say we would segfault immediately, and
so this would be straightforward to find.

So I think the problem is largely theoretical, and not a real risk.


> I'm not talking about the code-correctness of jemalloc. I'm talking about the
> interactions between jemalloc and existing code.

Sure. I simply wanted to point out that when people think "a new allocator,
that's risky", the usual rules don't apply since we use it everywhere else
already.


> Given that we are past feature
> freeze and we are trying to reduce the risk from optional changes, I don't
> think it makes any sense to add risk here.

I hear you, and disagree with the level of risk. I wouldn't call this a new feature either, it's activating two-year old code which is already used by over 90% of our users.
Comment 126 Paul Biggar 2010-11-23 07:45:34 PST
(In reply to comment #118)
> I'm not talking about the code-correctness of jemalloc. I'm talking about the
> interactions between jemalloc and existing code.

FWIW, the interactions are very simple code, very short and  very easy to understand.
Comment 127 timeless 2010-12-21 05:09:59 PST
the npn_alloc case would probably be something like:

int *array[4];
int *align = (int*)NPN_Alloc(16);
if (!(align & 16)) {
  array[0] = align;
  array[1] = (int*)NPN_Alloc(16);
  array[2] = (int*)NPN_Alloc(16);
  array[3] = (int*)NPN_Alloc(16);

  array[0][20] = 257;
}

if code "knows" that alloc from a given bucket with a certain offset will be returned in a certain sequence. Yes this is very contrived. But having seen people do things like:

int method(Foopy* fop) {return fop[1].x; }

struct Barp {
 Foopy element1;
 Foopy element2;
};

Barp bap;
method(&bap.element1);

... (and defending it as reasonable practice), I wouldn't be surprised if someone did something that was similarly wonky using malloc.

The thing about plugins is that different platforms have different sets of coders. Windows has always had multiple c runtimes, which means users suffered from bad plugin authors over a decade ago, and most of them were weeded out a decade ago (but not all, we still hit this stuff occasionally). Linux for the most part doesn't have much experience with allocator switches, but the cases I can point to are pure failure (mostly IMEs with bad linkage to libc5 or similar).

I'm not sure how much experience vendors for Apple have with allocator switching, my guess is very little.

Note that this comment is merely an answer to the question in comment 124. As long as comment 122 is addressed, I'll be relatively happy (and won't stand in its way). Yes, I'd like to be able to have a way to turn it off in case of emergency, but that's not my call.
Comment 128 Paul Biggar 2010-12-21 20:33:09 PST
(In reply to comment #127)
> ... (and defending it as reasonable practice), I wouldn't be surprised if
> someone did something that was similarly wonky using malloc.

> I'm not sure how much experience vendors for Apple have with allocator
> switching, my guess is very little.

I understand what you're saying, but I don't think an impractical, non-specific (though definitely theoretically possible) potential bug is worth considering. If there were real plugins with this problem, it might be worth evaluating on the merits of the concrete case.

Also, I'm not sure how the plugin stuff works, but I don't know if it would use jemalloc anyway.
Comment 129 Paul Biggar 2010-12-22 18:39:14 PST
Created attachment 499457 [details] [diff] [review]
jemalloc for Mac (but only in known versions)

This no longer assumes that new versions of the zone allocator will be compatible with version 6 (the one that ships with OSX).
Comment 130 Paul Biggar 2010-12-22 18:40:25 PST
This bug is largely done, but I'm going to wait until the start of the FF5 cycle to get it into the tree, due to the risk that bsmedberg points out above.
Comment 131 Paul Biggar 2011-03-23 19:38:07 PDT
Created attachment 521385 [details] [diff] [review]
Proposed final patch

The changes since the last review are pretty minor, but it's been a while, so I just want to get another look.

I intend to commit this in the next few days, assuming that tryserver comes back clean. The plan is to commit in two parts:

Step 1: all of the code, except disabled by omitting MOZ_MEMORY=1 in configure.in. This won't fail, because all the code will still be #defined out. If it does fail, something is wrong, and the backout strategy is revert.

Step 2: Enable jemalloc on mac by adding MOZ_MEMORY=1 in configure.in. This actually turns the code on, and backout strategy is to simply remove the MOZ_MEMORY line, so it's pretty trivial.
Comment 132 Paul Biggar 2011-03-23 19:41:33 PDT
Comment on attachment 521385 [details] [diff] [review]
Proposed final patch

My bad, this needs work. The patch still applied, but it doesn't compile. I'll update after it passes tryserver.
Comment 133 Nicholas Nethercote [:njn] 2011-05-13 00:13:13 PDT
pbiggar, what's the status of this?
Comment 134 Dirkjan Ochtman (:djc) 2011-05-13 00:43:54 PDT
njn, did you see the dependency on bug 651952?
Comment 135 Paul Biggar 2011-05-21 21:36:36 PDT
Created attachment 534271 [details] [diff] [review]
Final 32-bit bug licked

This will hopefully fix the final bug. We'll see on tryserver for sure: http://tbpl.mozilla.org/?tree=Try&rev=5dd3a5f7fbb0

So the status here is that I couldn't figure out the final bug, and gave up after a long time looking at it. I spun it out into a mentored bug (bug 651952) and Dale Kim picked this up, found the bug (a typo! oh the shame), and confirmed it works. Tryserver will tell us if something else has snuck in.

It's only slightly different from the last patch pavlov and jasone reviewed, but I want to get this checked just in case.

I'd love to get a really fast review so that this makes Firefox 7!
Comment 136 Paul Biggar 2011-05-21 21:38:31 PDT
Jesse, you mentioned you had a OSX 10.7 machine? If so, can you check this doesn't cause problems?
Comment 137 Paul Biggar 2011-05-22 16:05:51 PDT
Created attachment 534332 [details] [diff] [review]
With fixed visiblility behaviour

OK, this is the final version I want to push. Dale had ripped out some visibility things that were confusing, this puts them back in (turned off for Mac, as it's a non-ELF platform).

Tryservered just in case: http://tbpl.mozilla.org/?tree=Try&rev=9a4a899ec661
Comment 138 Paul Biggar 2011-05-23 08:16:02 PDT
Some chatter on IRC pointed out that it was better to wait til after the branch before trying to get this in, so that it has much more time on nightlies. Let's do that. Firefox 7 ftw.
Comment 139 Paul Biggar 2011-05-23 08:41:04 PDT
Comment on attachment 534332 [details] [diff] [review]
With fixed visiblility behaviour

Jesse suggested adding an environmental variable to simulate an unsupported version of the OSX allocator. I implemented this, and naturally discovered that our patch doesn't work in this case.

To remind casual readers of context:

- OSX's implementation of malloc looks up the malloc function in a malloc_zone_t struct.
- During Firefox initialization, we overwrite this struct with jemalloc function pointers.
- malloc_zone_t has a version field, and so we only actually know its size and structure for version 3 (OSX 10.5) and version 6 (OSX 10.6)
- we added a check that only overwrites on those known versions

The problem is that we still allocate memory using jemalloc because moz_malloc calls je_malloc/je_calloc/etc directly. If this memory is passed to OSX's realloc, then we segfault.

There are two potential solutions:

  1) add a(nother) layer of indirection to moz_malloc. This would be resilient to future unknown version of OSX, probably.

  2) Remove the check, and unconditionally do the overwriting.

I actually prefer the latter. I don't want yet another layer of indirection, and the code is nasty enough as it is. I don't think we should be turning it in unmaintainable knots to support operating systems not yet available, announced, or possibly even written. Plus, even when 10.8 comes out, if they don't change the interface, it might work anyway.

So I plan on just removing the dynamic version check, and just unconditionally overwrite malloc_zone_t, unless there are objections? Jesse has OSX 10.7, so we'll make that work first of course.
Comment 140 Paul Biggar 2011-05-25 03:43:37 PDT
Jesse reports that the binaries from https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pbiggar@mozilla.com-68afd94effb8/ (latest try, all green) work fine on Lion, but that they don't seem to use jemalloc. Their zone->version is 8. This is a bit weird, so looking into it.
Comment 141 Paul Biggar 2011-05-26 07:21:03 PDT
Created attachment 535331 [details] [diff] [review]
final version

This is the final version. It's been tryservered (all green), and jesse confirmed that the built versions dont crash on Lion. I'd love to get this in early in the cycle so it gets as much Nightly time as possible.

This patch is on top of bug 659694.
Comment 142 Jesse Ruderman 2011-05-26 17:20:00 PDT
> This patch is on top of bug 659694.

Correction: the patch is on top of bug 659632 (which landed today).
Comment 143 Jesse Ruderman 2011-05-26 20:39:17 PDT
> Jesse confirmed that the built versions don't crash on Lion

I think I was testing a version that didn't actually use jemalloc (even on Snow Leopard) for some reason.

When I apply this patch locally on Lion, I get a crash.  See my comment in bug 659694.

When I apply this patch locally on Snow Leopard, it works. about:memory shows different types of measurements than a non-jemalloc build.
Comment 144 Paul Biggar 2011-05-27 07:33:35 PDT
Jesse, can you try the Snow Leopard compiled version on Lion (since that's what we'll be shipping)?
Comment 145 Jesse Ruderman 2011-05-27 12:31:45 PDT
At least part of the confusion was due to Tinderbox debug builds having --enable-trace-malloc [1], which disables jemalloc [2].

[1] http://hg.mozilla.org/build/buildbot-configs/file/f56bf2ed364d/mozilla2/macosx64/mozilla-central/debug/mozconfig
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=417066#c3

I submitted http://tbpl.mozilla.org/?tree=Try&rev=d6c02308ba1b to test without trace-malloc.
Comment 146 Jesse Ruderman 2011-05-27 15:57:56 PDT
10.6: works, but warns:

2011-05-27 15:52:44.504 firefox-bin[22966:61b] *** __NSAutoreleaseNoPool(): Object 0x103660700 of class NSCFString autoreleased with no pool in place - just leaking

10.7: crashes on startup.

Same results with everything I tested (opt32, opt64, debug64).
Comment 147 Ted Mielczarek [:ted.mielczarek] 2011-05-31 05:41:46 PDT
This is a little scary if OS X changes can cause our builds to crash on startup. I think we need a safe fallback scenario for this or we can't reasonably ship it. (Just fixing the 10.7 scenario doesn't necessarily help, because it could break again with 10.8.)
Comment 148 Paul Biggar 2011-05-31 09:25:13 PDT
(In reply to comment #147)
> This is a little scary if OS X changes can cause our builds to crash on
> startup. I think we need a safe fallback scenario for this or we can't
> reasonably ship it.

A safe fallback means that each memory allocation needs one more branch. Not worth it IMHO for an OS that won't arrive for 2-3 years.

> (Just fixing the 10.7 scenario doesn't necessarily help,
> because it could break again with 10.8.)

I disagree with holding back a 15-20% memory improvement because it will break on an OS whose predecessor isn't even released.
Comment 149 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2011-05-31 09:32:54 PDT
In general, using OS internals which are not part of a stable API is just a bad idea. If we're going to trade performance now for future instability later, I'd really like to have that discussion in mozilla.dev.platform, not here. As it stands, I don't think anyone should grant superreview to this sort of patch.
Comment 150 Ted Mielczarek [:ted.mielczarek] 2011-05-31 09:35:03 PDT
Consider that if we had shipped this in Firefox 4, everyone who upgraded to 10.7 would find that their Firefox crashed on startup.
Comment 151 Justin Lebar (not reading bugmail) 2011-05-31 09:44:01 PDT
> A safe fallback means that each memory allocation needs one more branch. Not 
> worth it IMHO for an OS that won't arrive for 2-3 years.

Presumably if we could ensure that this branch was always predicted to take the jemalloc case, the instruction would be pretty cheap.

This might be as simple as ensuring that the jemalloc case lines up with the processor's static branch prediction assumptions [1].

[1] http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts/
Comment 152 Paul Biggar 2011-05-31 09:45:26 PDT
(In reply to comment #150)
> Consider that if we had shipped this in Firefox 4, everyone who upgraded to
> 10.7 would find that their Firefox crashed on startup.

Yeah, good point. Sigh.

I'll see if I can figure out a better solution than what we're using now, and otherwise I'll see what the slowdown of the extra branch is.
Comment 153 Boris Zbarsky [:bz] 2011-05-31 09:50:19 PDT
> Consider that if we had shipped this in Firefox 4, everyone who upgraded to
> 10.7 would find that their Firefox crashed on startup.

And if we hadn't, they would have discovered that all sorts of other stuff is broken after it starts up, no?  That tends to be how Apple OS releases work...

If we can avoid the crashing, great.  But I think that at least our test suites should fail on Mac when jemalloc is not being used, so we _know_ when that happens.
Comment 154 Ryan VanderMeulen [:RyanVM] 2011-05-31 09:55:57 PDT
(In reply to comment #150)
> Consider that if we had shipped this in Firefox 4, everyone who upgraded to
> 10.7 would find that their Firefox crashed on startup.

Is a 3 month release cycle insufficient time to find and fix issues for an upcoming OSX release prior to its release? It seems that Apple gives more than that amount of time for developers to work with pre-release versions of an upcoming release prior to launch.
Comment 155 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2011-05-31 10:20:04 PDT
Apple is notoriously bad at making prerelease builds available in a way which allows us to run full tests and be confident of the results. Sometimes things show up in the final release that weren't in the previews, and the previews are only available to paid ADC members, which makes it very difficult to get widespread community testing.
Comment 156 Paul Biggar 2011-06-07 15:16:55 PDT
*** Bug 659694 has been marked as a duplicate of this bug. ***
Comment 157 Paul Biggar 2011-06-07 17:16:27 PDT
Created attachment 537917 [details] [diff] [review]
fixed on 10.7

This fixes the 10.7 problems. It had a different malloc_zone_t size (version 8), fixed with a simple extension of the same problem on 10.6. It also added memory protections to the memory used by default_zone. mprotect works there.

I'm leaving the review flag off so that lurkers can give it a once over before I ask pavlov.
Comment 158 Paul Biggar 2011-06-07 17:24:45 PDT
(In reply to comment #149)
> In general, using OS internals which are not part of a stable API is just a
> bad idea. If we're going to trade performance now for future instability
> later, I'd really like to have that discussion in mozilla.dev.platform, not
> here.

That's a good point, I'll start that discussion.
Comment 159 Paul Biggar 2011-06-13 19:04:26 PDT
Created attachment 539088 [details] [diff] [review]
Fixed on 10.7, 10.8 etc

This makes Mac builds work on future unknown versions of OSX. If the malloc_zone_t version is > 8 (it is 8 on OSX 10.7), then we have an incompatible version and can't use jemalloc. So we leave the zone allocator alone, meaning that normal allocations will continue to use default OSX allocator.

But we still have to redirect the mozalloc allocator, which calls jemalloc directly. This adds a dynamic branch to each jemalloc allocation which checks whether or not to use jemalloc. This branch is only added on 64bit OSX (32bit will always work with jemalloc). There are tons of branches in there anyway, so an extra perfectly (dynamically) predictable one wont hurt much.

Note that the Mac implementation of posix_memalign (which we'd be using under the new dynamic scheme) is faulty, so this works around that too.

Tryserver when dynamically using jemalloc: http://tbpl.mozilla.org/?tree=Try&rev=75d588494435
Tryserver when dynamically not using jemalloc: http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a
Comment 160 Paul Biggar 2011-06-13 19:39:01 PDT
(In reply to comment #158)
> (In reply to comment #149)
> > In general, using OS internals which are not part of a stable API is just a
> > bad idea. If we're going to trade performance now for future instability
> > later, I'd really like to have that discussion in mozilla.dev.platform, not
> > here.
> 
> That's a good point, I'll start that discussion.

When I went to write the email to dev-platform, I realized you guys were right, and so fixed it (comment 159).

Does anyone still want to me to post to dev-platform? I couldn't figure out a way to word the post which didn't boil down to "we've worked around some hacks, but our workarounds might fail".
Comment 161 Paul Biggar 2011-06-17 14:26:53 PDT
(In reply to comment #159)
> Tryserver when dynamically not using jemalloc:
> http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a

This dynamically checked that the malloc_zone_t version was >6 before turning jemalloc on. In the actual patch, we would use >8. With >6, jemalloc would not be used on 10.6. This led to a problem on 32-bit which would never occur in real life (details below). So I retried with >7 as the check, and manually tested that it works on OSX 10.7 Lion, which it does. I'll go over it with Jesse later to be certain.

Here's the try build: http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a. Please download it and use it, esp on OSX Lion.



+++++++++++++++++++++++++++++++++++++++++++
This means we have a working version of jemalloc on mac which satisfies all the problems discussed in this thread.
+++++++++++++++++++++++++++++++++++++++++++


Here was the problem:

OSX 10.6 uses malloc_zone_t version 6. This includes memalign. However, OSX 10.5 uses version 3, which does not have memalign. As a result, the memalign dynamic checks wouldn't compile on 10.5, so I turned them off statically.

So when we ran 32-bit firefox on 10.6, the dynamic check failed (6>=6), and malloc() tried to run the jemalloc code instead of calling the default OSX allocators. Crash!

So why can't that happen in real life? Because 32-bit will _always_ use jemalloc. Our dynamic condition is for malloc_zone_t version > 8 (8 is used on Lion). 32-bit only runs on 10.5 (version 3) and 10.6 (version 6).
Comment 162 Karl Tomlinson (:karlt) 2011-06-19 19:46:17 PDT
(In reply to comment #161)
> 32-bit only runs on 10.5 (version 3) and 10.6 (version 6).

I'm not clear whether or not this is still relevant,
but I thought Lion would still support 32-bit apps, in which case we'd be using 32-bit libxul, with jemalloc I assume, to run 32-bit plugins.
Comment 163 Paul Biggar 2011-06-19 20:12:27 PDT
Created attachment 540382 [details] [diff] [review]
Works on 10.7 and 10.8 (um, probably) too

And here's the patch. Since the last approved patch:

- mprotect fix on 10.7
- support jemalloc on OSX 10.7
- ability to dynamically turn jemalloc off on OSX 10.8 and beyond
- support dynamically turn off jemalloc using an envvar (NO_MAC_JEMALLOC).
- workaround a bug in posix_memalign on OSX (in non-jemalloc mode)
Comment 164 Paul Biggar 2011-06-19 20:23:33 PDT
(In reply to comment #162)
> (In reply to comment #161)
> > 32-bit only runs on 10.5 (version 3) and 10.6 (version 6).
> 
> I'm not clear whether or not this is still relevant,
> but I thought Lion would still support 32-bit apps, in which case we'd be
> using 32-bit libxul, with jemalloc I assume, to run 32-bit plugins.

Actually, this is a good point - jesse and I were discussing it on Friday. We don't support running 32-bit apps combined with dynamically turning jemalloc off. So 32-bit plugins will probably crash at load-time on 10.8 or later (I think we don't care, please correct me if anyone feels strongly otherwise). But they'll be fine on 10.7, since we will be using jemalloc.

(The questionable thing is what happens if libxul calls memalign on 32-bit, which is legit and expected. In that case, we'll be calling it on version 8 of the malloc_zone_t API, which supports memalign, so we'll be OK.)
Comment 165 Stuart Parmenter 2011-07-05 11:26:10 PDT
Comment on attachment 540382 [details] [diff] [review]
Works on 10.7 and 10.8 (um, probably) too

This looks good, my only real comment is please remove any unnecessary changes (whitespace, etc) to jemalloc.c --

i.e.:
-	ret->root = (void**)base_calloc(1, sizeof(void *) << ret->level2bits[0]);
+	ret->root = (void**)base_calloc(1, sizeof(void *) <<
+	    ret->level2bits[0]);

 zone_destroy(malloc_zone_t *zone)
 {
-
 	/* This function should never be called. */
Comment 167 Paul Biggar 2011-07-09 22:39:39 PDT
I'm about to back this out, due to bugs 670175, 670468 and 670492. I'm just waiting for the build to finish.
Comment 168 Paul Biggar 2011-07-09 23:43:06 PDT
Backed out: http://hg.mozilla.org/mozilla-central/rev/18c8c2ea4caf
Comment 169 Paul Biggar 2011-07-15 17:18:43 PDT
Created attachment 546260 [details] [diff] [review]
Decommit properly

This fixes all the problems reported against jemalloc on Mac:

- bug 670492 reported a 20% increase in memory usage on 10.5. The problem is that madvsise didn't appear to work on 10.5. Instead of using madvise, I used mmap to remap the heap segment as uncommited. The nice way of doing this is just enabling MALLOC_DECOMMIT, so that's what this is (same as bug 669877).

- bug 670175 is fixed by dynamically turning jemalloc off on 10.7. I'll work it out and fix it later (perhaps by upgrading to the newest version of jemalloc).

- bug 670468 is being fixed by thunderbird folks (the TB installer needs to know to install libjemalloc.dylib), so this actually doesn't do anything for them. I'm coordinating with them to find a good landing time.

Performance discussion to follow.
Comment 170 Paul Biggar 2011-07-15 17:55:22 PDT
Performance discussion of the patch in comment 169

Highlights:

  - SunSpider regression of 11%
  - V8 regression of 6%
  - RSS improvement of 44%


Details:

Speed - I did in-browser JS benchmarks (jemalloc/no-jemalloc):

  - SunSpider (258/235) - 11% regression
  - V8 (4667/4941)  - 6% regression
  - Kraken (5045/5129)  - 0% regression

The V8 and Kraken results were noisy, but it's clear there is some V8 regression.

(Note the Talos sunspider and v8 numbers say no change, but dmandelin says they're crap and should be ignored).


Memory - I loaded techcrunch.com, opened the top 11 of so stories in new tabs, then closed them. In each case, I measured (explicit/js/RSS):

 - 'explicit' from about:memory
 - 'js' from about:memory
 - 'Real memory' from the OSX Activity Monitor (this is roughly RSS).
 - I did a CC+GC after each one, and also measured that.

               jemalloc     vs     No jemalloc
Startup   : ( 47/ 24/106)   vs    ( 46/ 23/112)
CC+GC     : ( 51/ 24/102)   vs    ( 48/ 21/108)
Load tabs : (441/208/508)   vs    (398/200/500)
CC+GC     : (411/205/491)   vs    (368/200/471)  - 4% memory regression (noisy)
Close tabs: (225/136/360)   vs    (216/131/432)  - 20% memory improvement
CC+GC     : ( 77/ 38/188)   vs    ( 71/ 34/335)  - 44% memory improvement


The improvements are from measuring RSS (last column in parenetheses).


Talos Tp5 results:

I've included these results for thoroughness, which show a 16% improvement in RSS. I don't really believe them though, because Tp5 takes a snapshot every 20 seconds.

Without jemalloc:
    tp5: 380.27
    tp5_pbytes: 3.5GB
    tp5_rss: 273.9MB
    tp5_shutdown: 858.0

With jemalloc:
    tp5: 414.7
    tp5_pbytes: 3.5GB
    tp5_rss: 230.8MB
    tp5_shutdown: 1186.0




Test it yourself with these builds:

Build with jemalloc: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pbiggar@mozilla.com-933061c83fd2/try-macosx64/

Identical build except no jemalloc:   https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1310279832/

(Note that you can also turn off jmalloc by setting the NO_MAC_JEMALLOC environmental variable)


Talos results for the wary:

with jemalloc: http://tbpl.mozilla.org/?tree=Try&pusher=pbiggar@mozilla.com&rev=933061c83fd2
No jemalloc: http://tbpl.mozilla.org/?tree=Firefox&rev=18c8c2ea4caf
Comment 171 Boris Zbarsky [:bz] 2011-07-15 18:16:47 PDT
> Performance discussion of the patch in comment 169

So overall we have a 10% Tp5 regression on Mac64 (maybe; triggering a few more runs so you have some idea of the noise may not be a bad idea), wins on various Dromaeo DOM tests, no obvious change on Tsspider but you say you did see a regression in the browser?
Comment 172 Paul Biggar 2011-07-15 18:38:59 PDT
(In reply to comment #171)
> > Performance discussion of the patch in comment 169
> 
> So overall we have a 10% Tp5 regression on Mac64 (maybe; triggering a few
> more runs so you have some idea of the noise may not be a bad idea), wins on
> various Dromaeo DOM tests, no obvious change on Tsspider but you say you did
> see a regression in the browser?

I'm calling this a 16% improvement, not a 10% regression. The private bytes (414.7 vs 380.27) show a 10% regression, but the RSS shows a 16% improvement. I believe that RSS is the more accurate figure. jemalloc uses more memory internally (I think this is reflected in the Talos Private bytes, which takes internal figures), but RSS includes fragmentation (which is the whole point of jemalloc).


Tsspider is fiction AFAICT, I measured in the browser.
Comment 173 Boris Zbarsky [:bz] 2011-07-15 19:42:18 PDT
> I'm calling this a 16% improvement, not a 10% regression

The 380.28 vs 414.7 is not "private bytes".  It's the average pageload time for the page set, in milliseconds.  i.e. the main metric Tp5 is supposed to be measuring.  The "private bytes" numbers for those runs are both about 3.5GB, and I agree they're completely useless.

So we're talking 16% reduction in memory usage and 10% slowdown in pageload.  The former is good, but the latter is very bad.  ;)

The latter also seems a bit fishy, given the performance improvements in DOM manipulation and the like (which were expected given what I've seen in profiles wrt Mac malloc).  Hence my proposal that more Tp5 runs be triggered on those two changesets to see what the noise in that Tp5 number is.

> Tsspider is fiction AFAICT

File a bug if it really is, please (as opposed to just being the shell sunspider measurement).  But yes, I know sunspider in browser in shell can have totally different results; it's not that suprising.  I was just making sure I understood the data.
Comment 174 Paul Biggar 2011-07-15 22:38:43 PDT
I'm running a browser built on this, and the performance is atrocious. Everything is slow, and the profiles I'm pulling out all point to jemalloc (specifically arena_bin_malloc_hard). So the Talos pageload times are bang on, and this needs a different solution.
Comment 175 Boris Zbarsky [:bz] 2011-07-15 23:20:51 PDT
Intersting.  I wonder why the DOM tests show different results... maybe they hit free() more; the default mac allocator's free() is kinda slow.
Comment 176 Paul Biggar 2011-08-30 21:49:02 PDT
Created attachment 557087 [details] [diff] [review]
Build with jemalloc disabled on i386, but enabled on x86-64

This disables jemalloc dynamically on OSX 10.7 and greater, and statically on OSX 10.5. The latter part is the majority of this bug, where I needed to add a FAKE_MOZ_MEMORY build symbol to keep building libjemalloc.dylib to keep universal builds happy.
Comment 177 Paul Biggar 2011-08-30 21:50:15 PDT
Try server builds are (finally) green at http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=d10cc13ac69a.
Comment 178 Mike Hommey [:glandium] 2011-08-30 22:17:22 PDT
Comment on attachment 557087 [details] [diff] [review]
Build with jemalloc disabled on i386, but enabled on x86-64

Review of attachment 557087 [details] [diff] [review]:
-----------------------------------------------------------------

I have two problems with this patch:
- it conflicts with bug 680440
- disabling jemalloc on 10.5 by disabling it on i386 is not nice. There are i386 10.6 users. I am one. And since 10.7 is supposedly only 64-bits (I heard the finder is 64-bits only), I guess a lot of the people owning i386 macs (the first batch of intel macs) are going to stick to 10.6 for a while.
Comment 179 Paul Biggar 2011-08-30 22:47:17 PDT
(In reply to Mike Hommey [:glandium] from comment #178)

Thanks for the review Mike.

> - it conflicts with bug 680440
The conflicts with bug 680440 are superficial, AFAICT. Whichever one of us lands second can sort it out, but it shouldn't be hard, especially with the cleanups in your bug. This is pretty common, and I wouldn't think it's a good reason to r- this bug.

> - disabling jemalloc on 10.5 by disabling it on i386 is not nice. There are
> i386 10.6 users. I am one. And since 10.7 is supposedly only 64-bits (I
> heard the finder is 64-bits only), I guess a lot of the people owning i386
> macs (the first batch of intel macs) are going to stick to 10.6 for a while.

jemalloc isn't on for anybody right now. I would welcome a follow-on bug to make jemalloc work with the 10.5 SDK, but holding this up for the very very small subsection of Mac users who use i386 on 10.6 doesn't make much sense since:

- we dont even test for this on tinderbox (my suggestions to do so were not welcomed - most people weren't aware you could do this)
- it's hard to make jemalloc work on i386. Turning on MOZ_MEMORY makes the codebase assume that memalign is available (and I see no good reason to break this assumption).
Comment 180 Andreas Gal :gal 2011-08-30 22:50:59 PDT
I have to agree with Paul here. This patch is a strict improvement over the status quo. Lets file a follow-up for the i386 work. I am sure Paul will gladly own that bug as well until we fixed that, too.
Comment 181 Mike Hommey [:glandium] 2011-08-30 23:08:17 PDT
(In reply to Paul Biggar from comment #179)
> - it's hard to make jemalloc work on i386. Turning on MOZ_MEMORY makes the
> codebase assume that memalign is available (and I see no good reason to
> break this assumption).

Your patch doesn't mention memalign being a problem, but reading the bug backlog, I see it is.
Now, as I see it, your main problem is that you can't build without jemalloc on i386 and with jemalloc on x64 because the universal build merging won't be happy, right? Then bug 677501 will make that problem go away, as jemalloc will be folded in a mozutils library that will exist whether jemalloc is enabled or not. At which point your patch would only be a matter of enabling jemalloc on x64 only.
Comment 182 Paul Biggar 2011-08-30 23:22:32 PDT
Comment on attachment 557087 [details] [diff] [review]
Build with jemalloc disabled on i386, but enabled on x86-64

Yes, I think bug 677501 would make the problem go away. I'd prefer to push ahead with this though, since this is something that I'd like to land while we're still quite early in the release cycle (it's bounced twice already, after originally aiming for FF4, and it's a MemShrink:P1). Obviously, I'd be willing to update bug 677501 if you're kind enough to let this land first :)
Comment 183 Mike Hommey [:glandium] 2011-09-01 05:17:05 PDT
Created attachment 557466 [details] [diff] [review]
Build with jemalloc disabled on i386, but enabled on x86-64

Tested on try:
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=6817f9403b66
(Win debug red is obviously unrelated)
Comment 184 Paul Biggar 2011-09-01 09:36:50 PDT
Comment on attachment 557466 [details] [diff] [review]
Build with jemalloc disabled on i386, but enabled on x86-64

Thanks for updating the patch. It needs slightly more though, esp commens and such; I'll do it this morning.
Comment 185 Paul Biggar 2011-09-01 15:56:54 PDT
Created attachment 557680 [details] [diff] [review]
Rebase on top of glandium's patches

With glandium's recent changes, this patch is much simpler.
Comment 186 Paul Biggar 2011-09-01 15:57:49 PDT
(In reply to Paul Biggar from comment #185)
> Created attachment 557680 [details] [diff] [review]
> Rebase on top of glandium's patches

This is on top of build-system.
Comment 187 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-07 04:23:30 PDT
Comment on attachment 557680 [details] [diff] [review]
Rebase on top of glandium's patches

I was expecting something much scarier :-P
Comment 188 Mike Hommey [:glandium] 2011-09-08 00:03:52 PDT
Created attachment 559060 [details] [diff] [review]
Don't link sqlite with mozutils on OSX

This is the cause of the Mochitests oranges Paul was seeing on try.
Comment 189 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-08 14:05:26 PDT
Comment on attachment 559060 [details] [diff] [review]
Don't link sqlite with mozutils on OSX

Nice.
Comment 190 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-08 14:07:15 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #189)
> Comment on attachment 559060 [details] [diff] [review]
> Don't link sqlite with mozutils on OSX
> 
> Nice.

Also, I hope we don't have any mismatched allocators here!
Comment 191 Paul Biggar 2011-09-08 14:08:25 PDT
Testing glandium's patch with jemalloc here:

https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&tree=Try&rev=4c171cf2ed6e
Comment 192 Mike Hommey [:glandium] 2011-09-08 14:15:15 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #190)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #189)
> > Comment on attachment 559060 [details] [diff] [review]
> > Don't link sqlite with mozutils on OSX
> > 
> > Nice.
> 
> Also, I hope we don't have any mismatched allocators here!

I think it's exactly why it crashed in the first place: the NSS tools are not linked against jemalloc/mozutils, but sqlite, which is loaded by the softokn, which itself is dynamically loaded, initializes jemalloc, so all subsequent allocations go through jemalloc while allocations before loading the softokn have been done with the system malloc. At least that's what i think happened. Another explanation could be that unloading the softoken removes the jemalloc functions from the address space, while they are still registered as zone allocator on the system allocator.

An alternate solution would be to link all NSS libraries against jemalloc/mozutils but that more painful to do.
As jemalloc on mac goes through a step of registration, contrary to other platforms where we actually do active calls into the jemalloc lib, it's just fine that everything is not linked against jemalloc.
Comment 195 Justin Lebar (not reading bugmail) 2011-10-05 05:40:33 PDT
This is a major change which we should track for FF10.

I don't see a TP5-RSS change on 10.6, but perhaps one of our memory stress tests will show an improvement.
Comment 196 Dave Hunt (:davehunt) 2011-10-10 03:59:04 PDT
We have seen an increase in memory usage reported by the endurance tests on all platforms. Investigating in bug 693247. Is there a valid reason we would expect explicit and resident memory to increase after this change?
Comment 197 Justin Lebar (not reading bugmail) 2011-10-10 05:12:46 PDT
> We have seen an increase in memory usage reported by the endurance tests on all platforms. 

This change applies only to Mac 10.6.

Explicit might increase if jemalloc rounds up more aggressively than the mac allocator.

If explicit increases, that could cause a corresponding increase in RSS.  Paul and I also observed that Mac sometimes doesn't release memory that jemalloc released until some other application wants it, in bug 670492.
Comment 198 Nicholas Nethercote [:njn] 2011-11-20 19:27:38 PST
Hmm, moz_malloc_usable_size looks like this:

 size_t
 moz_malloc_usable_size(void *ptr)
 {
     if (!ptr)
         return 0; 

 #if defined(XP_MACOSX)
     return malloc_size(ptr);
 #elif defined(MOZ_MEMORY)
     return malloc_usable_size(ptr);
 #elif defined(XP_WIN)
     return _msize(ptr);
 #else
     return 0;
 #endif  
 }

Surely the MOZ_MEMORY case should precede the XP_MACOSX case?  How does that even work when jemalloc is enabled on Mac?  I would have thought that calling the Mac's malloc_size() on a heap block allocated with jemalloc would be a recipe for an instant crash.  I can't find anywhere that jemalloc somehow overrides malloc_size() with its own version -- MXR tells me that the above function is the only place in the entire codebase where malloc_size() appears.  I'm bamboozled.
Comment 199 Justin Lebar (not reading bugmail) 2011-11-20 19:48:00 PST
> How does that even work when jemalloc is enabled on Mac?

This is actually...not wrong, I think.  It may even be necessary.  It certainly deserves a comment.

jemalloc on mac overrides the default zone's malloc_size function.  See jemalloc's szone2ozone function:

  default_zone->size = (void*)ozone_size; //ozone_size is a function

So calling malloc_size() should eventually call our ozone_size function, if there's any sanity in this world.

It may in fact be necessary to call malloc_size rather than malloc_usable_size.  The mac jemalloc code claims that some objects are allocated with the mac default allocator before jemalloc loads up.  If we called malloc_usable_size on one of them, jemalloc would likely crash.
Comment 200 Paul Biggar 2011-11-20 19:58:22 PST
(In reply to Justin Lebar [:jlebar] from comment #199)
> > How does that even work when jemalloc is enabled on Mac?
> 
> This is actually...not wrong, I think.  It may even be necessary.  It
> certainly deserves a comment.

Yes, this is not wrong. I had the same thoughts as njn about four times, but jlebar is right (especially about the need for a comment).
Comment 201 Nicholas Nethercote [:njn] 2011-11-20 20:38:53 PST
> Yes, this is not wrong. I had the same thoughts as njn about four times, but
> jlebar is right (especially about the need for a comment).

I spun off bug 704045 for this.
Comment 202 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-28 11:49:54 PST
Marking qa- as nothing for QA to verify. Please set to qa+ if this is not the case.

Note You need to log in before you can comment on or make changes to this bug.