Closed Bug 414946 Opened 17 years ago Closed 13 years ago

Use jemalloc on mac

Categories

(Core :: Memory Allocator, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10
Tracking Status
firefox10 + fixed

People

(Reporter: sayrer, Assigned: paul.biggar)

References

Details

(Whiteboard: [MemShrink:P1][qa-])

Attachments

(9 files, 26 obsolete files)

4.44 KB, patch
Details | Diff | Splinter Review
257.09 KB, text/plain
Details
1.00 KB, text/plain
Details
60.82 KB, text/plain
Details
2.78 KB, text/plain
Details
263.54 KB, text/plain
Details
281.60 KB, text/plain
Details
6.62 KB, patch
khuey
: review+
Details | Diff | Splinter Review
987 bytes, patch
khuey
: review+
Details | Diff | Splinter Review
AIUI, this is going to work a bit differently than it does on windows. We'll do something like define malloc/free as mozMalloc/mozFree (along with the C++ new/delete operators). We'd do it this way in order to avoid the OS X zone malloc API, which can really slow things down in calls to free().
I understand this may not be a space win. If we can pick up some speed, it might be worth it.
Version: unspecified → Trunk
Component: General → jemalloc
QA Contact: general → jemalloc
Assignee: nobody → jasone
Can we bump this back up so we can use the new OOM work for 1.9.1?
Blocks: 427099
Sure, I'll take a look at it this week.
Depends on: 446096
Attached patch Use jemalloc on OS X (obsolete) — Splinter Review
Enable jemalloc on OS X, such that it replaces the default system zone. Adjust various tuning parameters to values that are appropriate across platforms.
Attachment #330314 - Flags: review?(pavlov)
Attachment #330314 - Flags: review?(pavlov) → review+
Flags: wanted1.9.1?
Flags: wanted1.9.1? → wanted1.9.1-
Why hasn't this been checked in?
We ran into some stability problems with the patch applied. The patch is now long obsolete, and inappropriate for checkin.
Assignee: jasone → joelr
Is there a Shark profile showing the slowdown from 'free' on the Mac?
I'm not able to figure out from this bug what precisely pointed the finger at the system allocator being the culprit on the Mac. Is there more background on the Mac side of this issue? Here's the discussion on the Darwin-Dev mailing list that I started: http://groups.google.com/group/darwin-dev/browse_thread/thread/ba19e06923e29aaa It appears that the 10.6 allocator is already quite optimized for heavy multithreading.
Rather than swap the memory allocator for an entirely different one, would it make sense to first try to allocate Javascript memory from a different malloc zone?
(In reply to comment #11) > Rather than swap the memory allocator for an entirely different one, would it > make sense to first try to allocate Javascript memory from a different malloc > zone? This seems like a fine suggestion, but we need jemalloc working on mac, even if it is not a performance win, because we may decide to lean on it more heavily to manage our JS heap. I wouldn't mind examining both avenues.
I apologize for missing a large chunk of research here but -why- do we need jemalloc on the mac to manage our JS heap? Please feel free to point me to other bugs where this is explained.
(In reply to comment #13) > I apologize for missing a large chunk of research here but -why- do we need > jemalloc on the mac to manage our JS heap? Please feel free to point me to > other bugs where this is explained. We don't need it currently, it's just one idea we might explore as we evolve the GC. Bug 560818 is one example. Further in the future, we may want to reuse a large part of jemalloc if move to a copying collector. Also, we've seen plenty of differences in behavior between the system allocator and jemalloc on Linux and Windows, and it will help us diagnose performance problems if we can compare the behavior of the system allocator to jemalloc. The bottom line is that we need jemalloc to work on Mac, even if we don't end up using it.
Is there a test or sequence of steps in Firefox that I can use to benchmark memory allocators? I'm looking for a page to open, JS script to run, etc.
Oh no, benchmarking. SunSpider and V8 are too "industry-standard" benchmarks the major browser JS implementors are tuning for, or at least worrying about due to bench-marketing. They're fairly bogus based on real-world VM-logging studies done by Jan Vitek and his students at Purdue. But we are stuck with them for now. Another issue than perf is fragmentation. Stuard had some visualizations and a test harness (IIRC schrep helped on it) up on his blog in Firefox 3 beta era, showing victory via jemalloc, and better behavior than other browsers. We need this kind of test infra running 24x7. Is it yet? /be
Lets just get us using jemalloc everywhere on mac.
Blocks: 584446
Another reason (whether it'll help or not, I don't know): https://bugzilla.mozilla.org/show_bug.cgi?id=584446
Blocks: 586962
Whiteboard: [needs new patch]
blocking2.0: --- → ?
joelr: I've started looking at this now, let me know if you're still working on it.
blocking2.0: ? → ---
Quoting from pavlov in bug 580404: <quote> It should be pretty trivial. Most of the code runs, the biggest issue is just how to hook it up to the system. We've got different ways we use on each OS. On Windows, we replace the CRT which works great. On Linux, we can just build a dynamic library, link to it, and the dynamic linker takes over. On Mac, we wrote code to hook in to the OS's zone system. Check out: http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#6380 Basically on the Mac, there is this zone system, which is basically just a list of things that can allocate. It has some weird characteristics though, such as free() walks through each zone and asks if the zone owns the object and if so then calls the zone's free function. This means that there are a number of bugs in MacOS where the OS double frees things, beucase if nothing owns it, nothing frees it! We optimized that a bit, so that we could check as well. Because the Mac dynamic linker doesn't work the same as Linux, it is difficult to get our code to load fast enough to replace the default zone before other allocations happen. We ended up using the linker trick here: http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/Makefile.in#141 to get it to happen early, but it wasn't quite perfect. If you can get the stuff above to work well, it would probably be best, but alternatively you could use #defines or see if GCC's wrap stuff works (like we're using on Android) to just overwrite allocations throughout all of Mozilla's code and let the OS/system libraries allocate things in its own space. Anyways, I don't think any of this would be a ton of work. Maybe a week of trying a few different things and running it through our unit tests? We had jemalloc running on Mac, so at worst you're just looking at a little bitrot of ifdefs in the code. </quote>
Attached patch Build on OS X (obsolete) — Splinter Review
This makes Firefox build again on Mac. It's not tested to see if it will even run. It's basically the old patch, provided above by jasone, with a few small build system hacks, and a fix for the bitrot.
Assignee: joelr → pbiggar
Attachment #330314 - Attachment is obsolete: true
Attachment #466996 - Flags: feedback?(jasone)
jemalloc doesn't build as a library on the Mac, because PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is not defined. Apparently, OS X doesn't have the right kind of mutexes, so this is non-trivial to fix. I'm guessing that I can workaround this by using NSPR, so I'm going to try importing directly into the Mozilla sources (bug 580408) first.
PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is a Linux-specific performance hack. We just need to be able to statically initialize the mutex, which should still be possible on OS X.
The pthreads spec says this should work: pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
Coming from a different angle, this makes the standalone jemalloc (from git://canonware.com/jemalloc.git) compile on Mac. There aren't any tests, so I don't know whether this works. Not compiled on any other platform.
Attachment #468335 - Flags: feedback?(jasone)
Comment on attachment 468335 [details] [diff] [review] Port standalone jemalloc for Mac. Wow, I'm surprised it took so little to get the source to compile. The patch looks good to me, other than that we should use PTHREAD_MUTEX_DEFAULT rather than PTHREAD_MUTEX_RECURSIVE. There's also quite a bit of zone-related code that needs to be integrated for jemalloc to actually do anything useful on OS X. First though I want to get the code that's already in mozilla-central working.
Attachment #468335 - Flags: feedback?(jasone) → feedback+
(In reply to comment #27) > Wow, I'm surprised it took so little to get the source to compile. The patch > looks good to me, other than that we should use PTHREAD_MUTEX_DEFAULT rather > than PTHREAD_MUTEX_RECURSIVE. Is there any way to test this stuff? I'm feel like I'm flying a bit blind.
Use normal lock.
Attachment #468335 - Attachment is obsolete: true
(In reply to comment #28) > I assume the zone-related code is something like this: > http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_mac.mm?annotate=51407#l610 Actually, memory/jemalloc/jemalloc.c already has the zone-related code, but I haven't integrated it with the stand-alone jemalloc yet.
Any chance of a tryserver or plain binary build for the Mac? (32-bit or 64-bit) I'm happy to run it through my standard torture test over the next few days to see if it helps with memory fragmentation and release.
I just got Firefox+jemalloc building and running on OS X a couple of hours ago. Once I'm a bit more confident about the changes I'll attach a patch for review.
This patch fixes the build on OS X when --enable-jemalloc is specified (still not enabled by default). I built/ran Firefox on OS X and Linux with this patch applied, with no apparent issues. I know very little about the current memory fragmentation problems on OS X; can someone fill me in on the problems? Are the problems specific to OS X 10.6? JavaScript?
Attachment #466996 - Attachment is obsolete: true
Attachment #470247 - Flags: review?(pbiggar)
Attachment #466996 - Flags: feedback?(jasone)
This updated patch incorporates/updates the jemalloc_dzone code from the older patches associated with this bug, but leaves the malloc_zones order alone. The result is that nearly all memory allocation is performed via jemalloc, but the odd allocation that came from the system szone is still safely handled. I tortured a debug Firefox with this patch incorporated, and saw no stability issues over the course of 45 minutes. For lack of an easy alternative, I used sunspider to compare speed and memory usage. Memory usage decreased by ~4%, and speed decreased by ~2%.
Attachment #470247 - Attachment is obsolete: true
Attachment #470361 - Flags: review?(pbiggar)
Attachment #470247 - Flags: review?(pbiggar)
Comment on attachment 470361 [details] [diff] [review] Fix OS X build when using --enable-jemalloc I'm afraid I don't know the code well enough to review this yet. Stuart?
Attachment #470361 - Flags: review?(pbiggar) → review?(pavlov)
(In reply to comment #34) > I know very little about the current memory fragmentation problems on OS X; can > someone fill me in on the problems? Are the problems specific to OS X 10.6? > JavaScript? The fragmentation problems are documented in bug 584446 (check out the attachments). (Bug 584446 mention tagged pointers, which should be gone when fatvals lands tomorrow I think, on the tracemonkey branch).
(In reply to comment #35) > Created attachment 470361 [details] [diff] [review] > Fix OS X build when using --enable-jemalloc I pushed this to tryserver, and it doesn't build on most platforms (http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry, search for e41b94886585). I presume this was known. To test, I'd really like to get a working version to :limi. Can you push to tryserver when working?
(In reply to comment #38) > I pushed this to tryserver, and it doesn't build on most platforms > (http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry, search for > e41b94886585). I presume this was known. Nope I didn't know about the build problems because I had to get my hg account re-enabled yesterday (now taken care of). It looks to me like only 2 of the ~14 platforms suffered a build failure for your tryserver push... Am I looking at the wrong info?
(In reply to comment #39) > Nope I didn't know about the build problems because I had to get my hg account > re-enabled yesterday (now taken care of). It looks to me like only 2 of the > ~14 platforms suffered a build failure for your tryserver push... Am I looking > at the wrong info? No, you're looking at the right info, my mistake (when I wrote the comment only those two had failed, so "most" meant windows and mac).
Note that most of those "builds" are actually just test runs. If your build failed on Windows and Mac, then the only thing it succeeded in building was Linux.
(In reply to comment #41) > Note that most of those "builds" are actually just test runs. If your build > failed on Windows and Mac, then the only thing it succeeded in building was > Linux. Everything is green except OS X opt and Win2003 opt.
This updated patch probably fixes the build failures on OS X and Windows 2003, though I pushed enough bad intermediate tryserver jobs that it's going to be a while before we know for sure.
Attachment #470361 - Attachment is obsolete: true
Attachment #470707 - Flags: review?
Attachment #470361 - Flags: review?(pavlov)
Attachment #470707 - Flags: review? → review?(pavlov)
Starting to test whether it'll solve the issues with memory fragmentation that I was seeing, will know in a few days, I guess. I'm using this build, let me know if it's not the right one: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-fbbadab94d7d/tryserver-macosx/
Since this now has a new patch, I'm clearing that whiteboard value (if this isn't how it works, let me know! :)
Whiteboard: [needs new patch]
No, that build is missing the patch to enable jemalloc on OS X. You want this one instead: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-c7a064be4e25/tryserver-macosx/
Thanks! Good thing I checked. ;)
(In reply to comment #46) > No, that build is missing the patch to enable jemalloc on OS X. You want this > one instead: > > http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-c7a064be4e25/tryserver-macosx/ Crashes on load (note that I renamed the app bundle to not conflict with my other installations) with: Process: firefox-bin [151] Path: /Applications/jemalloc.app/Contents/MacOS/firefox-bin Identifier: org.mozilla.minefield Version: ??? (???) Code Type: X86 (Native) Parent Process: launchd [99] Date/Time: 2010-08-31 18:14:35.723 -0700 OS Version: Mac OS X 10.6.4 (10F569) Report Version: 6 Interval Since Last Report: 346128 sec Crashes Since Last Report: 4 Per-App Crashes Since Last Report: 3 Anonymous UUID: 28570194-9B5C-405F-A8D5-818F3F658117 Exception Type: EXC_BREAKPOINT (SIGTRAP) Exception Codes: 0x0000000000000002, 0x0000000000000000 Crashed Thread: 0 Dyld Error Message: Library not loaded: @executable_path/libjemalloc.dylib Referenced from: /Applications/jemalloc.app/Contents/MacOS/firefox-bin Reason: image not found
Probably need a packaging change to http://hg.mozilla.org/mozilla-central/file/tip/browser/installer/package-manifest.in to include libjemalloc.dylib in the dmg.
There's some build system magic that is apparently necessary for a library to be included in the final package. I have another tryserver build in process now that will probably result in a useful package at: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-a5313f943162/tryserver-macosx/
Attached patch Packaging fixes (obsolete) — Splinter Review
Some of the changes in this patch may be unnecessary. Can someone more knowledgeable about the build system comment on which parts to integrate into the main patch?
(In reply to comment #50) > http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-a5313f943162/tryserver-macosx/ This package does have libjemalloc.dylib in it, but a crash occurs shortly after startup: #0 0x97f978d8 in szone_free_definite_size () #1 0x97f96b88 in free () #2 0x97fbafec in fclose () #3 0x00dc6104 in nsINIParser_internal::Init ()^M#4 0x0000003f in ?? () Previous frame inner to this frame (gdb could not unwind past this frame) The MinefiledDebug package I built on my own machine last night worked fine... I'll poke at things some more this evening.
Any try server builds available yet? Deadline approaching, so eager to start testing. :)
Hasn't crashed so far, so at least I can give it a try. I'll report back in a day or so. Thanks! :)
What's the next step? I can build OS X packages that seem to work, but the tryserver package crashes on startup, and if I copy the .mozconfig used for the tryserver build I get a build error.
Hopefully you're just looking at the 64bit builds coming of the 10.6 builder on try, which is using XCode 3.2.1. How does that compare to your machine ? Could you post the build error ?
Attached patch jemalloc for OS X (obsolete) — Splinter Review
This patch fixes a couple of zone-related issues as compared to the previous one. There is still some sort of problem with running a 32-bit 10.5 binary on 10.6 (crashreporter intervenes), but since the only way I can build such binaries right now is via the try server, I haven't had time to dig in any further. This patch feels really close to being done.
Attachment #470707 - Attachment is obsolete: true
Attachment #471023 - Attachment is obsolete: true
Attachment #472560 - Flags: review?(pavlov)
Attachment #470707 - Flags: review?(pavlov)
(In reply to comment #57) > Hopefully you're just looking at the 64bit builds coming of the 10.6 builder on > try, which is using XCode 3.2.1. How does that compare to your machine ? Could > you post the build error ? I was trying to run a 32-bit build (10.5 builder) on 10.6. I'm running OS X 10.6.4, with XCode 3.2.3.
Attachment #472560 - Attachment is patch: true
Attachment #472560 - Attachment mime type: application/octet-stream → text/plain
To summarize: * I have been using a 64-bit optimized Minefield based on patches attached to this bug for about two weeks, with zero problems. * The latest patch results in a completely successful tryserver build/test. * I tested the .dmg packages built by tryserver on my OS X 10.6.4 system, with the following results: - Works: 64-bit opt (10.6) - Works: 64-bit debug (10.6) - Fails: 32-bit opt (10.5) - Works: 32-bit debug (10.5) I fixed several problems regarding malloc zone structure compatibility that showed up in earlier rounds of tryserver testing, but I see nothing useful in the crashes I get from tryserver packages built with the latest patch. I don't have OS X 10.5, so for me to do further testing of 10.5 builds would require substantial additional effort (namely setting up an additional development environment). If this bug is a priority, I hope that someone with broader platform access can look into this final niggle. * We don't have any data I'm aware of that indicates whether jemalloc improves the current fragmentation problem on OS X. The current version of this patch is certainly solid enough for such measurement. In the absence of such data, this change is too risky to justify inclusion in a late-stage beta release (though it really should be committed in the long run, even if jemalloc is disabled by default).
Thanks for the analysis, Jason - moving target to future.
Target Milestone: --- → Future
(In reply to comment #60) > I fixed several problems regarding malloc zone structure compatibility that > showed up in earlier rounds of tryserver testing, but I see nothing useful in > the crashes I get from tryserver packages built with the latest patch. I don't > have OS X 10.5, so for me to do further testing of 10.5 builds would require > substantial additional effort (namely setting up an additional development > environment). If this bug is a priority, I hope that someone with broader > platform access can look into this final niggle. I'm on it, with my shiny new mac. > * We don't have any data I'm aware of that indicates whether jemalloc improves > the current fragmentation problem on OS X. The current version of this patch > is certainly solid enough for such measurement. In the absence of such data, > this change is too risky to justify inclusion in a late-stage beta release > (though it really should be committed in the long run, even if jemalloc is > disabled by default). Alexander, can you evaluate this?
(In reply to comment #62) > Alexander, can you evaluate this? Yes, I'm currently running the build from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jasone@canonware.com-aa8dd97a92eb/tryserver-macosx64/ …and will report back once I have enough vmmap data + about:memory output to see if it helps fragmentation. (let me know if this is not the right build to test)
Jason, Turns out, about:memory doesn't report the numbers I need for this comparison with the build mentioned in comment 63. The entirety of the output looks like this: Other Information xpconnect/js/gcchunks 51,380,224 storage/sqlite/pagecache 28,692,808 storage/sqlite/other 1,619,856 images/chrome/used/raw 0 images/chrome/used/uncompressed 271,752 images/chrome/unused/raw 0 images/chrome/unused/uncompressed 0 images/content/used/raw 180,521 images/content/used/uncompressed 287,820 images/content/unused/raw 4,570 images/content/unused/uncompressed 8,192 layout/all 9,522,748 layout/bidi 7,796 gfx/surface/image 556,384 content/canvas/2d_pixel_bytes 0 Note how all the malloc/* data is missing, as well as the topmost "Overview" section. If you can post a tryserver build with these hooked up (you need to tell it that jemalloc is enabled, according to Vlad), I can get some real numbers on this.
(In reply to comment #64) > Note how all the malloc/* data is missing, as well as the topmost "Overview" > section. > > If you can post a tryserver build with these hooked up (you need to tell it > that jemalloc is enabled, according to Vlad), I can get some real numbers on > this. I'm doing this now.
(In reply to comment #60) > - Fails: 32-bit opt (10.5) I can reproduce this. Working on it.
Paul got me a working build with the malloc + Overview data, so I'm running that for the next days to see if vmmap and that output gives us anything useful.
OK, so here's the output from about:memory, and I've attached the vmmap output from the process. about:memory: Memory mapped: 682,622,976 Memory in use: 379,272,540 malloc/allocated 379,339,996 malloc/mapped 682,622,976 malloc/committed 682,622,976 malloc/dirty 3,194,880 xpconnect/js/gcchunks 82,837,504 storage/sqlite/pagecache 53,719,744 storage/sqlite/other 2,252,296 images/chrome/used/raw0images/chrome/used/uncompressed 597,440 images/chrome/unused/raw0images/chrome/unused/uncompressed 3,072 images/content/used/raw 341,833 images/content/used/uncompressed 1,343,528 images/content/unused/raw 7,398 images/content/unused/uncompressed 16,384 layout/all 771,736 layout/bidi0gfx/surface/image 1,990,284 content/canvas/2d_pixel_bytes 8,044,872
Attachment #476860 - Attachment is obsolete: true
(In reply to comment #68)> malloc/allocated 379,339,996> malloc/mapped 682,622,976> malloc/committed 682,622,976I know that on OSX we don't have a way to force the OS to reclaim unused pages, but is the memory actually set up in such a way that the OS will do it under pressure? That is, above, will the OS take back the 300 or so MB that we're not actually using, if another app needs it?
(In reply to comment #70) > (In reply to comment #68)> malloc/allocated 379,339,996> > malloc/mapped 682,622,976> malloc/committed > 682,622,976 > I know that on OSX we don't have a way to force the OS to reclaim > unused pages, but is the memory actually set up in such a way that the OS will > do it under pressure? That is, above, will the OS take back the 300 or so MB > that we're not actually using, if another app needs it? Josh might know, adding him to CC.
(In reply to comment #71) > Josh might know, adding him to CC. I don't know.
(In reply to comment #70) > (In reply to comment #68)> malloc/allocated 379,339,996> > malloc/mapped 682,622,976> malloc/committed > 682,622,976I know that on OSX we don't have a way to force the OS to reclaim > unused pages, but is the memory actually set up in such a way that the OS will > do it under pressure? That is, above, will the OS take back the 300 or so MB > that we're not actually using, if another app needs it? I think so, but I haven't run experiments to verify this. See comment #5 in bug #584446 for details.
(In reply to comment #60) > * I tested the .dmg packages built by tryserver on my OS X 10.6.4 system, with > the following results: > - Works: 64-bit opt (10.6) > - Works: 64-bit debug (10.6) > - Fails: 32-bit opt (10.5) > - Works: 32-bit debug (10.5) I'm trying to understand this better. Are you saying that when you pushed to tryserver, 32bit builds worked, but only the dmgs did not? For me, everything 32bit fails (opt, build, combinations thereof).
OS: Mac OS X → Windows 7
This is the mozconfig I'm using to build 32-bit firefox, which fails on load. It varies from the one on the wiki, largely by using i386-apple-darwin9.2.0 as the target, and drawf-2 as the debug symbol type, which I copied from the tryserver logs. Every variation of --{enable,disable}-{debug,optimize} fails. I'm executing this as objdir/dist/Minefield.app.Contents/MacOS/firefox-bin or objdir/dist/MinefieldDebug.app.Contents/MacOS/firefox-bin.
This is pretty much the same stack trace I get from any 32-bit build. The interesting partis here: 44 com.apple.AppKit 0x9021e811 +[NSBundle(NSNibLoading) loadNibFile:externalNameTable:withZone:] + 158 45 XUL 0x00c99115 nsAppShell::Init() + 821 (nsAppShell.mm:294) which touches on Zone stuff. In widget/src/cocoa/nsAppShell.mm: [NSBundle loadNibFile: [NSString stringWithUTF8String:(const char*)nibPath.get()] externalNameTable: [NSDictionary dictionaryWithObject:[NSApplication sharedApplication] forKey:@"NSOwner"] withZone:NSDefaultMallocZone()]; Note the NSDefaultMallocZone.
(In reply to comment #58) > Created attachment 472560 [details] [diff] [review] > jemalloc for OS X > > This patch fixes a couple of zone-related issues as compared to the previous > one. There is still some sort of problem with running a 32-bit 10.5 binary on > 10.6 (crashreporter intervenes), but since the only way I can build such > binaries right now is via the try server, I haven't had time to dig in any > further. > > This patch feels really close to being done. Are you sure this is the patch you pushed to tryserver? It crashes for me in debug since malloc_initialized is only set after both these calls, which assert that it's true. #ifdef MOZ_MEMORY_DARWIN /* Register the custom zone. */ malloc_zone_register(create_zone()); /* * Convert the default szone to an "overlay zone" that is capable * of deallocating szone-allocated objects, but allocating new * objects from jemalloc. */ szone2ozone(malloc_default_zone()); #endif malloc_initialized = true; I wonder if I've been trying to debug the wrong problem?
OS: Windows 7 → Windows XP
Just so that we're on the same page, the remaining problems, and the people solving/working on them, are: vlad: - we dont know if this solves the fragmentation problem (comment 70) pbiggar (vlad and jasone were looking at this too): - we dont know if the OS reclaims unused pages (comment 73) pbiggar (need feedback from jasone): - I can't get any 32bit build to work (comment 75) - jasone's tryserver build failed for 32bit optimized only (comment 60) jasone (waiting for feedback from nthomas): - the tryserver dmgs fail due to a packaging problem (comment 52)
(In reply to comment #78) > pbiggar (vlad and jasone were looking at this too): > - we dont know if the OS reclaims unused pages (comment 73) Running the attached program (alloc1), we can see that the OS returns the memory. I'm currently using 2.8GB of a 4GB mac, and alloc1 mallocs 1GB then writes into it so the OS actually allocated it. This registers in Activity Monitor as using 1GB. As soon as free() is called, the Activity Monitor shows an extra 1GB of free memory, and lists alloc1 as using 340KB only. This is on 10.6. I assume this is good enough, and that I don't need to check for 10.5 (and probably can't without a 10.5 OS anyway).
(In reply to comment #79) > Running the attached program (alloc1), we can see that the OS returns the > memory. Or I completely misunderstood the problem. My bad. Looking at this properly now.
(In reply to comment #70) > (In reply to comment #68)> malloc/allocated 379,339,996> > malloc/mapped 682,622,976> malloc/committed > 682,622,976 > I know that on OSX we don't have a way to force the OS to reclaim > unused pages, but is the memory actually set up in such a way that the OS will > do it under pressure? That is, above, will the OS take back the 300 or so MB > that we're not actually using, if another app needs it? I confess that I don't fully understand what question is being asked. The most obvious have obvious answers (yes and yes): - are allocations which are never written to reused by the OS - are allocations which are freed reused by the OS So that leaves complicated ones: - if we realloc an allocation to use less space, does the OS reclaim the now unused pages? - if we only use a small portion of an allocation, is all the allocated space used? (I would think not, but we'll see) I'll get answers to them now.
IIn reply to comment #81) > I confess that I don't fully understand what question is being asked. Ah, the question is from bug 584446, esp comments 4 and 5. So the question is that if we allocate memory, use it, then no longer need most of it, can we give it back. In the context of jemalloc, this means we take a very large allocation, hand it out to smaller user-space allocations, get those allocations back, and want to return them to the OS. Specifically, we want check whether msync(MS_KILLPAGES) does this.
This program allows us test whether memory is being reclaimed. Procedure: Compile with gcc: gcc -c alloc2.c -o alloc2 Run in the mode you'd like to test in one terminal: ./alloc2 lk Wait for the action, then run in "normal" mode in other terminal: ./alloc2 ln See what happens to the swap in the OS X Activity monitor. This is hardcoded to allocate 2GB, so you should have (a lot) less than 2GB of free RAM to test this. Else change the hard coding. Results to follow.
Attachment #478211 - Attachment is obsolete: true
(In reply to comment #83) > Results to follow. So it looks like the correct way to reclaim pages is madvise(MADV_FREE). I also tried madvise(MADV_DONTNEED), msync(MS_DEACTIVATE) and msync(MS_KILLPAGES). Tried with both mmap and malloc, and it held up for both. Note that everything on the internet says the opposite, so this means my results really need to be double-checked. For reference, here's my system info: Version 10.6.4 $ uname -a Darwin valrhona.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386 MacBookPro6,2 Darwin $ gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664) For reference, the internet disagrees, so this must have changed relatively recently. http://lists.apple.com/archives/darwin-development/2003/Apr/msg00223.html http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/ (madvise does nothing) In our jemalloc code we have this (use msync(MS_KILLPAGES) rather than madvise(MADV_FREE)): #elif (defined(MOZ_MEMORY_DARWIN)) msync((void *)((uintptr_t)chunk + (i << pagesize_2pow)), pagesize * npages, MS_KILLPAGES); #else madvise((void *)((uintptr_t)chunk + (i << pagesize_2pow)), (npages << pagesize_2pow), MADV_FREE); #endif My worry here is that different versions of the platforms use different conventions, which would be pretty awkward.
Attachment #472560 - Attachment is obsolete: true
Attachment #478290 - Flags: review?
Attachment #472560 - Flags: review?(pavlov)
(In reply to comment #85) > Created attachment 478290 [details] [diff] [review] > jemalloc on OS X (now with about:memory) This is jasone's, with support for seeing jemalloc stats in about:memory. This was the patch used for the build limi discusses in comment 69.
You might ask on the Apple perf list about the memory discarding stuff. http://lists.apple.com/mailman/listinfo/perfoptimization-dev
It seems like the linker flag -dead_strip is the problem with the 32-bit optimized build. That's the only difference between a build with --enable-optimize=-O0 --enable-debug and one built --disable-optimize --enable-debug, and the latter works.
OS: Windows XP → All
(In reply to comment #84) > My worry here is that different versions of the platforms use different > conventions, which would be pretty awkward. Well, luckily there aren't that many different versions in play here.. we can easily make the way to free be a runtime choice that we make based on OS version, right? If you can put together a small standalone test program, we should be able to collect all the data that we need quickly.
The corresponsing about:memory output after running Paul's new build for a few days: Memory mapped: 707,788,800 Memory in use: 405,449,436 malloc/allocated 405,514,844 malloc/mapped 707,788,800 malloc/committed 707,788,800 malloc/dirty 2,555,904 xpconnect/js/gcchunks 99,614,720 storage/sqlite/pagecache 58,766,768 storage/sqlite/other 2,450,704 images/chrome/used/raw0images/chrome/used/uncompressed 881,608 images/chrome/unused/raw0images/chrome/unused/uncompressed 3,072 images/content/used/raw 101,302 images/content/used/uncompressed 1,176,316 images/content/unused/raw 276,447 images/content/unused/uncompressed 4,959,808 layout/all 721,448 layout/bidi 0 gfx/surface/image 7,021,944 content/canvas/2d_pixel_bytes 0
Note that all of the vmmap/heap added this to stdout when run: 2010-09-28 22:56:42.035 vmmap[7915:903] *** Symbolication: Don't know how to introspect target process's malloc zone named jemalloc_ozone 2010-09-28 22:56:42.039 vmmap[7915:903] *** Symbolication: Don't know how to introspect target process's malloc zone named jemalloc_zone
(In reply to comment #89) > It seems like the linker flag -dead_strip is the problem with the 32-bit Nope, my mistake. The current stack trace I'm getting (with CFQSortArray at the top) occurs on all 32-bit builds with jasone's patch.
This fixes the problems with 32-bit, I believe. (Actually, this fixes the problems with running 32-bit build on 10.6, so we're waiting for the verdict from TryServer). When we compile against 10.5 headers, the malloc_zone_t struct is smaller than when we link against 10.6 headers. However, at run-time on 10.6, we're passed the 10.6 struct, which we copy into a struct which is too small. When OS X checks for the missing fields, we segfault. I'm not sure why it only happened where it did, perhaps it's the first call to memalign. This fixes it by using the correct size, predicated on the version field. We copy from the Apple headers to know both sizes no matter which SDK we use, so this may need some legal thing, which I'll look into next.
Attachment #478290 - Attachment is obsolete: true
Attachment #481164 - Flags: review?
Attachment #478290 - Flags: review?
(In reply to comment #79) > This is on 10.6. I assume this is good enough, and that I don't need to check > for 10.5 (and probably can't without a 10.5 OS anyway). I've confirmed this on 10.5 too. So on OSX, madvise(MADV_FREE) works.
(In reply to comment #92) > Created attachment 479316 [details] > vmmap with -resident argument added > > Note that all of the vmmap/heap added this to stdout when run: > > 2010-09-28 22:56:42.035 vmmap[7915:903] *** Symbolication: Don't know how to > introspect target process's malloc zone named jemalloc_ozone > > 2010-09-28 22:56:42.039 vmmap[7915:903] *** Symbolication: Don't know how to > introspect target process's malloc zone named jemalloc_zone Thanks! OK, the problem remains that we can't tell if this is helpful. The reason is that we don't supply the right callbacks to introspect the jemalloc heap. I'm looking into this now, but I can't find any docs that describe this, so I'll have to reconstruct from reading the OSX source code.
(In reply to comment #97) > (In reply to comment #92) > > Created attachment 479316 [details] [details] > > vmmap with -resident argument added > > OK, the problem remains that we can't tell if this is helpful. The reason is > that we don't supply the right callbacks to introspect the jemalloc heap. I'm > looking into this now, but I can't find any docs that describe this, so I'll > have to reconstruct from reading the OSX source code. Why isn't RSIZE, as reported by top, in combination with the about:memory stats, adequate? Re: the introspection callbacks, I know how to write them, but it is quite difficult to get the information out of jemalloc's internal data structures. Do we really need this?
(In reply to comment #98) > Re: the introspection callbacks, I know how to write them, but it is quite > difficult to get the information out of jemalloc's internal data structures. > Do we really need this? Perhaps not, I'll check it out. But as a matter of interest, is this actually possible? It seems that heaps and vmmap call introspection algorithms which are part of heaps/vmmap, not part of firefox or jemalloc. Is it possible to actually get introspection to work? I've tried writing a throw-away program that calls an introspection function, and i haven't been able.
(In reply to comment #98) > Why isn't RSIZE, as reported by top, in combination with the about:memory > stats, adequate? I think it should be, assuming you mean RPRVT?. Just to be sure I'm on the same page, you're saying that the difference between RPRVT in top and "Memory in use" in about:memory indicates the fragmentation? Limi is away, but I'll assume that the way to test this is to heavily load the browser with things like google docs, leave it, close all the windows, and see the difference described above.
Here are the memory stats (speed stats to come): Ultra-short summary: -------------------- jemalloc is 15-20% better overall. Short summary: ------------------- Memory use is reduced 15% at full load and 20% after tabs are closed. Fragmentation is solved. However, some numbers look worse, and it could be perceived that we use _more_ memory, which is not true. Procedure: ------------ - opened empty firefox - added tab for about:memory - added gmail tab - added google docs tab - opened 16 google docs documents, each in a separate tab - waited for tabs to load and then cycled through them - refreshed about:memory and recorded "Full load" measurement. - closed each tab, except about:memory - refreshed about:memory and recorded "closed tabs" measurement. - launched a compile in the background and waited a while, so that memory would be reclaimed from firefox by the OS - refreshed about:memory and recorded "after a while" measurement. Each time measurement consists of: - about:memory: "Memory mapped" and "Memory in use" - Activity monitor: "Real mem" and "Private mem". I recorded these for the 4 combinations of jemalloc enabled/disabled and 32-bit/64-bit. Note that the 32-bit measurement was on 10.6, not 10.5, so the 32-bit-jemalloc-disabled result may not be what 10.5 users actually experience. However, 10.6 supposedly brought improvements to fragmentation, so I believe the 10.5 results are going to be unfair to jemalloc, if anything. Memory use discussion: --------------------- - By comparing absolute "Private mem" numbers, we can see the total amount of memory that firefox consumes (that is, which is unavailable to other applications without paging). **This is the most important figure** On both 32-bit and 64-bit, jemalloc is better, 8-9% on 32bit and 15% on 64bit, during "full load". "After a while", jemalloc is much better, 33% on 32-bit and 20% on 64-bit. This is the most important number, and it's a definite, but not huge, win. The jemalloc "Memory in use" number is way higher than the mac allocator one. Since we allocate the same amount of memory internally, this is a measure of how much overhead jemalloc takes, which is about 30%. However, this is more than accounted for by the fragmentation, and so jemalloc comes out better here. Fragmentation discussion: ----------------------- - We can determine fragmentation by comparing "Memory in use" to "Private mem". On 32-bit, the mac allocator uses way more memory than we are trying to: 50% more during "full load", and 100% more "after a while". In jemalloc, the difference is minimal, about 3%. On 64-bit, it's just as bad; about 42%, going up to 100% "after a while". Jemalloc has a smaller difference, perhaps 15%, going down to 1% "after a while". Note that this is just a side show to "Memory use discussion", but it shows that fragmentation was the problem, and jemalloc solves it. Perceived memory use: -------------------- - Users are unlikely to know that private memory is the correct metric, and are more likely to look at larger numbers. - Those who dig deeper are more like to to use "Real mem" and "Memory in use" since they sound correct. - The OS doesn't take memory back immediately, so the "closed tab" metric might be used, even though the OS will take back that memory if it's needed. The reporting tools will report the higher numbers until that is taken back. Those looking at "about:memory" may believe that we are now using much more memory. The jemalloc figures are much higher, perhaps 25% higher on 64 bit. The "memory in use" figures look really good for the mac allocator, about twice as good as for jemalloc. However, they are just fiction, and in "Private mem", jemalloc always does better. Those looking at Activity monitor's "real mem" will see no difference between the two allocators. jemalloc looks much worst in the "closed tabs" segment of the test. It is much slower to return memory, and appears to hold on to 40-60% more memory than the mac allocator. However, it happily hands it back when the OS asks. Raw numbers: ---------------- Number are in the form: Full load -> closed tabs -> after a while Mac, 32bit, Private mem: 464 -> 130 -> 120 Mac, 32bit, Memory in use: 303 -> 93 -> 59 jemalloc, 32bit, Private mem: 426 -> 288 -> 91 jemalloc, 32bit, Memory in use: 412 -> 134 -> 103 Mac, 64bit, Private mem: 789 -> 250 -> 178 Mac, 64bit, Memory in use: 445 -> 165 -> 77 jemalloc, 64bit, Private mem: 693 -> 430 -> 152 jemalloc, 64bit, Memory in use: 605 -> 213 -> 148 Mac, 32bit, Real mem: 579 -> 420 -> 402 Mac, 32 bit, Memory mapped: 359 -> 353 -> 351 jemalloc, 32bit, Real mem: 581 -> 446 -> 207 jemalloc, 32bit, Memory mapped: 391 -> 409 -> 378 Mac, 64bit, Real mem: 880 -> 601 -> 220 Mac, 64 bit, Memory mapped: 486 -> 478 -> 466 jemalloc, 64bit, Real mem: 878 -> 622 -> 330 jemalloc, 64bit, Memory mapped: 631 -> 621 -> 570 I experienced some unrelated crashes, and had to repeat some of the tests in various ways, but these numbers were pretty static the whole time, and I have high confidence in them. While it isn't the most scientific test, I believe it's powerful enough to show that the move to jemalloc is warranted.
Closing tabs may not do what you think it does, due to docshells being cached for fast re-opening including user state (form, dynamic DOM modification, plugin stuff, etc.). I forget how the timers work there, but it's what users will actually *do*, so you're probably measuring the right thing. ;-)
(In reply to comment #102) > Closing tabs may not do what you think it does, due to docshells being cached > I forget how the timers work there, but it's what users > will actually *do*, so you're probably measuring the right thing. ;-) Yeah, this is what I was going for.
Speed tests: Raw data: ------------ Sunspider | v8 | kraken jemalloc32: 389.9 2312 8069 mac32: 386.5 2334 7774 jemalloc64: 410 2149 7789 mac64: 381 2115 7597 Procedure: ------------ These were all measured in-browser, not using the shell, since we haven't go jemalloc working properly for the shell yet. I measured the sunspider one 10 times, and the others 5 times each, and averaged them. They all do multiple runs, but I still hit a lot of variance, hence multiple runs. Discussion ----------- This shows that jemalloc slows things down, but not massively. On 32 bit, jemalloc is 1%, 1% and 3.6% slower on SS, v8 and kraken respectively. On 64bit, it's 7% slower, 1.6% faster and 2.5% slower. I think that all of these are in the noise, except the 7%, which I am suspicious of, but which we can at least think of as a reasonably hard bound.
Depends on: 603655
The APSL needs some comments detailing modifications and modification dates, for osx_zone_types.h. This is discussed in bug 603655.
Attachment #481164 - Attachment is obsolete: true
Attachment #481164 - Flags: review?
I sent this to tryserver and found no bugs. There are plenty of oranges, but I can't find any that aren't in other builds around the same time, or that aren't intermittent oranges with bugs filed.
Attachment #479318 - Attachment is obsolete: true
Attachment #482570 - Flags: review?(pavlov)
Attachment #482570 - Flags: feedback?(jasone)
This removes any Apple code from the patch. It doesn't need to be checked for legal issues (that's being handled in bug 603655), but it should be checked for readability from the fallout of that bug. (We aren't allowed copy apple's malloc_zone_t struct, as it's under an incompatible license, and so have to work around it).
Attachment #482570 - Attachment is obsolete: true
Attachment #484711 - Flags: review?(pavlov)
Attachment #484711 - Flags: feedback?(jasone)
Attachment #482570 - Flags: review?(pavlov)
Attachment #482570 - Flags: feedback?(jasone)
Comment on attachment 484711 [details] [diff] [review] jemalloc on OSX (now without copyright issues) I'll try to take a look at this tonight.
Comment on attachment 484711 [details] [diff] [review] jemalloc on OSX (now without copyright issues) Nicely done. Well structured, and well documented.
Attachment #484711 - Flags: feedback?(jasone) → feedback+
Comment on attachment 484711 [details] [diff] [review] jemalloc on OSX (now without copyright issues) Awesome work. Looking forward to seeing this get checked in. >diff --git a/configure.in b/configure.in > else > MOZ_OPTIMIZE_FLAGS="-O3 -fno-omit-frame-pointer" > fi >+ MOZ_MEMORY=1 > _PEDANTIC= > CFLAGS="$CFLAGS -fpascal-strings -fno-common" > CXXFLAGS="$CXXFLAGS -fpascal-strings -fno-common" Just something I noticed as I was skimming the patch: use spaces here instead of tabs.
Attachment #484711 - Flags: review?(pavlov) → review+
I think we should take this, and continue work on taking upstream jemalloc, which should give us better perf characteristics.
(In reply to comment #110) > >+ MOZ_MEMORY=1 > Just something I noticed as I was skimming the patch: use spaces here instead > of tabs. Good spot, thanks.
Sorry to add process noise here, but who makes the call on whether this will make Firefox 4? I care quite a bit about this from a user experience perspective, as low-memory situations are quite common on the Mac (especially after 10.6 shipped), so I want us to be as good with freeing up memory and avoiding fragmentation as we possibly can. The patch seems complete and working, and I've been running the tryserver builds for a few days without any issues.
I am opposed to this feature landing in FF4: switching allocators is incredibly risky at this point in the game, because it can affect all kins of third-party code that is loaded in our process in rather unpredictable ways. This is especially true because allocator errors typically don't show up as recognizable signatures in crash-stats, but as memory corruption errors that crash at unpredictable points later.
(In reply to comment #114) > I am opposed to this feature landing in FF4: switching allocators is incredibly > risky at this point in the game jemalloc has been used on the other tier 1 platforms for two years, which I think means it is used by 90%-ish percent of our users. If anything, this reduces the error paths because we now use the same allocators on all platforms. Describing it as 'switching allocators' greatly overstates the risk; I would describe it as "removing the less rigorously-tested allocation paths". > because it can affect all kind of third-party > code that is loaded in our process in rather unpredictable ways. From a safety perspective, memory is memory unless we're writing outside it's bounds. If we are somehow relying on writing outside the bounds of memory at some point, it's less risky to do it on the same allocator on all platforms. Also, the jemalloc functions which take addresses (free(), realloc(), etc), happily accept any address, check whether it belongs to jemalloc, and pass it on to the system allocator otherwise. This makes it pretty robust in the face of "all kinds of third party code". > especially true because allocator errors typically don't show up as > recognizable signatures in crash-stats, but as memory corruption errors that > crash at unpredictable points later. I haven't been around long, but I do not believe this has been a real problem with our use of jemalloc on any platform. jemalloc is safe and stable and this doesn't change that. I think it's pretty clear that there is a large gain here in terms of usability (as evidenced by limi pushing for this), and the risk is not nearly as significant as you think. That said, if there are unexplained crashes, I'd be the first to push for backing-out the jemalloc changeset.
allocator mixing interaction is only interesting when you interact with foreign libraries. such libraries tend to be platform specific. as such claiming that you're unifying codepaths ignores the interesting case. you're switching from a system whose codepaths with foreign libraries work to a system which happens to work with unrelated foreign libraries and is untested with the class of potential foreign libraries. i'm not personally expressing an opinion in favor/opposed, but I do agree that it isn't just "a safe switch".
(In reply to comment #116) > you're switching from a system whose codepaths with foreign libraries work to a > system which happens to work with unrelated foreign libraries and is untested > with the class of potential foreign libraries. I tried to explain this above. On the Mac, we know we will be interacting with foreign libraries, and will be asked to free memory which we did not allocate. As such, we check whether jemalloc owns memory before it frees it, and gives it back to the system (de)allocator if it does not. So those foreign libraries are not a risk. No change is a "safe switch", but I am just pointing out that the switch to jemalloc is not remotely as unsafe as one would think by saying "switching allocators".
Things may need to work in the other direction as well, though. If we allocate memory using jemalloc and then pass it to a library which frees it using non-jemalloc free, might that cause problems? I'm not talking about the code-correctness of jemalloc. I'm talking about the interactions between jemalloc and existing code. The point about crash-stats is that we cannot easily identify problems after this code has landed. They may show up simply as somewhat additional crashiness, without identifiable signatures. Given that we are past feature freeze and we are trying to reduce the risk from optional changes, I don't think it makes any sense to add risk here. Land it for FF5 and we can get a nice long beta cycle to notice and fix issues.
(In reply to comment #113) > > The patch seems complete and working, and I've been running the tryserver > builds for a few days without any issues. Do you see any improvements? My conversations with Paul have lead me to believe that this patch doesn't release memory to the OS, so it might not appear to be different.
assuming 10.7 comes out with a different malloc behavior and we ship ff4 with this code, are we likely to crash on launch or do something similar? the code in szone2ozone leads me to think you're assuming that 10.7+ will be compatible w/ 10.6. Given that 10.6 was not compatible with 10.5 in this case, I think that style assumption is faulty. As such this code seems like it's asking for trouble in 10.7 or 10.8 or whatever. This is a *terrible* approach for something which will ship and be deployed for ages. People are likely to upgrade OS X only to discover our browser doesn't work, and we probably wouldn't be able to upgrade it because we'd crash too early. It seems safer to fall back to not using jemalloc for malloc versions we don't recognize. But I'm not sure what other assumptions our code tries to make if we think we're built with jemalloc.
(In reply to comment #119) > My conversations with Paul have lead me to believe > that this patch doesn't release memory to the OS, so it might not appear to be > different. It does release memory to the OS. I was trying to get across a different point, which is that _it can appear_ to use more memory, if you don't know what you're looking at (which is likely, because the correct thing to look at is confusing). Here is the relevant quote from comment 101: > jemalloc looks much worst in the "closed tabs" segment of the test. It is much > slower to return memory, and appears to hold on to 40-60% more memory than the > mac allocator. However, it happily hands it back when the OS asks.
(In reply to comment #120) > It seems safer to fall back to not using jemalloc for malloc versions we don't > recognize. But I'm not sure what other assumptions our code tries to make if we > think we're built with jemalloc. Yes, you're right, great catch. I'll add a check for recognized versions (3 and 6), and we won't use jemalloc if we don't recognize the version. Thanks! (I should point out this doesn't affect the "should this ship in FF4 question").
Comment on attachment 484711 [details] [diff] [review] jemalloc on OSX (now without copyright issues) I'm going to record my r- since it's actually meaningful. note that having actually read the code and the way it hooks, with the change promised in comment 122, this might be ok. the idea is that jemalloc registers with the zone allocator system, so when another library calls free() and friends, it will eventually be directed back to our jemalloc if it was allocated by our jemalloc. other interesting cases: * what happens when another interesting zone allocator is brought in by another library. Probably the simplest version of that question is: suppose i use a comment 122 based jemalloc for a library and it gets loaded into a comment 122 based jemalloc application on 10.6, will they coexist peacefully? I haven't spent the time to ensure that nothing interesting would happen for a case like this. The other variations of this probably involve broken zone impls. As long as we have some provision to handle that... * is it possible for some library to assume something about how the allocator works which jemalloc doesn't do? (the assumption can be invalid, but that doesn't prevent shipping code from having it) historically some allocators had different alignment behavior (for fun, people can read all the interesting hacks ms has for msvcrt compat) - i believe that jemalloc isn't evil enough to give out pointers with the lowest bit set, so I can't think of any basic cases offhand - the easiest way for this stuff to be relevant is for a plugin which calls NPN_Alloc a couple of times in a row, since it controls the cpu of the "main" thread, it could be coded with stupid assumptions (please don't claim plugins don't make stupid assumptions, we have an entire product tracking plugin bugs). One way of ameliorating these risks would be something like -safe-mode which turns off jemalloc -- this needs to be possible w/o asking the user to use a Terminal, we're talking about Mac users :)
Attachment #484711 - Flags: review-
(In reply to comment #123) > I'm going to record my r- since it's actually meaningful. > note that having actually read the code and the way it hooks, with the change > promised in comment 122, this might be ok. Sure, I'll fix this up and resubmit. > other interesting cases: > * what happens when another interesting zone allocator is brought in by another > library. You could make this work, but it wouldn't out of the box. Right now, it would fail hard. Basically, we don't support this, nor should we. > Probably the simplest version of that question is: > suppose i use a comment 122 based jemalloc for a library and it gets loaded > into a comment 122 based jemalloc application on 10.6, will they coexist > peacefully? No, they won't. I believe it would fail hard. As above, I don't know a good reason to support this. > The other variations of this probably involve broken zone impls. As long as we > have some provision to handle that... I'm not sure what you mean? Basically, it works on 10.5 and 10.6, and I'll make sure it isn't loaded on other versions. > * is it possible for some library to assume something about how the allocator > works which jemalloc doesn't do? (the assumption can be invalid, but that > doesn't prevent shipping code from having it) historically some allocators had > different alignment behavior (for fun, people can read all the interesting > hacks ms has for msvcrt compat) Of course, libraries can assume anything. The most dangerous is that its possible to write outside the bounds of some allocated memory, which will fail differently with jemalloc than with the mac allocator. This doesn't bother us because it will now work the same on mac as other platforms (which already use jemalloc). In the case where a Mac-only extension relies on some invalid assumption about the 10.5 or 106. allocator, I don't think we should support this. > - the easiest way for this stuff to be relevant is for a plugin which calls > NPN_Alloc a couple of times in a row, since it controls the cpu of the "main" > thread, it could be coded with stupid assumptions (please don't claim plugins > don't make stupid assumptions, we have an entire product tracking plugin bugs). Can you go into more detail here? I'm not sure what you're getting at. > One way of ameliorating these risks would be something like -safe-mode which > turns off jemalloc -- this needs to be possible w/o asking the user to use a > Terminal, we're talking about Mac users :) I think we would need something more concrete in order to support this. I think this is too speculative otherwise. Bear in mind we already use jemalloc on the other tier 1 platforms, without checking for these assumptions. At the very least, adding support for this should move to a different bug.
(In reply to comment #118) > Things may need to work in the other direction as well, though. If we allocate > memory using jemalloc and then pass it to a library which frees it using > non-jemalloc free, might that cause problems? We overwrite the memory allocation structure (the 'default memory zone") during initialization. This means that all libraries use the jemalloc allocator from then on. The problem is that some memory is allocated _before_ we can overwrite the default memory zone, and are freed later. What also happens is that some Darwin libraries call the darwin allocators directly. Then they need to be freed by the darwin libraries, so jemalloc passes it back. Now in theory, we _could_ pass some memory to a Darwin library which we have allocated using jemalloc, which then decided to free it. I don't think this is really realistic because it would fail hard (and we would likely find it), and because that would be one incredibly misbehaved library. Finally, if that did happen, I think it's safe to say we would segfault immediately, and so this would be straightforward to find. So I think the problem is largely theoretical, and not a real risk. > I'm not talking about the code-correctness of jemalloc. I'm talking about the > interactions between jemalloc and existing code. Sure. I simply wanted to point out that when people think "a new allocator, that's risky", the usual rules don't apply since we use it everywhere else already. > Given that we are past feature > freeze and we are trying to reduce the risk from optional changes, I don't > think it makes any sense to add risk here. I hear you, and disagree with the level of risk. I wouldn't call this a new feature either, it's activating two-year old code which is already used by over 90% of our users.
(In reply to comment #118) > I'm not talking about the code-correctness of jemalloc. I'm talking about the > interactions between jemalloc and existing code. FWIW, the interactions are very simple code, very short and very easy to understand.
the npn_alloc case would probably be something like: int *array[4]; int *align = (int*)NPN_Alloc(16); if (!(align & 16)) { array[0] = align; array[1] = (int*)NPN_Alloc(16); array[2] = (int*)NPN_Alloc(16); array[3] = (int*)NPN_Alloc(16); array[0][20] = 257; } if code "knows" that alloc from a given bucket with a certain offset will be returned in a certain sequence. Yes this is very contrived. But having seen people do things like: int method(Foopy* fop) {return fop[1].x; } struct Barp { Foopy element1; Foopy element2; }; Barp bap; method(&bap.element1); ... (and defending it as reasonable practice), I wouldn't be surprised if someone did something that was similarly wonky using malloc. The thing about plugins is that different platforms have different sets of coders. Windows has always had multiple c runtimes, which means users suffered from bad plugin authors over a decade ago, and most of them were weeded out a decade ago (but not all, we still hit this stuff occasionally). Linux for the most part doesn't have much experience with allocator switches, but the cases I can point to are pure failure (mostly IMEs with bad linkage to libc5 or similar). I'm not sure how much experience vendors for Apple have with allocator switching, my guess is very little. Note that this comment is merely an answer to the question in comment 124. As long as comment 122 is addressed, I'll be relatively happy (and won't stand in its way). Yes, I'd like to be able to have a way to turn it off in case of emergency, but that's not my call.
(In reply to comment #127) > ... (and defending it as reasonable practice), I wouldn't be surprised if > someone did something that was similarly wonky using malloc. > I'm not sure how much experience vendors for Apple have with allocator > switching, my guess is very little. I understand what you're saying, but I don't think an impractical, non-specific (though definitely theoretically possible) potential bug is worth considering. If there were real plugins with this problem, it might be worth evaluating on the merits of the concrete case. Also, I'm not sure how the plugin stuff works, but I don't know if it would use jemalloc anyway.
This no longer assumes that new versions of the zone allocator will be compatible with version 6 (the one that ships with OSX).
Attachment #484711 - Attachment is obsolete: true
Attachment #499457 - Flags: review?
This bug is largely done, but I'm going to wait until the start of the FF5 cycle to get it into the tree, due to the risk that bsmedberg points out above.
Depends on: post2.0
Attached patch Proposed final patch (obsolete) — Splinter Review
The changes since the last review are pretty minor, but it's been a while, so I just want to get another look. I intend to commit this in the next few days, assuming that tryserver comes back clean. The plan is to commit in two parts: Step 1: all of the code, except disabled by omitting MOZ_MEMORY=1 in configure.in. This won't fail, because all the code will still be #defined out. If it does fail, something is wrong, and the backout strategy is revert. Step 2: Enable jemalloc on mac by adding MOZ_MEMORY=1 in configure.in. This actually turns the code on, and backout strategy is to simply remove the MOZ_MEMORY line, so it's pretty trivial.
Attachment #499457 - Attachment is obsolete: true
Attachment #521385 - Flags: review?(pavlov)
Attachment #499457 - Flags: review?
Comment on attachment 521385 [details] [diff] [review] Proposed final patch My bad, this needs work. The patch still applied, but it doesn't compile. I'll update after it passes tryserver.
Attachment #521385 - Flags: review?(pavlov)
Whiteboard: not-ready-for-cedar
Depends on: 651952
pbiggar, what's the status of this?
njn, did you see the dependency on bug 651952?
Attached patch Final 32-bit bug licked (obsolete) — Splinter Review
This will hopefully fix the final bug. We'll see on tryserver for sure: http://tbpl.mozilla.org/?tree=Try&rev=5dd3a5f7fbb0 So the status here is that I couldn't figure out the final bug, and gave up after a long time looking at it. I spun it out into a mentored bug (bug 651952) and Dale Kim picked this up, found the bug (a typo! oh the shame), and confirmed it works. Tryserver will tell us if something else has snuck in. It's only slightly different from the last patch pavlov and jasone reviewed, but I want to get this checked just in case. I'd love to get a really fast review so that this makes Firefox 7!
Attachment #521385 - Attachment is obsolete: true
Attachment #534271 - Flags: review?(pavlov)
Jesse, you mentioned you had a OSX 10.7 machine? If so, can you check this doesn't cause problems?
Attached patch With fixed visiblility behaviour (obsolete) — Splinter Review
OK, this is the final version I want to push. Dale had ripped out some visibility things that were confusing, this puts them back in (turned off for Mac, as it's a non-ELF platform). Tryservered just in case: http://tbpl.mozilla.org/?tree=Try&rev=9a4a899ec661
Attachment #534271 - Attachment is obsolete: true
Attachment #534332 - Flags: review?(pavlov)
Attachment #534271 - Flags: review?(pavlov)
Some chatter on IRC pointed out that it was better to wait til after the branch before trying to get this in, so that it has much more time on nightlies. Let's do that. Firefox 7 ftw.
Comment on attachment 534332 [details] [diff] [review] With fixed visiblility behaviour Jesse suggested adding an environmental variable to simulate an unsupported version of the OSX allocator. I implemented this, and naturally discovered that our patch doesn't work in this case. To remind casual readers of context: - OSX's implementation of malloc looks up the malloc function in a malloc_zone_t struct. - During Firefox initialization, we overwrite this struct with jemalloc function pointers. - malloc_zone_t has a version field, and so we only actually know its size and structure for version 3 (OSX 10.5) and version 6 (OSX 10.6) - we added a check that only overwrites on those known versions The problem is that we still allocate memory using jemalloc because moz_malloc calls je_malloc/je_calloc/etc directly. If this memory is passed to OSX's realloc, then we segfault. There are two potential solutions: 1) add a(nother) layer of indirection to moz_malloc. This would be resilient to future unknown version of OSX, probably. 2) Remove the check, and unconditionally do the overwriting. I actually prefer the latter. I don't want yet another layer of indirection, and the code is nasty enough as it is. I don't think we should be turning it in unmaintainable knots to support operating systems not yet available, announced, or possibly even written. Plus, even when 10.8 comes out, if they don't change the interface, it might work anyway. So I plan on just removing the dynamic version check, and just unconditionally overwrite malloc_zone_t, unless there are objections? Jesse has OSX 10.7, so we'll make that work first of course.
Attachment #534332 - Flags: review?(pavlov)
Jesse reports that the binaries from https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pbiggar@mozilla.com-68afd94effb8/ (latest try, all green) work fine on Lion, but that they don't seem to use jemalloc. Their zone->version is 8. This is a bit weird, so looking into it.
Depends on: 659632
Blocks: 659694
Attached patch final version (obsolete) — Splinter Review
This is the final version. It's been tryservered (all green), and jesse confirmed that the built versions dont crash on Lion. I'd love to get this in early in the cycle so it gets as much Nightly time as possible. This patch is on top of bug 659694.
Attachment #534332 - Attachment is obsolete: true
Attachment #535331 - Flags: review?(pavlov)
> This patch is on top of bug 659694. Correction: the patch is on top of bug 659632 (which landed today).
> Jesse confirmed that the built versions don't crash on Lion I think I was testing a version that didn't actually use jemalloc (even on Snow Leopard) for some reason. When I apply this patch locally on Lion, I get a crash. See my comment in bug 659694. When I apply this patch locally on Snow Leopard, it works. about:memory shows different types of measurements than a non-jemalloc build.
Jesse, can you try the Snow Leopard compiled version on Lion (since that's what we'll be shipping)?
At least part of the confusion was due to Tinderbox debug builds having --enable-trace-malloc [1], which disables jemalloc [2]. [1] http://hg.mozilla.org/build/buildbot-configs/file/f56bf2ed364d/mozilla2/macosx64/mozilla-central/debug/mozconfig [2] https://bugzilla.mozilla.org/show_bug.cgi?id=417066#c3 I submitted http://tbpl.mozilla.org/?tree=Try&rev=d6c02308ba1b to test without trace-malloc.
10.6: works, but warns: 2011-05-27 15:52:44.504 firefox-bin[22966:61b] *** __NSAutoreleaseNoPool(): Object 0x103660700 of class NSCFString autoreleased with no pool in place - just leaking 10.7: crashes on startup. Same results with everything I tested (opt32, opt64, debug64).
This is a little scary if OS X changes can cause our builds to crash on startup. I think we need a safe fallback scenario for this or we can't reasonably ship it. (Just fixing the 10.7 scenario doesn't necessarily help, because it could break again with 10.8.)
(In reply to comment #147) > This is a little scary if OS X changes can cause our builds to crash on > startup. I think we need a safe fallback scenario for this or we can't > reasonably ship it. A safe fallback means that each memory allocation needs one more branch. Not worth it IMHO for an OS that won't arrive for 2-3 years. > (Just fixing the 10.7 scenario doesn't necessarily help, > because it could break again with 10.8.) I disagree with holding back a 15-20% memory improvement because it will break on an OS whose predecessor isn't even released.
In general, using OS internals which are not part of a stable API is just a bad idea. If we're going to trade performance now for future instability later, I'd really like to have that discussion in mozilla.dev.platform, not here. As it stands, I don't think anyone should grant superreview to this sort of patch.
Consider that if we had shipped this in Firefox 4, everyone who upgraded to 10.7 would find that their Firefox crashed on startup.
> A safe fallback means that each memory allocation needs one more branch. Not > worth it IMHO for an OS that won't arrive for 2-3 years. Presumably if we could ensure that this branch was always predicted to take the jemalloc case, the instruction would be pretty cheap. This might be as simple as ensuring that the jemalloc case lines up with the processor's static branch prediction assumptions [1]. [1] http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts/
(In reply to comment #150) > Consider that if we had shipped this in Firefox 4, everyone who upgraded to > 10.7 would find that their Firefox crashed on startup. Yeah, good point. Sigh. I'll see if I can figure out a better solution than what we're using now, and otherwise I'll see what the slowdown of the extra branch is.
> Consider that if we had shipped this in Firefox 4, everyone who upgraded to > 10.7 would find that their Firefox crashed on startup. And if we hadn't, they would have discovered that all sorts of other stuff is broken after it starts up, no? That tends to be how Apple OS releases work... If we can avoid the crashing, great. But I think that at least our test suites should fail on Mac when jemalloc is not being used, so we _know_ when that happens.
(In reply to comment #150) > Consider that if we had shipped this in Firefox 4, everyone who upgraded to > 10.7 would find that their Firefox crashed on startup. Is a 3 month release cycle insufficient time to find and fix issues for an upcoming OSX release prior to its release? It seems that Apple gives more than that amount of time for developers to work with pre-release versions of an upcoming release prior to launch.
Apple is notoriously bad at making prerelease builds available in a way which allows us to run full tests and be confident of the results. Sometimes things show up in the final release that weren't in the previews, and the previews are only available to paid ADC members, which makes it very difficult to get widespread community testing.
Attached patch fixed on 10.7 (obsolete) — Splinter Review
This fixes the 10.7 problems. It had a different malloc_zone_t size (version 8), fixed with a simple extension of the same problem on 10.6. It also added memory protections to the memory used by default_zone. mprotect works there. I'm leaving the review flag off so that lurkers can give it a once over before I ask pavlov.
Attachment #535331 - Attachment is obsolete: true
Attachment #535331 - Flags: review?(pavlov)
(In reply to comment #149) > In general, using OS internals which are not part of a stable API is just a > bad idea. If we're going to trade performance now for future instability > later, I'd really like to have that discussion in mozilla.dev.platform, not > here. That's a good point, I'll start that discussion.
Attached patch Fixed on 10.7, 10.8 etc (obsolete) — Splinter Review
This makes Mac builds work on future unknown versions of OSX. If the malloc_zone_t version is > 8 (it is 8 on OSX 10.7), then we have an incompatible version and can't use jemalloc. So we leave the zone allocator alone, meaning that normal allocations will continue to use default OSX allocator. But we still have to redirect the mozalloc allocator, which calls jemalloc directly. This adds a dynamic branch to each jemalloc allocation which checks whether or not to use jemalloc. This branch is only added on 64bit OSX (32bit will always work with jemalloc). There are tons of branches in there anyway, so an extra perfectly (dynamically) predictable one wont hurt much. Note that the Mac implementation of posix_memalign (which we'd be using under the new dynamic scheme) is faulty, so this works around that too. Tryserver when dynamically using jemalloc: http://tbpl.mozilla.org/?tree=Try&rev=75d588494435 Tryserver when dynamically not using jemalloc: http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a
Attachment #537917 - Attachment is obsolete: true
(In reply to comment #158) > (In reply to comment #149) > > In general, using OS internals which are not part of a stable API is just a > > bad idea. If we're going to trade performance now for future instability > > later, I'd really like to have that discussion in mozilla.dev.platform, not > > here. > > That's a good point, I'll start that discussion. When I went to write the email to dev-platform, I realized you guys were right, and so fixed it (comment 159). Does anyone still want to me to post to dev-platform? I couldn't figure out a way to word the post which didn't boil down to "we've worked around some hacks, but our workarounds might fail".
(In reply to comment #159) > Tryserver when dynamically not using jemalloc: > http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a This dynamically checked that the malloc_zone_t version was >6 before turning jemalloc on. In the actual patch, we would use >8. With >6, jemalloc would not be used on 10.6. This led to a problem on 32-bit which would never occur in real life (details below). So I retried with >7 as the check, and manually tested that it works on OSX 10.7 Lion, which it does. I'll go over it with Jesse later to be certain. Here's the try build: http://tbpl.mozilla.org/?tree=Try&rev=5238f0ae274a. Please download it and use it, esp on OSX Lion. +++++++++++++++++++++++++++++++++++++++++++ This means we have a working version of jemalloc on mac which satisfies all the problems discussed in this thread. +++++++++++++++++++++++++++++++++++++++++++ Here was the problem: OSX 10.6 uses malloc_zone_t version 6. This includes memalign. However, OSX 10.5 uses version 3, which does not have memalign. As a result, the memalign dynamic checks wouldn't compile on 10.5, so I turned them off statically. So when we ran 32-bit firefox on 10.6, the dynamic check failed (6>=6), and malloc() tried to run the jemalloc code instead of calling the default OSX allocators. Crash! So why can't that happen in real life? Because 32-bit will _always_ use jemalloc. Our dynamic condition is for malloc_zone_t version > 8 (8 is used on Lion). 32-bit only runs on 10.5 (version 3) and 10.6 (version 6).
Whiteboard: not-ready-for-cedar
(In reply to comment #161) > 32-bit only runs on 10.5 (version 3) and 10.6 (version 6). I'm not clear whether or not this is still relevant, but I thought Lion would still support 32-bit apps, in which case we'd be using 32-bit libxul, with jemalloc I assume, to run 32-bit plugins.
And here's the patch. Since the last approved patch: - mprotect fix on 10.7 - support jemalloc on OSX 10.7 - ability to dynamically turn jemalloc off on OSX 10.8 and beyond - support dynamically turn off jemalloc using an envvar (NO_MAC_JEMALLOC). - workaround a bug in posix_memalign on OSX (in non-jemalloc mode)
Attachment #539088 - Attachment is obsolete: true
Attachment #540382 - Flags: review?(pavlov)
(In reply to comment #162) > (In reply to comment #161) > > 32-bit only runs on 10.5 (version 3) and 10.6 (version 6). > > I'm not clear whether or not this is still relevant, > but I thought Lion would still support 32-bit apps, in which case we'd be > using 32-bit libxul, with jemalloc I assume, to run 32-bit plugins. Actually, this is a good point - jesse and I were discussing it on Friday. We don't support running 32-bit apps combined with dynamically turning jemalloc off. So 32-bit plugins will probably crash at load-time on 10.8 or later (I think we don't care, please correct me if anyone feels strongly otherwise). But they'll be fine on 10.7, since we will be using jemalloc. (The questionable thing is what happens if libxul calls memalign on 32-bit, which is legit and expected. In that case, we'll be calling it on version 8 of the malloc_zone_t API, which supports memalign, so we'll be OK.)
Whiteboard: [MemShrink:P1]
Comment on attachment 540382 [details] [diff] [review] Works on 10.7 and 10.8 (um, probably) too This looks good, my only real comment is please remove any unnecessary changes (whitespace, etc) to jemalloc.c -- i.e.: - ret->root = (void**)base_calloc(1, sizeof(void *) << ret->level2bits[0]); + ret->root = (void**)base_calloc(1, sizeof(void *) << + ret->level2bits[0]); zone_destroy(malloc_zone_t *zone) { - /* This function should never be called. */
Attachment #540382 - Flags: review?(pavlov) → review+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: Future → mozilla8
Depends on: 670175
Blocks: 670468
Depends on: 670492
I'm about to back this out, due to bugs 670175, 670468 and 670492. I'm just waiting for the build to finish.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla8 → ---
No longer blocks: 670468
Depends on: 670468
Attached patch Decommit properly (obsolete) — Splinter Review
This fixes all the problems reported against jemalloc on Mac: - bug 670492 reported a 20% increase in memory usage on 10.5. The problem is that madvsise didn't appear to work on 10.5. Instead of using madvise, I used mmap to remap the heap segment as uncommited. The nice way of doing this is just enabling MALLOC_DECOMMIT, so that's what this is (same as bug 669877). - bug 670175 is fixed by dynamically turning jemalloc off on 10.7. I'll work it out and fix it later (perhaps by upgrading to the newest version of jemalloc). - bug 670468 is being fixed by thunderbird folks (the TB installer needs to know to install libjemalloc.dylib), so this actually doesn't do anything for them. I'm coordinating with them to find a good landing time. Performance discussion to follow.
Attachment #540382 - Attachment is obsolete: true
Attachment #546260 - Flags: review?(pavlov)
Performance discussion of the patch in comment 169 Highlights: - SunSpider regression of 11% - V8 regression of 6% - RSS improvement of 44% Details: Speed - I did in-browser JS benchmarks (jemalloc/no-jemalloc): - SunSpider (258/235) - 11% regression - V8 (4667/4941) - 6% regression - Kraken (5045/5129) - 0% regression The V8 and Kraken results were noisy, but it's clear there is some V8 regression. (Note the Talos sunspider and v8 numbers say no change, but dmandelin says they're crap and should be ignored). Memory - I loaded techcrunch.com, opened the top 11 of so stories in new tabs, then closed them. In each case, I measured (explicit/js/RSS): - 'explicit' from about:memory - 'js' from about:memory - 'Real memory' from the OSX Activity Monitor (this is roughly RSS). - I did a CC+GC after each one, and also measured that. jemalloc vs No jemalloc Startup : ( 47/ 24/106) vs ( 46/ 23/112) CC+GC : ( 51/ 24/102) vs ( 48/ 21/108) Load tabs : (441/208/508) vs (398/200/500) CC+GC : (411/205/491) vs (368/200/471) - 4% memory regression (noisy) Close tabs: (225/136/360) vs (216/131/432) - 20% memory improvement CC+GC : ( 77/ 38/188) vs ( 71/ 34/335) - 44% memory improvement The improvements are from measuring RSS (last column in parenetheses). Talos Tp5 results: I've included these results for thoroughness, which show a 16% improvement in RSS. I don't really believe them though, because Tp5 takes a snapshot every 20 seconds. Without jemalloc: tp5: 380.27 tp5_pbytes: 3.5GB tp5_rss: 273.9MB tp5_shutdown: 858.0 With jemalloc: tp5: 414.7 tp5_pbytes: 3.5GB tp5_rss: 230.8MB tp5_shutdown: 1186.0 Test it yourself with these builds: Build with jemalloc: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pbiggar@mozilla.com-933061c83fd2/try-macosx64/ Identical build except no jemalloc: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1310279832/ (Note that you can also turn off jmalloc by setting the NO_MAC_JEMALLOC environmental variable) Talos results for the wary: with jemalloc: http://tbpl.mozilla.org/?tree=Try&pusher=pbiggar@mozilla.com&rev=933061c83fd2 No jemalloc: http://tbpl.mozilla.org/?tree=Firefox&rev=18c8c2ea4caf
> Performance discussion of the patch in comment 169 So overall we have a 10% Tp5 regression on Mac64 (maybe; triggering a few more runs so you have some idea of the noise may not be a bad idea), wins on various Dromaeo DOM tests, no obvious change on Tsspider but you say you did see a regression in the browser?
(In reply to comment #171) > > Performance discussion of the patch in comment 169 > > So overall we have a 10% Tp5 regression on Mac64 (maybe; triggering a few > more runs so you have some idea of the noise may not be a bad idea), wins on > various Dromaeo DOM tests, no obvious change on Tsspider but you say you did > see a regression in the browser? I'm calling this a 16% improvement, not a 10% regression. The private bytes (414.7 vs 380.27) show a 10% regression, but the RSS shows a 16% improvement. I believe that RSS is the more accurate figure. jemalloc uses more memory internally (I think this is reflected in the Talos Private bytes, which takes internal figures), but RSS includes fragmentation (which is the whole point of jemalloc). Tsspider is fiction AFAICT, I measured in the browser.
> I'm calling this a 16% improvement, not a 10% regression The 380.28 vs 414.7 is not "private bytes". It's the average pageload time for the page set, in milliseconds. i.e. the main metric Tp5 is supposed to be measuring. The "private bytes" numbers for those runs are both about 3.5GB, and I agree they're completely useless. So we're talking 16% reduction in memory usage and 10% slowdown in pageload. The former is good, but the latter is very bad. ;) The latter also seems a bit fishy, given the performance improvements in DOM manipulation and the like (which were expected given what I've seen in profiles wrt Mac malloc). Hence my proposal that more Tp5 runs be triggered on those two changesets to see what the noise in that Tp5 number is. > Tsspider is fiction AFAICT File a bug if it really is, please (as opposed to just being the shell sunspider measurement). But yes, I know sunspider in browser in shell can have totally different results; it's not that suprising. I was just making sure I understood the data.
I'm running a browser built on this, and the performance is atrocious. Everything is slow, and the profiles I'm pulling out all point to jemalloc (specifically arena_bin_malloc_hard). So the Talos pageload times are bang on, and this needs a different solution.
Attachment #546260 - Flags: review?(pavlov)
Intersting. I wonder why the DOM tests show different results... maybe they hit free() more; the default mac allocator's free() is kinda slow.
This disables jemalloc dynamically on OSX 10.7 and greater, and statically on OSX 10.5. The latter part is the majority of this bug, where I needed to add a FAKE_MOZ_MEMORY build symbol to keep building libjemalloc.dylib to keep universal builds happy.
Attachment #546260 - Attachment is obsolete: true
Attachment #557087 - Flags: review?(ted.mielczarek)
Comment on attachment 557087 [details] [diff] [review] Build with jemalloc disabled on i386, but enabled on x86-64 Review of attachment 557087 [details] [diff] [review]: ----------------------------------------------------------------- I have two problems with this patch: - it conflicts with bug 680440 - disabling jemalloc on 10.5 by disabling it on i386 is not nice. There are i386 10.6 users. I am one. And since 10.7 is supposedly only 64-bits (I heard the finder is 64-bits only), I guess a lot of the people owning i386 macs (the first batch of intel macs) are going to stick to 10.6 for a while.
Attachment #557087 - Flags: review?(ted.mielczarek) → review-
(In reply to Mike Hommey [:glandium] from comment #178) Thanks for the review Mike. > - it conflicts with bug 680440 The conflicts with bug 680440 are superficial, AFAICT. Whichever one of us lands second can sort it out, but it shouldn't be hard, especially with the cleanups in your bug. This is pretty common, and I wouldn't think it's a good reason to r- this bug. > - disabling jemalloc on 10.5 by disabling it on i386 is not nice. There are > i386 10.6 users. I am one. And since 10.7 is supposedly only 64-bits (I > heard the finder is 64-bits only), I guess a lot of the people owning i386 > macs (the first batch of intel macs) are going to stick to 10.6 for a while. jemalloc isn't on for anybody right now. I would welcome a follow-on bug to make jemalloc work with the 10.5 SDK, but holding this up for the very very small subsection of Mac users who use i386 on 10.6 doesn't make much sense since: - we dont even test for this on tinderbox (my suggestions to do so were not welcomed - most people weren't aware you could do this) - it's hard to make jemalloc work on i386. Turning on MOZ_MEMORY makes the codebase assume that memalign is available (and I see no good reason to break this assumption).
I have to agree with Paul here. This patch is a strict improvement over the status quo. Lets file a follow-up for the i386 work. I am sure Paul will gladly own that bug as well until we fixed that, too.
Attachment #557087 - Flags: review- → review?(ted.mielczarek)
(In reply to Paul Biggar from comment #179) > - it's hard to make jemalloc work on i386. Turning on MOZ_MEMORY makes the > codebase assume that memalign is available (and I see no good reason to > break this assumption). Your patch doesn't mention memalign being a problem, but reading the bug backlog, I see it is. Now, as I see it, your main problem is that you can't build without jemalloc on i386 and with jemalloc on x64 because the universal build merging won't be happy, right? Then bug 677501 will make that problem go away, as jemalloc will be folded in a mozutils library that will exist whether jemalloc is enabled or not. At which point your patch would only be a matter of enabling jemalloc on x64 only.
Comment on attachment 557087 [details] [diff] [review] Build with jemalloc disabled on i386, but enabled on x86-64 Yes, I think bug 677501 would make the problem go away. I'd prefer to push ahead with this though, since this is something that I'd like to land while we're still quite early in the release cycle (it's bounced twice already, after originally aiming for FF4, and it's a MemShrink:P1). Obviously, I'd be willing to update bug 677501 if you're kind enough to let this land first :)
Attachment #557087 - Flags: review?(ted.mielczarek) → review?(mh+mozilla)
Depends on: 677501
Tested on try: https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=6817f9403b66 (Win debug red is obviously unrelated)
Attachment #557466 - Flags: review?(ted.mielczarek)
Comment on attachment 557466 [details] [diff] [review] Build with jemalloc disabled on i386, but enabled on x86-64 Thanks for updating the patch. It needs slightly more though, esp commens and such; I'll do it this morning.
Attachment #557466 - Flags: review?(ted.mielczarek)
With glandium's recent changes, this patch is much simpler.
Attachment #557087 - Attachment is obsolete: true
Attachment #557466 - Attachment is obsolete: true
Attachment #557680 - Flags: review?(khuey)
Attachment #557087 - Flags: review?(mh+mozilla)
(In reply to Paul Biggar from comment #185) > Created attachment 557680 [details] [diff] [review] > Rebase on top of glandium's patches This is on top of build-system.
Comment on attachment 557680 [details] [diff] [review] Rebase on top of glandium's patches I was expecting something much scarier :-P
Attachment #557680 - Flags: review?(khuey) → review+
This is the cause of the Mochitests oranges Paul was seeing on try.
Attachment #559060 - Flags: review?(khuey)
Comment on attachment 559060 [details] [diff] [review] Don't link sqlite with mozutils on OSX Nice.
Attachment #559060 - Flags: review?(khuey) → review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #189) > Comment on attachment 559060 [details] [diff] [review] > Don't link sqlite with mozutils on OSX > > Nice. Also, I hope we don't have any mismatched allocators here!
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #190) > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #189) > > Comment on attachment 559060 [details] [diff] [review] > > Don't link sqlite with mozutils on OSX > > > > Nice. > > Also, I hope we don't have any mismatched allocators here! I think it's exactly why it crashed in the first place: the NSS tools are not linked against jemalloc/mozutils, but sqlite, which is loaded by the softokn, which itself is dynamically loaded, initializes jemalloc, so all subsequent allocations go through jemalloc while allocations before loading the softokn have been done with the system malloc. At least that's what i think happened. Another explanation could be that unloading the softoken removes the jemalloc functions from the address space, while they are still registered as zone allocator on the system allocator. An alternate solution would be to link all NSS libraries against jemalloc/mozutils but that more painful to do. As jemalloc on mac goes through a step of registration, contrary to other platforms where we actually do active calls into the jemalloc lib, it's just fine that everything is not linked against jemalloc.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
This is a major change which we should track for FF10. I don't see a TP5-RSS change on 10.6, but perhaps one of our memory stress tests will show an improvement.
We have seen an increase in memory usage reported by the endurance tests on all platforms. Investigating in bug 693247. Is there a valid reason we would expect explicit and resident memory to increase after this change?
> We have seen an increase in memory usage reported by the endurance tests on all platforms. This change applies only to Mac 10.6. Explicit might increase if jemalloc rounds up more aggressively than the mac allocator. If explicit increases, that could cause a corresponding increase in RSS. Paul and I also observed that Mac sometimes doesn't release memory that jemalloc released until some other application wants it, in bug 670492.
Whiteboard: [MemShrink:P1][inbound] → [MemShrink:P1]
Hmm, moz_malloc_usable_size looks like this: size_t moz_malloc_usable_size(void *ptr) { if (!ptr) return 0; #if defined(XP_MACOSX) return malloc_size(ptr); #elif defined(MOZ_MEMORY) return malloc_usable_size(ptr); #elif defined(XP_WIN) return _msize(ptr); #else return 0; #endif } Surely the MOZ_MEMORY case should precede the XP_MACOSX case? How does that even work when jemalloc is enabled on Mac? I would have thought that calling the Mac's malloc_size() on a heap block allocated with jemalloc would be a recipe for an instant crash. I can't find anywhere that jemalloc somehow overrides malloc_size() with its own version -- MXR tells me that the above function is the only place in the entire codebase where malloc_size() appears. I'm bamboozled.
> How does that even work when jemalloc is enabled on Mac? This is actually...not wrong, I think. It may even be necessary. It certainly deserves a comment. jemalloc on mac overrides the default zone's malloc_size function. See jemalloc's szone2ozone function: default_zone->size = (void*)ozone_size; //ozone_size is a function So calling malloc_size() should eventually call our ozone_size function, if there's any sanity in this world. It may in fact be necessary to call malloc_size rather than malloc_usable_size. The mac jemalloc code claims that some objects are allocated with the mac default allocator before jemalloc loads up. If we called malloc_usable_size on one of them, jemalloc would likely crash.
(In reply to Justin Lebar [:jlebar] from comment #199) > > How does that even work when jemalloc is enabled on Mac? > > This is actually...not wrong, I think. It may even be necessary. It > certainly deserves a comment. Yes, this is not wrong. I had the same thoughts as njn about four times, but jlebar is right (especially about the need for a comment).
> Yes, this is not wrong. I had the same thoughts as njn about four times, but > jlebar is right (especially about the need for a comment). I spun off bug 704045 for this.
Marking qa- as nothing for QA to verify. Please set to qa+ if this is not the case.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: