Last Comment Bug 651952 - Fix final jemalloc-on-mac bug
: Fix final jemalloc-on-mac bug
Status: RESOLVED FIXED
[mentor=pbiggar@mozilla.com]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal with 1 vote (vote)
: ---
Assigned To: Dale Kim
:
Mentors:
Depends on:
Blocks: 414946
  Show dependency treegraph
 
Reported: 2011-04-21 12:47 PDT by Paul Biggar
Modified: 2011-05-23 07:04 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Fixes up a bad pointer set in szone2ozone for introspection. (44.90 KB, patch)
2011-05-21 20:27 PDT, Dale Kim
no flags Details | Diff | Splinter Review

Description Paul Biggar 2011-04-21 12:47:27 PDT
Filing this as a good bug for a talented newcomer. This is very self-contained, has a big upside, but may be challenging. You'll need to have good C debugging skills for this bug, and of course a Mac.

Bug 414946 is fixing jemalloc on mac. The patch worked when it was written, but a bug has crept in on 32-bit platforms, which cause Firefox to crash on start. This component is jemalloc, the memory allocator used by firefox for all its low-level memory allocation, so it's difficult to know whether the bug is in jemalloc itself or if it's in some code that does memory allocation, but I suspect the former.

When running firefox in gdb:

    gdb dist/NightlyDebug.app/Contents/MacOS/firefox-bin


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x00000000
0x00005001 in _mh_dylib_header ()
(gdb) bt
#0  0x00005001 in _mh_dylib_header ()
#1  0x97e41646 in malloc_good_size ()
#2  0x95f98522 in __CFStringChangeSizeMultiple ()
#3  0x95f98e1f in CFStringAppendCharacters ()
#4  0x95fb4d34 in resolveAbsoluteURLString ()
#5  0x95f96591 in CFURLCopyAbsoluteURL ()
#6  0x95fde741 in _CFBundleAddPreferredLprojNamesInDirectory ()
#7  0x95fde230 in _CFBundleGetLanguageSearchList ()
#8  0x95fde07a in CFBundleCopyResourceURL ()
#9  0x95fe8654 in CFBundleGetLocalInfoDictionary ()
#10 0x95fe85b2 in CFBundleGetValueForInfoDictionaryKey ()
#11 0x96e192f7 in CGSServerPort ()
#12 0x96e19127 in CGSScoreboard ()
#13 0x96e18f07 in initCGDisplayState ()
#14 0x96e18d2b in initCGDisplayMappings ()
#15 0x96e1855c in cg_setup ()
#16 0x97e558a0 in pthread_once ()
#17 0x96e18519 in CGSInitialize ()
#18 0x96e34fad in CGSInputModifierKeyState ()
#19 0x9076adbe in GetCurrentKeyModifiers ()
#20 0x9076ad8e in GetCurrentEventKeyModifiers ()
#21 0x00039886 in XRE_main (argc=1, argv=0xbfffe8ac, aAppData=0x4731080) at ../../../toolkit/xre/nsAppRunner.cpp:3128


Typically malloc_good_size would call zone->good_size, but something must be set wrong. In the past, I had a similar problem with this where the size of the zone structs were different on 10.5 than 10.6, so there may be a similar problem here.

szone2ozone and create_zone in jemalloc.c are probably pretty local to the problem, so start there, and be sure to read the comment labelled MALLOC_ZONE_T_SNOW_LEOPARD_NOTE.


I got a different stack trace some time back, so it may be that the frame pointers are getting overwritten and gdb is being confused. But since CFStringChangeSizeMultiple can actually call malloc_good_size, it is probably right.

Finally, a good tactic to figure the problem out would be to use hg to bisect between the revisions where it worked (when I posted the patch) and when it didn't (for example today).
Comment 1 Dale Kim 2011-05-21 20:27:29 PDT
Created attachment 534269 [details] [diff] [review]
Fixes up a bad pointer set in szone2ozone for introspection.
Comment 2 Dale Kim 2011-05-21 20:31:47 PDT
The patch is basically the patch from Paul for bug 414946.  A few changes for getting the build to succeed, but the actual bug fix is a 1 byte change in szone2ozone:

       zone->introspect = (malloc_introspection_t*)(ozone_introspect);
Comment 3 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-05-21 20:35:13 PDT
Awesome Dale!

Paul, can you review this?
Comment 4 Paul Biggar 2011-05-21 21:24:32 PDT
Wow, I am ashamed :( Great job Dale!

I've sent this to tryserver: http://tbpl.mozilla.org/?tree=Try&rev=5dd3a5f7fbb0

The differences between this patch and the last one reviewed in bug 651952 are not huge. It's kinda borderline whether we need a new review, so I'm going to be conservative and ask pavlov for one.
Comment 5 Dale Kim 2011-05-21 21:30:33 PDT
My biggest concern is the build fix changes I made.  There are some gcc function aliasing and visibility stuff that I ended up removing, but I have no idea how that will affect other platforms.
Comment 6 Paul Biggar 2011-05-21 21:42:03 PDT
(In reply to comment #5)
> My biggest concern is the build fix changes I made.  There are some gcc
> function aliasing and visibility stuff that I ended up removing, but I have
> no idea how that will affect other platforms.

OK, I see them. I reviewed the patch which brought them in, so I can fix them.
Comment 7 Paul Biggar 2011-05-23 07:04:02 PDT
Although we don't quite have jemalloc-on-mac in the tree yet, this particular bug is fixed, so marking it as such. Further jemalloc-on-mac goodness in bug 414946.

Note You need to log in before you can comment on or make changes to this bug.