Firefox's fork of jemalloc is configured for a minimum of 2 bytes allocation and alignment. The Linux ABI specifies that memory returned by malloc and friends is word aligned at minimum. (4 bytes on 32-bit, 8 bytes on 64-bit.) GCC knows this, and its value range propagation pass will carry this information around from malloc and company to things like strlen or strcpy. When gcc decides to inline the string functions, if it knows that the memory they're operating on is word aligned, it will skip the usual byte-at-a-time preamble and start with word-at-a-time accesses. This will cause a crash if you allocate e.g. two bytes and jemalloc decides to place them in the last two bytes of a page before an unmapped region. I'm fairly certain this is what causes the occasional crash in glGetString() with Mesa -- Mesa's _make_extension_string() is the only caller of the static function get_extension_override() (and gcc will inline static functions that are called once). get_extension_override() will return a calloc()ed "\0" if there are no extension overrides (which is always the case if the MESA_EXTENSION_OVERRIDE environment variable is not set, i.e if you are not a Mesa developer). _make_extension_string() calls strlen() on the value returned by get_extension_override(), and gcc inlines this call. If jemalloc places the calloc()ed "\0" in the last 2 bytes of a page before an unmapped page (4 bytes on 64-bit systems), then glxtest will crash.
This is actually a C language requirement, not Linux ABI thing. From the draft C1X standard (N1570.pdf): 7.22.3 paragraph 1 "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)." And Defect Report #75: Question Item 12 - alignment of allocated memory Is a piece of memory allocated by malloc required to be aligned suitably for any type, or only for those types that will fit into the space? For example, following the assignment: void *vp = malloc (1); is it required that (void *)(int *)vp compare equal to vp (assuming that sizeof(int) > 1), or is it permissible for vp to be a value not suitably aligned to point to an int? Response Subclause 7.10.3 requires allocated memory to be suitably aligned for any type, so they must compare equal. (7.10.3 is the section in the C89 standard, 7.22.3 is the same section in the C1X draft.)
Hm. jemalloc says > /* Minimum alignment of allocations is 2^QUANTUM_2POW_MIN bytes. */ > # define QUANTUM_2POW_MIN 4 http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#429 But I agree, it looks like they mean "minimum alignment of non-tiny allocations is 2^QUANTUM_2POW_MIN bytes", and that tiny allocations are aligned to 2^TINY_MIN_POW == 2 bytes. Are you sure jemalloc is returning addresses that are only 2-byte aligned? I'm not sure we'd want to change jemalloc's behavior to be compliant here; we'd have to see how many tiny allocations we make. (I actually just started collecting this data, so that should be easy...)
A quick test with gdb attached to a running Firefox 7 instance: (gdb) call malloc(2) $6 = (void *) 0x7f3c5633a4ca (gdb) print 0x7f3c5633a4ca % 8 $7 = 2 (gdb) call malloc(2) $8 = (void *) 0x7f3c5633a4ce (gdb) print 0x7f3c5633a4ce % 8 $9 = 6 (gdb) call malloc(2) $10 = (void *) 0x7f3c5633a4d0 (gdb) print 0x7f3c5633a4d0 % 8 $11 = 0 (gdb) call malloc(2) $12 = (void *) 0x7f3c5633a4d2 (gdb) print 0x7f3c5633a4d2 % 8 $13 = 2 And you don't have a choice in the matter -- you're implementing an interface specified by the C standard, the other half of which is implemented by the compiler itself. And this isn't something that you can work around inside Firefox, because your malloc is used by every shared library that Firefox loads.
> And you don't have a choice in the matter -- you're implementing an interface specified by the C > standard, the other half of which is implemented by the compiler itself. Well, we *could* decree that all shared libraries Firefox uses which do string operations on something malloc'ed must make sure that they've malloc'ed at least a word. This is obviously not a problem in most of the shared libraries we use. I'm not saying we should do that, but if changing the alignment of our malloc caused our RSS to increase by say, 20%, I think we'd have to think hard about the alternatives... In any case, I instrumented a run of starting up and shutting down the browser on my Linux-64 box. There were 340624 malloc's. Broken down by size: malloc(1) 365 0.11% malloc(2) 874 0.26% malloc(3) 1046 0.31% malloc(4) 3845 1.13% malloc(5) 2689 0.79% malloc(6) 1331 0.39% malloc(7) 950 0.28% TOTAL 11100 3.26% Currently malloc(5 - 7) all get rounded up to malloc(8). So the only change would be rounding the malloc(4) and belows up to 8 bytes. I'll go out on a limb here and say it seems very unlikely that the additional few bytes in these few small malloc's would make a difference. There's certainly no real space difference, and I doubt we use these tiny allocations enough for there to be a cache difference. I'll write a patch.
I wonder if we need to change this on Windows, or just on Linux.
Firefox's malloc won't override the malloc used by DLLs on Windows due to the way dynamic linking works there, so you only have to worry about how the Microsoft C compiler compiles Firefox itself. (Assuming cl.exe does these kinds of optimizations. Or starts doing them in the future.)
We should be fine on Windows without this change, I think.
Talos shows that this is, if anything, a memory *win*.
Comment on attachment 564286 [details] [diff] [review] Patch v1 I'm not sure who to have review this...
Comment on attachment 564286 [details] [diff] [review] Patch v1 This patch fixes a bug that can be the cause of a lot of mysterious crashiness on linux. See for example https://bugs.freedesktop.org/show_bug.cgi?id=40423 which is how it was discovered. Please consider it for aurora/beta branches so we get the stability benefit into the hands of users faster.
Can we see how this patch affects the crash rate on nightlies, or do we not have sufficient coverage? I'm a bit wary of taking this on beta; it's a significant change and may have unintended consequences. If we need to take it on Aurora so we can see how it affects crashes for a larger audience, I'd be OK with that. Note that this change affects Linux only on branches, but affects Linux and Mac 10.6 on trunk, since jemalloc for 10.6 was landed recently.
Comment on attachment 564286 [details] [diff] [review] Patch v1 We decided we don't want this in Aurora and Beta and will let it ride up through the normal process. If you disagree, please renominate.
Is this something QA can verify?
Not easy to verify it's fixed as it relies on undefined behavior, but if you have the same setup as in https://bugs.freedesktop.org/show_bug.cgi?id=40423#c6, you can check in about:support if you still get an error like "GLXtest process failed (received signal 11)" in the Graphics section: that would mean the bug is not fixed.
I can't seem to open that link (throbber just keeps spinning trying to connect).