Last Comment Bug 691003 - jemalloc's minimum alignment must be word sized on Linux
: jemalloc's minimum alignment must be word sized on Linux
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: unspecified
: x86_64 Linux
-- normal (vote)
: mozilla10
Assigned To: Justin Lebar (not reading bugmail)
: Mike Hommey [:glandium]
Depends on:
  Show dependency treegraph
Reported: 2011-09-30 22:52 PDT by Nicholas Miell
Modified: 2012-01-04 13:02 PST (History)
10 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Patch v1 (3.33 KB, patch)
2011-10-03 12:00 PDT, Justin Lebar (not reading bugmail)
khuey: review+
christian: approval‑mozilla‑aurora-
christian: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description User image Nicholas Miell 2011-09-30 22:52:52 PDT
Firefox's fork of jemalloc is configured for a minimum of 2 bytes allocation and alignment.

The Linux ABI specifies that memory returned by malloc and friends is word aligned at minimum. (4 bytes on 32-bit, 8 bytes on 64-bit.)

GCC knows this, and its value range propagation pass will carry this information around from malloc and company to things like strlen or strcpy.

When gcc decides to inline the string functions, if it knows that the memory they're operating on is word aligned, it will skip the usual byte-at-a-time preamble and start with word-at-a-time accesses.

This will cause a crash if you allocate e.g. two bytes and jemalloc decides to place them in the last two bytes of a page before an unmapped region.

I'm fairly certain this is what causes the occasional crash in glGetString() with Mesa -- Mesa's _make_extension_string() is the only caller of the static function get_extension_override() (and gcc will inline static functions that are called once). get_extension_override() will return a calloc()ed "\0" if there are no extension overrides (which is always the case if the MESA_EXTENSION_OVERRIDE environment variable is not set, i.e if you are not a Mesa developer). _make_extension_string() calls strlen() on the value returned by get_extension_override(), and gcc inlines this call. If jemalloc places the calloc()ed "\0" in the last 2 bytes of a page before an unmapped page (4 bytes on 64-bit systems), then glxtest will crash.
Comment 1 User image Nicholas Miell 2011-10-01 13:55:56 PDT
This is actually a C language requirement, not Linux ABI thing.

From the draft C1X standard (N1570.pdf): 7.22.3 paragraph 1 "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)."

And Defect Report #75: 
Item 12 - alignment of allocated memory
Is a piece of memory allocated by malloc required to be aligned suitably for any type, or only for those types that will fit into the space? For example, following the assignment:
void *vp = malloc (1);
is it required that (void *)(int *)vp compare equal to vp (assuming that sizeof(int) > 1), or is it permissible for vp to be a value not suitably aligned to point to an int?
Subclause 7.10.3 requires allocated memory to be suitably aligned for any type, so they must compare equal.

(7.10.3 is the section in the C89 standard, 7.22.3 is the same section in the C1X draft.)
Comment 2 User image Justin Lebar (not reading bugmail) 2011-10-01 19:03:03 PDT
Hm.  jemalloc says

> /* Minimum alignment of allocations is 2^QUANTUM_2POW_MIN bytes. */
> #  define QUANTUM_2POW_MIN      4

But I agree, it looks like they mean "minimum alignment of non-tiny allocations is 2^QUANTUM_2POW_MIN bytes", and that tiny allocations are aligned to 2^TINY_MIN_POW == 2 bytes.

Are you sure jemalloc is returning addresses that are only 2-byte aligned?

I'm not sure we'd want to change jemalloc's behavior to be compliant here; we'd have to see how many tiny allocations we make.  (I actually just started collecting this data, so that should be easy...)
Comment 3 User image Nicholas Miell 2011-10-01 19:25:34 PDT
A quick test with gdb attached to a running Firefox 7 instance:
(gdb) call malloc(2)
$6 = (void *) 0x7f3c5633a4ca
(gdb) print 0x7f3c5633a4ca % 8
$7 = 2
(gdb) call malloc(2)
$8 = (void *) 0x7f3c5633a4ce
(gdb) print 0x7f3c5633a4ce % 8
$9 = 6
(gdb) call malloc(2)
$10 = (void *) 0x7f3c5633a4d0
(gdb) print 0x7f3c5633a4d0 % 8
$11 = 0
(gdb) call malloc(2)
$12 = (void *) 0x7f3c5633a4d2
(gdb) print 0x7f3c5633a4d2 % 8
$13 = 2

And you don't have a choice in the matter -- you're implementing an interface specified by the C standard, the other half of which is implemented by the compiler itself. And this isn't something that you can work around inside Firefox, because your malloc is used by every shared library that Firefox loads.
Comment 4 User image Justin Lebar (not reading bugmail) 2011-10-01 19:47:12 PDT
> And you don't have a choice in the matter -- you're implementing an interface specified by the C 
> standard, the other half of which is implemented by the compiler itself.

Well, we *could* decree that all shared libraries Firefox uses which do string operations on something malloc'ed must make sure that they've malloc'ed at least a word.  This is obviously not a problem in most of the shared libraries we use.  I'm not saying we should do that, but if changing the alignment of our malloc caused our RSS to increase by say, 20%, I think we'd have to think hard about the alternatives...

In any case, I instrumented a run of starting up and shutting down the browser on my Linux-64 box.  There were 340624 malloc's.  Broken down by size:

malloc(1)   365     0.11%
malloc(2)   874     0.26%
malloc(3)   1046    0.31%
malloc(4)   3845    1.13%
malloc(5)   2689    0.79%
malloc(6)   1331    0.39%
malloc(7)   950     0.28%
TOTAL       11100   3.26%

Currently malloc(5 - 7) all get rounded up to malloc(8).  So the only change would be rounding the malloc(4) and belows up to 8 bytes.  I'll go out on a limb here and say it seems very unlikely that the additional few bytes in these few small malloc's would make a difference.  There's certainly no real space difference, and I doubt we use these tiny allocations enough for there to be a cache difference.

I'll write a patch.
Comment 5 User image Justin Lebar (not reading bugmail) 2011-10-01 19:50:38 PDT
I wonder if we need to change this on Windows, or just on Linux.
Comment 6 User image Nicholas Miell 2011-10-01 19:53:52 PDT
Firefox's malloc won't override the malloc used by DLLs on Windows due to the way dynamic linking works there, so you only have to worry about how the Microsoft C compiler compiles Firefox itself. (Assuming cl.exe does these kinds of optimizations. Or starts doing them in the future.)
Comment 7 User image Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-10-01 19:54:57 PDT
We should be fine on Windows without this change, I think.
Comment 8 User image Justin Lebar (not reading bugmail) 2011-10-03 12:00:48 PDT
Created attachment 564286 [details] [diff] [review]
Patch v1
Comment 9 User image Justin Lebar (not reading bugmail) 2011-10-03 12:25:06 PDT
Comment 10 User image Justin Lebar (not reading bugmail) 2011-10-03 22:04:38 PDT
Talos shows that this is, if anything, a memory *win*.
Comment 11 User image Justin Lebar (not reading bugmail) 2011-10-03 22:07:20 PDT
Comment on attachment 564286 [details] [diff] [review]
Patch v1

I'm not sure who to have review this...
Comment 12 User image Justin Lebar (not reading bugmail) 2011-10-05 11:08:53 PDT
Comment 13 User image Ed Morley [:emorley] 2011-10-06 03:49:09 PDT
Comment 14 User image Benoit Jacob [:bjacob] (mostly away) 2011-10-06 04:56:23 PDT
Comment on attachment 564286 [details] [diff] [review]
Patch v1

This patch fixes a bug that can be the cause of a lot of mysterious crashiness on linux. See for example which is how it was discovered. Please consider it for aurora/beta branches so we get the stability benefit into the hands of users faster.
Comment 15 User image Justin Lebar (not reading bugmail) 2011-10-06 07:13:53 PDT
Can we see how this patch affects the crash rate on nightlies, or do we not have sufficient coverage?

I'm a bit wary of taking this on beta; it's a significant change and may have unintended consequences.  If we need to take it on Aurora so we can see how it affects crashes for a larger audience, I'd be OK with that.

Note that this change affects Linux only on branches, but affects Linux and Mac 10.6 on trunk, since jemalloc for 10.6 was landed recently.
Comment 16 User image christian 2011-10-06 14:41:30 PDT
Comment on attachment 564286 [details] [diff] [review]
Patch v1

We decided we don't want this in Aurora and Beta and will let it ride up through the normal process. If you disagree, please renominate.
Comment 17 User image Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-28 14:37:13 PST
Is this something QA can verify?
Comment 18 User image Benoit Jacob [:bjacob] (mostly away) 2011-12-30 16:10:53 PST
Not easy to verify it's fixed as it relies on undefined behavior, but if you have the same setup as in, you can check in about:support if you still get an error like "GLXtest process failed (received signal 11)" in the Graphics section: that would mean the bug is not fixed.
Comment 19 User image Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-04 13:02:09 PST
I can't seem to open that link (throbber just keeps spinning trying to connect).

Note You need to log in before you can comment on or make changes to this bug.