Closed Bug 678977 Opened 13 years ago Closed 13 years ago

Teach sqlite to use jemalloc directly when applicable

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: khuey, Assigned: n.nethercote)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 5 obsolete files)

Attached patch Patch (obsolete) — Splinter Review
This uses sqlite's pluggable allocator stuff to plugin in jemalloc when MOZ_MEMORY is defined.  It might need some more cleverness on OS X.  I used the fallible mozalloc allocators since that seems appropriate here (since sqlite handles memory allocation failures).

njn, could you run your slop detector stuff at this?  I expect it might mask the 33016 byte allocations from https://bugzilla.mozilla.org/show_bug.cgi?id=676189#c15 by rounding them up to 64 K :-/, but it should handle the rest properly, and I expect it'll cause the sqlite memory reporter numbers to be more accurate.
Attachment #553164 - Flags: feedback?(nnethercote)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #0)
> njn, could you run your slop detector stuff at this?  I expect it might mask
> the 33016 byte allocations from
> https://bugzilla.mozilla.org/show_bug.cgi?id=676189#c15 by rounding them up
> to 64 K :-/

That's 36 K of course :-P
Attached patch Patch (obsolete) — Splinter Review
Including math.h helps.
Attachment #553164 - Attachment is obsolete: true
Attachment #553164 - Flags: feedback?(nnethercote)
Attachment #553259 - Flags: feedback?(nnethercote)
imma let you finish, but when this is ready for review, please look over https://hg.mozilla.org/mozilla-central/raw-file/tip/storage/style.txt
Comment on attachment 553259 [details] [diff] [review]
Patch

Review of attachment 553259 [details] [diff] [review]:
-----------------------------------------------------------------

Some drive-by comments, I'm in the process of measuring the difference.

::: storage/src/mozStorageService.cpp
@@ +322,5 @@
> +int jemalloc_rounding_calculator(int size)
> +{
> +  // We could use bit twiddling hacks here, but it seems unnecessary
> +  if (size == 0) {
> +    return 1;

That should be 2.  I even wrote a test program to check it :)

@@ +329,5 @@
> +    return pow(2, ceil(log((float)size) / log(2.0)));
> +  } else if (size <= 512) {
> +    // Round up to a 16 byte boundary
> +    if (size % 16)
> +      return size + 16 - (size % 16);

A macro or function to factor out this repeated rounding code would be good.

@@ +345,5 @@
> +    if (size % (1024 * 1024))
> +      return size + 1024 * 1024 - (size % (1024 * 1024));
> +    return size;
> +  }
> +}

FWIW here's the equivalent function I wrote in DMD, which has more branches but no power-of-two round-ups.  I find it easier to read.

static
SizeT getJemallocSize(SizeT n)
{
   // Small/Tiny: 2, 4, 8 (bytes)
   if      (n <= 2) n = 2;
   else if (n <= 4) n = 4;
   else if (n <= 8) n = 8;

   // Small/Quantum-spaced: 16, 32, 48, ..., 480, 496, 512 (bytes)
   else if (n <= 512) n = VG_ROUNDUP(n, 16);

   // Small/Sub-page: 1, 2 (KiB)
   else if (n <= 2*1024) n = VG_ROUNDUP(n, 1024);

   // Large: 4, 8, 12, ..., 1012, 1016, 1020 (KiB)
   else if (n <= 1024*1024) n = VG_ROUNDUP(n, 4*1024);

   // Huge: 1, 2, 3, ... (MiB)
   else n = VG_ROUNDUP(n, 1024*1024);

   return n;
}
Did you test this patch?  When I applied it I got this:

  firefox: /home/njn/moz/mi1/db/sqlite3/src/sqlite3.c:17913: sqlite3Malloc: Assertion `((((char*)(p) - (char*)0)&7)==0)' failed.

The problem was that a 4-byte allocation was only 4-aligned.  I guess that hasn't showed up before because we always added the 8 bytes at the start, and a 12-byte request is rounded up to 16 bytes and that is presumably always 8-aligned.
(In reply to Nicholas Nethercote [:njn] from comment #6)
> Did you test this patch?  When I applied it I got this:
> 
>   firefox: /home/njn/moz/mi1/db/sqlite3/src/sqlite3.c:17913: sqlite3Malloc:
> Assertion `((((char*)(p) - (char*)0)&7)==0)' failed.
> 
> The problem was that a 4-byte allocation was only 4-aligned.  I guess that
> hasn't showed up before because we always added the 8 bytes at the start,
> and a 12-byte request is rounded up to 16 bytes and that is presumably
> always 8-aligned.

Yep, I built and ran fine on Windows (http://tbpl.allizom.org/?usebuildbot=1&tree=Try&rev=ebda3a0ff95e).

Maybe jemalloc's alignment behavior is different on different platforms?  That seems illogical.
(In reply to Nicholas Nethercote [:njn] from comment #6)
> Did you test this patch?  When I applied it I got this:
> 
>   firefox: /home/njn/moz/mi1/db/sqlite3/src/sqlite3.c:17913: sqlite3Malloc:
> Assertion `((((char*)(p) - (char*)0)&7)==0)' failed.
> 
> The problem was that a 4-byte allocation was only 4-aligned.  I guess that
> hasn't showed up before because we always added the 8 bytes at the start,
> and a 12-byte request is rounded up to 16 bytes and that is presumably
> always 8-aligned.

I can't reproduce this, but it should be easy enough to fix, just make '8' the minimum size returned from jemalloc_rounding_calculator.
Ok, I finally got DMD and my annotated copy of Firefox in a good enough shape to test this.  Here's a sample report prior to the patch:

==20837== Unreported: 34,816 (cumulative: 11,474,380) bytes in 17 heap block(s) in record 107 of 14896:
==20837==  Requested bytes unreported: 136 / 34,952
==20837==  Slop      bytes unreported: 34,680 / 34,680
==20837==    at 0x4029F90: malloc (vg_replace_malloc.c:235)
==20837==    by 0x40AC58A: sqlite3MemMalloc (sqlite3.c:14291)
==20837==    by 0x40ACFA8: mallocWithAlarm (sqlite3.c:17841)
==20837==    by 0x40AD0A5: sqlite3Malloc (sqlite3.c:17875)
==20837==    by 0x40AD0F3: sqlite3_malloc (sqlite3.c:17893)
==20837==    by 0x40B822D: pcache1ResizeHash (sqlite3.c:33833)
==20837==    by 0x40B89AC: pcache1Fetch (sqlite3.c:34175)
==20837==    by 0x40B7526: sqlite3PcacheFetch (sqlite3.c:33093)

This report is for a case where we want to allocate 2048 bytes, but add 8 for the size and so end up allocating 4096.  There are 17 such blocks, so our memory usage is:

- useful:     17 * 2048 = 34,816 bytes
- accounting: 17 * 8    = 136    bytes
- slop:       17 * 2040 = 34,680 bytes

With the patch applied, there was no such record -- DMD doesn't produce records for allocation contexts that have been completely reported.  In fact, there weren't any SQLite records produced by DMD.

So, this patch has two effects:

- It fixes the minor clownshoes cases (of size 1024, 2048, 32768) from bug 676189 comment 15, saving some memory usage by avoiding slop.  (A tiny bit more memory is also saved due to not having to store an 8-byte size per heap block.)

- It gets rid of all SQLite dark matter -- even in the 33016 case, because that size is pre-emptively rounded up using xRoundup() to 36864, and so SQLite's own measurements include that slop.  But it's still slop, and the most significant part of the slop.

So, the patch works as intended, on my Linux64 machine at least.  I'll add some review comments for it shortly.
Comment on attachment 553259 [details] [diff] [review]
Patch

Review of attachment 553259 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageService.cpp
@@ +318,5 @@
>  sqlite3_vfs* ConstructTelemetryVFS();
> +
> +namespace {
> +
> +int jemalloc_rounding_calculator(int size)

This needs a very detailed comment.  Things to cover:

- Why we're using jemalloc (to avoid some clownshoes).

- Why we need a roundup function (because SQLite requires it, icky though it is).

- How you derived this function (from the comment at the top of jemalloc.c that describes its size classes).

- Consequences if this function ever underestimates how much rounding jemalloc does (no correctness problems, but some wasted memory, because jemalloc will round-up beyond what this function thinks it will).

- Consequences if this function ever overestimates how much rounding jemalloc does (no correctness problems, but some wasted memory, because the excessive rounding-up will be done before the request is made of jemalloc).

- What happens on platforms where jemalloc isn't defined (SQLite uses its own malloc/free wrappers, which add 8 bytes of size info and so can cause clownshoes).

- Anything else important I haven't thought of.

- Why we round up requests smaller than 8 to 8, even though jemalloc can do 2-byte and 4-byte allocations (because SQLite requires 8-byte alignment).

Also, what happens when our copy of jemalloc gets updated -- can we do anything to ensure this function will be updated along with it if necessary?  I'd be much happier if you can think of way to provide such a guarantee.

@@ +325,5 @@
> +  if (size == 0) {
> +    return 1;
> +  } else if (size <= 32) {
> +    // Round up to a power of two
> +    return pow(2, ceil(log((float)size) / log(2.0)));

Thinking some more, I really prefer my version which doesn't require math.h.  Can you change this one accordingly?

I also like that mine has inline comments showing the size classes (and their jemalloc names).

@@ +375,5 @@
> +const sqlite3_mem_methods allocator = {
> +  &sqlite_malloc_wrapper,
> +  &moz_free,
> +  &sqlite_realloc_wrapper,
> +  &sqlite_malloc_usable_size_wrapper,

Are the three wrappers necessary because the sqlite signatures use |int| instead of |size_t|?

@@ +378,5 @@
> +  &sqlite_realloc_wrapper,
> +  &sqlite_malloc_usable_size_wrapper,
> +  &jemalloc_rounding_calculator,
> +  &sqlite_malloc_init,
> +  &sqlite_malloc_uninit,

sqlite_malloc_shutdown would appear to be the idiomatic name for this.
Attachment #553259 - Flags: feedback?(nnethercote) → feedback+
Blocks: DarkMatter
Depends on: DMD
(In reply to Nicholas Nethercote [:njn] from comment #9)
> So, this patch has two effects:
> 
> - It fixes the minor clownshoes cases (of size 1024, 2048, 32768) from bug
> 676189 comment 15, saving some memory usage by avoiding slop.  (A tiny bit
> more memory is also saved due to not having to store an 8-byte size per heap
> block.)
> 
> - It gets rid of all SQLite dark matter -- even in the 33016 case, because
> that size is pre-emptively rounded up using xRoundup() to 36864, and so
> SQLite's own measurements include that slop.  But it's still slop, and the
> most significant part of the slop.

Well, the interesting question here is whether we've moved the waste from the allocator level to the sqlite level, or whether sqlite will make use of that extra space now that it knows about it.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #11)
> (In reply to Nicholas Nethercote [:njn] from comment #9)
>
> Well, the interesting question here is whether we've moved the waste from
> the allocator level to the sqlite level, or whether sqlite will make use of
> that extra space now that it knows about it.

SQLite does, in fact, try to make use of extra space that the memory allocator gives it.  How successful it is in using that extra space depends on context, of course.

For example, when SQLite is preparing an SQL statement, it first allocates an array to hold the opcodes as they are generated.  This uses the classic trick of starting with a small allocation, then doubling the size of the allocation when the array needs to grow.  After all opcodes have been generated, any left-over space at the end of the opcode array is reused for other auxiliary data structures needed by the prepared statement, thus avoiding extra mallocs.

Or, whenever SQLite needs to allocate a hash table, it requests a certain size, but after the allocation completes, SQLite asks how big the allocation is really, and uses the entire allocation for its hash table rather than just the amount of space requested.

If SQLite needs to store at 1100 byte string or blob, it requests 1100 bytes of memory from the memory allocator.  If it gets back 2048 bytes, it remembers that fact, but it still only uses the first 1100 bytes because that is the size of the string/blob it needs to store.  There isn't really anything we can do with those extra 948 bytes at the end.  If the size of the string grows to 1300 bytes, SQLite knows that the allocation is still big enough to hold it and so avoids a call to realloc() - but that realloc() was probably going to be a no-op anyhow so it is not clear how much work this saves.

The biggest user of memory in SQLite is that page cache.  Each page cache allocation is for a number of bytes that are the page size of the underlying database file (always a power of two) plus a header of about 100 bytes or so.  If your memory allocator rounds that up to the next power of two, the extra bytes are unused.  So in this case, nothing is saved.  And since this case is the largest user of memory in SQLite by far, I fear that your efforts here might be for naught.

We (the SQLite team) are currently working on a change to the page cache that will cause it to allocate multiple pages at once, with the multiplier chosen so that the total memory allocation size is close to but not exceeding some power of two.  This might help reduce unused space.  If the changes work out, we'll check them into an experimental branch so that y'all can test them.  Whom do we notify when we get these changes working?  Will it suffice to add a comment to this ticket?
(In reply to D. Richard Hipp from comment #12)
> The biggest user of memory in SQLite is that page cache.  Each page cache
> allocation is for a number of bytes that are the page size of the underlying
> database file (always a power of two) plus a header of about 100 bytes or
> so.  If your memory allocator rounds that up to the next power of two, the
> extra bytes are unused.  So in this case, nothing is saved.  And since this
> case is the largest user of memory in SQLite by far, I fear that your
> efforts here might be for naught.
> 
> We (the SQLite team) are currently working on a change to the page cache
> that will cause it to allocate multiple pages at once, with the multiplier
> chosen so that the total memory allocation size is close to but not
> exceeding some power of two.  This might help reduce unused space.  If the
> changes work out, we'll check them into an experimental branch so that y'all
> can test them.  Whom do we notify when we get these changes working?  Will
> it suffice to add a comment to this ticket?

Yes, that will be sufficient.

Thanks for your explanation of the status here.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #13)
> (In reply to D. Richard Hipp from comment #12)
> > The biggest user of memory in SQLite is that page cache.  Each page cache
> > allocation is for a number of bytes that are the page size of the underlying
> > database file (always a power of two) plus a header of about 100 bytes or
> > so.  If your memory allocator rounds that up to the next power of two, the
> > extra bytes are unused.  So in this case, nothing is saved.  And since this
> > case is the largest user of memory in SQLite by far, I fear that your
> > efforts here might be for naught.
> > 
> > We (the SQLite team) are currently working on a change to the page cache
> > that will cause it to allocate multiple pages at once, with the multiplier
> > chosen so that the total memory allocation size is close to but not
> > exceeding some power of two.  This might help reduce unused space.  If the
> > changes work out, we'll check them into an experimental branch so that y'all
> > can test them.  Whom do we notify when we get these changes working?  Will
> > it suffice to add a comment to this ticket?
> 
> Yes, that will be sufficient.
> 
> Thanks for your explanation of the status here.

Please see 

    http://www.sqlite.org/sqlite3-63597097ee.tar.gz

That tarball contains sqlite3.c and sqlite3.h files with a modified page cache which (we think) will do a much better job of avoiding wasted space on jemalloc allocations.  In order to take advantage of the new page cache changes, you'll need to compile with

    -DSQLITE_PAGECACHE_BLOCKALLOC

Without that macro, the behavior is unchanged from the SQLite that you currently have.

If you are able to test out this new SQLite pager and let us know if it help, that would be great.  If you tell us that this change does reduce memory usage in FF, we'll fold it into the trunk.
njn, can you test that build with your tools and report on its performance?
Yes, I'll take a look.  Thanks, Richard!
A drive by nit in your patch, Richard:

** This size (pgsz+200) bytes is not allocated efficiently by some
** implementations of malloc. In particular, some implementations are only
** able to allocate blocks of memory chunks of 2^N bytes, where N is some
** integer value.

That's not quite right.  jemalloc, like most allocators, is able to allocate lots of non-2^N sizes.  The particular problem case is that it rounds 32KB+200  to 36KB.  You'll probably get some waste with just about any allocator for 32KB+200.
With both khuey and Richard's changes applied, I see zero slop bytes in all of SQLite :)  Great work!

I've attached the top 10 records produced by DMD for one Firefox run, in case it's of interest.  The pagecache allocations dominate, unsurprisingly.

Since this bug is for using sqlite's pluggable allocator, I've filed bug 681184 for importing the new SQLite version once it's officially ready.
BTW, I've confirmed that there is zero slop with the two patches applied.  I haven't confirmed that we're actually using less memory.  That's difficult because the SQLite numbers are rather non-deterministic.

I'm happy to take the improvement on faith, since it seems very unlikely we'd get rid of the slop but then use the requested memory in such a way that we end up not seeing a reduction.  Does that make sense?
Are we making 32k or 36k allocations for the page cache?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> Are we making 32k or 36k allocations for the page cache?

==3969== Unreported: 9,912,320 (cumulative: 23,175,248) bytes in 20 heap block(s) in record 2 of 16370:
==3969==  Requested bytes unreported: 9,912,320 / 9,912,320
==3969==  Slop      bytes unreported: 0 / 0
==3969==    at 0x4029F90: malloc (vg_replace_malloc.c:235)
==3969==    by 0x403B090: moz_malloc (mozalloc.cpp:120)
==3969==    by 0x746FD67: mozilla::storage::(anonymous namespace)::sqlite_malloc_wrapper(int) (mozStorageService.cpp:353)
==3969==    by 0x40ADAFA: mallocWithAlarm (sqlite3.c:18349)
==3969==    by 0x40ADB95: sqlite3Malloc (sqlite3.c:18382)
==3969==    by 0x40B9506: pcache1AllocPage (sqlite3.c:35902)
==3969==    by 0x40BA2AC: pcache1Fetch (sqlite3.c:36459)
==3969==    by 0x40B8716: sqlite3PcacheFetch (sqlite3.c:35101)

9912320 / 20 is... 495616, which is 484KiB.  Interesting.
> BTW, I've confirmed that there is zero slop with the two patches applied.

Oh, and with khuey's patch applied we'll always get zero slop reported by DMD, because xRoundup is always called before xAlloc.  It's possible that the rounded-up space won't actually be used.


> 9912320 / 20 is... 495616, which is 484KiB.  Interesting.

Reading the code, that's how the new option works -- it allocates 15 page cache blocks at once:  15 * 33024 = 495360, plus 48 bytes for a header, which gets rounded up to 495616.  In other words, the allocation quantum is made much bigger, which minimizes slop at the cost of possibly allocating more than we'll ever need.  (That was possible before, but the max over-allocation was 32KiB, now it's 484KiB.)  Hmm.

Here's a sample about:memory after starting up, with khuey's patch from this bug applied, but with SQLITE_PAGECACHE_BLOCKALLOC disabled:

├───3,369,256 B (08.10%) -- storage
│   └──3,369,256 B (08.10%) -- sqlite
│      ├────960,960 B (02.31%) -- places.sqlite
│      │    ├──830,056 B (02.00%) -- cache-used [4]
│      │    ├───84,216 B (00.20%) -- schema-used [4]
│      │    └───46,688 B (00.11%) -- stmt-used [4]
│      ├────786,144 B (01.89%) -- other
│      ├────527,912 B (01.27%) -- extensions.sqlite
│      │    ├──363,952 B (00.87%) -- cache-used
│      │    ├──151,872 B (00.37%) -- stmt-used
│      │    └───12,088 B (00.03%) -- schema-used
│      ├────447,136 B (01.07%) -- chromeappsstore.sqlite
│      │    ├──331,712 B (00.80%) -- cache-used
│      │    ├──108,832 B (00.26%) -- stmt-used
│      │    └────6,592 B (00.02%) -- schema-used
│      ├────315,072 B (00.76%) -- webappsstore.sqlite
│      │    ├──199,648 B (00.48%) -- cache-used
│      │    ├──108,832 B (00.26%) -- stmt-used
│      │    └────6,592 B (00.02%) -- schema-used
│      ├────151,000 B (00.36%) -- content-prefs.sqlite
│      │    ├──132,840 B (00.32%) -- cache-used
│      │    ├───14,000 B (00.03%) -- stmt-used
│      │    └────4,160 B (00.01%) -- schema-used
│      ├────111,504 B (00.27%) -- permissions.sqlite
│      │    ├───99,824 B (00.24%) -- cache-used
│      │    ├────9,568 B (00.02%) -- stmt-used
│      │    └────2,112 B (00.01%) -- schema-used
│      └─────69,528 B (00.17%) -- formhistory.sqlite
│            ├──66,808 B (00.16%) -- cache-used
│            ├───2,720 B (00.01%) -- schema-used
│            └───────0 B (00.00%) -- stmt-used

And here's the same output after enabling SQLITE_PAGECACHE_BLOCKALLOC:

├───8,512,168 B (19.05%) -- storage
│   └──8,512,168 B (19.05%) -- sqlite
│      ├──6,219,680 B (13.92%) -- other
│      ├────960,784 B (02.15%) -- places.sqlite
│      │    ├──830,056 B (01.86%) -- cache-used [4]
│      │    ├───84,328 B (00.19%) -- schema-used [4]
│      │    └───46,400 B (00.10%) -- stmt-used [4]
│      ├────445,424 B (01.00%) -- chromeappsstore.sqlite
│      │    ├──331,712 B (00.74%) -- cache-used
│      │    ├──107,072 B (00.24%) -- stmt-used
│      │    └────6,640 B (00.01%) -- schema-used
│      ├────313,360 B (00.70%) -- webappsstore.sqlite
│      │    ├──199,648 B (00.45%) -- cache-used
│      │    ├──107,072 B (00.24%) -- stmt-used
│      │    └────6,640 B (00.01%) -- schema-used
│      ├────241,176 B (00.54%) -- urlclassifier3.sqlite
│      │    ├──136,984 B (00.31%) -- stmt-used
│      │    ├───99,824 B (00.22%) -- cache-used
│      │    └────4,368 B (00.01%) -- schema-used
│      ├────150,712 B (00.34%) -- content-prefs.sqlite
│      │    ├──132,840 B (00.30%) -- cache-used
│      │    ├───13,712 B (00.03%) -- stmt-used
│      │    └────4,160 B (00.01%) -- schema-used
│      ├────111,504 B (00.25%) -- permissions.sqlite
│      │    ├───99,824 B (00.22%) -- cache-used
│      │    ├────9,568 B (00.02%) -- stmt-used
│      │    └────2,112 B (00.00%) -- schema-used
│      └─────69,528 B (00.16%) -- formhistory.sqlite
│            ├──66,808 B (00.15%) -- cache-used
│            ├───2,720 B (00.01%) -- schema-used
│            └───────0 B (00.00%) -- stmt-used

Ouch.  That's a 5MB increase in SQLite memory usage, because we have multiple DBs.  (BTW, looks like those 484KiB allocations are mostly being charged to "storage/sqlite/other".)  I officially revoke my "on faith" remark from comment 19... I don't think this fix is going to work :(
Attached patch patch, v2 (obsolete) — Splinter Review
I was bored and decided to steal this one.  sdwilsh, I've tried to follow the storage code style guidelines.
Assignee: khuey → nnethercote
Attachment #553259 - Attachment is obsolete: true
Attachment #555646 - Flags: review?(khuey)
Attachment #555646 - Flags: feedback?(sdwilsh)
Comment on attachment 555646 [details] [diff] [review]
patch, v2

I wonder if we should move the roundup stuff into jemalloc?

Also, sdwilsh is the peer here, so rearranging r/f.
Attachment #555646 - Flags: review?(sdwilsh)
Attachment #555646 - Flags: review?(khuey)
Attachment #555646 - Flags: feedback?(sdwilsh)
Attachment #555646 - Flags: feedback?(khuey)
Attached patch patch, v3 (obsolete) — Splinter Review
This version moves the jemalloc roundup function into jemalloc.c, which is 100x less sleazy.

Asking review from khuey for the jemalloc and build system changes, and sdwilsh for the storage changes.
Attachment #555646 - Attachment is obsolete: true
Attachment #555646 - Flags: review?(sdwilsh)
Attachment #555646 - Flags: feedback?(khuey)
Attachment #555914 - Flags: review?(khuey)
Attachment #555914 - Flags: review?(sdwilsh)
Comment on attachment 555914 [details] [diff] [review]
patch, v3

>diff --git a/build/unix/gnu-ld-scripts/jemalloc-standalone-linkage-version-script b/build/unix/gnu-ld-scripts/jemalloc-standalone-linkage-version-script
>--- a/build/unix/gnu-ld-scripts/jemalloc-standalone-linkage-version-script
>+++ b/build/unix/gnu-ld-scripts/jemalloc-standalone-linkage-version-script
>+		je_malloc_usable_size_in_advance;

You need a similar change to http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.def to make this work on Windows.


>diff --git a/memory/jemalloc/jemalloc.c b/memory/jemalloc/jemalloc.c
>--- a/memory/jemalloc/jemalloc.c
>+++ b/memory/jemalloc/jemalloc.c
>@@ -6204,16 +6204,51 @@ free(void *ptr)
> 
> /*
>  * End malloc(3)-compatible functions.
>  */
> /******************************************************************************/
> /*
>  * Begin non-standard functions.
>  */
>+static inline int
>+roundup_2N(size_t n, size_t a)
>+{
>+	return (n + a - 1) & ~(a - 1);
>+}
>+
>+/* This was added for use by SQLite. */
>+size_t
>+je_malloc_usable_size_in_advance(size_t n)
>+{
>+	/* Small/Tiny: 2, 4, 8 bytes */
>+	if (n <= 2)
>+		return 2;
>+	if (n <= 4)
>+		return 4;
>+	if (n <= 8)
>+		return 8;
>+
>+	/* Small/Quantum-spaced: 16, 32, 48, ..., 480, 496, 512 bytes */
>+	if (n <= 512)
>+		return roundup_2N(n, 16);
>+
>+	/* Small/Sub-page: 1, 2 kB */
>+	if (n <= 2*1024)
>+		return roundup_2N(n, 1024);
>+
>+	/* Large: 4, 8, 12, ..., 1012, 1016, 1020 kB */
>+	if (n <= 1024*1024)
>+		return roundup_2N(n, 4*1024);
>+
>+	/* Huge: 1, 2, 3, ... MB */
>+	return roundup_2N(n, 1024*1024);
>+}
>+
>+
> #ifdef MOZ_MEMORY_ANDROID
> size_t
> malloc_usable_size(void *ptr)
> #else
> size_t
> malloc_usable_size(const void *ptr)
> #endif
> {

Are tabs really the prevailing style here?  /me cries.

>diff --git a/storage/src/Makefile.in b/storage/src/Makefile.in
>--- a/storage/src/Makefile.in
>+++ b/storage/src/Makefile.in
>+# TODO: we do this in crashreporter and xpcom/base too, should be centralized

Yeah, we really should do that.

>diff --git a/storage/src/mozStorageService.cpp b/storage/src/mozStorageService.cpp
>--- a/storage/src/mozStorageService.cpp
>+++ b/storage/src/mozStorageService.cpp
>@@ -309,24 +309,114 @@ Service::~Service()
>- 
>+
>+#ifdef MOZ_MEMORY
>+
>+#  if defined(XP_WIN) || defined(SOLARIS) || defined(ANDROID) || defined(XP_MACOSX)
>+#    include "jemalloc.h"
>+#  elif defined(XP_LINUX)
>+// jemalloc is directly linked into firefox-bin; libxul doesn't link
>+// with it.  So if we tried to use jemalloc_stats directly here, it
>+// wouldn't be defined.  Instead, we don't include the jemalloc header
>+// and weakly link against jemalloc_stats.

You copied and pasted the comment a bit too hard.

>+namespace {
>+
>+// By default, SQLite tracks the size of all its heap blocks by adding an extra
>+// 8 bytes at the start of the block to hold the size.  Unfortunately, this
>+// causes a lot of 2^N-sized allocations to be rounded up by jemalloc
>+// allocator, wasting memory.  For example, a request for 1024 bytes has 8
>+// bytes added, becoming a request for 1032 bytes, and jemalloc rounds this up
>+// to 2048 bytes, wasting 1012 bytes.  (See bug 676189 for more details.)
>+//
>+// So we register jemalloc as the malloc implementation, which avoids this
>+// 8-byte overhead, and thus a lot of waste.  This requires us to provide a
>+// function, sqliteMemRoundup(), which computes the actual size that will be
>+// allocated for a given request.  SQLite uses this function before all
>+// allocations, and may be able to use any excess bytes caused by the rounding.
>+//
>+// Nb: the wrappers for moz_malloc, moz_realloc and moz_malloc_usable_size are

Total nit, 'NB' prevails in Mozilla code.

>+// necessary because the sqlite_mem_methods type signatures differ slightly
>+// from the standard ones -- they use int instead of size_t.  But we don't need
>+// a wrapper for moz_free.
>+
>+void* sqliteMemMalloc(int n)
>+{
Attachment #555914 - Flags: review?(khuey) → review+
Ugh, I'm getting some seg faults in $OBJ/netwerk/test.  I can reproduce them on my Linux64 box:

  [bayou:~/moz/mi9/d64/netwerk/test] make check
  Running STS Parser Tests tests...
  Segmentation fault (core dumped)
  make: *** [check] Error 139

or if I run the individual test by hand:

  [bayou:~/moz/mi9/d64/netwerk/test] XPCOM_DEBUG_BREAK=stack-and-abort /home/njn/
  moz/mi9/d64/dist/bin/run-mozilla.sh ../../dist/bin/TestSTSParser
  Running STS Parser Tests tests...
  Segmentation fault (core dumped)

The stack trace at crash time is this:

#2  0x0000000000000000 in ?? ()
#3  0x00007ffff65d8eba in mozilla::storage::(anonymous namespace)::sqliteMemRoundup (n=<value optimised out>)
    at /home/njn/moz/mi9/storage/src/mozStorageService.cpp:370
#4  0x00007ffff398e6eb in mallocWithAlarm (n=64, pp=0x7fffffffcfb8)
    at /home/njn/moz/mi9/db/sqlite3/src/sqlite3.c:17789

So, a similar linking problem to what I was seeing earlier, before I added the |NS_VISIBILITY_DEFAULT __attribute__((weak))| magic in mozStorageService.cpp.
Sigh.
I'll try to get to this review tomorrow, but I can't make any promises because there's a lot to check.  Should be done within a week if I don't get it done by the end of the day tomorrow.
> You need a similar change to
> http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.def
> to make this work on Windows.

Even with that change I still get build errors, but only in opt builds:

   Creating library xul.lib and object xul.exp
mozStorageService.obj : error LNK2001: unresolved external symbol _je_malloc_usable_size_in_advance
xul.dll : fatal error LNK1120: 1 unresolved externals


http://tbpl.allizom.org/php/getParsedLog.php?id=6134686&full=1
(In reply to Nicholas Nethercote [:njn] from comment #27)
> 
>   [bayou:~/moz/mi9/d64/netwerk/test] XPCOM_DEBUG_BREAK=stack-and-abort
> /home/njn/
>   moz/mi9/d64/dist/bin/run-mozilla.sh ../../dist/bin/TestSTSParser
>   Running STS Parser Tests tests...
>   Segmentation fault (core dumped)

I tried calling jemalloc_stats from sqliteMemRoundup, and it crashes too.  So it appears this is a pre-existing problem with this kind of linking, but we didn't have tests that exercised jemalloc_stats in this way.

After talking with khuey, the problem seems to be that on Linux, jemalloc is linked into firefox-bin, not libxul.so, but TestSTSParser only gets to the libxul.so, so je_malloc_usable_size_in_advance isn't defined.  Comparing it to zero avoids the problem, amusingly enough.
I thought a better alternative would be to define moz_malloc_usable_size_in_advance in mozalloc.{cpp,h}, but I can't get that to work either.  My attempted patch is attached, I get this linking error on Linux:

make -C memory libs
make[5]: Entering directory `/home/njn/moz/mi9/d64/memory'
make -C jemalloc libs
make[6]: Entering directory `/home/njn/moz/mi9/d64/memory/jemalloc'
/home/njn/moz/mi9/d64/config/nsinstall -R -m 644 libjemalloc.a libjemalloc.a.desc ../../dist/lib
make[6]: Leaving directory `/home/njn/moz/mi9/d64/memory/jemalloc'
make -C mozalloc libs
make[6]: Entering directory `/home/njn/moz/mi9/d64/memory/mozalloc'
/home/njn/moz/mi9/d64/config/nsinstall -D ../../dist/sdk/lib
mozalloc.cpp
c++ -o mozalloc.o -c   -DOSTYPE=\"Linux2.6\" -DOSARCH=Linux -I../../xpcom -I/home/njn/moz/mi9/memory/mo
zalloc -I. -I../../dist/include -I../../dist/include/nsprpub  -I/home/njn/moz/mi9/d64/dist/include/nspr
 -I/home/njn/moz/mi9/d64/dist/include/nss       -fPIC  -fno-rtti -fno-exceptions -Wall -Wpointer-arith
-Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-off
setof -Wno-variadic-macros -Werror=return-type -pedantic -Wno-long-long -fno-strict-aliasing -std=gnu++
0x -pthread -ffunction-sections -fdata-sections -pipe  -DDEBUG -D_DEBUG -DTRACING -g -fno-omit-frame-po
inter -O   -DMOZILLA_CLIENT -include ../../mozilla-config.h -MD -MF .deps/mozalloc.pp /home/njn/moz/mi9
/memory/mozalloc/mozalloc.cpp
rm -f libmozalloc.so
/usr/bin/python2.7 /home/njn/moz/mi9/config/pythonpath.py -I../../config /home/njn/moz/mi9/config/expan
dlibs_exec.py --uselist --  c++  -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -
Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-ma
cros -Werror=return-type -pedantic -Wno-long-long -fno-strict-aliasing -std=gnu++0x -pthread -ffunction
-sections -fdata-sections -pipe  -DDEBUG -D_DEBUG -DTRACING -g -fno-omit-frame-pointer -O -fPIC -shared
 -Wl,-z,defs -Wl,--gc-sections -Wl,-h,libmozalloc.so -o libmozalloc.so  mozalloc.o mozalloc_abort.o moz
alloc_oom.o    -lpthread  -Wl,--icf=safe   -Wl,-rpath-link,/home/njn/moz/mi9/d64/dist/bin -Wl,-rpath-li
nk,/usr/local/lib    -ldl
/usr/bin/ld.gold.real: /home/njn/moz/mi9/d64/memory/mozalloc/mozalloc.o: in function moz_malloc_usable_
size_in_advance:/home/njn/moz/mi9/memory/mozalloc/mozalloc.cpp:282: error: undefined reference to 'je_m
alloc_usable_size_in_advance'
collect2: ld returned 1 exit status


At this point, I'm really tempted to put je_malloc_usable_size_in_advance back into mozStorageService.cpp.
You need to define je_malloc_usable_size_in_advance weak as jemalloc_stats in xpcom/base/nsMemoryReporterManager.cpp. It would probably be better to start doing that in jemalloc.h, though.

Note that jemalloc-standalone-linkage-version-script will be going away soon.
Depends on: 680440
Comment on attachment 555914 [details] [diff] [review]
patch, v3

Review of attachment 555914 [details] [diff] [review]:
-----------------------------------------------------------------

r=sdwilsh with comments addressed.

::: memory/jemalloc/jemalloc.c
@@ +6216,5 @@
> +}
> +
> +/* This was added for use by SQLite. */
> +size_t
> +je_malloc_usable_size_in_advance(size_t n)

I feel that this method is critically important to be correct, but I see zero tests added to make sure this behaves correctly.  I know writing native code tests sucks a bit, but can you please add some here?  I don't need to review that (but you should get someone else to do so).

::: storage/src/mozStorageService.cpp
@@ +391,5 @@
> +  &sqliteMemSize,
> +  &sqliteMemRoundup,
> +  &sqliteMemInit,
> +  &sqliteMemShutdown,
> +  nsnull

nit: we always use NULL when talking with SQLite.
Attachment #555914 - Flags: review?(sdwilsh) → review+
Also forgot to mention: global-nit: all global methods should be prefixed with `::` to indicate they are in the global namespace (`::moz_malloc`, for instance).
(In reply to Shawn Wilsher :sdwilsh from comment #33)
> > +/* This was added for use by SQLite. */
> > +size_t
> > +je_malloc_usable_size_in_advance(size_t n)
> 
> I feel that this method is critically important to be correct

It's not.  My patch v2 had a big explanation of what happened if it either over- or under-estimated the size, and the conclusion is that at worst some space could be wasted.  (I removed that comment when I moved the function into jemalloc because it didn't make sense to have it within jemalloc.)

But I can add some tests.
I think you might want to use jemalloc's constants here instead of hardcoding the values.  Some constants have different values on different platforms.  For example,

 #ifdef MOZ_MEMORY_WINDOWS
 #define        TINY_MIN_2POW           1
 #else
 #define TINY_MIN_2POW           (sizeof(void*) == 8 ? 3 : 2)
 #endif

2^TINY_MIN_2POW is the minimum allocation size.
Whiteboard: [MemShrink:P2]
Attached patch patch, v4Splinter Review
This patch adds a C++ unit test for je_malloc_usable_size_in_advance().  It also rewrites that function to avoid magic numbers.  jlebar, I just need you to review those changes, the rest has been r+'d by khuey and sdwilsh.  Thanks!
Attachment #555914 - Attachment is obsolete: true
Attachment #556442 - Attachment is obsolete: true
Attachment #571547 - Flags: review?(justin.lebar+bug)
The problem in comment 27 has gone away, I'm guessing due to recent changes in how jemalloc is built on Linux.

The build problem on Windows opt builds from comment 29 is still present.  (It's not a problem on Windows debug, sigh.)  Here's the full command and output:


d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/try-w32/build/config/pythonpath.py -I../../config /e/builds/moz2_slave/try-w32/build/config/expandlibs_exec.py --uselist -- link -NOLOGO -DLL -OUT:xul.dll -PDB:xul.pdb -SUBSYSTEM:WINDOWS  dlldeps-xul.obj nsStaticXULComponents.obj nsDllMain.obj nsGFXDeps.obj dlldeps-zlib.obj nsUnicharUtils.obj nsBidiUtils.obj nsRDFResource.obj   ./module.res -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH  -DEBUG -DEBUGTYPE:CV -DEBUG -OPT:REF   -LIBPATH:../../dist/lib -NODEFAULTLIB:msvcrt -NODEFAULTLIB:msvcrtd -NODEFAULTLIB:msvcprt -NODEFAULTLIB:msvcprtd -DEFAULTLIB:mozcrt  ../../toolkit/xre/xulapp_s.lib  ../../staticlib/components/necko.lib ../../staticlib/components/uconv.lib ../../staticlib/components/i18n.lib ../../staticlib/components/chardet.lib ../../staticlib/components/jar50.lib ../../staticlib/components/startupcache.lib ../../staticlib/components/pref.lib ../../staticlib/components/htmlpars.lib ../../staticlib/components/imglib2.lib ../../staticlib/components/gkgfx.lib ../../staticlib/components/gklayout.lib ../../staticlib/components/docshell.lib ../../staticlib/components/embedcomponents.lib ../../staticlib/components/webbrwsr.lib ../../staticlib/components/nsappshell.lib ../../staticlib/components/txmgr.lib ../../staticlib/components/commandlines.lib ../../staticlib/components/toolkitcomps.lib ../../staticlib/components/pipboot.lib ../../staticlib/components/pipnss.lib ../../staticlib/components/appcomps.lib ../../staticlib/components/jsreflect.lib ../../staticlib/components/composer.lib ../../staticlib/components/jetpack_s.lib ../../staticlib/components/telemetry.lib ../../staticlib/components/jsdebugger.lib ../../staticlib/components/storagecomps.lib ../../staticlib/components/jsctypes.lib ../../staticlib/components/jsperf.lib ../../staticlib/components/gkplugin.lib ../../staticlib/components/windowsproxy.lib ../../staticlib/components/jsd.lib ../../staticlib/components/autoconfig.lib ../../staticlib/components/auth.lib ../../staticlib/components/cookie.lib ../../staticlib/components/permissions.lib ../../staticlib/components/universalchardet.lib ../../staticlib/components/rdf.lib ../../staticlib/components/windowds.lib ../../staticlib/components/places.lib ../../staticlib/components/tkautocomplete.lib ../../staticlib/components/satchel.lib ../../staticlib/components/pippki.lib ../../staticlib/components/imgicon.lib ../../staticlib/components/gkwidget.lib ../../staticlib/components/accessibility.lib ../../staticlib/components/spellchecker.lib ../../staticlib/components/zipwriter.lib ../../staticlib/components/services-crypto.lib ../../staticlib/jsipc_s.lib ../../staticlib/domipc_s.lib ../../staticlib/domplugins_s.lib ../../staticlib/mozipc_s.lib ../../staticlib/mozipdlgen_s.lib ../../staticlib/ipcshell_s.lib ../../staticlib/gfx2d.lib ../../staticlib/gfxipc_s.lib ../../staticlib/hal_s.lib ../../staticlib/xpcom_core.lib ../../staticlib/ucvutil_s.lib ../../staticlib/chromium_s.lib ../../staticlib/mozreg_s.lib ../../staticlib/thebes.lib ../../staticlib/ycbcr.lib ../../staticlib/angle.lib   ../../media/libjpeg/jpeg3250.lib ../../media/libpng/png.lib ../../gfx/qcms/mozqcms.lib e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/mozjs.lib e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/crmf.lib         e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/smime3.lib         e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/ssl3.lib         e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/nss3.lib         e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/nssutil3.lib ../../gfx/cairo/cairo/src/mozcairo.lib  ../../gfx/cairo/libpixman/src/mozlibpixman.lib ../../gfx/harfbuzz/src/mozharfbuzz.lib ../../gfx/ots/src/mozots.lib  ../../dist/lib/mozsqlite3.lib  ../../modules/zlib/src/mozz.lib   e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/nspr4.lib e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/plc4.lib e:/builds/moz2_slave/try-w32/build/obj-firefox/dist/lib/plds4.lib  ../../dist/lib/mozalloc.lib kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib shell32.lib ole32.lib uuid.lib version.lib winspool.lib comdlg32.lib imm32.lib winmm.lib wsock32.lib msimg32.lib shlwapi.lib psapi.lib ws2_32.lib dbghelp.lib wininet.lib  usp10.lib oleaut32.lib   
   Creating library xul.lib and object xul.exp
mozStorageService.obj : error LNK2019: unresolved external symbol _je_malloc_usable_size_in_advance referenced in function "int __cdecl mozilla::storage::`anonymous namespace'::sqliteMemRoundup(int)" (?sqliteMemRoundup@?A0x84a2b40d@storage@mozilla@@YAHH@Z)
xul.dll : fatal error LNK1120: 1 unresolved externals
You need to add the function to memory/mozutils/mozutils.def.in
> + (arena_malloc_small() can be inaccurate because
> +  it's already chosen the bin by the time this fix-up
> +  occurs.

Nit: Close paren.

> +  size_t sizes[] = {

I might take a more brute-force approach here.  These specific numbers are tuned for jemalloc's current settings, so won't necessarily catch bugs if, for example, we halved the quantum size in jemalloc.

You could instead do something like check every size from 0 to 32 * 1024 (something much larger than a page), then check page boundary and page boundary +/- 1 byte for everything up to 32mb.

> +  size_t n = sizeof(sizes) / sizeof(sizes[0]);

Nit: size_t n = PR_ARRAY_SIZE(sizes);
Attachment #571547 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #40)
> Nit: size_t n = PR_ARRAY_SIZE(sizes);

Shouldn't we use ArrayLength() instead of the old macros? (btw, PR_ARRAY_SIZE is only used in nspr)
Heh,

mfbt/Util.h:

template<typename T, size_t N>
size_t
ArrayLength(T (&arr)[N])
{
    return N;
}

Sure, sounds good.
Just to summarize the benefits of this patch: 

- SQlite previously under-reported its memory usage by ~1.10--1.20x.  It now reports it accurately.

- The previously unreported memory was almost entirely slop bytes.  This patch reduces 10--25% of those slop bytes.  

See bug 676189 comment 10 and bug 676189 comment 15 for more details and numbers.
https://hg.mozilla.org/mozilla-central/rev/bc4375104b6f
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Crashes on x86_64-pc-linux-gnu:

#0  0x0000000000000000 in ?? ()

#1  0x00007ffff5d4f956 in mozilla::storage::(anonymous namespace)::sqliteMemRoundup (n=<optimized out>)
    at /var/tmp/mozilla-central/storage/src/mozStorageService.cpp:370
#2  0x00007ffff721e360 in mallocWithAlarm (n=64, pp=0x7fffffff8e68) at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:18270
#3  0x00007ffff721e42f in sqlite3Malloc (n=64) at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:18314
#4  0x00007ffff721e489 in sqlite3MallocZero (n=64) at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:18578
#5  0x00007ffff721e4e6 in pthreadMutexAlloc (iType=<optimized out>) at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:17460
#6  0x00007ffff721377b in sqlite3MutexAlloc (id=<optimized out>) at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:16774
#7  0x00007ffff7228675 in sqlite3_initialize () at /var/tmp/mozilla-central/db/sqlite3/src/sqlite3.c:108071
#8  0x00007ffff5d517ca in mozilla::storage::Service::initialize (this=0x7fffe8f54d60)
    at /var/tmp/mozilla-central/storage/src/mozStorageService.cpp:418
#9  0x00007ffff5d51bb5 in mozilla::storage::Service::getSingleton () at /var/tmp/mozilla-central/storage/src/mozStorageService.cpp:244
#10 0x00007ffff5d4f2cc in mozilla::storage::ServiceConstructor (aOuter=<optimized out>, aIID=..., aResult=0x7fffffff91b0)
   from /var/tmp/mozilla-central/moz-build-dir/dist/bin/libxul.so
#11 0x00007ffff60a02ad in mozilla::GenericFactory::CreateInstance (this=<optimized out>, aOuter=<optimized out>, aIID=<optimized out>, 
    aResult=<optimized out>) at /var/tmp/mozilla-central/moz-build-dir/xpcom/build/GenericFactory.cpp:48
#12 0x00007ffff60eec0b in nsComponentManagerImpl::CreateInstance (this=<optimized out>, aClass=..., aDelegate=0x0, aIID=..., aResult=0x7fffffff91b0)
    at /var/tmp/mozilla-central/xpcom/components/nsComponentManager.cpp:1212
#13 0x00007ffff60f127d in nsComponentManagerImpl::GetService (this=0x7ffff031e340, aClass=..., aIID=..., result=0x7fffffff9230)
    at /var/tmp/mozilla-central/xpcom/components/nsComponentManager.cpp:1505
#14 0x00007ffff5acfebb in nsJSCID::GetService (this=0x7fffe8f54d00, _retval=<optimized out>)
    at /var/tmp/mozilla-central/js/xpconnect/src/XPCJSID.cpp:821
#15 0x00007ffff6114b8d in NS_InvokeByIndex_P (that=<optimized out>, methodIndex=<optimized out>, paramCount=<optimized out>, params=<optimized out>)
    at /var/tmp/mozilla-central/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:195
#16 0x00007ffff5af2954 in CallMethodHelper::Invoke (this=0x7fffffff9490) at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:2907
#17 0x00007ffff5af8446 in CallMethodHelper::Call (this=0x7fffffff9490) at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:2207
#18 0x00007ffff5af8cfe in XPCWrappedNative::CallMethod (ccx=..., mode=<optimized out>)
    at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedNative.cpp:2173
#19 0x00007ffff5affe27 in XPC_WN_CallMethod (cx=0x7ffff719fc00, argc=1, vp=<optimized out>)
    at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1554
#20 0x00007ffff631b7ed in js::CallJSNative (cx=0x7ffff719fc00, native=0x7ffff5affb4e <XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)>, 
    args=...) at /var/tmp/mozilla-central/js/src/jscntxtinlines.h:297
#21 0x00007ffff63383a8 in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=js::NO_CONSTRUCT)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:629
#22 0x00007ffff632cddc in js::Interpret (cx=0x7ffff719fc00, entryFrame=0x7fffe9d006f0, interpMode=js::JSINTERP_NORMAL)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:3948
#23 0x00007ffff6337b4a in js::RunScript (cx=0x7ffff719fc00, script=0x7fffe9c3e1a8, fp=0x7fffe9d006f0)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:584
#24 0x00007ffff633845f in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:647
#25 0x00007ffff62d48ec in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0x7ffff719fc00)
   from /var/tmp/mozilla-central/moz-build-dir/dist/bin/libxul.so
#26 js_fun_call (cx=0x7ffff719fc00, argc=0, vp=0x7fffe9d006c8) at /var/tmp/mozilla-central/js/src/jsfun.cpp:1761
#27 0x00007ffff62d4a15 in js_fun_apply (cx=0x7ffff719fc00, argc=1, vp=0x7fffe9d006c8) from /var/tmp/mozilla-central/moz-build-dir/dist/bin/libxul.so
#28 0x00007ffff631b7ed in js::CallJSNative (cx=0x7ffff719fc00, native=0x7ffff62d4943 <js_fun_apply(JSContext*, unsigned int, JS::Value*)>, args=...)
    at /var/tmp/mozilla-central/js/src/jscntxtinlines.h:297
#29 0x00007ffff63383a8 in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=js::NO_CONSTRUCT)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:629
#30 0x00007ffff632cddc in js::Interpret (cx=0x7ffff719fc00, entryFrame=0x7fffe9d00650, interpMode=js::JSINTERP_NORMAL)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:3948
#31 0x00007ffff6337b4a in js::RunScript (cx=0x7ffff719fc00, script=0x7fffe9c2df28, fp=0x7fffe9d00650)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:584
#32 0x00007ffff633845f in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:647
#33 0x00007ffff6338ce0 in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0x7ffff719fc00) at /var/tmp/mozilla-central/js/src/jsinterp.h:148
#34 js::Invoke (cx=0x7ffff719fc00, thisv=..., fval=..., argc=<optimized out>, argv=<optimized out>, rval=0x7fffffffa610)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:679
#35 0x00007ffff6338e3a in js::InvokeGetterOrSetter (cx=0x7ffff719fc00, obj=<optimized out>, fval=..., argc=0, argv=0x0, rval=0x7fffffffa610)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:716
#36 0x00007ffff63522f4 in js::Shape::get (this=<optimized out>, cx=0x7ffff719fc00, receiver=0x7fffe9c43090, obj=<optimized out>, 
    pobj=0x7fffe9c43090, vp=0x7fffffffa610) at /var/tmp/mozilla-central/js/src/jsscopeinlines.h:279
#37 0x00007ffff63528e1 in js_NativeGetInline (cx=0x7ffff719fc00, receiver=0x7fffe9c43090, obj=0x7fffe9c43090, pobj=0x7fffe9c43090, 
    shape=0x7fffe9cb2d00, getHow=<optimized out>, vp=0x7fffffffa610) at /var/tmp/mozilla-central/js/src/jsobj.cpp:5757
#38 0x00007ffff6357b29 in js_GetPropertyHelperInline (cx=0x7ffff719fc00, obj=0x7fffe9c43090, receiver=0x7fffe9c43090, id=..., getHow=1, 
    vp=0x7fffffffa610) at /var/tmp/mozilla-central/js/src/jsobj.cpp:5937
#39 0x00007ffff6357b78 in js_GetPropertyHelper (cx=<optimized out>, obj=<optimized out>, id=<optimized out>, getHow=<optimized out>, 
    vp=<optimized out>) at /var/tmp/mozilla-central/js/src/jsobj.cpp:5946
#40 0x00007ffff632a9ec in js::Interpret (cx=0x7ffff719fc00, entryFrame=0x7fffe9d003b0, interpMode=js::JSINTERP_NORMAL)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:3484
#41 0x00007ffff6337b4a in js::RunScript (cx=0x7ffff719fc00, script=0x7fffe9000d78, fp=0x7fffe9d003b0)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:584
#42 0x00007ffff633845f in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:647
#43 0x00007ffff62d4b76 in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0x7ffff719fc00)
   from /var/tmp/mozilla-central/moz-build-dir/dist/bin/libxul.so
#44 js_fun_apply (cx=0x7ffff719fc00, argc=<optimized out>, vp=0x7fffe9d00368) at /var/tmp/mozilla-central/js/src/jsfun.cpp:1817
#45 0x00007ffff631b7ed in js::CallJSNative (cx=0x7ffff719fc00, native=0x7ffff62d4943 <js_fun_apply(JSContext*, unsigned int, JS::Value*)>, args=...)
    at /var/tmp/mozilla-central/js/src/jscntxtinlines.h:297
#46 0x00007ffff63383a8 in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=js::NO_CONSTRUCT)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:629
#47 0x00007ffff632cddc in js::Interpret (cx=0x7ffff719fc00, entryFrame=0x7fffe9d00228, interpMode=js::JSINTERP_NORMAL)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:3948
#48 0x00007ffff6337b4a in js::RunScript (cx=0x7ffff719fc00, script=0x7fffe9c95d78, fp=0x7fffe9d00228)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:584
#49 0x00007ffff633845f in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:647
#50 0x00007ffff628d964 in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0x7ffff719fc00) at /var/tmp/mozilla-central/js/src/jsinterp.h:148
#51 array_extra (cx=0x7ffff719fc00, mode=FOREACH, args=...) at /var/tmp/mozilla-central/js/src/jsarray.cpp:3427
#52 0x00007ffff628dd8b in array_forEach (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsarray.cpp:3489
#53 0x00007ffff631b7ed in js::CallJSNative (cx=0x7ffff719fc00, native=0x7ffff628dd66 <array_forEach(JSContext*, uintN, JS::Value*)>, args=...)
    at /var/tmp/mozilla-central/js/src/jscntxtinlines.h:297
#54 0x00007ffff63383a8 in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=js::NO_CONSTRUCT)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:629
#55 0x00007ffff632cddc in js::Interpret (cx=0x7ffff719fc00, entryFrame=0x7fffe9d00048, interpMode=js::JSINTERP_NORMAL)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:3948
#56 0x00007ffff6337b4a in js::RunScript (cx=0x7ffff719fc00, script=0x7fffe9cd4790, fp=0x7fffe9d00048)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:584
#57 0x00007ffff633845f in js::InvokeKernel (cx=0x7ffff719fc00, args=..., construct=<optimized out>)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:647
#58 0x00007ffff6338ce0 in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0x7ffff719fc00) at /var/tmp/mozilla-central/js/src/jsinterp.h:148
#59 js::Invoke (cx=0x7ffff719fc00, thisv=..., fval=..., argc=<optimized out>, argv=<optimized out>, rval=0x7fffffffb640)
    at /var/tmp/mozilla-central/js/src/jsinterp.cpp:679
#60 0x00007ffff6267769 in JS_CallFunctionValue (cx=0x7ffff719fc00, obj=<optimized out>, fval=..., argc=3, argv=0x7fffffffb730, rval=0x7fffffffb640)
    at /var/tmp/mozilla-central/js/src/jsapi.cpp:5168
#61 0x00007ffff5aec6a9 in nsXPCWrappedJSClass::CallMethod (this=<optimized out>, wrapper=<optimized out>, methodIndex=<optimized out>, 
    info=0x7fffebf15b08, nativeParams=0x7fffffffbb60) at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedJSClass.cpp:1533
#62 0x00007ffff5ae3aaf in nsXPCWrappedJS::CallMethod (this=0x7fffe915ad00, methodIndex=3, info=0x7fffebf15b08, params=0x7fffffffbb60)
    at /var/tmp/mozilla-central/js/xpconnect/src/XPCWrappedJS.cpp:553
#63 0x00007ffff61159e2 in PrepareAndDispatch (self=0x7fffe9159e40, methodIndex=<optimized out>, args=<optimized out>, gpregs=0x7fffffffbc40, 
    fpregs=0x7fffffffbc70) at /var/tmp/mozilla-central/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:153
#64 0x00007ffff6114ca3 in SharedStub () from /var/tmp/mozilla-central/moz-build-dir/dist/bin/libxul.so
#65 0x00007ffff4d75cdd in nsXREDirProvider::DoStartup (this=0x7fffffffbe50) at /var/tmp/mozilla-central/toolkit/xre/nsXREDirProvider.cpp:741
#66 0x00007ffff4d6f42e in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>)
    at /var/tmp/mozilla-central/toolkit/xre/nsAppRunner.cpp:3423
#67 0x0000000000401f23 in do_main (argv=0x7fffffffe558, argc=1, 
    exePath=0x7fffffffc400 "/var/tmp/mozilla-central/moz-build-dir/dist/bin/libxpcom.so")
    at /var/tmp/mozilla-central/browser/app/nsBrowserApp.cpp:198
#68 main (argc=<optimized out>, argv=<optimized out>) at /var/tmp/mozilla-central/browser/app/nsBrowserApp.cpp:281

(gdb) up
#1  0x00007ffff5d4f956 in mozilla::storage::(anonymous namespace)::sqliteMemRoundup (n=<optimized out>)
    at /var/tmp/mozilla-central/storage/src/mozStorageService.cpp:370
370       n = je_malloc_usable_size_in_advance(n);
I normally use:
export LDFLAGS="-Wl,-O1,--hash-style=gnu,--as-needed,--no-keep-memory,--gc-sections,--icf=all,--icf-iterations=3"
When I delete --gc-section from the LDFLAGS above, Firefox no longer crashes.
AIUI --gc-sections tells ld to strip out functions in a library which aren't used.  je_malloc_usable_size isn't used within jemalloc (currently), and we build jemalloc as a separate static library, so this could explain it.
Thanks for the diagnosis, Justin.  If --gc-sections is a non-standard build option, is it reasonable to ignore this crash?
I don't think this is a show-stopper, no.  I'm a bit concerned about Linux distros which might build with this flag.  I don't know if it's common.  Bug 699395 should fix this problem, but we'd have to land it before we branch...

(I don't understand why this is only a problem for je_malloc_usable_size_in_advance.  The public malloc() function also isn't called from within jemalloc, but it's clearly not GC'ed, even though it's declared in the same way as je_malloc_usable_size.  Maybe it's called from some other function in the library which is explicitly exported?  Or maybe the fact that jemalloc's malloc() is replacing the system function means that it's not GC'ed?)

Glandium understands this stuff much better, so I'm curious if he has anything to add.
Note that we do add --gc-sections in configure when ld doesn't fuck up debugging symbols with it. So in practice, it should always be there. Does ld have a different behaviour when putting it twice on the same command line?

Also note that --gc-sections doesn't GC things that are exported.

What I can think of, though, is that maybe the firefox binary wasn't rebuilt, though that should have been fixed by bug 695989. Octoploid, did you make -f client.mk or did you selectively rebuild stuff?
(In reply to Mike Hommey [:glandium] from comment #51)
> Note that we do add --gc-sections in configure when ld doesn't **** up
> debugging symbols with it. So in practice, it should always be there. Does
> ld have a different behaviour when putting it twice on the same command line?
> 
> Also note that --gc-sections doesn't GC things that are exported.
> 
> What I can think of, though, is that maybe the firefox binary wasn't
> rebuilt, though that should have been fixed by bug 695989. Octoploid, did
> you make -f client.mk or did you selectively rebuild stuff?

I always run "make -f client.mk" and firefox-bin is rebuild properly. 
It just doesn't contain the symbol:

library % nm -D libxul.so |grep je_malloc_usable_size_in_advance
                 w je_malloc_usable_size_in_advance

bin % nm -D firefox |grep je_malloc_usable_size_in_advance
bin %
Using --gc-sections when linking firefox is the problem:

 % /usr/bin/python2.7 /var/tmp/mozilla-central/config/pythonpath.py -I../../config /var/tmp/mozilla-central/config/expandlibs_exec.py --uselist -- c++ -o firefox -fno-rtti -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -pedantic -Wno-long-long -march=native -ffunction-sections -fdata-sections -fvisibility-inlines-hidden -fpermissive -Wno-delete-non-virtual-dtor -fno-exceptions -fno-strict-aliasing -std=gnu++0x -pthread -pipe -DNDEBUG -DTRIMMED -O0 -fomit-frame-pointer nsBrowserApp.o -lpthread -Wl,-O1,--hash-style=gnu,--as-needed,--gc-sections,--print-gc-sections,--no-keep-memory,--icf=all,--icf-iterations=3 -Wl,--icf=safe -Wl,-rpath-link,/var/tmp/mozilla-central/moz-build-dir/dist/bin -Wl,-rpath-link,/usr/lib -Wl,--whole-archive /var/tmp/mozilla-central/moz-build-dir/dist/lib/libmozutils.a -Wl,--no-whole-archive -rdynamic -L../../dist/bin -L../../dist/lib /var/tmp/mozilla-central/moz-build-dir/dist/lib/libxpcomglue.a -ldl 2>&1 | grep je_malloc_usable_size_in_advance
        
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.0/../../../../x86_64-pc-linux-gnu/bin/ld: removing unused section from '.text.je_malloc_usable_size_in_advance' in file '../../dist/lib/libmozutils.a(jemalloc.o)'
When I add #include "jemalloc.h" to browser/app/nsBrowserApp.cpp
and a dummy je_malloc_usable_size_in_advance(4); to main() 
the problem also goes away.
Ah, now that i think of it, we only add --gc-sections for libraries.
Note that I'm inclined to think your problem is a bug in the linker. We're linking with -rdynamic, and we explicitely export this symbol, the linker shouldn't consider it unused.
Consider:
 % cat check.c
__attribute__((visibility("default"))) void fun () {}
int main () {  }

 % gcc -ffunction-sections -rdynamic -Wl,--gc-sections,--print-gc-sections check.c 2>&1 | grep .text.fun
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.0/../../../../x86_64-pc-linux-gnu/bin/ld: removing unused section from '.text.fun' in file '/tmp/cc22IotY.o'

 % gcc -rdynamic -Wl,--gc-sections,--print-gc-sections check.c 2>&1 | grep .text.fun
 %    

So, maybe jemalloc should be build without the "-ffunction-sections" flag?
What I'm saying is that this behaviour should be IMHO considered a bug in the linker. Just don't set --gc-sections yourself. Our build system does it where it makes sense anyways.
(In reply to Mike Hommey [:glandium] from comment #56)
> Note that I'm inclined to think your problem is a bug in the linker. We're
> linking with -rdynamic, and we explicitely export this symbol, the linker
> shouldn't consider it unused.

It turned out that you're right. The bug was already fixed in gold recently.
See: http://thread.gmane.org/gmane.comp.gnu.binutils/55376
Hi, we use XULRunner to embed Gecko in SWT's Browser widget, and see this same problem with XULRunner 10.0.2 on Linux (downloaded from http://ftp.mozilla.org/pub/mozilla.org/xulrunner/releases/ ).  Navigating to any url crashes as a result of hitting sqliteMemRoundup(), which attempts to call the not-included je_malloc_usable_size_in_advance() function.

The previous comments indicate that a linker bug is to blame for this situation.  Regardless, unless I'm missing something, it seems like the XULRunner 10 Linux binary that's downloadable from mozilla.org is now useless for embedding.  I'm not speaking from the perspective of a Linux distro that can apply the fix to ld and re-build a fixed XULRunner, but as an embedder whose users are expected to ship a XULRunner binary downloaded from mozilla.org.  Is there any possibility of this situation being fixed somehow (and if so, hopefully in 10.0.x)?
(In reply to Grant Gayed from comment #61)
> Hi, we use XULRunner to embed Gecko in SWT's Browser widget, and see this
> same problem with XULRunner 10.0.2 on Linux (downloaded from
> http://ftp.mozilla.org/pub/mozilla.org/xulrunner/releases/ ).  Navigating to
> any url crashes as a result of hitting sqliteMemRoundup(), which attempts to
> call the not-included je_malloc_usable_size_in_advance() function.

See bug 720682.
Depends on: 720682
You need to log in before you can comment on or make changes to this bug.