Last Comment Bug 696162 - Fix jsgcchunk's AllocGCChunk to be more efficient and to avoid potential problems on Mac 10.7
: Fix jsgcchunk's AllocGCChunk to be more efficient and to avoid potential prob...
Status: RESOLVED FIXED
[qa-]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla12
Assigned To: Justin Lebar (not reading bugmail)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 694896
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-20 12:00 PDT by Justin Lebar (not reading bugmail)
Modified: 2012-03-29 16:49 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
WIP v1 (7.46 KB, patch)
2011-11-08 21:09 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Untested WIP v2 (9.84 KB, patch)
2011-11-09 09:05 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
WIP v3 (9.79 KB, patch)
2011-11-09 14:24 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1 (10.80 KB, patch)
2011-11-22 13:56 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v2 (8.39 KB, patch)
2012-01-23 19:45 PST, Justin Lebar (not reading bugmail)
igor: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2011-10-20 12:00:57 PDT
In bug 694896, we're fixing jemalloc's chunk allocator so that it works on 10.7.

The js engine uses basically the same code to allocate chunks, so whatever fixes we take there are likely to apply to js as well.  These fixes only apply to Linux and Mac; they don't apply to Windows.

It's not clear whether the hang in bug 694896 applies to the js engine, since js uses vm_allocate, while jemalloc uses mmap.  But unless the changes slow jemalloc down, we should probably make them in js too, to try to avoid the hang and to try to speed up chunk allocation.
Comment 1 Justin Lebar (not reading bugmail) 2011-10-20 12:16:27 PDT
I'm curious if you guys know why we use vm_allocate rather than mmap on mac.  If it's better, we could switch jemalloc to vm_allocate.
Comment 2 Justin Lebar (not reading bugmail) 2011-11-08 21:09:18 PST
Created attachment 573075 [details] [diff] [review]
WIP v1
Comment 3 Justin Lebar (not reading bugmail) 2011-11-08 21:12:30 PST
https://tbpl.mozilla.org/?tree=Try&rev=7c557912adf0
Comment 4 Justin Lebar (not reading bugmail) 2011-11-09 07:30:42 PST
I was hoping to see some motion on our benchmarks, but I didn't.  Anyway we should probably still do this, because it avoids the potential infinite loop in AllocChunk on Linux and Mac (recall that the equivalent loop in jemalloc was causing us to spin on 10.7).
Comment 5 Igor Bukanov 2011-11-09 07:44:45 PST
(In reply to Justin Lebar [:jlebar] from comment #4)
> I was hoping to see some motion on our benchmarks, but I didn't.  Anyway we
> should probably still do this, because it avoids the potential infinite loop
> in AllocChunk on Linux and Mac (recall that the equivalent loop in jemalloc
> was causing us to spin on 10.7).

If the benchmark will show a win in the bug 697970 the GC would not need to use aligned chunks and could use a straightforward mmap calls without alignment loops.
Comment 6 Justin Lebar (not reading bugmail) 2011-11-09 07:50:07 PST
Pushed a new patch to try which implements a Windows optimization, per Terrence's suggestion in another bug:

On Windows (previously, everywhere), we do the following:

 1. map 2mb.
 2. unmap the whole 2mb.
 3. map 1mb inside where the 2mb was, aligned to a chunk boundary.
 4. If (3) didn't succeed, presumably because another thread jumped in between (2) and (3) and mapped into the space we were planning to use, then start over from (1).

Terrence's insight was that in step (1), we don't need to commit any memory, since we're going to immediately unmap it anyway.  We'll see if this has any effect on benchmark scores, but I doubt it will.

It may also be worth experimenting with an optimistic/pessimistic scheme on Windows similar to what this patch implements for *nix.  If we're feeling optimistic (because optimistic was the right choice last time), allocate a 1mb chunk.  If it's aligned, great, use it.  If not, fall back to the slow algorithm above.

https://tbpl.mozilla.org/?tree=Try&rev=ceb3d91fec2d
Comment 7 Justin Lebar (not reading bugmail) 2011-11-09 09:05:37 PST
Created attachment 573221 [details] [diff] [review]
Untested WIP v2
Comment 8 Mozilla RelEng Bot 2011-11-09 09:30:30 PST
Try run for 038665fdaad7 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=038665fdaad7
Results (out of 2 total builds):
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-038665fdaad7
Comment 9 Terrence Cole [:terrence] 2011-11-09 09:46:50 PST
Comment on attachment 573221 [details] [diff] [review]
Untested WIP v2

Not a review, but a few quick comments to tag some bugs I noticed that might impact the test results.

>- * vim: set ts=4 sw=4 et tw=99 ft=cpp:
>+ * vim: set ts=4 sw=4 sts=4 et tw=70 ft=cpp:

Was this intentional?

>+/* Last time we tried to allocate a chunk, did mmap immediately return
>+ * a pointer with the correct alignment? */
>+static bool sLastMmapWasUnaligned = false;

If you are going to use a global variable, common to multiple ifdef sections, just put it at the top of the file.  BUT, as it is just use to select a fast path, it's not clear that passing a value between cores is going to be a net win over just doing 3 syscalls in all cases.  We will need to see how this stacks up on a many-threaded allocation micro-benchmark.  Also note that even though reads and writes to bool will be atomic, you still need to ensure that all possible R/W orderings makes sense in the case you use it in.  (I didn't check this.)

>-    p = MapPages(NULL, ChunkSize);
>+
>+    if (ChunkAddrToOffset(p) == 0) {
>+        /* Optimistic case worked! */
>+        return p;
>+    }

Update sLastMmapWasUnaligned? (and elsewhere in this function)

>+    sLastMmapWasUnlaligned = true;
>+    UnmapPages(p, ChunkSize);
>+    p = MapPages(FindChunkStart(p), ChunkSize);

I think you want FindChunkStart(p + ChunkSize) here, if you are looking for the next aligned chunk.

>+    if (sLastMmapWasUnlaligned) {

Please at least compile test before posting patches?
Comment 10 Justin Lebar (not reading bugmail) 2011-11-09 10:15:42 PST
> Please at least compile test before posting patches?

It's sometimes useful for me to post work-in-progress patches to Bugzilla so that I can pass the patches around to my various VMs for testing.  I try to be explicit when patches are a work in progress so that people aren't surprised when they don't work, but I'm sorry if I wasn't clear enough in this case.

> >+ * vim: set ts=4 sw=4 sts=4 et tw=70 ft=cpp:
> Was this intentional?

Yes; the file doesn't have width-99 comments.

> If you are going to use a global variable, common to multiple ifdef sections, just put 
> it at the top of the file.  BUT, as it is just use to select a fast path, it's not 
> clear that passing a value between cores is going to be a net win over just doing 3 
> syscalls in all cases.

Note that although this global variable is common to two ifdef sections, it's not common to all of them.  But let's hold off on nits until I actually finish this patch, if you don't mind.

jemalloc uses thread-local storage for this value, but before I ported that here, I wanted to ask about whether locks are held while allocating gc chunks, etc.  It sounds like there's no locking at all, so we should just use TLS here.

> Update sLastMmapWasUnaligned? (and elsewhere in this function)

In the chunk you cited, the code is run only if sLastMmapWasUnaligned is false, so I don't think I need to update the variable.  I'm not sure where else you're thinking sLastMmapWasUnaligned should be updated.

> >+    p = MapPages(FindChunkStart(p), ChunkSize);
> I think you want FindChunkStart(p + ChunkSize) here, if you are looking for the next aligned 
> chunk.

FindChunkStart rounds up, right?

static inline void *
FindChunkStart(void *p)
{
    jsuword addr = reinterpret_cast<jsuword>(p);
    addr = (addr + ChunkMask) & ~ChunkMask;
    return reinterpret_cast<void *>(addr);
}
Comment 11 Terrence Cole [:terrence] 2011-11-09 10:39:00 PST
(In reply to Justin Lebar [:jlebar] from comment #10)
> > Please at least compile test before posting patches?
> 
> It's sometimes useful for me to post work-in-progress patches to Bugzilla so
> that I can pass the patches around to my various VMs for testing.  I try to
> be explicit when patches are a work in progress so that people aren't
> surprised when they don't work, but I'm sorry if I wasn't clear enough in
> this case.

Interesting, I think most people use WIP patches to communicate what they are working on earlier than a full review so that they can get unofficial feedback on their basic direction.  That said, even if you are using it for remote storage, I would still recommend compile testing so that you don't have to manually fix up on all your VM's.  

> > >+ * vim: set ts=4 sw=4 sts=4 et tw=70 ft=cpp:
> > Was this intentional?
> 
> Yes; the file doesn't have width-99 comments.

SM style is <100 chars per line.

> > If you are going to use a global variable, common to multiple ifdef sections, just put 
> > it at the top of the file.  BUT, as it is just use to select a fast path, it's not 
> > clear that passing a value between cores is going to be a net win over just doing 3 
> > syscalls in all cases.
> 
> Note that although this global variable is common to two ifdef sections,
> it's not common to all of them.  But let's hold off on nits until I actually
> finish this patch, if you don't mind.
> 
> jemalloc uses thread-local storage for this value, but before I ported that
> here, I wanted to ask about whether locks are held while allocating gc
> chunks, etc.  It sounds like there's no locking at all, so we should just
> use TLS here.
> 
> > Update sLastMmapWasUnaligned? (and elsewhere in this function)
> 
> In the chunk you cited, the code is run only if sLastMmapWasUnaligned is
> false, so I don't think I need to update the variable.  I'm not sure where
> else you're thinking sLastMmapWasUnaligned should be updated.

Ah yes, you are right.  Generally, booleans should be the positive case, so that making them false is not a dual-negative.  E.g. "aligned" is easier to parse than "not unaligned".
 
> > >+    p = MapPages(FindChunkStart(p), ChunkSize);
> > I think you want FindChunkStart(p + ChunkSize) here, if you are looking for the next aligned 
> > chunk.
> 
> FindChunkStart rounds up, right?

I assumed you had recycled Chunk::fromAddress here, sorry.
 
> static inline void *
> FindChunkStart(void *p)
> {
>     jsuword addr = reinterpret_cast<jsuword>(p);
>     addr = (addr + ChunkMask) & ~ChunkMask;
>     return reinterpret_cast<void *>(addr);
> }
Comment 12 Justin Lebar (not reading bugmail) 2011-11-09 10:50:39 PST
> SM style is <100 chars per line.

Is SM style to have 100-character-long lines in comments?  I don't see that in jsapi.h.

> E.g. "aligned" is easier to parse than "not unaligned".

Agreed; I'll change it.
Comment 13 Andrew McCreight [:mccr8] 2011-11-09 10:53:09 PST
Spidermonkey style says "Comments should fit within 80 columns".  Code is 99 columns.

https://wiki.mozilla.org/JavaScript:SpiderMonkey:C%2B%2B_Coding_Style
Comment 14 Justin Lebar (not reading bugmail) 2011-11-09 11:18:18 PST
Do we have TLS plumbing hiding somewhere in the JS engine?
Comment 15 Mozilla RelEng Bot 2011-11-09 11:20:31 PST
Try run for ceb3d91fec2d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ceb3d91fec2d
Results (out of 62 total builds):
    success: 61
    warnings: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-ceb3d91fec2d
Comment 16 Emanuel Hoogeveen [:ehoogeveen] 2011-11-09 14:05:46 PST
(In reply to Justin Lebar [:jlebar] from comment #6)
> It may also be worth experimenting with an optimistic/pessimistic scheme on
> Windows similar to what this patch implements for *nix.  If we're feeling
> optimistic (because optimistic was the right choice last time), allocate a
> 1mb chunk.  If it's aligned, great, use it.  If not, fall back to the slow
> algorithm above.
Could you instead allocate 2MiB, check if it's aligned, and if it is return two allocated chunks? That would cut out the 'unmap the unaligned 1MiB, map 2MiB' fallback step, but I have no idea if it's possible in the current code. It would also be worse if 2MiB is significantly less likely to be aligned than 1MiB, I guess.
Comment 17 Justin Lebar (not reading bugmail) 2011-11-09 14:08:35 PST
> Could you instead allocate 2MiB, check if it's aligned, and if it is return two allocated chunks? 

This would be easy to do in the chunk allocation code.  I can't speak to whether this would be easy for the JS engine to use.

> It would also be worse if 2MiB is significantly less likely to be aligned than 1MiB, I guess.

I'm not sure what you mean.  But in order to do this, you'd have to be able to return either 1MB or 2MB from the function.
Comment 18 Mozilla RelEng Bot 2011-11-09 14:20:32 PST
Try run for bceea3535d74 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=bceea3535d74
Results (out of 62 total builds):
    success: 58
    warnings: 4
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-bceea3535d74
Comment 19 Justin Lebar (not reading bugmail) 2011-11-09 14:24:02 PST
Created attachment 573323 [details] [diff] [review]
WIP v3

Still need to add thread-local storage.
Comment 20 Emanuel Hoogeveen [:ehoogeveen] 2011-11-09 15:31:34 PST
(In reply to Justin Lebar [:jlebar] from comment #17)
> > It would also be worse if 2MiB is significantly less likely to be aligned than 1MiB, I guess.
> 
> I'm not sure what you mean.
Well, I'm not sure how mapping works on the OS level. But if RAM (address space?) is fragmented such that there's lots of unaligned 2MiB holes, then asking for 1MiB might return an aligned section but asking for 2MiB will never do so.
Comment 21 Justin Lebar (not reading bugmail) 2011-11-09 15:44:20 PST
Ah, I think I understand what you're saying.  You're comparing the following two algorithms?

Algorithm A:
  A.1 Map 1mb.  If that's aligned, return it.  Otherwise, continue to step A.2.
  A.2 Map 2mb.  Unmap it, then try to map 1mb inside.

Algorithm B:
  B.1 Map 2mb.  If it's aligned, return the full 2mb.  Otherwise, continue to step B.2.
  B.2 Unmap it, then try to map 1mb inside.

Now that I write it out, a difficulty with algorithm B is that JS kind of has to treat this as one chunk, not two, since when it unmaps the 2mb chunk, it has to do so all at once.  (Windows requires that there be a 1:1 mapping between map and unmap operations.)
Comment 22 Emanuel Hoogeveen [:ehoogeveen] 2011-11-09 16:45:58 PST
Aah, fair enough then (annoying limitation though).
Comment 23 Mozilla RelEng Bot 2011-11-10 20:40:29 PST
Try run for 37d374ec9db0 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=37d374ec9db0
Results (out of 278 total builds):
    success: 232
    warnings: 44
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-37d374ec9db0
Comment 24 Mozilla RelEng Bot 2011-11-11 14:10:25 PST
Try run for fe46d3bea0cf is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=fe46d3bea0cf
Results (out of 62 total builds):
    success: 33
    warnings: 27
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-fe46d3bea0cf
Comment 25 Mozilla RelEng Bot 2011-11-17 18:30:34 PST
Try run for d736254a4fc0 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=d736254a4fc0
Results (out of 47 total builds):
    exception: 1
    success: 24
    warnings: 20
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-d736254a4fc0
Comment 26 Justin Lebar (not reading bugmail) 2011-11-22 09:10:43 PST
Oh, it's failing on Windows XP because __declspec(thread) doesn't work there with DLLs.  Awesome.

I'll see if the Windows API is any better here.

http://msdn.microsoft.com/en-us/library/9w1sdazb%28v=vs.80%29.aspx (see community content at the bottom.  I also debugged the crash locally.)
Comment 27 Mozilla RelEng Bot 2011-11-22 13:40:30 PST
Try run for 46c8c24deef0 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=46c8c24deef0
Results (out of 62 total builds):
    success: 59
    warnings: 1
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-46c8c24deef0
Comment 28 Justin Lebar (not reading bugmail) 2011-11-22 13:56:36 PST
Created attachment 576249 [details] [diff] [review]
Patch v1
Comment 29 Igor Bukanov 2011-11-23 04:02:55 PST
(In reply to Justin Lebar [:jlebar] from comment #28)
> Created attachment 576249 [details] [diff] [review] [diff] [details] [review]
> Patch v1

I do not see why do you need a thread-local to store the flag if the last chunk was aligned. For example, do you have data that show that if another thread would get aligned chunk, then this would not influence a chance to get an aligned chunk on this thread?
Comment 30 Igor Bukanov 2011-11-23 04:31:01 PST
(In reply to Igor Bukanov from comment #29)
> (In reply to Justin Lebar [:jlebar] from comment #28)
> > Created attachment 576249 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > Patch v1
> 
> I do not see why do you need a thread-local to store the flag if the last
> chunk was aligned. For example, do you have data that show that if another
> thread would get aligned chunk, then this would not influence a chance to
> get an aligned chunk on this thread?

Another observation is that without cooperation with jemalloc and other mmap users in the browser trying to predict the result of the mmap may backfire. Also a partial unmap on Linux/Map may in a longer run harm the kernel page allocator.

So I would prefer to see a patch that always try to allocate first 1-MB chunks at is it was before. Then in another bug we should measure how often that fails and what heuristics can help to improve on that.
Comment 31 Justin Lebar (not reading bugmail) 2011-11-24 09:46:14 PST
> I do not see why do you need a thread-local to store the flag if the last chunk was 
> aligned.

It's not clear to me that tls is necessary here.  I'm just matching the jemalloc code.

The three options for storing this boolean are:

* Global boolean synchronized with locks or atomic ops.  With no evidence, this seems like overkill to me.

* Global boolean without locks.  This seems like a reasonable thing to do, since it's just a hint.  Terrence suggested earlier that he's concerned about the overhead of extra cache coherency traffic of this compared to TLS, but I suspect that, in a situation where we're allocating so many blocks, this would easily be lost in the noise.

* TLS.  Like you suggested, it seems to me that the OS probably serializes mmap calls, so TLS doesn't make a lot of sense here except as a performance optimization.


> Another observation is that without cooperation with jemalloc and other mmap users in the 
> browser trying to predict the result of the mmap may backfire.

Agreed.  Although I'm not sure whether we'd hit these pathological cases in practice.  The case that really matters is when we're in a tight loop allocating lots of blocks, and in that case, the hint should be reasonably accurate.

None of the changes I've made here have shown up in any benchmark, leading me to believe that this code isn't so critical to performance, at least for our common benchmarks.  My main concern is to get rid of the loop on *nix, where it's been known to be a problem.  So if you think we should diverge from jemalloc and get rid of the optimistic/pessimistic prediction code, I'm happy to do that.
Comment 32 Igor Bukanov 2011-11-24 10:56:47 PST
(In reply to Justin Lebar [:jlebar] from comment #31)
> > I do not see why do you need a thread-local to store the flag if the last chunk was 
> > aligned.
> 
> It's not clear to me that tls is necessary here.  I'm just matching the
> jemalloc code.

Lets not copy unclear parts of jemalloc heuristics tuned for malloc, not GC allocation patterns, into the GC. Besides, if really need to match jemalloc, we should call jemalloc chunk allocator code directly when compiled with MOZ_MEMORY.

> * Global boolean without locks.  This seems like a reasonable thing to do,
> since it's just a hint.  Terrence suggested earlier that he's concerned
> about the overhead of extra cache coherency traffic of this compared to TLS,
> but I suspect that, in a situation where we're allocating so many blocks,
> this would easily be lost in the noise.

The idea of that heuristic is to minimize the number of the system calls. Using thread-local or global flag is not about optimizing the access to the flag.  Rather it is about using different heuristics that could save a few rather expensive system calls. Compared with that the cache coherency implications should be noise.

> * TLS.  Like you suggested, it seems to me that the OS probably serializes
> mmap calls, so TLS doesn't make a lot of sense here except as a performance
> optimization.

The chunk allocation pattern for a particular thread is very different from the pattern when looked from the global point of view. I want to emphasis it again that this is not about thread-local optimization.

> So if you think we should diverge
> from jemalloc and get rid of the optimistic/pessimistic prediction code, I'm
> happy to do that.

This is exactly what I suggest. I.e. lets copy fixes for the bugs (like avoiding that loop on mmap-equipped platforms) or copy clear improvements (like not-committing memory on Windows when tiring oversized allocation), but leave unclear optimizations.
Comment 33 Mozilla RelEng Bot 2011-11-28 14:40:24 PST
Try run for a96c51d86b4b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=a96c51d86b4b
Results (out of 249 total builds):
    exception: 4
    success: 222
    warnings: 20
    failure: 3
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-a96c51d86b4b
Comment 34 Igor Bukanov 2012-01-22 04:55:43 PST
Comment on attachment 576249 [details] [diff] [review]
Patch v1

The proposed changes have landed as a part of another bug.
Comment 35 Justin Lebar (not reading bugmail) 2012-01-22 05:25:11 PST
Could you please link me to the bug?  I don't see relevant changes in inbound.
Comment 36 Justin Lebar (not reading bugmail) 2012-01-23 19:39:06 PST
Comment on attachment 576249 [details] [diff] [review]
Patch v1

Re-opening and re-requesting review, because I can't find any evidence that this was fixed.  There's been only one change to jsgcchunk.cpp since October, and it was cosmetic.

Igor, maybe you closed the wrong bug?
Comment 37 Justin Lebar (not reading bugmail) 2012-01-23 19:45:28 PST
Created attachment 590982 [details] [diff] [review]
Patch v2
Comment 38 Justin Lebar (not reading bugmail) 2012-01-23 19:46:47 PST
Patch v2 gets rid of TLS, per comment 32.
Comment 39 Igor Bukanov 2012-01-24 02:47:12 PST
Comment on attachment 590982 [details] [diff] [review]
Patch v2

Review of attachment 590982 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I thought this was landed some time ago. r+ with the comments below addressed.

::: js/src/jsgcchunk.cpp
@@ +318,2 @@
>      void *p;
> +    bool wasAligned;

This variable is no longer used.

@@ +376,5 @@
> +     * method.
> +     *
> +     * Notice that we have to unmap before we remap, due to Windows's
> +     * restriction that there be a 1:1 mapping between VM alloc and dealloc
> +     * operations. */

Nit: style for multi-line comment is always to put opening /* and closing */ on their own lines and stay within 78 char per line in the comment text.

@@ +395,5 @@
> +inline static void *
> +AllocChunkSlow()
> +{
> +    /* Map space for two chunks, then unmap around the result so we're left with
> +     * space for one chunk. */

Nit: fix comment style per above

@@ +397,5 @@
> +{
> +    /* Map space for two chunks, then unmap around the result so we're left with
> +     * space for one chunk. */
> +
> +    char *p = reinterpret_cast<char*>(MapPages(NULL, ChunkSize * 2));

If mapPages returns void *, the  cast here should be static_cast.

@@ +431,5 @@
> +     * we called AllocChunk() on this thread, the fast path would have failed.
> +     * But here in the js engine, we always try the fast path before falling
> +     * back to the slow path, because it's not clear that jemalloc's heuristic
> +     * is helpful to us.
> +     */

Fix comment style

@@ +447,5 @@
> +
> +    /* We allocated a chunk, but not at the correct alignment.  Try to extend
> +     * the tail end of the chunk and then unmap the beginning so that we have an
> +     * aligned chunk.  If that fails, do the slow version of AllocChunk. */
> +

Fix comment style

@@ +457,5 @@
> +        return p;
> +    }
> +
> +    /* Extension failed.  Clean up, then revert to the reliable-but-expensive
> +     * method. */

Fix comment style
Comment 40 Justin Lebar (not reading bugmail) 2012-01-24 07:49:23 PST
Thanks, Igor.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7fa7c10803e
Comment 41 Igor Bukanov 2012-01-24 07:55:51 PST
(In reply to Justin Lebar [:jlebar] from comment #40)
> Thanks, Igor.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/e7fa7c10803e

This still contains  * */ as a comment terminator at https://hg.mozilla.org/integration/mozilla-inbound/rev/e7fa7c10803e#l1.157 . Could you fix land a quick fix for that?
Comment 42 Justin Lebar (not reading bugmail) 2012-01-24 07:58:50 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebf0c9794a34
Comment 43 Matt Brubeck (:mbrubeck) 2012-01-24 09:02:52 PST
Backed out on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd472718b66e

because of a build error on Win64:
https://tbpl.mozilla.org/php/getParsedLog.php?id=8790765&tree=Mozilla-Inbound
e:/builds/moz2_slave/m-in-w64/build/js/src/jsgcchunk.cpp(74) : error C2061: syntax error : identifier 'uint32'
Comment 44 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-24 09:25:19 PST
That should be uint32_t these days.  I'm a bit surprised that built anywhere, actually.  I'd be interested in knowing what header was providing a definition of that type, because nothing should be, and having a definition is a bit dangerous for underlying-type confusion.
Comment 45 Justin Lebar (not reading bugmail) 2012-01-24 11:28:16 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/04b4225269a4, but I failed to change the reinterpret_casts to static_casts.  :-/  I'll make the chance once I confirm that this push stuck.
Comment 46 Justin Lebar (not reading bugmail) 2012-01-24 13:37:23 PST
static_cast followup (hopefully the last followup for this bug!)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a4bc12a0d98
Comment 49 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-29 15:44:25 PDT
Is this something QA can verify?
Comment 50 Terrence Cole [:terrence] 2012-03-29 16:32:13 PDT
No, not really.  This code has been rewritten anyway.
Comment 51 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-29 16:49:21 PDT
qa- as per comment 50 -- we won't be verifying this one.

Note You need to log in before you can comment on or make changes to this bug.