Closed Bug 1005849 Opened 10 years ago Closed 10 years ago

js::gc::MapAlignedPages can miss usable regions in low-memory conditions

Categories

(Core :: JavaScript: GC, defect)

32 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: away, Assigned: ehoogeveen)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(5 files, 10 obsolete files)

5.41 KB, patch
ehoogeveen
: review+
ehoogeveen
: checkin+
Details | Diff | Splinter Review
8.45 KB, patch
terrence
: review+
Details | Diff | Splinter Review
9.16 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
17.03 KB, patch
terrence
: review+
Details | Diff | Splinter Review
12.83 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
Sister bug to bug 1005844. For this one I don't have the crash reports to prove it, but from code inspection it that seems GC can hit the same issue: if there are no 2MB free regions, then allocation fails even if there were hundreds of suitably-aligned 1MB regions.

(Actually it seems even more likely than the jemalloc case. MapAlignedPages goes directly to 2MB carving without trying 1MB first)
Whiteboard: [MemShrink]
Component: JavaScript Engine → JavaScript: GC
In bug 1005844 I added a third pass to the jemalloc allocation logic to deal with this scenario (not yet reviewed by a jemalloc peer). In this bug I'd like to do the same, but first I think it would be good to make the existing logic match jemalloc [1] more closely. This patch does that, though I didn't abstract the page trimming logic as jemalloc does [2].

The only functional change is that with this patch, we first try to allocate a chunk of the desired size directly, and only fall back to the slow path if it isn't already aligned. I don't know what the success rate of this fast path will be - but jemalloc does it, and I can certainly imagine prior aligned allocations leaving aligned chunks in their wake. Terrence, what do you think?

(I checked that this compiles on Windows and Linux, but it hasn't gone through Try yet)

[1] http://dxr.mozilla.org/mozilla-central/source/memory/jemalloc/src/src/chunk_mmap.c#142
[2] http://dxr.mozilla.org/mozilla-central/source/memory/jemalloc/src/src/chunk_mmap.c#85
Assignee: nobody → emanuel.hoogeveen
Status: NEW → ASSIGNED
Attachment #8421777 - Flags: review?(terrence)
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment on attachment 8421777 [details] [diff] [review]
Part 1: Refactor GC allocation logic to match jemalloc3 more closely.

Review of attachment 8421777 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense. r=me

::: js/src/gc/Memory.h
@@ +25,5 @@
>      size_t systemAllocGranularity() { return allocGranularity; }
>  
>      // Allocate or deallocate pages from the system with the given alignment.
>      void *mapAlignedPages(size_t size, size_t alignment);
> +    void *mapAlignedPagesSlow(size_t size, size_t alignment);

Please move the new signature down to the private: section, just below decommitEnabled().
Attachment #8421777 - Flags: review?(terrence) → review+
Thanks for the quick review! Carrying forward r=terrence.

I'll send this through try when that starts to respond again - in particular, I'd like to do a before/after comparison on Talos to see if it moves the needle at all. Assuming no problems turn up, I'd like to land part 1 to see if it affects AWFY (the Windows 8 machine) or AWSY at all.

I don't expect that adding a third pass will affect anything in terms of performance (if it were needed during our tests, we would be OOMing currently), but the jemalloc variant is still waiting on review anyway. I'll probably still post a part 2 here tomorrow, so we'll see how things work out on the timing.

(In reply to Terrence Cole [:terrence] from comment #2)
> Please move the new signature down to the private: section, just below
> decommitEnabled().

Done.
Attachment #8422151 - Flags: review+
I'd be surprised if this affected Talos or AWFY -- if I've understood correctly, this will only have an effect if we're close to running out of virtual memory, in which case it'll keep us alive longer. And AWSY is Linux64-only, so nothing will happen there.
Part 1 doesn't add the new pass yet. js::gc::MapAlignedPages currently jumps straight to allocating a 2MB chunk, then cutting it down to 1MB - part 1 changes this to match jemalloc3, which means it'll try to allocate 1MB first in case the block it gets back is already aligned. So if that works, it'll save a few system calls and potentially fill a 1MB gap - if it doesn't, it wastes a map and unmap. This affects both Windows and Linux.

Something similar goes for bug 1005844: mozjemalloc currently 1) takes a fast path where it just allocates the requested size, then 2) takes a slower path where it deallocates and tries to allocate at the nearest aligned address, then 3) allocates a 2MB chunk, then cuts that down to 1MiB. Part 1 in that bug switches Windows over to the jemalloc3 path, which cuts out step 2. Other OSes are already using that logic.
Attachment #8421777 - Attachment is obsolete: true
Looks nice and green on try: https://tbpl.mozilla.org/?tree=Try&rev=f0d710772af1

As far as Talos goes it looks like a wash: http://compare-talos.mattn.ca/?oldRevs=e708912bf96f&newRev=23a8cf038873&server=graphs.mozilla.org&submit=true

There's still some things marked as significant, but they look pretty random and I think they'd go away with enough retriggers. I'm marking this checkin-needed and leave-open so we can see if it does anything on AWFY while I work on part 2.
This adds the new pass. The idea is that even though there may not be any 2MB chunks left, the allocator may still be able to give us 1MB chunks that are either already aligned, or *can* be aligned by allocating in the nearest aligned location.

On Windows we can't explicitly tell the allocator 'only give me addresses higher/lower than this bound', and even on Linux the address parameter is only a hint, so we temporarily leave unalignable chunks allocated to force the system allocator to give us a new chunk.

This is more straightforward on Windows, where we know it hands out addresses in increasing order and map/unmap calls must be matched. On Linux we have to try both directions, because we don't know if the heap grows up or down. This is a slow path, so maybe it doesn't matter, but it would be nice if we could find out which direction is correct at runtime and only try that one.

In any case, I tested this on try with another patch on top to make us *always* use this path, and it looks green: https://tbpl.mozilla.org/?tree=Try&rev=f53aaf91d124 - however, I haven't tested this in the low-memory situation that it's meant for.

Unlike in bug 1005844, I split out the logic to align chunks into its own function. There's less abstraction around the OS page mapping functions in the GC, so it looks cleaner - but I'm also considering using it in a part 3 (more on that when I post it).
Attachment #8423077 - Flags: review?(terrence)
Attachment #8422151 - Flags: checkin+
The main problem with the slow path of part 3 is that we only use it when we're in such a low memory or high fragmentation situation that the normal slow path fails. Reaching a high fragmentation state eventually may be inevitable, but it would be nice if we could delay it somewhat.

Consider the following situation, where each x represents a 512kB chunk of allocated space, and each | represents a 512kB chunk of unallocated space:
 1    2   3
x|||xx||xx||||||||...

In this case, the OS allocator will likely give us the address of 1 each time, which is unaligned. We fall back to the slow path, get back the address of 3 and move on. 1 and 2 remain unused forever, even though 1 *could* be aligned and 2 is *already* aligned. In this case it would be nice if we made an attempt to align 1 at some point - that would not only give us that bit of address space, but it would also make the fast path succeed thereafter.

This patch tries to do this without slowing down the general case by simply attempting to align a fast path chunk every time the fast path has failed 8 times. This won't help the situation where 1 (in the example above) *can't* be aligned - in that case we'll just have to keep using the slow path until address space starvation forces us into the last ditch path, unless some allocation with a smaller alignment happens to fill the gap for us. With any luck, neither situation comes up very often.

I arbitrarily set tryToAlign to 8, and I'm happy to take suggestions on naming ;) I haven't had time to test this yet.
Attachment #8423101 - Flags: review?(terrence)
Comment on attachment 8423077 [details] [diff] [review]
Part 2: Add a 'last ditch' pass to the GC's allocator to find alignable chunks when all else fails.

Review of attachment 8423077 [details] [diff] [review]:
-----------------------------------------------------------------

r=me. Looks good -- doubly so if it's already passing try. Lots of comments, but mostly style nits.

::: js/src/gc/Memory.cpp
@@ +88,5 @@
>      return p;
>  }
>  
> +void *
> +SystemPageAllocator::mapAlignedPagesLastDitch(size_t size, size_t alignment)

A general comment about the algorithm being used here would be quite nice -- something like the first paragraph from the comment on this patch.

@@ +90,5 @@
>  
> +void *
> +SystemPageAllocator::mapAlignedPagesLastDitch(size_t size, size_t alignment)
> +{
> +    const int max_attempts = 8;

This should be MaxAttempts to match SpiderMonkey style.

@@ +91,5 @@
> +void *
> +SystemPageAllocator::mapAlignedPagesLastDitch(size_t size, size_t alignment)
> +{
> +    const int max_attempts = 8;
> +    void *p, *temp_maps[max_attempts];

Each declaration on its own line, please.
Rename temp_maps => tempMaps to match SM style.

@@ +93,5 @@
> +{
> +    const int max_attempts = 8;
> +    void *p, *temp_maps[max_attempts];
> +
> +    int attempt;

Please initialize attempt to 0 here instead of the for-loop initializer. This will make it clearer that the use below the loop is always valid, because otherwise the reader requires non-local knowledge that MaxAttempts is always > 0. Of course, naming MaxAttempts with an initial capital is a good clue, but no reason to even require that.

@@ +94,5 @@
> +    const int max_attempts = 8;
> +    void *p, *temp_maps[max_attempts];
> +
> +    int attempt;
> +    for (attempt = 0; attempt < max_attempts; attempt++) {

++attempt, rather than attempt++.

In the days before optimizing compilers, attempt++ would unconditionally sink the prior value before incrementing so that it could be assigned later -- with ++attempt, of course, the prior value is unobservable so doesn't need to be preserved. These days it isn't a performance concern anymore, but it's still traditional to use ++i in loops: more likely to match older code and there's no reason to prefer the postfix version.

Also, I find that putting the ++/-- in front is easier to read in general: the operator is smaller and easier to miss than the name, especially if there are lots of trailing _ in the code. Making the operator break the alignment serves to highlight it instead of hiding it.

@@ +104,5 @@
> +        if (p)
> +            break;
> +        /* Failure here indicates a race with another thread, so try again. */
> +        if (!VirtualAlloc(temp_maps[attempt], size, MEM_RESERVE, PAGE_READWRITE))
> +            attempt--;

--attempt;

@@ +347,5 @@
>  
> +void *
> +SystemPageAllocator::mapAlignedPagesLastDitch(size_t size, size_t alignment)
> +{
> +    const int max_attempts = 8;

The same comments I made for the windows variant apply for this function as well.

@@ +351,5 @@
> +    const int max_attempts = 8;
> +    void *p, *temp_maps[max_attempts];
> +
> +    int prot = PROT_READ | PROT_WRITE;
> +    int flags = MAP_PRIVATE | MAP_ANON;

It looks like prot and flags are only used once, so you should eliminate them by just passing the constants to MapMemory directly.

@@ +375,5 @@
> +void *
> +SystemPageAllocator::tryAlignChunk(void *unaligned, size_t size, size_t alignment)
> +{
> +    int prot = PROT_READ | PROT_WRITE;
> +    int flags = MAP_PRIVATE | MAP_ANON;

Here having these split out makes more sense, but you should make them const int and make the names Prot and Flags.

@@ +380,5 @@
> +    size_t offset = uintptr_t(unaligned) % alignment;
> +
> +    /* Try to make an aligned chunk by allocating after the end. */
> +    void *desired = (void *)(uintptr_t(unaligned) + size);
> +    void *result = mmap(desired, alignment - offset, prot, flags, -1, 0);

These mmap calls should be using the MapMemory function so we don't break ia64 again.

@@ +392,5 @@
> +        unmapPages(result, alignment - offset);
> +
> +    /* Try to make an aligned chunk by allocating before the start. */
> +    desired = (void *)(uintptr_t(unaligned) - offset);
> +    result = mmap(desired, offset, prot, flags, -1, 0);

Ditto.
Attachment #8423077 - Flags: review?(terrence) → review+
Comment on attachment 8423101 [details] [diff] [review]
Part 3: Try to align allocated chunk when fast path has failed 8 times.

Review of attachment 8423101 [details] [diff] [review]:
-----------------------------------------------------------------

This is a neat idea! However, before we settle on "8" attempts, let's collect some numbers. Please add counters to SystemPageAllocator for each allocation path and bump them each time one of them succeeds. Then in ~SystemPageAllocator printf them. |mach build binaries| should be quite fast for changes here, so trying a few different numbers on octane (http://octane-benchmark.googlecode.com/svn/latest/index.html) shouldn't take too long. In particular, does dropping it to "every time we miss the fast path" increase our fast path hit rate in the long term? Some attempts on gregor-wagner.com/tmp/mem would also not be amiss, but those take longer to run and will be harder to interpret, so use your discretion.

::: js/src/gc/Memory.h
@@ +61,5 @@
>      size_t              allocGranularity;
> +
> +    // Whenever the allocation fast path has failed tryToAlign times in a row,
> +    // we want to try and align the chunk we got back.
> +    size_t              tryToAlignCounter;

fastPathMissCount

@@ +62,5 @@
> +
> +    // Whenever the allocation fast path has failed tryToAlign times in a row,
> +    // we want to try and align the chunk we got back.
> +    size_t              tryToAlignCounter;
> +    static const size_t tryToAlign = 8;

MaxFastPathMisses
Attachment #8423101 - Flags: review?(terrence)
Thanks for the reviews! Some additional notes maybe worth considering:

1) Currently we use MEM_COMMIT on Windows except when we're sure we don't need to. We could switch instead to making a single call with MEM_COMMIT at the end - this seems like a choice between optimizing for the faster cases (we're only making at most a few calls, so passing in MEM_COMMIT avoids having to make another call at the end) and optimizing for the slower cases (we're making a fair few calls anyway, so make those faster on average by only committing at the end).

2) For Linux: Instead of leaving memory temporarily mapped to ensure the next call will give us a new block each time, we could use the first parameter to mmap to request an address above (or below) a specific range. MapMemory already does this for ia64 by requesting an address near 0x0000070000000000 (of course, this bug isn't really relevant for a 64-bit process). However, I'm not sure what 'near' means in this context - the Linux man page just says 'the mapping will be created at a nearby page boundary' [1]. We also currently don't know in what direction mmap hands out addresses, so we don't know whether to ask for a higher or lower address each time.

[1] http://man7.org/linux/man-pages/man2/mmap.2.html
Attached patch Crude counters (obsolete) — Splinter Review
I implemented these really crude, Windows-only counters ... but I'm only seeing mapAlignedPages get called 3 times. Ever. It's the same regardless of whether I just open and close the browser, run MemBench or run octane. I'd think my counters are broken, but sometimes the fast path works twice, sometimes the slow path works twice.. but the total is always 3. Could you tell me what I'm doing wrong?
Flags: needinfo?(terrence)
Comment on attachment 8423511 [details] [diff] [review]
Crude counters

Argh, of course, I'm an idiot. I just had to realize it *after* posting that. There must be multiple SystemPageAllocators, and by calling fopen with "w" of course I'm only logging *one* of them, replacing whatever was there before. I even realized this, and then forgot. I haven't fired up a build with this fixed yet, but let me post this before too many people realize my error :P
Attachment #8423511 - Attachment is obsolete: true
Flags: needinfo?(terrence)
I got the counters working, but this is turning out to be quite difficult to measure. There aren't really all that many calls in total - three consecutive runs of Octane generate around 1k calls in total, and there's a lot of variance. I *think* the ratio of fast path successes to slow path ones is correlated with the amount of successes on the tryAlignChunk path - I was seeing fast path successes outweigh slow path ones about 2:1 when setting MaxFastPathMisses to 0 (just running it every time), and closer to the other way around with MaxFastPathMisses set to 8. But I'd have to get a much larger sample to be sure, and I'm not sure how it drops off.

Considering we don't actually call this function very often (~350 times for a full run of Octane), maybe it's worth it to just call tryAlignChunk every time? It doesn't succeed very often, but when it does it seems to help the fast path a fair bit. I'm not sure with how much confidence I can say that though. Just calling it 50% of the time on a fast path fail may be just as good. Unfortunately I can't really run Octane in a loop for an hour, and the crude logging makes doing a lot of individual runs annoying. I'll fiddle with it some more tomorrow.
I'd still like to see if I can make an informed decision on part 3, but there's no reason to hold this one up. Asking for a quick re-review given the change to MapMemory.

(In reply to Terrence Cole [:terrence] from comment #10)
> A general comment about the algorithm being used here would be quite nice --
> something like the first paragraph from the comment on this patch.

Done. I did duplicate one of the comments between the Windows and Linux versions of the function, but they're far enough apart that I think it makes sense.

> This should be MaxAttempts to match SpiderMonkey style.

> Each declaration on its own line, please.
> Rename temp_maps => tempMaps to match SM style.

Done and done. The names were leftover from the jemalloc-style patch :) I didn't realize that constants should start with a capital though, noted.

> Please initialize attempt to 0 here instead of the for-loop initializer.

Done. I did this initially because I preferred having it all in one place when the variable declaration was at the top of the function - but that's not an issue in C++.

> ++attempt, rather than attempt++.

Done.

> It looks like prot and flags are only used once, so you should eliminate
> them by just passing the constants to MapMemory directly.

I decided to add these flags as default parameters to MapMemory instead, since everything that uses MapMemory currently wants the same ones. Let me know what you think.

> @@ +375,5 @@
> > +void *
> > +SystemPageAllocator::tryAlignChunk(void *unaligned, size_t size, size_t alignment)
> > +{
> > +    int prot = PROT_READ | PROT_WRITE;
> > +    int flags = MAP_PRIVATE | MAP_ANON;
> 
> Here having these split out makes more sense, but you should make them const
> int and make the names Prot and Flags.

Done.

> These mmap calls should be using the MapMemory function so we don't break
> ia64 again.

Unchanged, as discussed on IRC (needs the address parameter of mmap and uses address close to previous result).
Attachment #8423077 - Attachment is obsolete: true
Attachment #8423839 - Flags: review?(terrence)
Hugh Nougher e-mailed me with a great idea on how to align quickly. I need to think about how it'll work with mmap, but it should work nicely for VirtualAlloc. I'm not sure whether to cancel review or do this on top of part 2. It'll shuffle things around a bit.
Comment on attachment 8423839 [details] [diff] [review]
Part 2 v2: Add a 'last ditch' pass to the GC's allocator to find alignable chunks when all else fails.

Cancelling review for now. I implemented the Windows version of Hugh's idea, and it's quite nice, although I lost a bit of the abstraction in the process. I'll think about how to get it back (if it's worth it) and the Linux equivalent next.
Attachment #8423839 - Flags: review?(terrence)
Attached patch Part 2 v3 WIP (obsolete) — Splinter Review
Wew, spent way too long on this. Implementing the Unix version turned out to be a pain in the ass. I had to rewrite pretty much all of the patch and the code got a lot more complicated (and fragile, I fear). This is still WIP since it seems to crash on startup on Linux - I'll look into that after I get some rest. I realize it's weekend now, but if you have a moment to look at the general interface that'd be nice.
Attachment #8423101 - Attachment is obsolete: true
Attachment #8423839 - Attachment is obsolete: true
Attachment #8424291 - Flags: feedback?(terrence)
While working on the new pass I found I was getting confused because of the differences between VirtualAlloc and mmap. In particular, the fact that mmap
(1) Uses its address parameter as only a hint and
(2) Returns MAP_FAILED instead of null on failure
were wholly unhelpful. jemalloc abstracts these details out and after working on the Unix path (which got complicated) I've come to agree with them. This patch does the following:

1) Adds default parameters to MapMemory, allowing callers to avoid having to pass everything if they just want the defaults (as most callers do).
2) Makes MapMemory return null on failure, instead of MAP_FAILED. This is useful in several places where we check for alignment - it turns out that we usually want to take the same path regardless of whether the chunk we got back was already aligned, or if allocation failed. It's also consistent with VirtualAlloc's behavior. Technically it's possible to enable mmap to return null as a valid address - but not without mucking around in your system config, and I really doubt we care.
3) Adds a MapMemoryAt function which takes an address parameter. This function explicitly fails if the address returned wasn't the address you requested. If we ever come up with a scenario where getting an address in the right sort of area is desirable (and just passing null won't do), I think we should just make a new wrapper function for it - MapMemoryAtRoughly or something. Having to check the returned address and unmap if different was just a pain in the ass for my purposes (in an already hairy bit of code).
4) Adds equivalent MapMemory and MapMemoryAt functions for the Windows path. These don't really do much, but I thought it was a bit nicer to have similar interfaces.
5) Converts all callers over to the new functions.
Attachment #8424375 - Flags: review?(terrence)
This adds the new path for both Windows and Unix.

The Unix path deals with not knowing which direction to choose by making both paths equivalent, then using a heuristic to choose between them (it still tries the other one on failure). Trying one direction and failing creates threading issues, and neither path is fool proof in the first place - another allocator could free up some chunk near the beginning of the address space while we're busy and mess up our heuristic. I've done my best to make the new path as fail safe as possible, but the interface still isn't great.

I suggest reading the Windows path first to see how the new path works (I'm not sure I explained it very well) before diving into the Unix shenanigans. The patch builds on both Windows and Linux, and I ran Octane on both without issue, but it definitely needs a run on try (along with at least a Linux-specific stress test of the last ditch pass).
Attachment #8424291 - Attachment is obsolete: true
Attachment #8424291 - Flags: feedback?(terrence)
Attachment #8424376 - Flags: review?(terrence)
Try looks great: https://tbpl.mozilla.org/?tree=Try&rev=e16e0eb7fb26
And so does the stress test of the last ditch pass: https://tbpl.mozilla.org/?tree=Try&rev=62e810b61608

I'm feeling a lot more confident now that I got it right :)
I worked on the interface a bit more today and came up with this. This is a better match for C-style jemalloc, and I think it's less of a footgun besides. Sorry about the churn, but at least it's the weekend ;)
Attachment #8424376 - Attachment is obsolete: true
Attachment #8424376 - Flags: review?(terrence)
Attachment #8424484 - Flags: review?(terrence)
Comment on attachment 8424375 [details] [diff] [review]
Part 2: Some interface changes to make mapping memory less error prone.

Review of attachment 8424375 [details] [diff] [review]:
-----------------------------------------------------------------

Great! r=me

::: js/src/gc/Memory.cpp
@@ +245,5 @@
> +     * used (which isn't usually what you want, as this overrides existing
> +     * mappings), so check that the address we got is the address we wanted.
> +     */
> +    if (region != desired) {
> +        JS_ALWAYS_TRUE(0 == munmap(region, length));

Please read bug 1008613, in particular comment 8 and so that here.

@@ +277,5 @@
>       * If the allocated memory doesn't have its upper 17 bits clear, consider it
>       * as out of memory.
>       */
>      if ((uintptr_t(region) + (length - 1)) & 0xffff800000000000) {
>          JS_ALWAYS_TRUE(0 == munmap(region, length));

Ditto here.
Attachment #8424375 - Flags: review?(terrence) → review+
Thanks for the review! I decided to split this off into its own patch. I changed the assert to |MOZ_ASSERT_IF(munmap(...), errno == ENOMEM)|, and while I was there I changed all the asserts over to their MOZ_ equivalents (as it's my understanding that these are being changed over incrementally, with the JS_ ones just being an alias at this point).
Attachment #8425130 - Flags: review?(terrence)
Comment on attachment 8425130 [details] [diff] [review]
Part 2b: Allow munmap to fail if errno == ENOMEM and switch over to MOZ_ASSERT variants.

Review of attachment 8425130 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/gc/Memory.cpp
@@ +179,5 @@
>  
>  void
>  SystemPageAllocator::unmapPages(void *p, size_t size)
>  {
> +    MOZ_ASSERT_IF(munmap((caddr_t)p, size), errno == ENOMEM);

Let's leave the solaris code alone: it likely has different behavior from linux.

@@ +247,5 @@
>       * used (which isn't usually what you want, as this overrides existing
>       * mappings), so check that the address we got is the address we wanted.
>       */
>      if (region != desired) {
> +        MOZ_ASSERT_IF(munmap(region, length), errno == ENOMEM);

I don't think this does what you want it to: MOZ_ASSERT and MOZ_ASSERT_IF bodies are completely removed in non-DEBUG builds, so opt builds would never release memory. Note, compilers will remove branches to empty bodies, so the straightforward code below will suffice.

if (munmap(p, size)) {
    MOZ_ASSERT(errno == ENOMEM);
}

@@ +279,5 @@
>       * If the allocated memory doesn't have its upper 17 bits clear, consider it
>       * as out of memory.
>       */
>      if ((uintptr_t(region) + (length - 1)) & 0xffff800000000000) {
> +        MOZ_ASSERT_IF(munmap(region, length), errno == ENOMEM);

Ditto.

@@ +340,5 @@
>  
>  void
>  SystemPageAllocator::unmapPages(void *p, size_t size)
>  {
> +    MOZ_ASSERT_IF(munmap(p, size), errno == ENOMEM);

Ditto.

@@ +429,5 @@
>      size_t total_size; // Total allocated size
>  
>      pa_start = (void *)(uintptr_t(p) & ~(page_size - 1));
>      total_size = ((uintptr_t(p) + length) & ~(page_size - 1)) + page_size - uintptr_t(pa_start);
> +    MOZ_ASSERT_IF(munmap(pa_start, total_size), errno == ENOMEM);

Ditto.
Attachment #8425130 - Flags: review?(terrence) → review+
Thanks! Carrying forward r=terrence.

(In reply to Terrence Cole [:terrence] from comment #27)
> > +    MOZ_ASSERT_IF(munmap((caddr_t)p, size), errno == ENOMEM);
> 
> Let's leave the solaris code alone: it likely has different behavior from
> linux.

Okay, reverted (also removed the include).

> > +        MOZ_ASSERT_IF(munmap(region, length), errno == ENOMEM);
> 
> I don't think this does what you want it to: [...]

Whoops, that's a silly mistake. Fixed.
Attachment #8425130 - Attachment is obsolete: true
Attachment #8425636 - Flags: review+
Comment on attachment 8424484 [details] [diff] [review]
Part 3 v2: Add a 'last ditch' pass to the GC's allocator to find alignable chunks when all else fails.

Review of attachment 8424484 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, I think I follow all of that and I think I actually understand most of it. With this much complex code though, we need tests. Please add a jsapi-test that interleave calls to these functions with manual mmap/VirtualAlloc calls such that each of the paths here will get exercised on TBPL.

::: js/src/gc/Memory.cpp
@@ +55,5 @@
>      /* Special case: If we want allocation alignment, no further work is needed. */
>      if (alignment == allocGranularity)
>          return p;
>  
> +    if (uintptr_t(p) % alignment) {

Please add a static inline IsAligned(void *p, uintptr_t alignment) to the top of this file and use it everywhere we are currently open-coding the modulus.

Secondly, we can trivially restructure this to remove a level of indentation. The is-aligned case is the fast path, so make this branch into |if (IsAligned(p, alignment)) return p;| and continue with the fallback after the early return.
Attachment #8424484 - Flags: review?(terrence)
(In reply to Terrence Cole [:terrence] from comment #29)
> Okay, I think I follow all of that and I think I actually understand most of
> it.

Great! :) I'll start writing some tests.

> Please add a static inline IsAligned(void *p, uintptr_t alignment) to the
> top of this file and use it everywhere we are currently open-coding the
> modulus.

I went with OffsetFromAligned, since we need the actual offset in a number of places. IsUnaligned would also work (IsAligned would have to invert the condition, so it couldn't return an integer), but looked a bit strange to me.

> Secondly, we can trivially restructure this to remove a level of
> indentation. The is-aligned case is the fast path, so make this branch into
> |if (IsAligned(p, alignment)) return p;| and continue with the fallback
> after the early return.

Good point, I've applied this locally.

I'll upload a new patch after writing some tests.
Updated to fix your comments. My tests (coming up) also uncovered a couple of issues:

1) The direction wasn't being flipped correctly on the first attempt - it was flipping the direction, then trying the opposite of the -new- direction, effectively trying the same direction twice.

I also made it not return an aligned chunk when the direction was wrong if the direction just flipped. I saw this showing up fairly frequently in cppunit on Try after adding some logging on try, and each time, trying the other direction (instead of just returning the aligned chunk we found) succeeded. This should be an edge case in practice, only affecting the first few allocations, so I think it's probably fine either way.

2) The slow path on Linux was overallocating by too much, and then aligning down instead of up - I changed this to be more like the Windows code and take the growth direction into account. Thinking about it now, that latter part might not be necessary - there should only ever be one aligned region in there now. But maybe being explicit about taking the direction into account isn't a bad thing - I don't think it makes the code much worse. Your call.
Attachment #8424484 - Attachment is obsolete: true
Attachment #8427970 - Flags: review?(terrence)
This adds a new JSAPI test, testGCAllocator.cpp, that tests all the behaviors that the allocation logic was designed for. It sets up a region of memory in specific ways, then checks that the GC allocator returns the correct address in each case.

It doesn't really test for OOMy edge cases - that would require either taking up all the address space first, hooking mmap/munmap to only return addresses from a specific region, or making MapMemory/MapMemoryAt into member functions and creating a derived class to do the same thing - all of which seemed like a pain.

Since it does test all the main behaviors (and encovered some problems) I hope this is good enough :)
Attachment #8427978 - Flags: review?(terrence)
Oh, and the new JSAPI test looks good on try: https://tbpl.mozilla.org/?tree=Try&rev=7be0d991f528
Comment on attachment 8427978 [details] [diff] [review]
Part 4: Test the new allocation logic.

Review of attachment 8427978 [details] [diff] [review]:
-----------------------------------------------------------------

Brilliant work! I love the use of a string to define the expected behavior: gives a great visual overview of how the allocator works in each case.  r=me.

::: js/src/jsapi-tests/testGCAllocator.cpp
@@ +47,5 @@
> +static const size_t Alignment = 2 * Chunk;
> +static const int MaxTempChunks = 4096;
> +static const size_t StagingSize = 16 * Chunk;
> +
> +bool addressesGrowUp() {

{ on new line. And on the other functions in this file.

@@ +66,5 @@
> +};
> +
> +bool testGCAllocatorUp(const size_t PageSize) {
> +    const size_t UnalignedSize = StagingSize + Alignment - PageSize;
> +    void * chunkPool[MaxTempChunks];

No space between * and chunkPool.

@@ +194,5 @@
> +    // large enough to hold strlen(str) chunks of Chunk bytes.
> +    int len = strlen(str);
> +    int i;
> +    // Find the index of the desired address.
> +    for (i = 0; i < len && str[i] != 'o'; i++);

I <3 semicolon terminated loops :-). Please use ++i though for consistency.
Attachment #8427978 - Flags: review?(terrence) → review+
Comment on attachment 8427970 [details] [diff] [review]
Part 3 v3: Add a 'last ditch' pass to the GC's allocator to find alignable chunks when all else fails.

Review of attachment 8427970 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/gc/Memory.cpp
@@ +24,5 @@
> + * the region starting at p (as we assert that allocation size is an integer
> + * multiple of the alignment).
> + */
> +static inline size_t
> +OffsetFromAligned(void *p, size_t alignment)

Nice. I was thinking open-code the places where you need the actual alignment, but reading the code I like this approach much better than what I had in mind.

@@ +67,5 @@
>      /* Special case: If we want allocation alignment, no further work is needed. */
>      if (alignment == allocGranularity)
>          return p;
>  
> +    if (OffsetFromAligned(p, alignment) == 0)

I also like how "not offset from alignment" reads (e.g. !OffsetFromAlignment), but I'm fine with it this way too.
Attachment #8427970 - Flags: review?(terrence) → review+
Thanks! Carrying forward r=terrence.

(In reply to Terrence Cole [:terrence] from comment #34)
> I love the use of a string to define the expected behavior:
> gives a great visual overview of how the allocator works in each case.

I was very happy with it as well - you should have seen what I had for each case originally, it wasn't pretty (I had the memory layout in a comment above each test instead).

> > +bool addressesGrowUp() {
> 
> { on new line. And on the other functions in this file.

Done.

> > +    void * chunkPool[MaxTempChunks];
> 
> No space between * and chunkPool.

Whoops, done.

> > +    for (i = 0; i < len && str[i] != 'o'; i++);
> 
> I <3 semicolon terminated loops :-). Please use ++i though for consistency.

Argh, muscle memory.
Attachment #8427978 - Attachment is obsolete: true
Attachment #8428087 - Flags: review+
Thanks for the reviews!

(In reply to Terrence Cole [:terrence] from comment #35)
> > +    if (OffsetFromAligned(p, alignment) == 0)
> 
> I also like how "not offset from alignment" reads (e.g.
> !OffsetFromAlignment), but I'm fine with it this way too.

I often feel like |if (!fooBar ...| makes it too easy to miss the negation because of the similarity between ( and !. Maybe that's just being paranoid though.
> I <3 semicolon terminated loops :-)

Really? I detest them... they're easy to mis-read, and there's a clearer alternative -- for(...) {} -- that's almost as short.
(In reply to Nicholas Nethercote [:njn] from comment #38)
> Really? I detest them... they're easy to mis-read, and there's a clearer
> alternative -- for(...) {} -- that's almost as short.

I'm fine with s/;/ {}/ personally. I just preferred it to 
for (i = 0; i < len; ++i) {
    if (str[i] == 'o')
        break;
}
... or what have you. I feel like the indentation makes it fairly clear regardless though :)
David, if this (and bug 1005844 if it gets the go-ahead) lands and sticks, is it something you'd want on Aurora/ESR 31? I don't think the code in question changes much, so in that sense these changes are fairly self-contained, but of course they affect virtually all allocation in the browser (though if they were broken in the sense of returning an unaligned address, or null where there wasn't before, I'd expect things to have blown up on try).
Flags: needinfo?(dmajor)
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #40)
> David, if this (and bug 1005844 if it gets the go-ahead) lands and sticks,
> is it something you'd want on Aurora/ESR 31? I don't think the code in
> question changes much, so in that sense these changes are fairly
> self-contained, but of course they affect virtually all allocation in the
> browser (though if they were broken in the sense of returning an unaligned
> address, or null where there wasn't before, I'd expect things to have blown
> up on try).

The two changes have the potential to be huge wins, and I really want to see them go in, but it's a lot of new code with tricky manipulations in a critical module. The benefit would have to justify the risk. If the improvements are really as big as I expect, then I would support a fast-track, but we need to get some numbers from the wild to back it up. (And release management would have the final say of course)
Flags: needinfo?(dmajor)
That makes sense. The only number that I expect will benefit immediately is vsize-max-contiguous - and not at startup, but over time during a long browsing session. The changes here won't cause us to use less memory, real or virtual - but they should help us stave off fragmentation, and survive a lot longer once memory does become fragmented. That will hopefully affect the number of OOM crashes we get, at least the ones that aren't the result of a big burst of allocation - but I imagine it'll take some time before we have numbers on that.
This is a very nice bug to fix, but backporting this to Aurora feels over-aggressive to me :/
I realize it's rather late to be saying this, but: is there a reason we don't allocate JS GC memory on the heap with memalign()? That would allow us to avoid duplicating all this logic from jemalloc...
Terrence and I were talking about the code duplication on IRC a few days ago - I wasn't aware that there was such a method. I copied all the logic in this bug over to bug 1005844, so if that gets the green light and we can access memalign() from the JSAPI..
It would have the added benefit of reducing "heap-unclassified", because the JS memory reporters cover the JS heap perfectly :)
(In reply to Nicholas Nethercote [:njn] from comment #38)
> > I <3 semicolon terminated loops :-)
> 
> Really? I detest them... they're easy to mis-read, and there's a clearer
> alternative -- for(...) {} -- that's almost as short.

I figured you would be the first to object. ;-)

We should add this case to the style guide.

(In reply to Nicholas Nethercote [:njn] from comment #43)
> This is a very nice bug to fix, but backporting this to Aurora feels
> over-aggressive to me :/

I strongly agree.

(In reply to Nicholas Nethercote [:njn] from comment #45)
> I realize it's rather late to be saying this, but: is there a reason we
> don't allocate JS GC memory on the heap with memalign()? That would allow us
> to avoid duplicating all this logic from jemalloc...

My understanding was that memalign was not available on all platforms and we wanted to support non-jemalloc builds. Although, now that I'm thinking about it, since memalign is going to be available on all of our tier 1 platforms, we could just use it and have a dead-simple chunk allocator to fallback to for tier-n>1 platforms.
\o/ Note that the patch I just attached to bug 1005844 goes back to the old, simpler logic for the non-Windows path, so if we switch over to using memalign, one of the checks in the JSAPI test added in this bug will need to be ifdeffed out for Linux.

And no worries about the uplift, I just thought I'd ask now while 31 is still on Aurora :)
Blocks: 1008613
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: