Closed Bug 1068684 Opened 5 years ago Closed 5 years ago

SharedArrayBuffer length should not bias so heavily toward asm.js

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

There are two aspects of SharedArrayBuffer that are there so that SharedArrayBuffer can be used with asm.js:

- the length is restricted to be a valid asm.js length, ie,
  at least 4K AND (power of 2 OR multiple of 16M)

- on 64-bit systems a full 4GB is reserved so that bounds checking
  can be done using signal handlers

Those are managed at construction time since SharedArrayBuffer cannot easily be changed after creation (unlike an ArrayBuffer the contents cannot be copied elsewhere).

SharedArrayBuffer can be useful without the asm.js bias, and it seems probable that it would be better to ask for asm.js behavior when it is needed, rather than just making it the only option available.

As to how one asks for that behavior, the obvious choices are (a) an extra argument to the SharedArrayBuffer constructor or (b) a separate constructor.

The alternative to asking for the behavior is to somehow make it possible to change the buffer after creation, eg by going via some file mapping; not sure how that would work out in practice on all platforms or if it is even reasonably possible.
For the length restriction: you can definitely remove those: asm.js will just fail link-time validation if given a bad length (as with any other oddly-sized ArrayBuffer).

As for the 4gb mapping: is there really any measurable problem with this?  We've got *tons* of address space on 64-bit and the cost of VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I measured.
(In reply to Luke Wagner [:luke] from comment #1)
> For the length restriction: you can definitely remove those: asm.js will
> just fail link-time validation if given a bad length (as with any other
> oddly-sized ArrayBuffer).

Yes.

> As for the 4gb mapping: is there really any measurable problem with this? 
> We've got *tons* of address space on 64-bit and the cost of
> VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I
> measured.

I don't know, I haven't measured yet, and I'm generally a little paranoid (though I'm willing to believe that it won't be a problem in practice).  What did you measure, exactly?

We've discussed this a little bit on email or IRC, if I remember correctly.  Some other options were mentioned there.  In particular, if the length is not acceptable to asm.js then there's no need for the extra page preceding the array buffer space nor is there any need for the 4GB reservation.  But exposing that behavior in the specification seems unreasonable.
Here is what I found (text copied from an old email):

So first, for clean pages (no writes after mmap), Linux is basically O(1) for mmap/mprotect/munmap (usually < 6us).  For dirty pages, mprotect and munmap become O(n), with munmap being much slower than mprotect.  For reference, for 1GB, mprotect takes 4ms and munmap takes 52ms.

For window, things are more complicated.  First of all, almost everything is O(n), but the scaling is different for clean vs. dirty and for reserved vs. committed.  VirtualAlloc(MEM_RESERVE) (the memory range is reserved, but not touchable) is the cheapest, taking 131us for 35GB.  VirtualAlloc(MEM_COMMIT) is more expensive, taking 250us for 1GB.  VirtualProtect takes about 1.3ms for 1GB.  VirtualUnmap'ing a 35GB region containing 1GB of clean mapped pages takes 1.6ms.  That is all clean memory.  For dirty memory, the VirtualProtect takes 33ms, and the VirtualUnmap takes 88ms.

-----

Here I was focusing on 35gb reserved / 1 gb commited; scaling that down to 4gb reserved / 1 mb committed should make Windows a bunch cheaper.  Given we don't expect a high volume of SAB (like, noone should be using SharedInt32Array(3) as their vec3 type), this seems to all be within a pretty reasonable range.
> As for the 4gb mapping: is there really any measurable problem with this? 
> We've got *tons* of address space on 64-bit and the cost of
> VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I
> measured.

Check out bug 1008613 comment 8 for one problem -- oddly enough, munmap() started failing with ENOMEM in a fuzz test case where vsize reached 127 GiB.
Were there any other extenuating circumstances producing an OOM at such a relatively low vsize for 64-bit?  I mean I can allocate 16Tb right now without a problem.
> Were there any other extenuating circumstances producing an OOM at such a
> relatively low vsize for 64-bit?

Not that I know of.

> I mean I can allocate 16Tb right now without a problem.

The failure was in munmap(). Can you deallocate that memory without problem too? It's common to not check the return value of munmap().

I don't have strong opinions here. I just thought it mentioning this weird case.
(In reply to Nicholas Nethercote [:njn] from comment #6)
> > I mean I can allocate 16Tb right now without a problem.
> 
> The failure was in munmap(). Can you deallocate that memory without problem
> too? It's common to not check the return value of munmap().

Yep, instantly also.  I think, under normal circumstances, these 4gb reservations are just allocating one tiny datum to describe "this range is claimed" and the size of the range doesn't matter until you start committing pages.  That's why I was guessing that the fuzzcase was, in addition to doing all the SAB allocations, doing a bunch with the GC heap and thereby allocating a bunch of virtual-address-range descriptors.
Bug 1073934 is probably an instance of where we run into the map/unmap exhaustion(?) problem in practice.  The failures are all on Linux, so far.
Bug 1073934 goes into some detail and explains why we're seeing these problems when there are many live SharedArrayBuffer objects that are all mapped to 4GB (short version: Linux-64 has a 128TB address space limitation, and we're up against that).  There is a little turbulence in the page mapping table when the kernel is rearranging things and thus various page mapping calls can knock the process temporarily over the limit, there is no reason to believe that it is munmap per se that is the problem.
Perhaps we can just track the number of mapped 4gb regions (SAB and asm.js AB) and throw if there is ever more than 128tb worth (~32,000).  I can't imagine someone really needing that many SABs in practice and this seems easier than what the alternative (which I think means stopping all SAB-sharing threads so that we can atomically swap in the 4gb region).

To wit, I was thinking of doing something analogous on 32-bit: keeping track of total AB space usage and OOMing if an allocation would leave us in a situation where there is less than, say, 100mb address space remaining (b/c once we get close, the browser gets really crashy).
(In reply to Luke Wagner [:luke] from comment #10)
> Perhaps we can just track the number of mapped 4gb regions (SAB and asm.js
> AB) and throw if there is ever more than 128tb worth (~32,000).  I can't
> imagine someone really needing that many SABs in practice and this seems
> easier than what the alternative (which I think means stopping all
> SAB-sharing threads so that we can atomically swap in the 4gb region).

It's not quite that simple because we're also running out of virtual memory map entries for the process, see new comments on bug 1073934.

Also I don't know what the situation is on Mac OS X and Windows, surely there are limits there too.

Mac OS X: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB8QFjAA&url=https%3A%2F%2Fdeveloper.apple.com%2Flibrary%2Fmac%2Fdocumentation%2Fperformance%2Fconceptual%2Fmanagingmemory%2Farticles%2Faboutmemory.html&ei=MWAtVJ7aFaGBywPymYHADw&usg=AFQjCNG-H6ysqwWUTtkII7OLeLQjMAcM8Q&bvm=bv.76477589,d.bGQ
Says that the per-process VM limit on 64-bit systems is 18 exabytes, which is something like 100x what it is on Linux - probably good enough.  I've found no mention of other limits.

Windows: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.85%29.aspx
In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per process.
(In reply to Lars T Hansen [:lth] from comment #11)
> It's not quite that simple because we're also running out of virtual memory
> map entries for the process, see new comments on bug 1073934.

I see roughly the same numeric limit, though in bug 1073934 comment 13: ~32,000.

> Windows:
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.
> 85%29.aspx
> In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per
> process.

Whoa, but check out Windows Vista Home Basic/Premium: 8/16gb resp.  wtf!  Good thing, (judging by http://stats.unity3d.com/web/os.html) noone uses Vista...

So there seems to be two limits: virtual address space (which shouldn't cause munmap/mprotect to fail) and address-space descriptors (which seem limited to ~32,000, at least on Linux).  Even a rather conservative limit of 1000 seems like it'd both be enough and keep us far from danger.

Btw, for the problem of trying to get the GC to trigger, we have the OutOfMemoryCallback (see jsapi.h) which does a synchronous mega-gc and is what lets us not, e.g., OOM when you visit a big asm.js app on 32-bit and hit reload: when we fail the allocation, we sync-gc, then try again.  The same could be applied if one of our conservative limits fails.

One last note: it seems like non-asm.js could also benefit from asm.js-style bounds-check elimination when it has singleton information about the SAB (which it should).
Blocks: 1073934
(In reply to Luke Wagner [:luke] from comment #12)
> (In reply to Lars T Hansen [:lth] from comment #11)
> 
> > Windows:
> > http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.
> > 85%29.aspx
> > In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per
> > process.
> 
> Whoa, but check out Windows Vista Home Basic/Premium: 8/16gb resp.  wtf! 

Those look like they are limits on physical memory, not address space.  The table is confusing but the way I read it even Vista users get 8TB of virtual address space per process.
(In reply to Luke Wagner [:luke] from comment #12)
> 
> So there seems to be two limits: virtual address space (which shouldn't
> cause munmap/mprotect to fail) and address-space descriptors (which seem
> limited to ~32,000, at least on Linux).  Even a rather conservative limit of
> 1000 seems like it'd both be enough and keep us far from danger.

Yes.  That gives us 4TB in the worst case.

> Btw, for the problem of trying to get the GC to trigger, we have the
> OutOfMemoryCallback (see jsapi.h) which does a synchronous mega-gc and is
> what lets us not, e.g., OOM when you visit a big asm.js app on 32-bit and
> hit reload: when we fail the allocation, we sync-gc, then try again.  The
> same could be applied if one of our conservative limits fails.

I agree.

I'll cook something up along those lines.
Jon, could you take a look at the GC bits to see if I've bungled something?

All, with this patch Gary's test case from bug 1073934 passes as it should, and debugger runs show the special case GC kicking in a number of times.

(Also see my latest note on bug 1073934 where I argue that we may need to do this on 32-bit Linux as well, in order to limit the number of live mappings.)
Attachment #8500399 - Flags: feedback?(jcoppeard)
Comment on attachment 8500399 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit

Review of attachment 8500399 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Lars,

In general this looks good, but I do have some comments:

1) I don't think you need to wait for background sweeping to finish since shared array buffers are finalized on the main thread (the shared array buffer object class lacks the JSCLASS_BACKGROUND_FINALIZE flag).

2) As I read it you actually want to GC all zones, not just the zone of the current compartment, so you want to call PrepareForFullGC() rather than PrepareZoneForGC().

3) Let's not add another entry point in GCRuntime for running a GC.  Can you move the body of collectAndSweep() into JSContext::gcForReason()?  Also this should probably be called fullGC() or something like that to discriminate it from JS::GCForReason() which requires you to select the zones first.
Attachment #8500399 - Flags: feedback?(jcoppeard) → feedback+
(In reply to Jon Coppeard (:jonco) from comment #16)
> Comment on attachment 8500399 [details] [diff] [review]
> Trigger a synchronous GC when the count of SAB exceeds a set limit
> 
> Review of attachment 8500399 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks for quick feedback.

> 1) I don't think you need to wait for background sweeping to finish since
> shared array buffers are finalized on the main thread (the shared array
> buffer object class lacks the JSCLASS_BACKGROUND_FINALIZE flag).

Maybe.  If I wait, I risk that a fast-allocating test case exhausts resources before sweeping manages to free up space, and I really want to guarantee that that does not happen.  The only way to guarantee it is to force the sweeping to finish, I think.

(This really is an extraordinary circumstance, we do not expect to see programs that have a thousand live SABs in the wild, but we can't afford to crash like we do now.)

> 2) As I read it you actually want to GC all zones, not just the zone of the
> current compartment, so you want to call PrepareForFullGC() rather than
> PrepareZoneForGC().

I'll look into that.  For this patch I was just trying to GC the one zone.

> 3) Let's not add another entry point in GCRuntime for running a GC.  Can you
> move the body of collectAndSweep() into JSContext::gcForReason()?

I don't think so, but I'll look again, perhaps I messed something up previously, but the GCs basically did not happen because there were guards on whether the zone had been prepped for GC (it had not).  At least the pure gc() callback failed for that reason.

> Also this
> should probably be called fullGC() or something like that to discriminate it
> from JS::GCForReason() which requires you to select the zones first.

Well, again, I was just trying to do the one zone here, but once we've settled the other details I'm happy to discuss naming.
Have you considered using the largeAllocationFailureCallback to do the synchronous collection?  It calls into Gecko where we fire the low-memory notification which:
 - flushes a bunch of caches including the back-forward cache (so unroots window trees that aren't visible but reachable via Back/Forward)
 - does a synchronous GC/CC/GC trio (which is necessary if you want to clear up garbage rooted by dead windows)

I added the largeAllocationFailureCallback to mitigate 32-bit OOMs during reload/navigation.  It is currently called via pod_(calloc|realloc)CanGC in the onOutOfMemoryCanGC path.
(In reply to Luke Wagner [:luke] from comment #18)
> Have you considered using the largeAllocationFailureCallback to do the
> synchronous collection?

Not under that name, see below.

> It calls into Gecko where we fire the low-memory
> notification which:
>  - flushes a bunch of caches including the back-forward cache (so unroots
> window trees that aren't visible but reachable via Back/Forward)
>  - does a synchronous GC/CC/GC trio (which is necessary if you want to clear
> up garbage rooted by dead windows)
> 
> I added the largeAllocationFailureCallback to mitigate 32-bit OOMs during
> reload/navigation.  It is currently called via pod_(calloc|realloc)CanGC in
> the onOutOfMemoryCanGC path.

That's a better hint than you provided earlier, so maybe I'll be luckier finding it this time.

As I commented to Jon above, I am skeptical that lazy sweeping will necessarily keep up, so anything that forces everything to a grinding halt is fine with mee.  This path looks promising.
Much simplified thanks to suggestions from you both.  The only new complication here is that the shell needs to set up callback, or Gary's test case in bug 1073934 will be no happier.
Attachment #8500399 - Attachment is obsolete: true
Attachment #8511061 - Flags: review?(luke)
Attachment #8511061 - Flags: review?(jcoppeard)
Comment on attachment 8511061 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit, v2

Review of attachment 8511061 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #8511061 - Flags: review?(jcoppeard) → review+
Comment on attachment 8511061 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit, v2

Review of attachment 8511061 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SharedArrayObject.cpp
@@ +96,5 @@
>      // Enforced by SharedArrayBufferObject::New(cx, length).
>      MOZ_ASSERT(IsValidAsmJSHeapLength(length));
>  
>  #ifdef JS_CODEGEN_X64
> +    if (++numLive == maxLive) {

I think you want >= here.  Otherwise, it seems that, in the time before numLive is decremented below, another SAB could be allocated (that would succeed because ++numLive would be > maxLive) and then all subsequent SAB allocations would succeed.
Attachment #8511061 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #22)
> Comment on attachment 8511061 [details] [diff] [review]
> Trigger a synchronous GC when the count of SAB exceeds a set limit, v2
> 
> Review of attachment 8511061 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/SharedArrayObject.cpp
> @@ +96,5 @@
> >      // Enforced by SharedArrayBufferObject::New(cx, length).
> >      MOZ_ASSERT(IsValidAsmJSHeapLength(length));
> >  
> >  #ifdef JS_CODEGEN_X64
> > +    if (++numLive == maxLive) {
> 
> I think you want >= here.  Otherwise, it seems that, in the time before
> numLive is decremented below, another SAB could be allocated (that would
> succeed because ++numLive would be > maxLive) and then all subsequent SAB
> allocations would succeed.

Good point.
Assignee: nobody → lhansen
Won't this allow asm.js to access past the apparent end of the SAB (before it hits the end of the rounded-up page)?  This could be fixed by offsetting the base of the SAB so that the end lined up with a page boundary.
(In reply to Luke Wagner [:luke] from comment #26)
> Won't this allow asm.js to access past the apparent end of the SAB (before
> it hits the end of the rounded-up page)?

I don't understand how that could be.  Observe that the buffer will always start on a page boundary, this page does not change that.  All this page does is remove the assumption that the amount of memory requested is a multiple of a page size, and round up the request to an integral number of pages.

That roundup is not reflected in the number of bytes logically available, nor should it be.

asm.js will still perform a test that the buffer it is handed has a logical size that is an integral number of pages (plus other things).
Flags: needinfo?(luke)
(In reply to Lars T Hansen [:lth] from comment #28)
> That roundup is not reflected in the number of bytes logically available,
> nor should it be.

I'm thinking about x64, where page protection determines what is out-of-bounds.
Flags: needinfo?(luke)
Attachment #8511166 - Flags: review?(luke) → review+
Hi Lars, sorry had to back this out for bustage in ASAN builds like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3434568&repo=mozilla-inbound
Flags: needinfo?(lhansen)
(In reply to Carsten Book [:Tomcat] from comment #31)
> Hi Lars, sorry had to back this out for bustage in ASAN builds like
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=3434568&repo=mozilla-inbound

That was careless on my part...
Flags: needinfo?(lhansen)
I'll file a separate bug to move the test cases over to jit-tests.  I don't exactly remember why these tests were not placed there in the first place.
Attachment #8514979 - Flags: review?(jdemooij)
Keywords: leave-open
Comment on attachment 8514979 [details] [diff] [review]
Remove a test case that tested for the restriction that is now gone

Review of attachment 8514979 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8514979 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/mozilla-central/rev/677c24618e33
https://hg.mozilla.org/mozilla-central/rev/3ce12a4c3afb
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Depends on: 1120063
You need to log in before you can comment on or make changes to this bug.