1068684 - SharedArrayBuffer length should not bias so heavily toward asm.js

Assignee

Description

•

10 years ago

There are two aspects of SharedArrayBuffer that are there so that SharedArrayBuffer can be used with asm.js:

- the length is restricted to be a valid asm.js length, ie,
  at least 4K AND (power of 2 OR multiple of 16M)

- on 64-bit systems a full 4GB is reserved so that bounds checking
  can be done using signal handlers

Those are managed at construction time since SharedArrayBuffer cannot easily be changed after creation (unlike an ArrayBuffer the contents cannot be copied elsewhere).

SharedArrayBuffer can be useful without the asm.js bias, and it seems probable that it would be better to ask for asm.js behavior when it is needed, rather than just making it the only option available.

As to how one asks for that behavior, the obvious choices are (a) an extra argument to the SharedArrayBuffer constructor or (b) a separate constructor.

The alternative to asking for the behavior is to somehow make it possible to change the buffer after creation, eg by going via some file mapping; not sure how that would work out in practice on all platforms or if it is even reasonably possible.

Luke Wagner [:luke]

Comment 1

•

10 years ago

For the length restriction: you can definitely remove those: asm.js will just fail link-time validation if given a bad length (as with any other oddly-sized ArrayBuffer).

As for the 4gb mapping: is there really any measurable problem with this?  We've got *tons* of address space on 64-bit and the cost of VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I measured.

Lars T Hansen [:lth]

Assignee

Comment 2

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #1)
> For the length restriction: you can definitely remove those: asm.js will
> just fail link-time validation if given a bad length (as with any other
> oddly-sized ArrayBuffer).

Yes.

> As for the 4gb mapping: is there really any measurable problem with this? 
> We've got *tons* of address space on 64-bit and the cost of
> VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I
> measured.

I don't know, I haven't measured yet, and I'm generally a little paranoid (though I'm willing to believe that it won't be a problem in practice).  What did you measure, exactly?

We've discussed this a little bit on email or IRC, if I remember correctly.  Some other options were mentioned there.  In particular, if the length is not acceptable to asm.js then there's no need for the extra page preceding the array buffer space nor is there any need for the 4GB reservation.  But exposing that behavior in the specification seems unreasonable.

Luke Wagner [:luke]

Comment 3

•

10 years ago

Here is what I found (text copied from an old email):

So first, for clean pages (no writes after mmap), Linux is basically O(1) for mmap/mprotect/munmap (usually < 6us).  For dirty pages, mprotect and munmap become O(n), with munmap being much slower than mprotect.  For reference, for 1GB, mprotect takes 4ms and munmap takes 52ms.

For window, things are more complicated.  First of all, almost everything is O(n), but the scaling is different for clean vs. dirty and for reserved vs. committed.  VirtualAlloc(MEM_RESERVE) (the memory range is reserved, but not touchable) is the cheapest, taking 131us for 35GB.  VirtualAlloc(MEM_COMMIT) is more expensive, taking 250us for 1GB.  VirtualProtect takes about 1.3ms for 1GB.  VirtualUnmap'ing a 35GB region containing 1GB of clean mapped pages takes 1.6ms.  That is all clean memory.  For dirty memory, the VirtualProtect takes 33ms, and the VirtualUnmap takes 88ms.

-----

Here I was focusing on 35gb reserved / 1 gb commited; scaling that down to 4gb reserved / 1 mb committed should make Windows a bunch cheaper.  Given we don't expect a high volume of SAB (like, noone should be using SharedInt32Array(3) as their vec3 type), this seems to all be within a pretty reasonable range.

Nicholas Nethercote [inactive]

Comment 4

•

10 years ago

> As for the 4gb mapping: is there really any measurable problem with this? 
> We've got *tons* of address space on 64-bit and the cost of
> VirtualAlloc(4gb)+VirtualProtect(few mb) was very low on all platforms I
> measured.

Check out bug 1008613 comment 8 for one problem -- oddly enough, munmap() started failing with ENOMEM in a fuzz test case where vsize reached 127 GiB.

Luke Wagner [:luke]

Comment 5

•

10 years ago

Were there any other extenuating circumstances producing an OOM at such a relatively low vsize for 64-bit?  I mean I can allocate 16Tb right now without a problem.

Nicholas Nethercote [inactive]

Comment 6

•

10 years ago

> Were there any other extenuating circumstances producing an OOM at such a
> relatively low vsize for 64-bit?

Not that I know of.

> I mean I can allocate 16Tb right now without a problem.

The failure was in munmap(). Can you deallocate that memory without problem too? It's common to not check the return value of munmap().

I don't have strong opinions here. I just thought it mentioning this weird case.

Luke Wagner [:luke]

Comment 7

•

10 years ago

(In reply to Nicholas Nethercote [:njn] from comment #6)
> > I mean I can allocate 16Tb right now without a problem.
> 
> The failure was in munmap(). Can you deallocate that memory without problem
> too? It's common to not check the return value of munmap().

Yep, instantly also.  I think, under normal circumstances, these 4gb reservations are just allocating one tiny datum to describe "this range is claimed" and the size of the range doesn't matter until you start committing pages.  That's why I was guessing that the fuzzcase was, in addition to doing all the SAB allocations, doing a bunch with the GC heap and thereby allocating a bunch of virtual-address-range descriptors.

Lars T Hansen [:lth]

Assignee

Comment 8

•

10 years ago

Bug 1073934 is probably an instance of where we run into the map/unmap exhaustion(?) problem in practice.  The failures are all on Linux, so far.

Lars T Hansen [:lth]

Assignee

Comment 9

•

10 years ago

Bug 1073934 goes into some detail and explains why we're seeing these problems when there are many live SharedArrayBuffer objects that are all mapped to 4GB (short version: Linux-64 has a 128TB address space limitation, and we're up against that).  There is a little turbulence in the page mapping table when the kernel is rearranging things and thus various page mapping calls can knock the process temporarily over the limit, there is no reason to believe that it is munmap per se that is the problem.

Luke Wagner [:luke]

Comment 10

•

10 years ago

Perhaps we can just track the number of mapped 4gb regions (SAB and asm.js AB) and throw if there is ever more than 128tb worth (~32,000).  I can't imagine someone really needing that many SABs in practice and this seems easier than what the alternative (which I think means stopping all SAB-sharing threads so that we can atomically swap in the 4gb region).

To wit, I was thinking of doing something analogous on 32-bit: keeping track of total AB space usage and OOMing if an allocation would leave us in a situation where there is less than, say, 100mb address space remaining (b/c once we get close, the browser gets really crashy).

Lars T Hansen [:lth]

Assignee

Comment 11

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #10)
> Perhaps we can just track the number of mapped 4gb regions (SAB and asm.js
> AB) and throw if there is ever more than 128tb worth (~32,000).  I can't
> imagine someone really needing that many SABs in practice and this seems
> easier than what the alternative (which I think means stopping all
> SAB-sharing threads so that we can atomically swap in the 4gb region).

It's not quite that simple because we're also running out of virtual memory map entries for the process, see new comments on bug 1073934.

Also I don't know what the situation is on Mac OS X and Windows, surely there are limits there too.

Mac OS X: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB8QFjAA&url=https%3A%2F%2Fdeveloper.apple.com%2Flibrary%2Fmac%2Fdocumentation%2Fperformance%2Fconceptual%2Fmanagingmemory%2Farticles%2Faboutmemory.html&ei=MWAtVJ7aFaGBywPymYHADw&usg=AFQjCNG-H6ysqwWUTtkII7OLeLQjMAcM8Q&bvm=bv.76477589,d.bGQ
Says that the per-process VM limit on 64-bit systems is 18 exabytes, which is something like 100x what it is on Linux - probably good enough.  I've found no mention of other limits.

Windows: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.85%29.aspx
In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per process.

Luke Wagner [:luke]

Comment 12

•

10 years ago

(In reply to Lars T Hansen [:lth] from comment #11)
> It's not quite that simple because we're also running out of virtual memory
> map entries for the process, see new comments on bug 1073934.

I see roughly the same numeric limit, though in bug 1073934 comment 13: ~32,000.

> Windows:
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.
> 85%29.aspx
> In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per
> process.

Whoa, but check out Windows Vista Home Basic/Premium: 8/16gb resp.  wtf!  Good thing, (judging by http://stats.unity3d.com/web/os.html) noone uses Vista...

So there seems to be two limits: virtual address space (which shouldn't cause munmap/mprotect to fail) and address-space descriptors (which seem limited to ~32,000, at least on Linux).  Even a rather conservative limit of 1000 seems like it'd both be enough and keep us far from danger.

Btw, for the problem of trying to get the GC to trigger, we have the OutOfMemoryCallback (see jsapi.h) which does a synchronous mega-gc and is what lets us not, e.g., OOM when you visit a big asm.js app on 32-bit and hit reload: when we fail the allocation, we sync-gc, then try again.  The same could be applied if one of our conservative limits fails.

One last note: it seems like non-asm.js could also benefit from asm.js-style bounds-check elimination when it has singleton information about the SAB (which it should).

Lars T Hansen [:lth]

Assignee

Updated

•

10 years ago

Blocks: 1073934

Lars T Hansen [:lth]

Assignee

Comment 13

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #12)
> (In reply to Lars T Hansen [:lth] from comment #11)
> 
> > Windows:
> > http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.
> > 85%29.aspx
> > In Windows 8.1 the limit is 128TB per process.  Previously it was 8TB per
> > process.
> 
> Whoa, but check out Windows Vista Home Basic/Premium: 8/16gb resp.  wtf! 

Those look like they are limits on physical memory, not address space.  The table is confusing but the way I read it even Vista users get 8TB of virtual address space per process.

Lars T Hansen [:lth]

Assignee

Comment 14

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #12)
> 
> So there seems to be two limits: virtual address space (which shouldn't
> cause munmap/mprotect to fail) and address-space descriptors (which seem
> limited to ~32,000, at least on Linux).  Even a rather conservative limit of
> 1000 seems like it'd both be enough and keep us far from danger.

Yes.  That gives us 4TB in the worst case.

> Btw, for the problem of trying to get the GC to trigger, we have the
> OutOfMemoryCallback (see jsapi.h) which does a synchronous mega-gc and is
> what lets us not, e.g., OOM when you visit a big asm.js app on 32-bit and
> hit reload: when we fail the allocation, we sync-gc, then try again.  The
> same could be applied if one of our conservative limits fails.

I agree.

I'll cook something up along those lines.

Lars T Hansen [:lth]

Assignee

Comment 15

•

10 years ago

Attached patch Trigger a synchronous GC when the count of SAB exceeds a set limit (obsolete) — Details — Splinter Review

Jon, could you take a look at the GC bits to see if I've bungled something?

All, with this patch Gary's test case from bug 1073934 passes as it should, and debugger runs show the special case GC kicking in a number of times.

(Also see my latest note on bug 1073934 where I argue that we may need to do this on 32-bit Linux as well, in order to limit the number of live mappings.)

Attachment #8500399 - Flags: feedback?(jcoppeard)

Jon Coppeard (:jonco)

Comment 16

•

10 years ago

Comment on attachment 8500399 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit

Review of attachment 8500399 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Lars,

In general this looks good, but I do have some comments:

1) I don't think you need to wait for background sweeping to finish since shared array buffers are finalized on the main thread (the shared array buffer object class lacks the JSCLASS_BACKGROUND_FINALIZE flag).

2) As I read it you actually want to GC all zones, not just the zone of the current compartment, so you want to call PrepareForFullGC() rather than PrepareZoneForGC().

3) Let's not add another entry point in GCRuntime for running a GC.  Can you move the body of collectAndSweep() into JSContext::gcForReason()?  Also this should probably be called fullGC() or something like that to discriminate it from JS::GCForReason() which requires you to select the zones first.

Attachment #8500399 - Flags: feedback?(jcoppeard) → feedback+

Lars T Hansen [:lth]

Assignee

Comment 17

•

10 years ago

(In reply to Jon Coppeard (:jonco) from comment #16)
> Comment on attachment 8500399 [details] [diff] [review]
> Trigger a synchronous GC when the count of SAB exceeds a set limit
> 
> Review of attachment 8500399 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks for quick feedback.

> 1) I don't think you need to wait for background sweeping to finish since
> shared array buffers are finalized on the main thread (the shared array
> buffer object class lacks the JSCLASS_BACKGROUND_FINALIZE flag).

Maybe.  If I wait, I risk that a fast-allocating test case exhausts resources before sweeping manages to free up space, and I really want to guarantee that that does not happen.  The only way to guarantee it is to force the sweeping to finish, I think.

(This really is an extraordinary circumstance, we do not expect to see programs that have a thousand live SABs in the wild, but we can't afford to crash like we do now.)

> 2) As I read it you actually want to GC all zones, not just the zone of the
> current compartment, so you want to call PrepareForFullGC() rather than
> PrepareZoneForGC().

I'll look into that.  For this patch I was just trying to GC the one zone.

> 3) Let's not add another entry point in GCRuntime for running a GC.  Can you
> move the body of collectAndSweep() into JSContext::gcForReason()?

I don't think so, but I'll look again, perhaps I messed something up previously, but the GCs basically did not happen because there were guards on whether the zone had been prepped for GC (it had not).  At least the pure gc() callback failed for that reason.

> Also this
> should probably be called fullGC() or something like that to discriminate it
> from JS::GCForReason() which requires you to select the zones first.

Well, again, I was just trying to do the one zone here, but once we've settled the other details I'm happy to discuss naming.

Luke Wagner [:luke]

Comment 18

•

10 years ago

Have you considered using the largeAllocationFailureCallback to do the synchronous collection?  It calls into Gecko where we fire the low-memory notification which:
 - flushes a bunch of caches including the back-forward cache (so unroots window trees that aren't visible but reachable via Back/Forward)
 - does a synchronous GC/CC/GC trio (which is necessary if you want to clear up garbage rooted by dead windows)

I added the largeAllocationFailureCallback to mitigate 32-bit OOMs during reload/navigation.  It is currently called via pod_(calloc|realloc)CanGC in the onOutOfMemoryCanGC path.

Lars T Hansen [:lth]

Assignee

Comment 19

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #18)
> Have you considered using the largeAllocationFailureCallback to do the
> synchronous collection?

Not under that name, see below.

> It calls into Gecko where we fire the low-memory
> notification which:
>  - flushes a bunch of caches including the back-forward cache (so unroots
> window trees that aren't visible but reachable via Back/Forward)
>  - does a synchronous GC/CC/GC trio (which is necessary if you want to clear
> up garbage rooted by dead windows)
> 
> I added the largeAllocationFailureCallback to mitigate 32-bit OOMs during
> reload/navigation.  It is currently called via pod_(calloc|realloc)CanGC in
> the onOutOfMemoryCanGC path.

That's a better hint than you provided earlier, so maybe I'll be luckier finding it this time.

As I commented to Jon above, I am skeptical that lazy sweeping will necessarily keep up, so anything that forces everything to a grinding halt is fine with mee.  This path looks promising.

Lars T Hansen [:lth]

Assignee

Comment 20

•

10 years ago

Attached patch Trigger a synchronous GC when the count of SAB exceeds a set limit, v2 — Details — Splinter Review

Much simplified thanks to suggestions from you both.  The only new complication here is that the shell needs to set up callback, or Gary's test case in bug 1073934 will be no happier.

Attachment #8500399 - Attachment is obsolete: true

Attachment #8511061 - Flags: review?(luke)

Attachment #8511061 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco)

Comment 21

•

10 years ago

Comment on attachment 8511061 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit, v2

Review of attachment 8511061 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

Attachment #8511061 - Flags: review?(jcoppeard) → review+

Luke Wagner [:luke]

Comment 22

•

10 years ago

Comment on attachment 8511061 [details] [diff] [review]
Trigger a synchronous GC when the count of SAB exceeds a set limit, v2

Review of attachment 8511061 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SharedArrayObject.cpp
@@ +96,5 @@
>      // Enforced by SharedArrayBufferObject::New(cx, length).
>      MOZ_ASSERT(IsValidAsmJSHeapLength(length));
>  
>  #ifdef JS_CODEGEN_X64
> +    if (++numLive == maxLive) {

I think you want >= here.  Otherwise, it seems that, in the time before numLive is decremented below, another SAB could be allocated (that would succeed because ++numLive would be > maxLive) and then all subsequent SAB allocations would succeed.

Attachment #8511061 - Flags: review?(luke) → review+

Lars T Hansen [:lth]

Assignee

Comment 23

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #22)
> Comment on attachment 8511061 [details] [diff] [review]
> Trigger a synchronous GC when the count of SAB exceeds a set limit, v2
> 
> Review of attachment 8511061 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/SharedArrayObject.cpp
> @@ +96,5 @@
> >      // Enforced by SharedArrayBufferObject::New(cx, length).
> >      MOZ_ASSERT(IsValidAsmJSHeapLength(length));
> >  
> >  #ifdef JS_CODEGEN_X64
> > +    if (++numLive == maxLive) {
> 
> I think you want >= here.  Otherwise, it seems that, in the time before
> numLive is decremented below, another SAB could be allocated (that would
> succeed because ++numLive would be > maxLive) and then all subsequent SAB
> allocations would succeed.

Good point.

Lars T Hansen [:lth]

Assignee

Comment 24

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9756bf697be4

Keywords: leave-open

Lars T Hansen [:lth]

Assignee

Comment 25

•

10 years ago

Attached patch Remove the length restriction on creation of SharedArrayBuffer objects — Details — Splinter Review

Attachment #8511166 - Flags: review?(luke)

Lars T Hansen [:lth]

Assignee

Updated

•

10 years ago

Assignee: nobody → lhansen

Luke Wagner [:luke]

Comment 26

•

10 years ago

Won't this allow asm.js to access past the apparent end of the SAB (before it hits the end of the rounded-up page)?  This could be fixed by offsetting the base of the SAB so that the end lined up with a page boundary.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 27

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/9756bf697be4

Lars T Hansen [:lth]

Assignee

Comment 28

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #26)
> Won't this allow asm.js to access past the apparent end of the SAB (before
> it hits the end of the rounded-up page)?

I don't understand how that could be.  Observe that the buffer will always start on a page boundary, this page does not change that.  All this page does is remove the assumption that the amount of memory requested is a multiple of a page size, and round up the request to an integral number of pages.

That roundup is not reflected in the number of bytes logically available, nor should it be.

asm.js will still perform a test that the buffer it is handed has a logical size that is an integral number of pages (plus other things).

Flags: needinfo?(luke)

Luke Wagner [:luke]

Comment 29

•

10 years ago

(In reply to Lars T Hansen [:lth] from comment #28)
> That roundup is not reflected in the number of bytes logically available,
> nor should it be.

I'm thinking about x64, where page protection determines what is out-of-bounds.

Flags: needinfo?(luke)

Luke Wagner [:luke]

Updated

•

10 years ago

Attachment #8511166 - Flags: review?(luke) → review+

Lars T Hansen [:lth]

Assignee

Comment 30

•

10 years ago

Second patch:
https://hg.mozilla.org/integration/mozilla-inbound/rev/17bae3d258dd

Keywords: leave-open

Carsten Book [:Tomcat]

Comment 31

•

10 years ago

Hi Lars, sorry had to back this out for bustage in ASAN builds like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3434568&repo=mozilla-inbound

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Assignee

Comment 32

•

10 years ago

(In reply to Carsten Book [:Tomcat] from comment #31)
> Hi Lars, sorry had to back this out for bustage in ASAN builds like
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=3434568&repo=mozilla-inbound

That was careless on my part...

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Assignee

Comment 33

•

10 years ago

Try run of modified patch is green:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e29d9d949806

Lars T Hansen [:lth]

Assignee

Comment 34

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/677c24618e33

Lars T Hansen [:lth]

Assignee

Comment 35

•

10 years ago

Attached patch Remove a test case that tested for the restriction that is now gone — Details — Splinter Review

I'll file a separate bug to move the test cases over to jit-tests.  I don't exactly remember why these tests were not placed there in the first place.

Attachment #8514979 - Flags: review?(jdemooij)

Lars T Hansen [:lth]

Assignee

Updated

•

10 years ago

Keywords: leave-open

Jan de Mooij [:jandem]

Comment 36

•

10 years ago

Comment on attachment 8514979 [details] [diff] [review]
Remove a test case that tested for the restriction that is now gone

Review of attachment 8514979 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

Attachment #8514979 - Flags: review?(jdemooij) → review+

Lars T Hansen [:lth]

Assignee

Comment 37

•

10 years ago

Test case removal:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ce12a4c3afb

Keywords: leave-open

Ryan VanderMeulen [:RyanVM]

Comment 38

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/677c24618e33
https://hg.mozilla.org/mozilla-central/rev/3ce12a4c3afb

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

9 years ago

Depends on: 1120063

Trigger a synchronous GC when the count of SAB exceeds a set limit 10 years ago Lars T Hansen [:lth] 10.68 KB, patch	jonco : feedback+	Details \| Diff \| Splinter Review
Trigger a synchronous GC when the count of SAB exceeds a set limit, v2 10 years ago Lars T Hansen [:lth] 10.70 KB, patch	luke : review+ jonco : review+	Details \| Diff \| Splinter Review
Remove the length restriction on creation of SharedArrayBuffer objects 10 years ago Lars T Hansen [:lth] 3.58 KB, patch	luke : review+	Details \| Diff \| Splinter Review
Remove a test case that tested for the restriction that is now gone 10 years ago Lars T Hansen [:lth] 2.23 KB, patch	jandem : review+	Details \| Diff \| Splinter Review