Closed Bug 1230162 Opened 9 years ago Closed 5 years ago

Address space exhaustion on Windows 7 x64 opt caused by SAB allocation policy

Categories

(Core :: JavaScript Engine, defect, P3)

x86_64
Windows 7
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Bug 1176214 generalized existing TypedArray test cases to also apply to shared memory (when the TA is mapping a SharedArrayBuffer).  A good example is tests/ecma_6/TypedArray/fill.js, notice how it adds constructors for TA-on-SAB when appropriate.

That test case will then end up allocating a lot of SABs within its main loop, about 70 of them, times eight constructors (should have been nine - Uint8ClampedArray was left off by accident), ie more than 500 such buffers.

The allocation policy on Win64 is to map 6442459126 bytes per segment, ie slightly more than 6GB, on the assumption that they will be used for asm.js.

I'm surprised that it's 6GB and not 4GB, but that's not the central point here.  The point is that, modulo aggressive GC, we'll have a peak allocation of more than 3TB during one such run.  This is already a lot, but it turns into spurious failures during test runs when multiple test threads are run in parallel; running jstests.py -j4 (the default) on my home system I can reliably provoke test failures because VirtualAlloc fails.  This happens on try, too (https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb0756adcf10, the "p" failure on Win8 x64 opt).

There might be multiple work items here.  The test failure can perhaps be fixed by forcing some kind of serialization (not sure).  We should also not inflate allocations that are unsuitable for asm.js, it's an easy fix even if it's not fully general.
The 6GB allocation is caused by a 4GB heap plus a 2GB "immediate range" on this platform.
On IRC, :ehoogeveen notes that Win7 has the same 8TB per-process limit as Vista: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.85%29.aspx#memory_limits.

(This is not what's biting us in this case, but at least it won't let us go "Bah, Vista.")
Summary: Address space exhaustion on Windows x64 opt caused by SAB allocation policy → Address space exhaustion on Windows 7 x64 opt caused by SAB allocation policy
This patch will neuter the tests that have the problematic behavior (and probably more), but it would be better to fix the allocation behavior.
What about this mitigation:
 - like ArrayBuffer, initial SAB simply malloc()s, no guard region
 - a SAB has a bool "has been postMessage()d" that starts false and flips to true when postMessage()d
 - when asm.js is linked and the SAB does not have a guard region already:
   - if the bool is false, create a guard region and copy over (ArrayBuffer::prepareForAsmJS), which mutates the SAB, but that's ok, it's not shared yet
   - if the bool is true, fail linking
?  I think this limitation will be easy to work around in practice and existing SAB code won't hit it.
(In reply to Lars T Hansen [:lth] from comment #5)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cfdf76e4a09

This is just a test to see if avoiding overallocation when we can is enough to make the tests pass.

I like Luke's idea and will look into it in detail.
Assignee: nobody → lhansen
This stops the test case from failing: it allocates large buffers only when the requested size is valid for asm.js, otherwise it allocates smaller buffers.  I'd like to land this as a stopgap but keep the bug open and to implement a more general solution later.
Attachment #8695779 - Flags: review?(luke)
Comment on attachment 8695779 [details] [diff] [review]
allocate less, when we can

Review of attachment 8695779 [details] [diff] [review]:
-----------------------------------------------------------------

Hah, very creative.
Attachment #8695779 - Flags: review?(luke) → review+
Keywords: leave-open
Depends on: 1234397
Assignee: lhansen → nobody
Priority: -- → P3
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
Flags: needinfo?(sdetar)
There's been no activity because shared memory has been turned off due to Spectre / Meltdown, and the 40% of our users who are still on Windows 7 have not had a chance to run into it...  It may be that we will "fix" this by instead duing bug 1507759, but until we know for sure we should leave this open.
Flags: needinfo?(sdetar)

I'm going to close that because we have various policy hacks in place for overallocation (generalizations of Luke's patch above) and they will almost certainly hide this problem effectively. We're also working on other mitigations that will also apply to the present bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Assignee: nobody → lhansen
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: