Closed Bug 1337561 Opened 3 years ago Closed 3 years ago
64-bit browser runs out of memory loading a Wasm module on a computer with 128GB of RAM, and there is no callstack
5.91 KB, patch
|Details | Diff | Splinter Review|
1.33 KB, patch
|Details | Diff | Splinter Review|
1018 bytes, patch
|Details | Diff | Splinter Review|
2.79 KB, patch
|Details | Diff | Splinter Review|
Oh, also, at the time of the OOM, process monitor shows browser memory usage to be about 1GB, so this does not seem to be an actual OOM, but something else.
Luke, I wonder if this is from bug 1334933 - maybe our JIT memory gets fragmented because we take random chunks and big wasm allocations no longer fit :( I can check that tomorrow. Thoughts?
Also reported bug 1337585 which occurs on the same test case (although note slightly different .html urls to raise the situation), possibly not related root causes in any way, but linking since they happen on the same page.
Note that we're doing a 67 MB (!!) code allocation for the wasm module. That's half our quota on 32-bit and 10% on 64-bit.
If we have to support 70 MB allocations, picking pages at random fragments way too much. This patch instead uses a cursor that starts at page 0. Then we randomly skip a page to keep some amount of randomness, and if we free a page we move the cursor back to that position. I added logging and we now mostly stay within the first few MB when loading websites. Even if I reduce the limit on 64-bit to 128 MB I can open a few websites and still load the wasm module without problems.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Attachment #8834791 - Flags: review?(luke)
FWIW, both (moz)jemalloc and the GC use 1MiB-aligned 1MiB chunks, so having an allocation that straddles two 1MiB regions fragments the address space for their purposes. Could we try to make sure the resulting allocation doesn't straddle more chunks than it needs to? This would limit the randomness of allocations where (size % 1MiB) is close to 1MiB (if it's 0, a 1MiB-aligned allocation would be best). But if the existing patch practically eliminates the problem, maybe the above is moot.
Comment on attachment 8834791 [details] [diff] [review] Patch Review of attachment 8834791 [details] [diff] [review]: ----------------------------------------------------------------- Seems like a nice simple heuristic. I wonder if, on 64-bit, we could have a much bigger MaxCodePagesPerProcess, though -- like 1gb -- given that 2^47 is so much bigger.
Attachment #8834791 - Flags: review?(luke) → review+
Jukka: 67mb is enormous (>3x bigger than previous big engines). Is this an opt build?
Yes, it is an optimized build, for -Oz. -O3 builds of UE4 would be about 10MB bigger than -Oz, but we haven't seen big performance diff between -O3 and -Oz, so have been opting for -Oz instead. Though the build does have function names intact to be able to profile performance. Those cost 8MB in added disk size. I don't know why it would grow to 67MB in memory, the disk size of the uncompressed .wasm is 41MB. Optimized -Oz builds without function names will be 33MB. Developers need to be able to develop too, so even unoptimized builds need to work, otherwise it will hurt iteration times and not allow people to debug easily. (Currently -O0 and -O1 builds of Wasm do not work, because they run into some kind of limit as well, it's been on my todo to do a test case of that to file)
Thanks for the quick review. Should we file a follow-up bug to investigate the code size? (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #6) > But if the existing patch practically eliminates the problem, maybe the > above is moot. Note that we reserve our code memory under JS_Init, so what we do in allocate() does not affect what's available for GC/jemalloc. We have to see how it holds up on 32-bit when there are multiple websites running wasm, but for now I think it makes sense to pick the first available chunk from our range so we have a large continuous range of pages available near the end for other code allocations.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/867bd491574a Fix executable page allocator to avoid fragmenting the JIT code space. r=luke
Hey, could we also change the error message to say "wasm code size too large (was xxx MB)" instead of a "out of memory" (which does not have a callstack)? I spent a good hour or so trying to figure out what exactly was the source. It sounds like the fix still leaves chance for people not to be able to run large unoptimized codebases, so having a clear error message would be helpful to pinpoint if this needs to be revisited later. If there is a good understanding that .wasm disk size is not even expected to correlate tightly to the reported 67MB above, then no issue there. Otoh if it does sound like something fishy, then might be worth to check.
Bug 1282063 also wants error messages to be more fine-grained (in that case, a stack frame that is way too large).
I filed bgu 1337723 for the code allocation OOM.
(In reply to Jan de Mooij [:jandem] from comment #10) > Note that we reserve our code memory under JS_Init, so what we do in > allocate() does not affect what's available for GC/jemalloc. Aah, interesting. I knew a change happened there recently but I hadn't seen the code yet. It looks like that region won't necessarily be 1MiB-aligned, but that should waste at most one chunk.
Jukka: ok, 41mb (33mb after stripping names; names doing allocate code memory so this part isn't hurting here) .wasm; wow, that is just simply much bigger than the previous Unity .wasms I was looking at (~14mb).
One difference is that Unity links against the game directly and does DCE to eliminate unused bits of the engine, whereas the UE4 engine is the whole engine for UE4 "Blueprints" projects (which are essentially data-only projects), so there is no DCE opportunity, but on the upside, they can then share (and cache) the same .wasm across multiple projects without having to rebuild.
Unfortunately I still see the OOM occur :( , although now it's much more rare, and only happens perhaps after reloading the page 10 to 20 times (which I do often now as I'm iterating on the demos).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I wonder if we should GC/discard wasm code more aggressively? 640 MB is quite a lot on 64-bit, maybe there's fragmentation going on but still.
Do we collect any data that could help diagnose issues on a remote machine? A fragmentation map of the JIT area would be really cool, but for starters I'd settle for an analogue of "vsize-max-contiguous".
Jukka: could this just be another manifestation of bug 1337585 and what's really happening is that, on each reload, because the console is open, the old windows are being kept alive and thus keeping their code allocations alive and so you're simply exhausting JIT memory?
Oh, that's certainly possible. The OOM never occurs on first page load. Let's check this again once bug 1337585 is fixed.
Well the bug that's now duped against (console open leaks everything) has been open since 2014; may take some prodding to get it fixed. In the meantime, can you repro without console open?
Ok, looks like the OOM is still occurring :/ Tested on both 64-bit and 32-bit Nightlies while keeping console window closed so that it won't cause console leaks, and with 266GB of free hard disk space available on C:\, so that low disk space is ruled out (which was a possible issue in bug 1337585)
I feel like there's a lot of speculation going on and it would be better to just look at this in a debugger. The STR make it sound like this is easy to reproduce, but I haven't had any luck with a Win64 nightly. Is there anything missing from the STR?
How many iterations does it take to OOM on 32-bit and 64-bit?
Here is another test case: https://s3.amazonaws.com/mozilla-games/tmp/2017-02-21-SunTemple/SunTemple.html 64-bit FF Nightly refused to load on the sixth run (clicked reload icon five times after initial page load) 32-bit FF Nightly refused to load on the second run (clicked reload once after initial page load)
This might be the same issue as bug 1341090, where the executable memory is saturated, but this time across several zones (since there are reloads), making the fix there inefficient. Out of curiosity, would using the shiny new meta directive asking for a new process help here? It doesn't seem to be used on the demo in comment 28, and I wonder if that would trigger creation of a new process every time we reload the page. Of course, this doesn't help on browsers/versions that don't support the meta directive, but it would be a workaround on Nightly at least.
The new HTTP header does not affect 64-bit browser builds, it is ignored there. For 32-bit browser builds, it might have an effect (although S3 does not allow adding custom HTTP headers).
So, at least for 64-bit, it seems like we could have a much bigger virtual address space reservation for code. 32-bit is harder given such a huge allocation.
(In reply to Luke Wagner [:luke] from comment #31) > So, at least for 64-bit, it seems like we could have a much bigger virtual > address space reservation for code. 32-bit is harder given such a huge > allocation. That sounds reasonable to me. It would make it harder to simplify jumps on 64-bit if we go larger than INT32_MAX, but the code supports that still. Regarding 32-bit, if the limit is 128 MB and we have allocations larger than 60 MB there's not much we can do. Benjamin and I were talking about some GC heuristics but it's still pretty fragile on 32-bit.
(In reply to Jan de Mooij [:jandem] from comment #32) Yeah, INT32_MAX seems pretty generous and way better than 640mb. For 32-bit: with multi-e10s, I was thinking that a process could maintain a bool that says "I'm fragmented" and once that bool was set, the parent would try to never reuse that process again for subsequent loads. One simple way to set that bool would be after a code allocation failed (so that reloading again would work). But we could also be more pro-active and maintain a high-water mark in the code allocator and set the bool if it got beyond a certain threshold.
(In reply to Jan de Mooij [:jandem] from comment #32) Oh, another shorter-term idea for 32-bit: when code allocation fails, could we call the cx->runtime->largeAllocationFailureCallback and retry (like pod_callocCanGC does)? The large allocation callback does a synchronous purging GC/CC/GC which would hopefully blow away all JIT code and let the retry succeed.
(In reply to Luke Wagner [:luke] from comment #33) > For 32-bit: with multi-e10s, I was thinking that a process could maintain a > bool that says "I'm fragmented" and once that bool was set, the parent would > try to never reuse that process again for subsequent loads. Bug 1305091 :) Maybe we should hook that up to our code... Using the allocation failure callback is a great idea.
(In reply to Jan de Mooij [:jandem] from comment #35) > Bug 1305091 :) Maybe we should hook that up to our code... Oh sweet, that sounds perfect!
Woohoo, the largeAllocationFailureCallback looks like it does the trick, from my initial testing, even with the 128mb limit.
To minimize risk for branch landing, this patch just adds the retry case to wasm allocations. Jukka: could you try out this patch?
Comment on attachment 8840025 [details] [diff] [review] use-large-alloc-failure-callback Review of attachment 8840025 [details] [diff] [review]: ----------------------------------------------------------------- Great, thanks!
Attachment #8840025 - Flags: review?(jdemooij) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/7771cc7440c7 Baldr: call largeAllocationCallback and retry if executable allocation fails (r=jandem)
Oh right, and this patch bumps the reserved space on 64-bit to 1gb.
Attachment #8840153 - Flags: review?(jdemooij)
Attachment #8840153 - Flags: review?(jdemooij) → review+
Status: REOPENED → RESOLVED
Closed: 3 years ago → 3 years ago
Resolution: --- → FIXED
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8fad3aa8fd46 Increase executable code allocation size on 64-bit (r=jandem)
Approval Request Comment [Feature/Bug causing the regression]: bug 1334933 [User impact if declined]: large wasm workloads OOM [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: simple patch that (1) increases a quota limit, (2) adds one more use of a callback that is already used for similar OOM situations in ArrayBuffer
Comment on attachment 8840452 [details] [diff] [review] branch.patch more address space for wasm code on 64bit
Setting qe-verify- based on Luke's assessment on manual testing needs (see Comment 44) and the fact that this fix has automated coverage.
The STR from comment 0 was based on Wasm 0x0D prerelease version, a STR URL with release version 0x01 is available here: https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html Verified the fix to work on that URL in 32-bit FF Nightly and 64-bit FF Nightly. Opened the page and reloaded it 10 times in a row. However we do still get OOMs easily in Wasm compilation if one happens to have the web page console open when reloading, which is very common when developing web pages. That is bug 1337585 and bug 1084605.
Status: RESOLVED → VERIFIED
Attachment #8840025 - Flags: feedback?(jujjyl) → feedback+
Dunno what happened in the beta branch, but https://hg.mozilla.org/releases/mozilla-beta/annotate/tip/js/src/jit/ProcessExecutableMemory.cpp#l393 / 53.0b4 still allocates 640Mb and not 1Gb.
Ugh, that's because status-firefox53 was set to fixed back in comment 18 and never got reset when the bug was reopened and the final patch was approved for Aurora. Pushing with carried over approvals since we clearly don't want to ship Firefox 53 (and *only* Firefox 53) with this discrepancy. https://hg.mozilla.org/releases/mozilla-beta/rev/a3d1f3dbc5bcae03bbe1340ff3711ca3b416030b
You need to log in before you can comment on or make changes to this bug.