Closed Bug 941837 Opened 6 years ago Closed 6 years ago

js::gc::MapAlignedPages causes virtual address space fragmentation on Windows

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox25 --- wontfix
firefox26 + verified
firefox27 --- verified
firefox28 --- verified
firefox-esr17 --- wontfix
firefox-esr24 - wontfix
b2g18 --- wontfix
b2g-v1.1hd --- wontfix
b2g-v1.2 --- fixed

People

(Reporter: dmajor, Assigned: terrence)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [MemShrink])

Attachments

(2 files, 2 obsolete files)

During the recent OOM tree closure we noticed an alternating pattern in VM usage on releng machines: 1MB used / 1MB free / 1MB used / etc.

An xperf trace shows that these allocations come from js::gc::MapAlignedPages, via GCHelperThread::threadLoop. The code wants an allocation of 1MB aligned at 1MB, so it performs a 2MB VirtualAlloc to guarantee alignment, then frees the 2MB and performs a 1MB VirtualAlloc.

http://hg.mozilla.org/mozilla-central/file/c7cbfa315d46/js/src/gc/Memory.cpp#l55

The alignment math takes the result from the 2MB allocation and pushes it up to the next strictly-greater aligned value. But in the case where the result was already 1MB aligned, this leaves a gap of 1MB. I think we should adjust the pointer only if it wasn't already aligned.
Attached patch jsalign (obsolete) — Splinter Review
Haven't tested yet. The math would be easier if we could be certain that alignment is always a power of 2.
Whiteboard: [MemShrink]
Why such a strict alignment?  GC tricks?
Flags: needinfo?(wmccloskey)
AIUI, there's something about being able to look up a header from any address; I don't know if the 1MB number is merely the number we decided on. But this is definitely not a good pattern! Some other possibilities here are:

* reserve memory 32 or 64MB at a time, and then commit/free it in 1MB chunks internally
* instead of hardcoding 1MB, use the natural allocation granularity of Windows, which is 64k on x86 (see SYSTEM_INFO.dwAllocationGranularity)
* Don't use magic alignment to find the header: instead use VirtualQuery to ask Windows what the base address is of the current address and use that
(In reply to Nathan Froyd (:froydnj) from comment #2)
> Why such a strict alignment?  GC tricks?

Correct.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> AIUI, there's something about being able to look up a header from any
> address; I don't know if the 1MB number is merely the number we decided on.
> But this is definitely not a good pattern! Some other possibilities here are:
> 
> * reserve memory 32 or 64MB at a time, and then commit/free it in 1MB chunks
> internally

B2G is already complaining about us reserving 1MiB at a time. (Note, these are at the JSRuntime granularity, so even reserving decommited at those sizes will run into trouble until we can fix JSRuntime-per-worker.)

> * instead of hardcoding 1MB, use the natural allocation granularity of
> Windows, which is 64k on x86 (see SYSTEM_INFO.dwAllocationGranularity)

Overhead is proportional to ChunkSize. As I recall 64K happened to work out terribly, but I'd have to re-do the math.

> * Don't use magic alignment to find the header: instead use VirtualQuery to
> ask Windows what the base address is of the current address and use that

If we out-of-line any code in this path, we take a tremendous performance hit. Making the chunk size dynamic at all runs into the same issues.

What we used to do was allocate in a loop until windows happened to give us a 1MiB block -- I don't really think the new technique is worse, even with this awkward side-effect. David's suggestion that we round down is the right one: I really, really should have done that in the first place.
Assignee: nobody → terrence
Great find, David! This patch should also reduce our computation overhead to grab a new block in common cases.
Attachment #8336343 - Flags: review?(dmajor)
Oh, sorry, I did not realize you attached a patch too, David! Rounding down would certainly work too.

Bill, which of the two patches here do you think is the better approach?
(In reply to Terrence Cole [:terrence] from comment #6)
> Oh, sorry, I did not realize you attached a patch too, David! Rounding down
> would certainly work too.
> 
> Bill, which of the two patches here do you think is the better approach?

I think my patch has the same results as yours (if I did my math right). It only rounds down on the "already aligned" case -- rounding down in general is probably not correct.
And it looks like this happens on unix too:
    void *front = (void *)(uintptr_t(region) + (alignment - offset));
Comment on attachment 8336343 [details] [diff] [review]
reduce_windows_fragmentation-v0.diff

I don't really have any JS experience so I'll send this review to Bill.
Attachment #8336343 - Flags: review?(dmajor) → review?(wmccloskey)
Let's use AlignBytes from jsutil.h. I don't think there's any reason we'll ever use a non-power of two alignment.

Also, bug 671702 contains a pretty exhaustive discussion of our previous attempt to switch to 64KB chunks. It's always worth revisiting old decisions, of course.
(In reply to Bill McCloskey (:billm) from comment #10)
> Let's use AlignBytes from jsutil.h. I don't think there's any reason we'll
> ever use a non-power of two alignment.

Great! I didn't know we had an existing template for that. Using % twice does feel a bit like cheating though ;-).

> Also, bug 671702 contains a pretty exhaustive discussion of our previous
> attempt to switch to 64KB chunks. It's always worth revisiting old
> decisions, of course.

Thanks for the link!
Attachment #8336310 - Attachment is obsolete: true
Attachment #8336343 - Attachment is obsolete: true
Attachment #8336343 - Flags: review?(wmccloskey)
Attachment #8336447 - Flags: review?(wmccloskey)
(In reply to Terrence Cole [:terrence] from comment #11)
> Great! I didn't know we had an existing template for that. Using % twice
> does feel a bit like cheating though ;-).

It's odd that it uses % at all given that it asserts IsPowerOfTwo. Should be able to just do bit math.
Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

Review of attachment 8336447 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I don't know why AlignBytes needs to use % at all. I'd rather use it than anything else though.

Thanks very much for tracking this down, David!
Attachment #8336447 - Flags: review?(wmccloskey) → review+
Flags: needinfo?(wmccloskey)
Attached file test.cpp
Seems modern compilers are smart enough to figure it out on their own:

g++ -O3 =>
0000000000400750 <_Z4testm>:
  400750:       48 89 f8                mov    %rdi,%rax
  400753:       ba 24 08 40 00          mov    $0x400824,%edx
  400758:       be 01 00 00 00          mov    $0x1,%esi
  40075d:       48 f7 d8                neg    %rax
  400760:       25 ff ff 0f 00          and    $0xfffff,%eax
  400765:       48 8d 0c 38             lea    (%rax,%rdi,1),%rcx
  400769:       48 8b 3d e0 08 20 00    mov    0x2008e0(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  400770:       31 c0                   xor    %eax,%eax
  400772:       e9 69 fe ff ff          jmpq   4005e0 <__fprintf_chk@plt>
  400777:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40077e:       00 00 

clang++ =>
0000000000400710 <_Z4testm>:
  400710:       89 fa                   mov    %edi,%edx
  400712:       f7 da                   neg    %edx
  400714:       48 81 e2 ff ff 0f 00    and    $0xfffff,%rdx
  40071b:       48 01 fa                add    %rdi,%rdx
  40071e:       48 8b 3d 2b 09 20 00    mov    0x20092b(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  400725:       be 24 08 40 00          mov    $0x400824,%esi
  40072a:       30 c0                   xor    %al,%al
  40072c:       e9 bf fe ff ff          jmpq   4005f0 <fprintf@plt>
  400731:       66 66 66 66 66 66 2e    data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  400738:       0f 1f 84 00 00 00 00 
  40073f:       00
That's because the compiler can inline the call and see that |alignment| is a constant. In our code it can't because alignment is passed in from a different translation unit (jsgc.cpp). Although I guess the unified sources might change that.

It really doesn't matter though. Two extra % ops is nothing here.
(In reply to Bill McCloskey (:billm) from comment #15)
> That's because the compiler can inline the call and see that |alignment| is
> a constant. In our code it can't because alignment is passed in from a
> different translation unit (jsgc.cpp). Although I guess the unified sources
> might change that.
> 
> It really doesn't matter though. Two extra % ops is nothing here.

D'oh! Right you are. If I read both parameters from getenv, I get this code for clang++:

0000000000400710 <_Z4testmm>:
  400710:       48 89 f8                mov    %rdi,%rax
  400713:       31 d2                   xor    %edx,%edx
  400715:       48 f7 f6                div    %rsi
  400718:       48 89 f0                mov    %rsi,%rax
  40071b:       48 29 d0                sub    %rdx,%rax
  40071e:       31 d2                   xor    %edx,%edx
  400720:       48 f7 f6                div    %rsi
  400723:       48 01 fa                add    %rdi,%rdx
  400726:       48 8b 3d 23 09 20 00    mov    0x200923(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  40072d:       be 64 08 40 00          mov    $0x400864,%esi
  400732:       30 c0                   xor    %al,%al
  400734:       e9 b7 fe ff ff          jmpq   4005f0 <fprintf@plt>
  400739:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
Yeah, I would definitely expect that if the value is known at compile time. I thought that wasn't the case here, but it turns out that the caller passes a constant, and MSVC pushes it down into the callee. Impressive!

   44 68706203 8b1d6c608468    mov     ebx,dword ptr [mozjs!_imp__VirtualAlloc (6884606c)]
   44 68706209 56              push    esi
   63 6870620a 6a04            push    4
   63 6870620c 6800200000      push    2000h
   63 68706211 6800002000      push    200000h
   63 68706216 6a00            push    0
   63 68706218 ffd3            call    ebx
   64 6870621a 85c0            test    eax,eax
   64 6870621c 742e            je      mozjs!js::gc::MapAlignedPages+0x5c (6870624c)
   67 6870621e 6800800000      push    8000h
   67 68706223 8bf0            mov     esi,eax
   67 68706225 6a00            push    0
   67 68706227 56              push    esi
   67 68706228 ff1568608468    call    dword ptr [mozjs!_imp__VirtualFree (68846068)]
   67 6870622e 8bc6            mov     eax,esi
   68 68706230 6a04            push    4
   68 68706232 25ffff0f00      and     eax,0FFFFFh
I reckon this is worth backporting to Aurora, maybe even Beta, given that it's such a simple change.
Major kudos.
(In reply to Terrence Cole [:terrence] from comment #20)
> https://tbpl.mozilla.org/?tree=Try&rev=bbb9fe129495

The XP Debug JP purple is not from you.

SHIP IT
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #22)
> (In reply to Terrence Cole [:terrence] from comment #20)
> > https://tbpl.mozilla.org/?tree=Try&rev=bbb9fe129495
> 
> The XP Debug JP purple is not from you.
> 
> SHIP IT

Open the tree?
See #jsapi? ;)
Landing on a CLOSED TREE with Ryan's blessing:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e05ba13f8bfa
Just out of interest, do we know when this behavior that causes the fragmentation has been introduced? I wonder if that aligns with any of our regressions in OOM crash volume.
I landed this bug in Jan 2012: almost 2 years ago.
(In reply to Terrence Cole [:terrence] from comment #27)
> I landed this bug in Jan 2012: almost 2 years ago.

Thanks. FYI, this doesn't directly relate to the regressions we investigate(d) in bug 837835, but as this bug was already present before the regressions happened, it's always possible that it played a part (e.g. by its symptoms becoming worse due to other changes).
Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 720439.

User impact if declined: Increased probability of OOM.

Testing completed (on m-c, etc.): Try and M-C builds as well as http://people.mozilla.org/~sguo/mochimem/viewer.html?url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1385065562/mozilla-central_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build184.txt.gz&url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dmajor@mozilla.com-a9643c5b477b/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build1420.txt.gz

Risk to taking this patch (and alternatives if risky): Low.

String or IDL/UUID changes made by this patch: None.
Attachment #8336447 - Flags: approval-mozilla-beta?
Attachment #8336447 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/e05ba13f8bfa
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

Talked this over in stability meeting this morning. We're going to take this to beta because it is very likely to help us with some memory issues and there's been a significant increase in OOM-caused crashes since FF25 (that we know now).  This patch should help and we hope will be obvious if it causes other issues - should that occur, we can back out in a week for final beta.
Attachment #8336447 - Flags: approval-mozilla-beta?
Attachment #8336447 - Flags: approval-mozilla-beta+
Attachment #8336447 - Flags: approval-mozilla-aurora?
Attachment #8336447 - Flags: approval-mozilla-aurora+
I am on fully updated beta (26) channel, and after last few days I can say this fix did not help with my virtual memory fragmentation issue -- the progression is here: https://bugzilla.mozilla.org/show_bug.cgi?id=930797#c17 . "vsize-max-contiguous" still goes to zero every day or two, what inevitably crashes FF. I am able to avoid crashes only by emptively exiting FF pre-emptively (on the first sight of visual corruption). Any guesses what could be done about this?
(In reply to User Dderss from comment #34)

vsize-max-contiguous can be misleading in isolation. It's true that fragmentation can decrease max-contiguous, but high memory usage can also decrease max-contiguous. We need to be more careful before blaming fragmentation (developers too!).

In bug 930797 comment 17, your vsize grows to very high levels (approaching 4gb) so I think the real problem is high memory usage rather than fragmentation. The code change in this bug might delay your crash, but it won't fix it. I would be interested in seeing an xperf trace of your scenario. If you are willing to collect this data, I will try to write up instructions in the next couple of days. In any case, let's move the discussion back to bug 930797.
(In reply to User Dderss from comment #34)
> I am on fully updated beta (26) channel, and after last few days I can say
> this fix did not help with my virtual memory fragmentation issue -- the
> progression is here: https://bugzilla.mozilla.org/show_bug.cgi?id=930797#c17


Let's keep discussing your issue to that bug there, as it obviously has causes different than what's being discussed here.
Memory fragmentation is one piece of the puzzle of memory issues Firefox has been seeing at different people's computers, and one "easy" case of it has been solved here, so that fix is worthwhile. As David says, it's not a silver bullet that fixes everything - in your case, it might not have much of an influence at all. We know we need to keep investigating.
And there's pretty surely not a silver bullet (i.e. one single patch that will fix everything) for the memory issues at all anyhow, esp. as the problems didn't arrive all at once, so they won't be fixed all at once. Your problem might be mostly one issue, another user's problems might be due to a completely different issue - and some issues might be eased with the fix here as well.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #32)
> Should we consider this for ESR24 as well?

Terrence, do we want to backport this VirtualAlloc fix to ESR 24?
Flags: needinfo?(terrence)
Yes, absolutely; I thought that Ryan had already pushed to all branches. Good catch!
Flags: needinfo?(terrence)
Can't push without approval!
Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Extremely low risk patch which greatly reducing heap fragmentation.
User impact if declined: More OOMs than otherwise.
Fix Landed on Version: 26, 27, 28
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #8336447 - Flags: approval-mozilla-esr24?
David, can you please confirm this is fixed in Firefox 26, 27, and 28?
Flags: needinfo?(dmajor)
(In reply to Anthony Hughes, QA Mentor (:ashughes) [unavailable until Jan 2, 2014] from comment #41)
> David, can you please confirm this is fixed in Firefox 26, 27, and 28?

Confirmed based on comment 30 and 32, plus I examined v26 under a debugger to make sure.
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] from comment #42)

Thanks for your help, David.
Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

I appreciate that this landed in 26 and has a significant impact but it doesn't meet the criteria for uplift to ESR24 branch and there's no indication from users that this should be given an exception. 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process
Attachment #8336447 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24-
Blocks: defrag
You need to log in before you can comment on or make changes to this bug.