941837 - js::gc::MapAlignedPages causes virtual address space fragmentation on Windows

Reporter

Description

•

11 years ago

During the recent OOM tree closure we noticed an alternating pattern in VM usage on releng machines: 1MB used / 1MB free / 1MB used / etc.

An xperf trace shows that these allocations come from js::gc::MapAlignedPages, via GCHelperThread::threadLoop. The code wants an allocation of 1MB aligned at 1MB, so it performs a 2MB VirtualAlloc to guarantee alignment, then frees the 2MB and performs a 1MB VirtualAlloc.

http://hg.mozilla.org/mozilla-central/file/c7cbfa315d46/js/src/gc/Memory.cpp#l55

The alignment math takes the result from the 2MB allocation and pushes it up to the next strictly-greater aligned value. But in the case where the result was already 1MB aligned, this leaves a gap of 1MB. I think we should adjust the pointer only if it wasn't already aligned.

(Away)

Reporter

Comment 1

•

11 years ago

Attached patch jsalign (obsolete) — Details — Splinter Review

Haven't tested yet. The math would be easier if we could be certain that alignment is always a power of 2.

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Whiteboard: [MemShrink]

Nathan Froyd [:froydnj]

Comment 2

•

11 years ago

Why such a strict alignment?  GC tricks?

Flags: needinfo?(wmccloskey)

Benjamin Smedberg

Comment 3

•

11 years ago

AIUI, there's something about being able to look up a header from any address; I don't know if the 1MB number is merely the number we decided on. But this is definitely not a good pattern! Some other possibilities here are:

* reserve memory 32 or 64MB at a time, and then commit/free it in 1MB chunks internally
* instead of hardcoding 1MB, use the natural allocation granularity of Windows, which is 64k on x86 (see SYSTEM_INFO.dwAllocationGranularity)
* Don't use magic alignment to find the header: instead use VirtualQuery to ask Windows what the base address is of the current address and use that

Terrence Cole [:terrence]

Assignee

Comment 4

•

11 years ago

(In reply to Nathan Froyd (:froydnj) from comment #2)
> Why such a strict alignment?  GC tricks?

Correct.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> AIUI, there's something about being able to look up a header from any
> address; I don't know if the 1MB number is merely the number we decided on.
> But this is definitely not a good pattern! Some other possibilities here are:
> 
> * reserve memory 32 or 64MB at a time, and then commit/free it in 1MB chunks
> internally

B2G is already complaining about us reserving 1MiB at a time. (Note, these are at the JSRuntime granularity, so even reserving decommited at those sizes will run into trouble until we can fix JSRuntime-per-worker.)

> * instead of hardcoding 1MB, use the natural allocation granularity of
> Windows, which is 64k on x86 (see SYSTEM_INFO.dwAllocationGranularity)

Overhead is proportional to ChunkSize. As I recall 64K happened to work out terribly, but I'd have to re-do the math.

> * Don't use magic alignment to find the header: instead use VirtualQuery to
> ask Windows what the base address is of the current address and use that

If we out-of-line any code in this path, we take a tremendous performance hit. Making the chunk size dynamic at all runs into the same issues.

What we used to do was allocate in a loop until windows happened to give us a 1MiB block -- I don't really think the new technique is worse, even with this awkward side-effect. David's suggestion that we round down is the right one: I really, really should have done that in the first place.

Assignee: nobody → terrence

Terrence Cole [:terrence]

Assignee

Comment 5

•

11 years ago

Attached patch reduce_windows_fragmentation-v0.diff (obsolete) — Details — Splinter Review

Great find, David! This patch should also reduce our computation overhead to grab a new block in common cases.

Attachment #8336343 - Flags: review?(dmajor)

Terrence Cole [:terrence]

Assignee

Comment 6

•

11 years ago

Oh, sorry, I did not realize you attached a patch too, David! Rounding down would certainly work too.

Bill, which of the two patches here do you think is the better approach?

(Away)

Reporter

Comment 7

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #6)
> Oh, sorry, I did not realize you attached a patch too, David! Rounding down
> would certainly work too.
> 
> Bill, which of the two patches here do you think is the better approach?

I think my patch has the same results as yours (if I did my math right). It only rounds down on the "already aligned" case -- rounding down in general is probably not correct.

(Away)

Reporter

Comment 8

•

11 years ago

And it looks like this happens on unix too:
    void *front = (void *)(uintptr_t(region) + (alignment - offset));

(Away)

Reporter

Comment 9

•

11 years ago

Comment on attachment 8336343 [details] [diff] [review]
reduce_windows_fragmentation-v0.diff

I don't really have any JS experience so I'll send this review to Bill.

Attachment #8336343 - Flags: review?(dmajor) → review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

11 years ago

Let's use AlignBytes from jsutil.h. I don't think there's any reason we'll ever use a non-power of two alignment.

Also, bug 671702 contains a pretty exhaustive discussion of our previous attempt to switch to 64KB chunks. It's always worth revisiting old decisions, of course.

Terrence Cole [:terrence]

Assignee

Comment 11

•

11 years ago

Attached patch reduce_heap_fragmentation-v1.diff — Details — Splinter Review

(In reply to Bill McCloskey (:billm) from comment #10)
> Let's use AlignBytes from jsutil.h. I don't think there's any reason we'll
> ever use a non-power of two alignment.

Great! I didn't know we had an existing template for that. Using % twice does feel a bit like cheating though ;-).

> Also, bug 671702 contains a pretty exhaustive discussion of our previous
> attempt to switch to 64KB chunks. It's always worth revisiting old
> decisions, of course.

Thanks for the link!

Attachment #8336310 - Attachment is obsolete: true

Attachment #8336343 - Attachment is obsolete: true

Attachment #8336343 - Flags: review?(wmccloskey)

Attachment #8336447 - Flags: review?(wmccloskey)

(Away)

Reporter

Comment 12

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #11)
> Great! I didn't know we had an existing template for that. Using % twice
> does feel a bit like cheating though ;-).

It's odd that it uses % at all given that it asserts IsPowerOfTwo. Should be able to just do bit math.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 13

•

11 years ago

Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

Review of attachment 8336447 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I don't know why AlignBytes needs to use % at all. I'd rather use it than anything else though.

Thanks very much for tracking this down, David!

Attachment #8336447 - Flags: review?(wmccloskey) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

11 years ago

Flags: needinfo?(wmccloskey)

Terrence Cole [:terrence]

Assignee

Comment 14

•

11 years ago

Attached file test.cpp — Details

Seems modern compilers are smart enough to figure it out on their own:

g++ -O3 =>
0000000000400750 <_Z4testm>:
  400750:       48 89 f8                mov    %rdi,%rax
  400753:       ba 24 08 40 00          mov    $0x400824,%edx
  400758:       be 01 00 00 00          mov    $0x1,%esi
  40075d:       48 f7 d8                neg    %rax
  400760:       25 ff ff 0f 00          and    $0xfffff,%eax
  400765:       48 8d 0c 38             lea    (%rax,%rdi,1),%rcx
  400769:       48 8b 3d e0 08 20 00    mov    0x2008e0(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  400770:       31 c0                   xor    %eax,%eax
  400772:       e9 69 fe ff ff          jmpq   4005e0 <__fprintf_chk@plt>
  400777:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40077e:       00 00 

clang++ =>
0000000000400710 <_Z4testm>:
  400710:       89 fa                   mov    %edi,%edx
  400712:       f7 da                   neg    %edx
  400714:       48 81 e2 ff ff 0f 00    and    $0xfffff,%rdx
  40071b:       48 01 fa                add    %rdi,%rdx
  40071e:       48 8b 3d 2b 09 20 00    mov    0x20092b(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  400725:       be 24 08 40 00          mov    $0x400824,%esi
  40072a:       30 c0                   xor    %al,%al
  40072c:       e9 bf fe ff ff          jmpq   4005f0 <fprintf@plt>
  400731:       66 66 66 66 66 66 2e    data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  400738:       0f 1f 84 00 00 00 00 
  40073f:       00

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 15

•

11 years ago

That's because the compiler can inline the call and see that |alignment| is a constant. In our code it can't because alignment is passed in from a different translation unit (jsgc.cpp). Although I guess the unified sources might change that.

It really doesn't matter though. Two extra % ops is nothing here.

Terrence Cole [:terrence]

Assignee

Comment 16

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #15)
> That's because the compiler can inline the call and see that |alignment| is
> a constant. In our code it can't because alignment is passed in from a
> different translation unit (jsgc.cpp). Although I guess the unified sources
> might change that.
> 
> It really doesn't matter though. Two extra % ops is nothing here.

D'oh! Right you are. If I read both parameters from getenv, I get this code for clang++:

0000000000400710 <_Z4testmm>:
  400710:       48 89 f8                mov    %rdi,%rax
  400713:       31 d2                   xor    %edx,%edx
  400715:       48 f7 f6                div    %rsi
  400718:       48 89 f0                mov    %rsi,%rax
  40071b:       48 29 d0                sub    %rdx,%rax
  40071e:       31 d2                   xor    %edx,%edx
  400720:       48 f7 f6                div    %rsi
  400723:       48 01 fa                add    %rdi,%rdx
  400726:       48 8b 3d 23 09 20 00    mov    0x200923(%rip),%rdi        # 601050 <stdout@@GLIBC_2.2.5>
  40072d:       be 64 08 40 00          mov    $0x400864,%esi
  400732:       30 c0                   xor    %al,%al
  400734:       e9 b7 fe ff ff          jmpq   4005f0 <fprintf@plt>
  400739:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

(Away)

Reporter

Comment 17

•

11 years ago

Yeah, I would definitely expect that if the value is known at compile time. I thought that wasn't the case here, but it turns out that the caller passes a constant, and MSVC pushes it down into the callee. Impressive!

   44 68706203 8b1d6c608468    mov     ebx,dword ptr [mozjs!_imp__VirtualAlloc (6884606c)]
   44 68706209 56              push    esi
   63 6870620a 6a04            push    4
   63 6870620c 6800200000      push    2000h
   63 68706211 6800002000      push    200000h
   63 68706216 6a00            push    0
   63 68706218 ffd3            call    ebx
   64 6870621a 85c0            test    eax,eax
   64 6870621c 742e            je      mozjs!js::gc::MapAlignedPages+0x5c (6870624c)
   67 6870621e 6800800000      push    8000h
   67 68706223 8bf0            mov     esi,eax
   67 68706225 6a00            push    0
   67 68706227 56              push    esi
   67 68706228 ff1568608468    call    dword ptr [mozjs!_imp__VirtualFree (68846068)]
   67 6870622e 8bc6            mov     eax,esi
   68 68706230 6a04            push    4
   68 68706232 25ffff0f00      and     eax,0FFFFFh

(Away)

Reporter

Comment 18

•

11 years ago

With this change, browser-chrome's vsizeMaxContiguous stays in much better shape for most of the run, though it still wears down at the end:

http://people.mozilla.org/~sguo/mochimem/viewer.html?url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1385065562/mozilla-central_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build184.txt.gz&url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dmajor@mozilla.com-a9643c5b477b/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build1420.txt.gz&

Nicholas Nethercote [inactive]

Comment 19

•

11 years ago

I reckon this is worth backporting to Aurora, maybe even Beta, given that it's such a simple change.

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Blocks: 859955

Terrence Cole [:terrence]

Assignee

Comment 20

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=bbb9fe129495

Shu-yu Guo [:shu]

Comment 21

•

11 years ago

Major kudos.

Ryan VanderMeulen [:RyanVM]

Comment 22

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #20)
> https://tbpl.mozilla.org/?tree=Try&rev=bbb9fe129495

The XP Debug JP purple is not from you.

SHIP IT

Terrence Cole [:terrence]

Assignee

Comment 23

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #22)
> (In reply to Terrence Cole [:terrence] from comment #20)
> > https://tbpl.mozilla.org/?tree=Try&rev=bbb9fe129495
> 
> The XP Debug JP purple is not from you.
> 
> SHIP IT

Open the tree?

Ryan VanderMeulen [:RyanVM]

Comment 24

•

11 years ago

See #jsapi? ;)

Terrence Cole [:terrence]

Assignee

Comment 25

•

11 years ago

Landing on a CLOSED TREE with Ryan's blessing:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e05ba13f8bfa

Robert Kaiser

Comment 26

•

11 years ago

Just out of interest, do we know when this behavior that causes the fragmentation has been introduced? I wonder if that aligns with any of our regressions in OOM crash volume.

Terrence Cole [:terrence]

Assignee

Comment 27

•

11 years ago

I landed this bug in Jan 2012: almost 2 years ago.

Robert Kaiser

Comment 28

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #27)
> I landed this bug in Jan 2012: almost 2 years ago.

Thanks. FYI, this doesn't directly relate to the regressions we investigate(d) in bug 837835, but as this bug was already present before the regressions happened, it's always possible that it played a part (e.g. by its symptoms becoming worse due to other changes).

Terrence Cole [:terrence]

Assignee

Comment 29

•

11 years ago

Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 720439.

User impact if declined: Increased probability of OOM.

Testing completed (on m-c, etc.): Try and M-C builds as well as http://people.mozilla.org/~sguo/mochimem/viewer.html?url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/1385065562/mozilla-central_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build184.txt.gz&url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dmajor@mozilla.com-a9643c5b477b/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm69-tests1-windows-build1420.txt.gz

Risk to taking this patch (and alternatives if risky): Low.

String or IDL/UUID changes made by this patch: None.

Attachment #8336447 - Flags: approval-mozilla-beta?

Attachment #8336447 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Comment 30

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/e05ba13f8bfa

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla28

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

status-firefox25: --- → affected

status-firefox26: --- → affected

status-firefox27: --- → affected

status-firefox28: --- → fixed

tracking-firefox26: --- → +

Lukas Blakk [:lsblakk] use ?needinfo

Comment 31

•

11 years ago

Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

Talked this over in stability meeting this morning. We're going to take this to beta because it is very likely to help us with some memory issues and there's been a significant increase in OOM-caused crashes since FF25 (that we know now).  This patch should help and we hope will be obvious if it causes other issues - should that occur, we can back out in a week for final beta.

Attachment #8336447 - Flags: approval-mozilla-beta?

Attachment #8336447 - Flags: approval-mozilla-beta+

Attachment #8336447 - Flags: approval-mozilla-aurora?

Attachment #8336447 - Flags: approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 32

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/85c967274d60
https://hg.mozilla.org/releases/mozilla-beta/rev/6fd1cc2d5c09

Should we consider this for ESR24 as well?

status-b2g18: --- → wontfix

status-b2g-v1.1hd: --- → wontfix

status-b2g-v1.2: --- → affected

status-firefox25: affected → wontfix

status-firefox26: affected → fixed

status-firefox27: affected → fixed

status-firefox-esr17: --- → wontfix

status-firefox-esr24: --- → affected

Ryan VanderMeulen [:RyanVM]

Comment 33

•

11 years ago

https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/6fd1cc2d5c09

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v1.2: affected → fixed

User Dderss

Comment 34

•

11 years ago

I am on fully updated beta (26) channel, and after last few days I can say this fix did not help with my virtual memory fragmentation issue -- the progression is here: https://bugzilla.mozilla.org/show_bug.cgi?id=930797#c17 . "vsize-max-contiguous" still goes to zero every day or two, what inevitably crashes FF. I am able to avoid crashes only by emptively exiting FF pre-emptively (on the first sight of visual corruption). Any guesses what could be done about this?

(Away)

Reporter

Comment 35

•

11 years ago

(In reply to User Dderss from comment #34)

vsize-max-contiguous can be misleading in isolation. It's true that fragmentation can decrease max-contiguous, but high memory usage can also decrease max-contiguous. We need to be more careful before blaming fragmentation (developers too!).

In bug 930797 comment 17, your vsize grows to very high levels (approaching 4gb) so I think the real problem is high memory usage rather than fragmentation. The code change in this bug might delay your crash, but it won't fix it. I would be interested in seeing an xperf trace of your scenario. If you are willing to collect this data, I will try to write up instructions in the next couple of days. In any case, let's move the discussion back to bug 930797.

Robert Kaiser

Comment 36

•

11 years ago

(In reply to User Dderss from comment #34)
> I am on fully updated beta (26) channel, and after last few days I can say
> this fix did not help with my virtual memory fragmentation issue -- the
> progression is here: https://bugzilla.mozilla.org/show_bug.cgi?id=930797#c17


Let's keep discussing your issue to that bug there, as it obviously has causes different than what's being discussed here.
Memory fragmentation is one piece of the puzzle of memory issues Firefox has been seeing at different people's computers, and one "easy" case of it has been solved here, so that fix is worthwhile. As David says, it's not a silver bullet that fixes everything - in your case, it might not have much of an influence at all. We know we need to keep investigating.
And there's pretty surely not a silver bullet (i.e. one single patch that will fix everything) for the memory issues at all anyhow, esp. as the problems didn't arrive all at once, so they won't be fixed all at once. Your problem might be mostly one issue, another user's problems might be due to a completely different issue - and some issues might be eased with the fix here as well.

Chris Peterson [:cpeterson]

Comment 37

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #32)
> Should we consider this for ESR24 as well?

Terrence, do we want to backport this VirtualAlloc fix to ESR 24?

tracking-firefox-esr24: --- → ?

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Assignee

Comment 38

•

11 years ago

Yes, absolutely; I thought that Ryan had already pushed to all branches. Good catch!

Flags: needinfo?(terrence)

Ryan VanderMeulen [:RyanVM]

Comment 39

•

11 years ago

Can't push without approval!

Terrence Cole [:terrence]

Assignee

Comment 40

•

11 years ago

Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Extremely low risk patch which greatly reducing heap fragmentation.
User impact if declined: More OOMs than otherwise.
Fix Landed on Version: 26, 27, 28
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None

Attachment #8336447 - Flags: approval-mozilla-esr24?

u279076

Comment 41

•

11 years ago

David, can you please confirm this is fixed in Firefox 26, 27, and 28?

Flags: needinfo?(dmajor)

(Away)

Reporter

Comment 42

•

11 years ago

(In reply to Anthony Hughes, QA Mentor (:ashughes) [unavailable until Jan 2, 2014] from comment #41)
> David, can you please confirm this is fixed in Firefox 26, 27, and 28?

Confirmed based on comment 30 and 32, plus I examined v26 under a debugger to make sure.

Flags: needinfo?(dmajor)

u279076

Comment 43

•

10 years ago

(In reply to David Major [:dmajor] from comment #42)

Thanks for your help, David.

Status: RESOLVED → VERIFIED

status-firefox26: fixed → verified

status-firefox27: fixed → verified

status-firefox28: fixed → verified

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

10 years ago

status-firefox-esr24: affected → wontfix

tracking-firefox-esr24: ? → -

Lukas Blakk [:lsblakk] use ?needinfo

Comment 44

•

10 years ago

Comment on attachment 8336447 [details] [diff] [review]
reduce_heap_fragmentation-v1.diff

I appreciate that this landed in 26 and has a significant impact but it doesn't meet the criteria for uplift to ESR24 branch and there's no indication from users that this should be given an exception. 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process

Attachment #8336447 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24-

(Away)

Reporter

Updated

•

10 years ago

Blocks: defrag

jsalign 11 years ago (Away) 1.13 KB, patch		Details \| Diff \| Splinter Review
reduce_windows_fragmentation-v0.diff 11 years ago Terrence Cole [:terrence] 1.98 KB, patch		Details \| Diff \| Splinter Review
reduce_heap_fragmentation-v1.diff 11 years ago Terrence Cole [:terrence] 3.34 KB, patch	billm : review+ lsblakk : approval-mozilla-aurora+ lsblakk : approval-mozilla-beta+ lsblakk : approval-mozilla-esr24-	Details \| Diff \| Splinter Review
test.cpp 11 years ago Terrence Cole [:terrence] 526 bytes, text/x-c++src		Details