Closed Bug 1441473 Opened 2 years ago Closed Last year

ARM64 mmap loop is slow and will defeat ASLR

Categories

(Core :: JavaScript: GC, defect, P1)

ARM64
All
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: lth, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [geckoview:fxr:p2][arm64:m3])

Because ARM64 has 48-bit addresses and the JS engine basically requires them to be 47-bit, there is a workaround at the lowest level of our memory allocator that ensures that the JS engine will not see addresses with bit 47 set.  The workaround is a loop that iterates over a known address area and attempts to allocate 1MB aligned chunks in that area.

This algorithm has two problems:

- heap addresses become somewhat predictable since the starting address
  of the area is a known constant and the loop iterates predictably from low to
  high addresses every time a chunk is requested; my guess is that for a
  sufficiently large request (such as an ArrayBuffer) the address can be made
  almost completely predictable and certainly it can be made to span a range
  of predictable addresses

- the loop will tend toward O(n^2) behavior, with the unit of operation being
  a mmap call; if all our blocks are the same size this will basically be a
  first-fit allocator with little fragmentation, but we'll still step across
  all the blocks that are allocated and incur significant overhead for that,
  and there's really no reason to believe all our blocks are the same size,
  so we might fragment significantly and thus make the search even longer

Well-known block management and blacklisting algorithms might be brought to bear on these problems, and we should try to play nice with ASLR even if that means we implement our own.
Jon, this needs an owner.
Flags: needinfo?(jcoppeard)
Priority: P3 → P1
Target Milestone: --- → mozilla61
P1 means we want this for 61, right? I'd like to take this, I'll try to start committing some time to it (and related cleanup) in the next few days.
That would be excellent, thanks!

I flagged it for 61 because: some distros do build for arm64 even if we don't; the more functionality we add / bugs we fix for arm64 the more attractive building for it will be; and the more likely the mmap issues are to bite those distros.  I don't care too much if performance is bad but I do worry about aslr.
Thanks for taking this Emanuel.
Flags: needinfo?(jcoppeard)
Emanuel, have you had a chance to look at this mmap issue?

I'm tagging this bug with [geckoview:crow] because the Firefox Reality VR browser (aka "Crow") will be the first official Mozilla builds we ship for ARM64.
Flags: needinfo?(emanuel.hoogeveen)
Whiteboard: [geckoview:crow]
I apologize for the delay here, life got in the way :( Leaving the needinfo for myself to remind me to prioritize this.
Out of curiosity and since it affects sparc64 as well, what is the intended way to fix this?

The extended address spaces are also available on x86_64 now in the Linux kernel, although the kernel knows a way to limit the address space for affected processes.
Optional for 62 per cpeterson, but still a high priority.
(In reply to Jason Orendorff [:jorendorff] from comment #8)
> Optional for 62 per cpeterson, but still a high priority.

Yes. The first release of the Firefox Reality VR browser will ship with 62 (or maybe 63) on 32-bit ARM only. So we don't need to fix this ARM64 issue in 62, but it would be good to fix in 63 or 64 because the following Firefox Reality release will include ARM64.
Whiteboard: [geckoview:crow] → [geckoview:fxr]
See Also: → 1461480
After discussing this with Lars, I'm fairly convinced that this is also the reason why TSan does not start up properly on ARM64. When I start the jsshell, it seems to loop forever and I was able to catch this backtrace:


Thread 1 "js" received signal SIGINT, Interrupt.
__tsan::MetaMap::FreeRange (this=this@entry=0x26fee08 <__tsan::ctx_placeholder+8>, proc=proc@entry=0xffffb6e00008, p=<optimized out>, sz=sz@entry=8192) at compiler-rt/lib/tsan/rtl/tsan_sync.cc:90
90          u32 idx = *meta;
(gdb) bt
#0  __tsan::MetaMap::FreeRange (this=this@entry=0x26fee08 <__tsan::ctx_placeholder+8>, proc=proc@entry=0xffffb6e00008, p=<optimized out>, sz=sz@entry=8192) at compiler-rt/lib/tsan/rtl/tsan_sync.cc:90
#1  0x00000000004b29c4 in __tsan::MetaMap::ResetRange (this=0x26fee08 <__tsan::ctx_placeholder+8>, proc=0xffffb6e00008, p=281473741635584, p@entry=281473741488128, sz=<optimized out>, sz@entry=1048576) at compiler-rt/lib/tsan/rtl/tsan_sync.cc:164
#2  0x0000000000442f84 in __interceptor_munmap (addr=0xffffb6600000, sz=1048576) at compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:785
#3  0x0000000001416804 in js::gc::MapMemory (length=<optimized out>, prot=3, flags=34, fd=-1, offset=0) at js/src/gc/Memory.cpp:602
#4  js::gc::MapAlignedPages (size=1048576, alignment=1048576) at js/src/gc/Memory.cpp:628
#5  0x0000000001370510 in js::gc::Chunk::allocate (rt=0xffff71b00000) at js/src/gc/Allocator.cpp:675
#6  js::gc::GCRuntime::getOrAllocChunk (this=0xffff71b00720, lock=...) at js/src/gc/Allocator.cpp:602
#7  0x0000000001418604 in js::Nursery::allocateNextChunk (this=0xffff71b030c0, chunkno=<optimized out>, lock=...) at js/src/gc/Nursery.cpp:1110
#8  0x00000000014182b0 in js::Nursery::init (this=0xffff71b030c0, maxNurseryBytes=<optimized out>, lock=...) at js/src/gc/Nursery.cpp:166
#9  0x0000000001377a58 in js::gc::GCRuntime::init (this=0xffff71b00720, maxbytes=<optimized out>, maxNurseryBytes=16777216) at js/src/gc/GC.cpp:1293
#10 0x0000000000fd24d0 in JSRuntime::init (this=0xffff71b00000, cx=<optimized out>, maxbytes=33554432, maxNurseryBytes=16777216) at js/src/vm/Runtime.cpp:215
#11 0x0000000000eef058 in js::NewContext (maxBytes=33554432, maxNurseryBytes=16777216, parentRuntime=<optimized out>) at js/src/vm/JSContext.cpp:149
#12 0x0000000000cccc74 in JS_NewContext (maxbytes=33554432, maxNurseryBytes=<optimized out>, parentRuntime=0x0) at js/src/jsapi.cpp:472
#13 0x00000000004df104 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:9259



I'm not sure if TSan will help us fix the original problem (random GC crashes, maybe due to races), but for now, this bug also blocks the proper use of TSan on ARM64.
I should add that it is not a given that fixing this to be performant and ASLR-aware will allow TSan to run, since TSan's replacement mmap may still hand out addresses that we can't use; but at the moment our mmap/munmap search loop really makes TSan grind to a halt.  Presumably the TSan mmap replacement violates assumptions in our code.
[geckoview:fxr:p2] because Firefox Reality 1.0 will not include ARM64 support.
Whiteboard: [geckoview:fxr] → [geckoview:fxr:p2]
Marking as fix-optional for 63, but still P1 as this is a blocker for fuzzing with TSan on ARM64. (comment 10)
Jon, do you know who might be able to look at this issue?
I think this issue is not restricted to ARM64 but to any platform which might be able to allocate pointer on 48 bits or more.
Flags: needinfo?(jcoppeard)
ARM64 is the only tier1 platform affected by this at the moment (there's noise that x64 may be a problem in the future), but several tier3 platforms are also affected, see ifdefs in the code.

ARM64 performance will likely suffer if this is not fixed.
I'm really sorry for my tardiness on this issue, I've been struggling to adjust to some changes in my life. I have a patch almost ready to clean up the GC allocation functions (the ones in gc/Memory.cpp) that I hope to submit soon. That's more of a step 0 as it doesn't remove the mmap loop, but I'm hoping that it will make things more readable overall.
I would be interested to test that patch on sparc64, too :-).
Whiteboard: [geckoview:fxr:p2] → [geckoview:fxr:p2][arm64:m3]
See Also: 1461480
I'm working on this in bug 1502733. That bug was originally going to just be cleanup, but the patch in that bug seems to have made things marginally slower on ARM64, causing timeouts, so I decided to go ahead and fix this issue there.

Part 2 in that bug removes the mmap loop, opting instead to take advantage of the large address space on 64-bit platforms by choosing (aligned) addresses randomly within it. This should have a very high chance of succeeding on the first try assuming we haven't run out of physical memory, and choosing addresses at random within the 47-bit address space gives us the strongest possible ASLR (short of reducing alignment restrictions).

Unfortunately the patch is currently running into failures on aarch64, the very platform it's meant to fix. We seem to be getting segmentation faults, but I don't have any stacks for them yet. Hopefully we can figure out what's going wrong soon; I'll try to land it soon after the 65 train leaves.
Depends on: 1502733
Flags: needinfo?(emanuel.hoogeveen)
Flags: needinfo?(jcoppeard)

Emanuel, now that your fixes for bug 1502733 have landed, can we resolve this bug as fixed?

Flags: needinfo?(emanuel.hoogeveen)

Yes, this is done. Fixed by bug 1502733.

Status: NEW → RESOLVED
Closed: Last year
Flags: needinfo?(emanuel.hoogeveen)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.