Closed Bug 1484835 Opened Last year Closed 10 months ago

write real jit unwind handling on aarch64 windows

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

ARM64
All
enhancement

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: froydnj, Assigned: dmajor)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We disabled the jit unwind handling for AArch64 Windows so we could get things working.  This bug is about re-enabling it, which will likely require a good deal more cleverness than "blanket this region with a single exception handler" strategy we employ on x86-64.
Priority: -- → P3
Blocks: 1495879
Does the "blanket this region with a single exception handler" scheme not work on AArch64 Windows?  Because if we actually have to for-real unwind, then yes, it will take significantly more effort.  Talking to Chakra, it sounds like the easiest way is to preserve fp (bug 1426134) and then you can write somewhat-simple unwind logic with their mostly-undocumented JIT API.
(In reply to Luke Wagner [:luke] from comment #1)
> Does the "blanket this region with a single exception handler" scheme not
> work on AArch64 Windows?  Because if we actually have to for-real unwind,
> then yes, it will take significantly more effort.  Talking to Chakra, it
> sounds like the easiest way is to preserve fp (bug 1426134) and then you can
> write somewhat-simple unwind logic with their mostly-undocumented JIT API.

For inexplicable reasons (memory usage?), RUNTIME_FUNCTION, which we use here:

https://searchfox.org/mozilla-central/source/js/src/jit/ProcessExecutableMemory.cpp#105-110

and fill in here:

https://searchfox.org/mozilla-central/source/js/src/jit/ProcessExecutableMemory.cpp#135-145

is significantly different on AArch64 Windows vs. x86-ish.  The AArch64 RUNTIME_FUNCTION looks like it actually expects to cover a bona fide function:

typedef struct _IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY {
    DWORD BeginAddress;
    union {
        DWORD UnwindData;
        struct {
            DWORD Flag : 2;
            DWORD FunctionLength : 11;
            DWORD RegF : 3;
            DWORD RegI : 4;
            DWORD H : 1;
            DWORD CR : 2;
            DWORD FrameSize : 9;
        } DUMMYSTRUCTNAME;
    } DUMMYUNIONNAME;
} IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY, * PIMAGE_ARM64_RUNTIME_FUNCTION_ENTRY;

Always preserving FP on AArch64 is a lot more reasonable (so many registers!) than x86-ish processors, so that's probably the right way forward.  What's the mostly-undocumented JIT API?  RtlAddFunctionTable and friends?  Or something else?
Ah, weird.  I can see why it would be nice to nail down enough of the ABI that one could unwind declaratively rather than calling some arbitrary callback.  But what determines whether the union's UnwindData vs. DUMMYSTRUCTNAME members are accessed?  I wonder if UnwindData is the escape hatch and we can use that.  (There's still the question of where EndAddress goes if UnwindData is used.)

Yeah, RtlAddFunctionTable and friends were, last I looked, rather undocumented.
There's some documentation at https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=vs-2017#arm64-exception-handling-information .

It looks like if DUMMYSTRUCTNAME.Flag == 0, then UnwindData is interpreted as an RVA to an exception information record.

Although instead of a direct EndAddress, it has an 18-bit field for FunctionLength/4, so we'd need a separate entry for each megabyte of code.
(Or perhaps RtlInstallFunctionTableCallback could save us a bit of trouble by only dealing with these as needed.)
Luke -- I don't know how helpful it will be, but ARM's implementation of their ABI logic in C++ is here: https://git.linaro.org/arm/vixl.git/tree/src/aarch64/abi-aarch64.h.
I'm (slowly and intermittently) poking at this. Three options I'm exploring:

* 1024 separate RtlAddFunctionTable calls, one for each megabyte of the region. (We're not supposed to let region boundaries split an epilogue, but given that we're not doing proper unwinding, maybe we can sneak by.) I know the repetition sucks, but at least it keeps us on similar codepaths to existing tested code.

* An RtlInstallFunctionTableCallback that, when called, returns a RUNTIME_FUNCTION that does an analogous thing to the x86_64 code.

* An RtlInstallFunctionTableCallback that calls breakpad right then and there. This seems like the easiest option, but I'd want to convince myself that it's safe, e.g. NT isn't holding locks while using the callback. And also that the callback is only called when an exception happens, and not e.g. someone is RtlVirtualUnwind'ing us. ARM64 doesn't seem to have a concept of UNW_FLAG_EHANDLER etc.
Agreed that the third bullet seems attractive.  We might even be able to switch the x64 path to use it, to get the full testing weight of x64 and reduce platform-specific differences.  I worry that we might miss (in normal automated testing) rare cases that call the callback when not handling an exception.

Is the idea for the second bullet that the PGET_RUNTIME_FUNCTION_CALLBACK would simply return a RUNTIME_FUNCTION that narrowly contained whatever the given pc was and so this could be considered a laziness optimization of the first bullet?

For the first bullet: agreed that it seems fine to arbitrarily slice functions overlapping the 1mb boundary, given that our RUNTIME_FUNCTION is simply calling breakpad.  We could also create these lazily, in jit::AllocateExecutableMemory() (which is a coarse-grained page allocator).  But this seems like more work than the second bullet.
> * An RtlInstallFunctionTableCallback that calls breakpad right then and there.

Well, I just realized that function table callbacks aren't passed enough context to be able to call Breadkpad meaningfully -- no exception record, etc. So that option seems to be out.
Note: I changed the conditionals related to the JIT exception handler in bug 1514209, disabling the code on aarch64 for now.
Sounds like dmajor is looking at this.
Assignee: nobody → dmajor

I think I've found a path that can make this work. From comment 7:

  • An RtlInstallFunctionTableCallback that, when called, returns a
    RUNTIME_FUNCTION that does an analogous thing to the x86_64 code.

I was worried that the 1M size limit in the xdata record would prevent the RUNTIME_FUNCTION from being able to say that it applies to the code that just crashed.

It turns out this is not a problem because the length field is not checked at all. I can set it to zero and it still works. This tells me that Windows is doing its address bookkeeping entirely based on the parameters to RtlInstallFunctionTableCallback, which are large enough to cover our JIT regions.

So we can bake a single generic RUNTIME_FUNCTION into the first page of the region, and the callback can just return a pointer to it. It's a little more indirect than the x86_64 one, but it doesn't have to worry about an EndAddress field.

Locally I made JIT code be super crashy (by reverting bug 1501269) and got the exception to flow all the way to breakpad with a successful report at Socorro. (I also disabled E10S to work around bug 1517729.)

Before review I want to do some more testing to make sure this doesn't blow things up horribly.

For lack of arm64 test coverage, I applied comment 12 to x86_64 as well, and sent it through try. Green: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=ad91e236b6be95b99e48320d4f271b3968b904d3

Back on arm64, I confirmed that this setup doesn't cause Breakpad to get wrongly invoked when someone calls RtlVirtualUnwind with UNW_FLAG_NHANDLER, which was one of my concerns in comment 7.

Luke: what do you think about landing the RtlInstallFunctionTableCallback approach for both arm64 and x86_64, so that the x86_64 install base could tell us if something's wrong? Otherwise I'm worried that if there's a subtle problem, it'll go undetected due to the currently-low install base of arm64.

Oh, and just to double-check: the JIT regions live basically forever, right? I ask because there's no such thing as RtlUninstallFunctionTableCallback.

Flags: needinfo?(luke)

I think that's a great idea to improve symmetry and testing between x86_64 / arm64, so yes.

The JIT region almost lives forever; technically the code is realized when JS_ShutDown() is called:

We already don't do this if there are any live JSRuntimes (as a hack to paper over leaking situations). I'm not sure what would go wrong if we simply never released the machine code. It's not like we support a sequence of init->shutdown->init->shutdown (libraryInitState asserts this, even) or we'll need the memory for something else after shutdown. Jan, can you think any reason we need to release the executable code region?

Flags: needinfo?(luke) → needinfo?(jdemooij)
Because the .xdata format on ARM64 can only encode sizes up to 1M (much too small for our JIT code regions), we register a function table callback to provide RUNTIME_FUNCTIONs at runtime. Windows doesn't seem to care about the size fields on RUNTIME_FUNCTIONs that are created in this way, so the same RUNTIME_FUNCTION can work for any address in the region. We'll set up a generic one in RegisterExecutableMemory and the callback can just return a pointer to it.

(In reply to Luke Wagner [:luke] from comment #14)

We already don't do this if there are any live JSRuntimes (as a hack to paper over leaking situations). I'm not sure what would go wrong if we simply never released the machine code. It's not like we support a sequence of init->shutdown->init->shutdown (libraryInitState asserts this, even) or we'll need the memory for something else after shutdown. Jan, can you think any reason we need to release the executable code region?

Not releasing would be OK I think (as long as leak checking tools don't complain). At that point the pages should be just reserved, not committed.

However we should keep the asserts on that path. We could rename ReleaseProcessExecutableMemory for that and add a comment there documenting why we're doing it this way.

Flags: needinfo?(jdemooij)

I feel uneasy about changing those lifetimes, mostly out of unfamiliarity with the code.

What if we made the function table callback more aware of what's going on? Instead of using the allocation region as the callback's Context parameter, we could give it the ProcessExecutableMemory object. The callback could then test initialized().

Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4d932b82695c
Extend the Windows JIT unwind handler to ARM64 r=luke
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.