Closed Bug 1533808 Opened 9 months ago Closed 8 months ago

Unrecognized opcode sequence assertion while initializing the IOInterposer on Windows 10 x64 debug sandboxed child processes

Categories

(Core :: mozglue, enhancement, P1)

62 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: gregtatum, Assigned: handyman)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I'm breaking out this issue into its own bug. Bug 1529125 hooks the IOInterposer into the creation of the content process. This means that more threads will likely be registered with the interposer, so it is most likely receiving new operations it hasn't been able to collect yet.

I can't land that code yet, as it's running into assertions on Windows 10 x64 debug builds.

In Bug 1529125, Aaron Klotz said:

This is nsWindowsDllInterceptor telling us that the instruction sequence at the beginning of the function being hooked is something that we do not yet support.

Seeing a disassembly (or even better, a minidump) of the function in question would be helpful here.

Assuming the line numbers in PoisonIOInterposerWin.cpp haven't changed, NtReadFile would be the function whose disassembly I'd want to look at (preferably side by side with the bytes of the function).

Florian replied with:

I'm seeing this assertion on this try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b83546bbafd008708e18df6602cec454d58989f2&searchStr=Windows%2C10%2Cx64%2Cdebug&selectedJob=231325635

The minidump c96d5b4b-fe2e-43dc-9569-219142cd4bfc.dmp might be what you need.

ntdll.dll version is 6.2.17134.254

Here is a full try run from Bug 1529125 that shows the errors happening on Windows 10 x64 debug builds.

:aklotz I'm moving the needinfo from the other bug, to this one. For some background information, this work is for helping support the performance team with their investigation, especially around startup performance. We have new FileIO support for the Firefox Profiler front-end, and we're working on making this information really useful now.

Flags: needinfo?(aklotz)
Blocks: 1529125

Yep, I'm excited about this! I'll follow up as soon as I can. Leaving the ni? set for now.

Looking at the circumstances around these failures, I suspect that your issue is a bad interaction with the sandbox.

The sandbox hooks the same functions that PoisionIOInterposer does. If the sandbox sets those hooks before we set ours, then I could see us having some problems there.

Yes, we don't currently support mov r64, imm64, which is written there by the sandbox.

A naive patch that adds this support fixes the assertion, but it is less than ideal for a couple of reasons:

  • It adds an extra, redundant jmp;
  • It requires the interceptor to use a 10-byte patch, which we need to avoid. (Those are only there to work around third-party code and should not be used for our own stuff).

The ideal solution here is that the interceptor recognizes the

mov rXX, imm64
jmp rXX

Instruction sequence and understands how to substitute a different immediate value into the mov instruction.

Component: XPCOM → mozglue
Flags: needinfo?(aklotz)
Summary: Unrecognized opcode sequence while running the IOInterposer on Windows 10 x64 debug builds → Unrecognized opcode sequence assertion while initializing the IOInterposer on Windows 10 x64 debug sandboxed child processes

I may have waded too deep into this one. I'm not sure I know enough of about the Windows environment and assembly to be able to fix this one myself. Do you know anyone that may have some capacity to take this one on :aklotz? I'm not sure if we can do an easy fix first, and then follow-up with the ideal solution. It would at least unblock the other work.

Flags: needinfo?(aklotz)

Let me see if we can find some time to look at this. Aaron's pretty busy right now but maybe we have other devs who can tackle this. What's the priority here?

Flags: needinfo?(aklotz) → needinfo?(gtatum)

This is pretty high on the Profiler's team's priorities, as it's blocking part of our quarterly OKRs of delivering solid FileIO metrics. It is also affecting other performance team's investigations into IO usage. So I would say P1 on our end. Any help would be greatly appreciated.

Flags: needinfo?(gtatum)

(In reply to Jim Mathies [:jimm] from comment #6)

Let me see if we can find some time to look at this. Aaron's pretty busy right now but maybe we have other devs who can tackle this. What's the priority here?

Any luck finding someone who has time to work on this?

As Greg said in comment 7, this is pretty important. It blocks us from fixing initialization of the IOInterposer in content processes (ie. the patch in bug 1529125), and blocks our attempts to introduce regression tests for the correctness of the IO interposer, and for startup main thread I/O.

If we can't find someone to properly fix this, we'll need to discuss workarounds.

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Priority: -- → P2

Due to coming changes involving the IOInterposer, the WindowsDllInterceptor may be set up later than the sandbox. The sandbox hooks some of the same functions, so the Interceptor is running into its hooks instead of the original implementations it anticipated. This patch allows it to recognize and efficiently patch those hooks when that happens.

So the deal is, interceptor now gets the chromium sandbox trampoline function [1]:

oldFn:
mov rXX, sandboxFn /* imm64 */
jmp rXX

and it wants to replace the reference to it with:

trampFn:
mov rXX, interceptorFn /* imm64 */
jmp rXX

and to write:

replacementFn:
mov rXX, sandboxFn /* imm64 /
jmp rXX
/
the replacement fn would normally jump back to the rest of oldFn, but there is no more */

and to set:

oldFnPtr = replacementFn

where oldFnPtr is a variable that interceptorFn can use to call the original that it replaced.

This patch instead follows Aaron's suggestion and writes:

oldFnPtr = sandboxFn

and skips writing replacementFn altogether, which is faster and cleaner.

This is only a complete solution if we are giving up the idea of sandboxing the ability to write to executable memory in content (which is at best far off as it would mean moving JIT out-of-process).

We lucked out in that our trampoline uses movabs r11, imm64 (0x49 0xbb) while Chromiums uses mov r8, imm64 (0x48 0xb8) so we can tell the difference in Clear without keeping extra metadata.

We need to maintain that we are destroyed in the proper (now, reversed) order.

[1] https://searchfox.org/mozilla-central/rev/b3ac60ff061c7891e77c26b73b61804aa1a8f682/security/sandbox/chromium/sandbox/win/src/resolver_64.cc#24

Assignee: nobody → davidp99
Priority: P2 → P1
Blocks: 1540135
Pushed by daparks@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/88df194b6f92
Recognize Chromium-sandboxed methods in WindowsDllInterceptor r=aklotz
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
You need to log in before you can comment on or make changes to this bug.