Closed Bug 1649279 Opened 4 years ago Closed 4 years ago

MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence, at mozilla/interceptor/PatcherDetour.h:1262 from xpcshell trying to run httpd for tests - Can't run any mochitests on Windows 10 with a debug build

Categories

(Core :: mozglue, defect, P1)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox80 --- wontfix

People

(Reporter: Gijs, Assigned: handyman)

References

(Blocks 1 open bug)

Details

STR:

  1. win10 x64 machine
  2. mozconfig:
mk_add_options MOZ_DEBUG=1
mk_add_options MOZ_OBJDIR="d:/builds/frontend-debug/"

ac_add_options --disable-compile-environment
ac_add_options --enable-debug
ac_add_options --enable-artifact-builds
  1. ./mach build && ./mach package
  2. ./mach mochitest --appname=dist browser/base/content/test/performance/browser_startup.js (or any other test in this dir, AFAICT)

ER:

test runs

AR:

 1:47.93 DLL blocklist was unable to intercept AppInit DLLs.
 1:47.93 Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at /builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/PatcherDetour.h:1262

Running with ./mach run works fine. It's not clear to me what's going on here, or how to investigate. :-(

It looks like this is actually happening inside xpcshell.exe, not inside Firefox. This has meant that I haven't really been able to debug this, as the --debugger switch only affects the app, not the xpcshell binary we use to run tests.

:aklotz, any idea what would cause this?

Flags: needinfo?(aklotz)
Summary: MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence, at mozilla/interceptor/PatcherDetour.h:1262 - Can't run performance tests on Windows 10 with a debug artifact build → MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence, at mozilla/interceptor/PatcherDetour.h:1262 from xpcshell trying to run httpd for tests - Can't run any mochitests on Windows 10 with a debug build

I'm going to bounce this over to David, but in the meantime, Gijs, could you please give us the build number of your Win10 installation?

Flags: needinfo?(aklotz) → needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(davidp99)

I'm on a Windows Insider build, in the "Dev Channel", formerly the "fast" ring, version 2004, build 20152.1000 .

Flags: needinfo?(gijskruitbosch+bugs)

In case it's helpful, I commented out the assert and replaced it with:

          printf("Got 0x%x followed by 0x%x\n", (*origBytes), (origBytes[1]));

which prints

Got 0xba followed by 0x70

I can't really tell from the file what this code is supposed to do at a high level; it's patching something, but I'm not sure what. I assume it's based on some windows DLL? I can probably attach whatever it is, or run debugging patches if that's helpful...

Thanks :gijs. I wasn't able to reproduce the failure on the latest Windows but it turns out I'm actually able to run the latest dev preview in a VM (for a short time anyway) so I should be able to take it from here.

FYI, the DLL interceptor is writing a "trampoline" into the beginning of some DLL-exported functions (see TestDllInterceptor.cpp for a list) -- it JMPs to our replacement code, which may JMP back to the original DLL code (after reproducing the commands that we overwrote with our first JMP). I haven't gotten to the xpcshell part yet but you would quickly find that debuggers and hijacking assembler code at runtime do not mix well. The DLL interceptor really wants printf debugging. My first step is likely to be doing similar, but printing the entire trampoline region, because there is probably some assembler code the DLL interceptor doesn't recognize or misunderstood. Designing reliable diagnostics to simplify this is tough because e.g. failures tend to come up as crashes.

Assignee: nobody → davidp99
Severity: -- → S3
Flags: needinfo?(davidp99)
Priority: -- → P1

:gijs, are you still seeing this? I realized I got the latest Windows 10 build from the fast channel, which is now 20161.1000 (you had 20152) and I can't reproduce the failure. This would make sense if a DLL we were hooking had changed by MS to something we can't handle in 20152, then changed back in 20161. If you have been updated to the latest Windows build and still see this then maybe we've got a bigger incompatibility.

Flags: needinfo?(gijskruitbosch+bugs)

Yes, this appears to work again on 20161. Huh. I guess we can close it and reopen if it returns?

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(gijskruitbosch+bugs)
Resolution: --- → WORKSFORME
Blocks: 1668057
You need to log in before you can comment on or make changes to this bug.