browser_utility_crashReporter.js and browser_utility_audioDecodeCrash.js fail to crash/pass when run on windows 11
Categories
(Core :: IPC, defect, P2)
Tracking
()
People
(Reporter: jmaher, Assigned: gerard-majax)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(1 obsolete file)
currently we run on windows 10, we are looking to get tests running on windows 11 in the new year. There are some failures that are new and unexpected.
You can see both the errors in this try push.
I am not sure how to decipher this, but I am happy to hack and try other things if there are suggestions.
Reporter | ||
Comment 1•2 years ago
|
||
:jld, any suggestions on how to get more info or fix this?
also there might be a related failure on xpcshell (here). this crashes on shutdown in toolkit/crashreporter/test/unit/test_crash_terminator.js
.
Comment 2•2 years ago
|
||
I don't know much about this, but :gerard-majax is the de facto owner of the Utility process so might have some ideas.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 3•2 years ago
|
||
Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPropertyBag2.getPropertyAsAString]
This suggest something is not right with how processes are crashing: https://searchfox.org/mozilla-central/rev/aa1bb4f0ca8bfda4b117d1befca47b72d5dd6d5d/ipc/glue/test/browser/head.js#266-267 we dont have a dumpID ?
Reporter | ||
Comment 4•2 years ago
|
||
oh interesting; Maybe there is something with the OS version, or some expected value from a library/dll when crashing that windows 11 has different or some permissions thing.
Reporter | ||
Comment 5•2 years ago
|
||
if you were going to push to try, you can reproduce it with:
./mach try fuzzy --worker-override="win10-64-2004=gecko-t/win11-64-2009-alpha" --no-artifact -q 'test-windows10' --rebuild 2 ipc/glue/test/browser/
likewise I could poke around a bit tomorrow or test patches out.
Assignee | ||
Comment 6•2 years ago
|
||
I dont know anything about Windows 11, it will take me a while before I can investigate since this requires me to setup a VM ...
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
After much failed attempts with the TC loaner, I setup a Win11 21H2 local VM and it looks like I can repro the issue. :jmaher
was unable to repro, and his VM was 22H2. The TC workers are also on 21H2.
Reporter | ||
Comment 8•2 years ago
|
||
maybe it is worth considering 22H2? :Jonathan?
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #8)
maybe it is worth considering 22H2? :Jonathan?
Working on this today!
Comment 10•2 years ago
|
||
We have identified the root cause for the test failure in bug 1800898 and are working on a fix. Since 21H2 seems to have version-specific bugs, it is quite interesting for us to have the tests run on it at the moment. Could we wait that 21H2-specific bugs are fixed before moving to 22H2?
Comment 11•2 years ago
|
||
r=handyman
Comment 12•2 years ago
•
|
||
In order to give us more time to prepare a proper patch for the general problem as stated in bug 1800898, I propose a patch that will bypass the specific instance of the problem that we have identified on our Windows 11 21H2 workers, although this will not fix the general problem.
The patch implements a tail-call optimization for detoured code. When we recognize that the last instruction we hook is a call, instead of using a call
instruction that would return to our detoured code, we can simulate a call with the push
and jmp
instructions, and push a return address that will return directly to the original code.
The problematic hooked function in Windows 11 21H2 workers is as follows:
BaseThreadInitThunk
180015540 48 83 ec 28 SUB RSP,0x28
180015544 85 c9 TEST ECX,ECX
180015546 75 17 JNZ LAB_18001555f
180015548 49 8b c8 MOV RCX,R8
18001554b e8 30 ff 00 00 CALL BaseThreadInitXfgThunk
180015550 8b c8 MOV ECX,EAX
Without the optimization, the detoured code looks like this on the Windows 11 21H2 workers:
0:006> u poi(mozglue!stub_BaseThreadInitThunk)
000001d4`d84a0010 4883ec28 sub rsp,28h
000001d4`d84a0014 85c9 test ecx,ecx
000001d4`d84a0016 740e je 000001d4`d84a0026
000001d4`d84a0018 ff2500000000 jmp qword ptr [000001d4`d84a001e]
000001d4`d84a001e ... 00007ff8`a99e555f (KERNEL32!BaseThreadInitThunk+0x1f)
000001d4`d84a0026 498bc8 mov rcx,r8
000001d4`d84a0029 ff1502000000 call qword ptr [000001d4`d84a0031]
000001d4`d84a002f eb08 jmp 000001d4`d84a0039
000001d4`d84a0031 ... 00007ff8`a99f5480 (KERNEL32!BaseThreadInitXfgThunk)
000001d4`d84a0039 ff2500000000 jmp qword ptr [000001d4`d84a003f]
000001d4`d84a003f ... 00007ff8`a99e5550 (KERNEL32!BaseThreadInitThunk+0x10)
With the optimization, it now looks like this:
0:006> u poi(mozglue!stub_BaseThreadInitThunk)
00000126`bc7c0010 4883ec28 sub rsp,28h
00000126`bc7c0014 85c9 test ecx,ecx
00000126`bc7c0016 740e je 00000126`bc7c0026
00000126`bc7c0018 ff2500000000 jmp qword ptr [00000126`bc7c001e]
00000126`bc7c001e ... 00007ff8`a99e555f (KERNEL32!BaseThreadInitThunk+0x1f)
00000126`bc7c0026 498bc8 mov rcx,r8
00000126`bc7c0029 ff3506000000 push qword ptr [00000126`bc7c0035]
00000126`bc7c002f ff2508000000 jmp qword ptr [00000126`bc7c003d]
00000126`bc7c0035 ... 00007ff8`a99e5550 (KERNEL32!BaseThreadInitThunk+0x10)
00000126`bc7c003d ... 00007ff8`a99f5480 (KERNEL32!BaseThreadInitXfgThunk)
Because the address pushed on the stack is part of the original code for BaseThreadInitThunk
, it benefits from the unwinding information embedded in kernel32.dll
. Callees thus manage to do proper stack unwinding and all structured exception handlers get called.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 13•2 years ago
•
|
||
The changes described in comment 12 are actually not compatible with Intel CET's shadow stack security feature. Hopefully I have this feature on my machine which helped me realize this. So it's better to not integrate these changes and fix the general problem instead.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
I'll just leave here some notes from [:gerard-majax] for reproducing the test in a running browser:
start browser, open browser console
const utilityProcessTest = () => { return Cc["@mozilla.org/utility-process-test;1"].createInstance( Ci.nsIUtilityProcessTest ); } const ProcessTools = Cc["@mozilla.org/processtools-service;1"].getService( Ci.nsIProcessToolsService );
then open
about:processes
then in the console,utilityProcessTest().startProcess();
and you should see a utility process appearing in theabout:processes
then issueProcessTools.crash(x);
with x the PID of the process, and it should crash
andabout:crashes
should have a crash to submit
The patch for bug 1800898 is ready and will fix this bug, it is undergoing code review at the moment.
Comment 15•2 years ago
|
||
The code review is over and we can land the patch on Monday when soft code freeze ends.
Comment 16•2 years ago
|
||
This was fixed with bug 1800898: https://treeherder.mozilla.org/jobs?revision=5be752c9618c22f1d89d4c559ecd3a59f560ee55&repo=try
Description
•