Closed Bug 1914160 Opened 5 months ago Closed 5 months ago

Starting a profile on a windows debug build asserts with "Unrecognized opcode sequence"

Categories

(Core :: DLL Services, defect, P1)

defect

Tracking

()

RESOLVED FIXED
131 Branch
Tracking Status
firefox-esr115 - wontfix
firefox-esr128 131+ fixed
firefox129 - wontfix
firefox130 - fixed
firefox131 + fixed

People

(Reporter: emilio, Assigned: yannis)

Details

Attachments

(4 files, 3 obsolete files)

[16780] Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at D:/moz/gecko/obj-debug/dist/include\mozilla/interceptor/PatcherDetour.h:1207
#01: mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::CreateTrampoline (D:\moz\gecko\obj-debug\dist\include\mozilla\interceptor\PatcherDetour.h:1207)
#02: mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::AddHook (D:\moz\gecko\obj-debug\dist\include\mozilla\interceptor\PatcherDetour.h:477)
#03: mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddHook (D:\moz\gecko\toolkit\xre\dllservices\mozglue\nsWindowsDllInterceptor.h:435)
#04: mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,long (*)(HINSTANCE__ *)>::InitOnceCallback (D:\moz\gecko\toolkit\xre\dllservices\mozglue\nsWindowsDllInterceptor.h:203)
#05: RtlRunOnceExecuteOnce[C:\WINDOWS\SYSTEM32\ntdll.dll +0xa37f5]
#06: InitOnceExecuteOnce[C:\WINDOWS\System32\KERNELBASE.dll +0xabcf1]
#07: mozilla::WindowsStackWalkInitialization (D:\moz\gecko\toolkit\xre\dllservices\mozglue\WindowsStackWalkInitialization.cpp:57)
#08: locked_profiler_start (D:\moz\gecko\tools\profiler\core\platform.cpp:6566)
#09: profiler_start (D:\moz\gecko\tools\profiler\core\platform.cpp:6710)
#10: nsProfiler::StartProfiler (D:\moz\gecko\tools\profiler\gecko\nsProfiler.cpp:171)
#11: XPTC__InvokebyIndex (D:\moz\gecko\xpcom\reflect\xptcall\md\win32\xptcinvoke_asm_x86_64.asm:99)
#12: CallMethodHelper::Call (D:\moz\gecko\js\xpconnect\src\XPCWrappedNative.cpp:1174)
#13: XPCWrappedNative::CallMethod (D:\moz\gecko\js\xpconnect\src\XPCWrappedNative.cpp:1120)
#14: XPC_WN_CallMethod (D:\moz\gecko\js\xpconnect\src\XPCWrappedNativeJSOps.cpp:966)
#15: CallJSNative (D:\moz\gecko\js\src\vm\Interpreter.cpp:487)
#16: js::InternalCallOrConstruct (D:\moz\gecko\js\src\vm\Interpreter.cpp:581)
#17: js::Interpret (D:\moz\gecko\js\src\vm\Interpreter.cpp:3422)
#18: js::RunScript (D:\moz\gecko\js\src\vm\Interpreter.cpp:459)
#19: js::InternalCallOrConstruct (D:\moz\gecko\js\src\vm\Interpreter.cpp:613)
#20: js::Call (D:\moz\gecko\js\src\vm\Interpreter.cpp:680)
#21: js::fun_call (D:\moz\gecko\js\src\vm\JSFunction.cpp:1052)
#22: CallJSNative (D:\moz\gecko\js\src\vm\Interpreter.cpp:487)
#23: js::InternalCallOrConstruct (D:\moz\gecko\js\src\vm\Interpreter.cpp:581)
#24: js::Interpret (D:\moz\gecko\js\src\vm\Interpreter.cpp:3422)
#25: js::RunScript (D:\moz\gecko\js\src\vm\Interpreter.cpp:459)
#26: js::InternalCallOrConstruct (D:\moz\gecko\js\src\vm\Interpreter.cpp:613)
#27: js::Call (D:\moz\gecko\js\src\vm\Interpreter.cpp:680)
#28: js::BoundFunctionObject::call (D:\moz\gecko\js\src\vm\BoundFunctionObject.cpp:72)
#29: CallJSNative (D:\moz\gecko\js\src\vm\Interpreter.cpp:487)
#30: js::Call (D:\moz\gecko\js\src\vm\Interpreter.cpp:680)
#31: JS::Call (D:\moz\gecko\js\src\vm\CallAndConstruct.cpp:119)
#32: mozilla::dom::EventListener::HandleEvent (D:\moz\gecko\obj-debug\dom\bindings\EventListenerBinding.cpp:62)
#33: mozilla::dom::EventListener::HandleEvent<mozilla::dom::EventTarget *> (D:\moz\gecko\obj-debug\dist\include\mozilla\dom\EventListenerBinding.h:65)
#34: mozilla::EventListenerManager::HandleEventSingleListener (D:\moz\gecko\dom\events\EventListenerManager.cpp:1341)
#35: mozilla::EventListenerManager::HandleEventWithListenerArray (D:\moz\gecko\dom\events\EventListenerManager.cpp:1662)
#36: mozilla::EventListenerManager::HandleEventInternal (D:\moz\gecko\dom\events\EventListenerManager.cpp:1559)
#37: mozilla::EventTargetChainItem::HandleEvent (D:\moz\gecko\dom\events\EventDispatcher.cpp:368)
#38: mozilla::EventTargetChainItem::HandleEventTargetChain (D:\moz\gecko\dom\events\EventDispatcher.cpp:645)
#39: mozilla::EventDispatcher::Dispatch (D:\moz\gecko\dom\events\EventDispatcher.cpp:1221)
#40: mozilla::EventDispatcher::DispatchDOMEvent (D:\moz\gecko\dom\events\EventDispatcher.cpp:1370)
#41: mozilla::PresShell::HandleDOMEventWithTarget (D:\moz\gecko\layout\base\PresShell.cpp:9179)
#42: nsContentUtils::DispatchXULCommand (D:\moz\gecko\dom\base\nsContentUtils.cpp:6928)
#43: mozilla::dom::XULButtonElement::OnPointerClicked (D:\moz\gecko\dom\xul\XULButtonElement.cpp:662)
#44: mozilla::dom::XULButtonElement::PostHandleEvent (D:\moz\gecko\dom\xul\XULButtonElement.cpp:596)
#45: mozilla::EventTargetChainItem::PostHandleEvent (D:\moz\gecko\dom\events\EventDispatcher.cpp:488)
#46: mozilla::EventTargetChainItem::HandleEventTargetChain (D:\moz\gecko\dom\events\EventDispatcher.cpp:612)
#47: mozilla::EventTargetChainItem::HandleEventTargetChain (D:\moz\gecko\dom\events\EventDispatcher.cpp:688)
#48: mozilla::EventDispatcher::Dispatch (D:\moz\gecko\dom\events\EventDispatcher.cpp:1221)
#49: mozilla::PresShell::EventHandler::DispatchEventToDOM (D:\moz\gecko\layout\base\PresShell.cpp:9048)
#50: mozilla::PresShell::EventHandler::DispatchEvent (D:\moz\gecko\layout\base\PresShell.cpp:8586)
#51: mozilla::PresShell::EventHandler::HandleEventWithCurrentEventInfo (D:\moz\gecko\layout\base\PresShell.cpp:8517)

Yannis seems you might be familiar with this code according to blame?

Flags: needinfo?(yjuglaret)

Yes, this would be caused by failing to hook LdrUnloadDll, because our hooking code does not recognize the opcodes that this function uses in your specific version of ntdll.dll. Can you confirm the version number for this file and maybe share it with me?

Flags: needinfo?(yjuglaret)
Attached file ntdll.dll

Does this work?

Flags: needinfo?(yjuglaret)

Version is 10.0.26100.1542, fwiw (this is from Windows 11 Insiders)

Yes, thank you. The problematic opcode should be the last one in this sequence:

40 56                   push rsi
48 83 EC 30             sub  rsp, 0x30
33 F6                   xor  esi, esi
40 38 35 19 4B 1B 00    cmp  byte ptr [rip + 0x1b4b19], sil

I'll push a patch to handle this opcode.

If you run ./mach cppunittest, you should currently run into an error in TestDllInterceptor.cpp. Can you confirm that it is the case?

Flags: needinfo?(yjuglaret) → needinfo?(emilio.alvarez96)

Wrong emilio FWIW ;-)

But yes:

TEST-PASS | WindowsDllInterceptor | Could detour LdrLoadDll from ntdll.dll
TEST-SKIPPED | WindowsDllInterceptor | Will not attempt to execute patched LdrLoadDll.
[25428] Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at D:/moz/gecko/obj-debug/dist/include\mozilla/interceptor/PatcherDetour.h:1207
#01: mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::CreateTrampoline (D:\moz\gecko\obj-debug\dist\include\mozilla\interceptor\PatcherDetour.h:1207)
#02: mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::AddHook (D:\moz\gecko\obj-debug\dist\include\mozilla\interceptor\PatcherDetour.h:477)
#03: mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddHook (D:\moz\gecko\obj-debug\dist\include\nsWindowsDllInterceptor.h:435)
#04: mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,long (*)(HINSTANCE__ *)>::InitOnceCallback (D:\moz\gecko\obj-debug\dist\include\nsWindowsDllInterceptor.h:203)
#05: RtlRunOnceExecuteOnce (C:\WINDOWS\SYSTEM32\ntdll.dll + 0xa37f5)
#06: InitOnceExecuteOnce (C:\WINDOWS\System32\KERNELBASE.dll + 0xabcf1)
#07: TestHook<long (*)(HINSTANCE__ *),10,bool (*)(long)> (D:\moz\gecko\toolkit\xre\dllservices\tests\TestDllInterceptor.cpp:348)
#08: wmain (D:\moz\gecko\toolkit\xre\dllservices\tests\TestDllInterceptor.cpp:1454)
#09: __scrt_common_main_seh (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288)
#10: BaseThreadInitThunk (C:\WINDOWS\System32\KERNEL32.DLL + 0x2dbe7)
#11: RtlUserThreadStart (C:\WINDOWS\SYSTEM32\ntdll.dll + 0x85a6c)
mozcrash checking C:\msys64\tmp\tmp4cefgw5y for minidumps...
TEST-UNEXPECTED-FAIL | TestDllInterceptor.exe | test failed with return code 2147483651
Flags: needinfo?(emilio.alvarez96)

This patch adds support for the following instruction:

cmp byte ptr [rip+0x1b4b19], sil

This instruction was found in the 13 first bytes of
LdrUnloadDll in ntdll version 10.0.26100.1542.

It also adds support for the following variant of xor:

xor r/m32, r32

This variant was used by clang to generate code based on the inline
assembly that this patch adds in tests.

Assignee: nobody → yjuglaret
Status: NEW → ASSIGNED

(In reply to Emilio Cobos Álvarez (:emilio) from comment #6)

Wrong emilio FWIW ;-)

Oops - thanks for the clarification! Can you confirm that the proposed patch fixes both the test and your original use case on your machine?

Flags: needinfo?(emilio)

[Tracking Requested - why for this release]: This is a S3 severity bug (blocks non-critical functionality) that impacts all existing builds of Firefox and Thunderbird when running on the recent Windows 11 Insider Preview 10.0.26120.1542. The impact of this bug is that when we are recording a performance profile (or if the background hang reporter detects a hang in Nightly), the main process (and other processes) of Firefox or Thunderbird is at risk of running into a deadlock situation. If at the point where the profiler (or BHR) interrupts a thread, the thread is currently in the part of DLL unloading that requires holding the LdrpInvertedFunctionTableSRWLock SRW lock, then the process where that thread lives will run into a deadlock situation.

Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b824030b6b67 Add support for rex-prefixed rip-relative cmp and a variant of xor. r=handyman,win-reviewers

Landing to have this code bake in Nightly. Then if :emilio confirms that this fixes the issue (high confidence), we can start to consider uplifts.

That patch does fix the issue, thanks!

Flags: needinfo?(emilio)
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 131 Branch

I don't think we need to track this for all the things given that Win11 24H2 isn't shipping widely yet and doesn't even have a confirmed launch date yet. Seems like a reasonable ride-along candidate for a 130 dot release and next cycle's ESR builds, though. It grafts cleanly across the board at least.

The patch landed in nightly and beta is affected.
:yannis, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox130 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(yjuglaret)
Attachment #9421201 - Flags: approval-mozilla-beta?
Attachment #9421201 - Attachment description: Bug 1914160 - Add support for rex-prefixed rip-relative cmp and a variant of xor. → WIP: Bug 1914160 - Add support for rex-prefixed rip-relative cmp and a variant of xor.
Attachment #9421201 - Flags: approval-mozilla-release?
Flags: needinfo?(yjuglaret)
Attachment #9421201 - Flags: approval-mozilla-release?
Attachment #9421201 - Attachment is obsolete: true
Attachment #9421201 - Flags: approval-mozilla-beta?
Attachment #9421205 - Flags: approval-mozilla-beta?
Attachment #9421205 - Attachment is obsolete: true
Attachment #9421205 - Flags: approval-mozilla-beta?

Sorry for all the noise -- I'll add the uplift request when 130.0 is released.

Attachment #9422834 - Flags: approval-mozilla-beta?
Attachment #9422834 - Attachment is obsolete: true
Attachment #9422834 - Flags: approval-mozilla-beta?

This patch adds support for the following instruction:

cmp byte ptr [rip+0x1b4b19], sil

This instruction was found in the 13 first bytes of
LdrUnloadDll in ntdll version 10.0.26100.1542.

It also adds support for the following variant of xor:

xor r/m32, r32

This variant was used by clang to generate code for the new function
IsEqualToGlobalValue that this patch adds to our tests.

Original Revision: https://phabricator.services.mozilla.com/D219861

Attachment #9422840 - Flags: approval-mozilla-release?

release Uplift Approval Request

  • User impact if declined: Risk of running into a deadlock when using the Firefox Profiler. Only known to impact users running Windows 11 from the Insider Preview Dev Channel at the moment. Could impact regular Windows 11 users at some point in the future.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: -
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Only changes behavior for the specific faulty instruction, for which we added rigorous testing.
  • String changes made/needed: No
  • Is Android affected?: no
Attachment #9422834 - Attachment is obsolete: false
Attachment #9422834 - Flags: approval-mozilla-beta?
Attachment #9422834 - Attachment is obsolete: true
Attachment #9422834 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

This patch adds support for the following instruction:

cmp byte ptr [rip+0x1b4b19], sil

This instruction was found in the 13 first bytes of
LdrUnloadDll in ntdll version 10.0.26100.1542.

It also adds support for the following variant of xor:

xor r/m32, r32

This variant was used by clang to generate code for the new function
IsEqualToGlobalValue that this patch adds to our tests.

Original Revision: https://phabricator.services.mozilla.com/D219861

Attachment #9423546 - Flags: approval-mozilla-esr128?

esr128 Uplift Approval Request

  • User impact if declined: Risk of running into a deadlock when using the Firefox Profiler. Only known to impact users running Windows 11 from the Insider Preview Dev Channel at the moment. Could impact regular Windows 11 users at some point in the future.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: -
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Only changes behavior for the specific faulty instruction, for which we added rigorous testing.
  • String changes made/needed: No
  • Is Android affected?: no
Attachment #9423546 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
Attachment #9422840 - Flags: approval-mozilla-release? → approval-mozilla-release+

This fix will be included in our planned 130 dot release, thanks.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: