Open Bug 1400169 Opened 2 years ago Updated 2 years ago
Crash in Call
Hook With SEH
[Tracking Requested - why for this release]: this crash signature is starting to rise on windows 10 for users with the creator's update with 32bit versions of firefox since 2017-09-12 (that would be patch tuesday). This bug was filed from the Socorro interface and is report bp-554e999d-b52f-4828-a604-4c3210170913. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 @0x43445a30 1 user32.dll CallHookWithSEH 2 user32.dll __fnHkINLPMSG 3 ntdll.dll KiUserCallbackDispatcher 4 ntdll.dll KiUserApcDispatcher 5 user32.dll IsDialogMessageW 6 user32.dll DialogBox2 7 user32.dll InternalDialogBox 8 user32.dll DialogBoxIndirectParamAorW 9 user32.dll DialogBoxIndirectParamW 10 mozglue.dll mozilla::detail::MutexImpl::unlock() mozglue/misc/Mutex_windows.cpp:33 11 xul.dll nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&, unsigned int, nsTimerImpl::Callback&&) xpcom/threads/nsTimerImpl.cpp:232 12 mozglue.dll mozilla::detail::MutexImpl::unlock() mozglue/misc/Mutex_windows.cpp:33 13 xul.dll nsFilePicker::ShowFilePicker(nsString const&) widget/windows/nsFilePicker.cpp:534 14 xul.dll nsFilePicker::ShowW(short*) widget/windows/nsFilePicker.cpp:620 15 xul.dll nsFilePicker::Show(short*) widget/windows/nsFilePicker.cpp:648 16 xul.dll AsyncShowFilePicker::Run() widget/nsBaseFilePicker.cpp:86 17 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1418 18 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96 19 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 20 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 21 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 22 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:271 23 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:283 24 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:4567 25 xul.dll XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4747 26 xul.dll XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4842 27 xul.dll mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/Bootstrap.cpp:45 28 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 29 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 30 kernel32.dll BaseThreadInitThunk 31 ntdll.dll __RtlUserThreadStart 32 ntdll.dll _RtlUserThreadStart
The volume of crashes in 56 seems very low. Track 56- for now. Feel free to nominate again if you don't agree.
tagging this as regression to get more eyeballs on it (though it might not technically be our regression)
According to https://support.mozilla.org/en-US/questions/1176159 this is possibly related to Immunet Anti-virus. Worth checking out.
We can try blocklisting. David can you also investigate the issue, seems like a crash related to this antivirus and file picking. Adam, can you also try contacting the company and directing them to this bug?
(ni? adam re: Comment #4)
I believe this is related to DLLs injecting into our process but I think the issue is probably more general than Immunet AV. I haven't been able to come up with anything actionable. First, on Immunet AV, I tried 32-bit Nightly with it installed (Immunet's installer says it chose a 64-bit install). I was unable to get it to crash when spawned from either the chrome process (Ctrl-O) or proxied from the content process (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/file). I also don't think I see anything related to Immunet (or really AV in general) in the modules or correlations sections of the crash-stats, although maybe they don't appear in crash-stats if they have been unloaded? -- The rest of this comment is deep in the debugging weeds -- As for the actual crash, I believe (pretty strongly but definitely without a proof) that the issue is DLL(s) registering for hooks and not cleaning them up (that is, calling SetWindowsHookEx but freeing the handler they register without calling UnhookWindowsHookEx). I dont have a repro for this bug but I have stepped through the behavior when it _doesn't_ crash and the point of failure seems to be a dynamic call to a function pointer stored in a table. Indeed, it does two lookups and the first one is strong evidence of the second. The first lookup, in KiUserCallbackDispatcher, looks to be bringing up the table of function hooks for a specific hook type given by the idHook parameter of SetWindowHookEx . The killer here is the names of the functions it looks up -- in our crashes the name of the function is "__fnHkINLPMSG", which looks like gibberish, until you see the names of other functions in the table, like "__fnHkINLPMOUSEHOOKSTRUCTEX". This starts to make it pretty clear that the name of the function is a mangled idHook type and our crashes are almost certainly due to hooking the "WH_MSGFILTER" type. If that weren't convincing enough, the docs for WH_MSGFILTER are: "Installs a hook procedure that monitors messages generated as a result of an input event in a dialog box, message box, menu, or scroll bar." I believe it is then crashing calling someones bad WH_MSGFILTER handler. Its more likely that it doesn't exist than that the handler crashed, given the call stack. Another genuine possibility is that the offending application hooked up its handlers with the wrong bitness (ie a 64-bit handler in our 32-bit process). I don't know anything about how this might happen tho. Note that the type of the handler explains why this is happening when we post a dialog -- it may be the rare opportunity for the system to call this handler. Given this explanation, its not a surprise that there are other crashes under CallHookWithSEH that are probably other injections failing to run or to clean up after themselves. For example,  is a different crash that seems to be due to a WH_MOUSE handler.  is a crash-stats search that looks for __fnHkINLPMSG in the call stack and  is the graph showing the spike in this crash on Sep 12. The spike, strangely, is only in release and the esr, despite the release having gone out 2 weeks prior. So I suspect something changed out in the wild -- something like a bad AV release -- I just haven't found it. ---  https://msdn.microsoft.com/en-us/library/windows/desktop/ms644990(v=vs.85).aspx  https://crash-stats.mozilla.com/report/index/3c3955ed-d66c-4f9b-9cda-971d10170921  https://crash-stats.mozilla.com/signature/?proto_signature=~__fnHkINLPMSG&signature=CallHookWithSEH&date=%3E%3D2017-08-21T23%3A46%3A11.000Z&date=%3C2017-09-21T23%3A46%3A11.000Z  https://crash-stats.mozilla.com/signature/?proto_signature=~__fnHkINLPMSG&signature=CallHookWithSEH&date=%3E%3D2017-08-21T23%3A46%3A11.000Z&date=%3C2017-09-21T23%3A46%3A11.000Z#graphs
This is a demo app, hacked up from this github repo , that I used to explore SetWindowsHookEx. I was able to use this to demonstrate most of the stuff in my last comment. For example, the method is the callback for the hook for WH_MSGFILTER events and the method is only called when we use dialogs, not menus (specially handled) or scrollbars (due to APZ) or other stuff. What I got wrong was that this handler calls LoadLibrary, so any loaded DLL shouldn't be easily unloaded. However, the crashes seem to happen immediately after the jump -- the topmost valid stack entry is a "call eax" and eax contains the location that we crash at -- so its probably a bad function pointer. And I still don't understand how this stuff works with 32/64-bit inconsistencies.  https://github.com/proteansec/visual-studio-projects
Just an update that I've reached out and will update when I hear back.
They have someone that can help us look into this issue. I've added :jimm to the email chain.
This was a known issue and something addressed in their last release 6.0.4, which is available on their website.
Component: Widget: Win32 → Other
Product: Core → External Software Affecting Firefox
Version: 55 Branch → unspecified
Do you know what the issue was? Is there anything we can do on our side?
Orphaned windows hooks is the current theory, not much to go on here.
Whiteboard: [main process crash] → [main process crash][inj+]
You need to log in before you can comment on or make changes to this bug.