Open Bug 1400169 Opened 2 years ago Updated 2 years ago

Crash in CallHookWithSEH

Categories

(External Software Affecting Firefox :: Other, defect, P3, critical)

x86
Windows
defect

Tracking

(firefox-esr52 wontfix, firefox55 wontfix, firefox56- wontfix, firefox57 wontfix, firefox58 ?)

Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 - wontfix
firefox57 --- wontfix
firefox58 --- ?

People

(Reporter: philipp, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [main process crash][inj+])

Crash Data

Attachments

(1 file)

[Tracking Requested - why for this release]:
this crash signature is starting to rise on windows 10 for users with the creator's update with 32bit versions of firefox since 2017-09-12 (that would be patch tuesday).

This bug was filed from the Socorro interface and is 
report bp-554e999d-b52f-4828-a604-4c3210170913.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 		@0x43445a30 	
1 	user32.dll 	CallHookWithSEH 	
2 	user32.dll 	__fnHkINLPMSG 	
3 	ntdll.dll 	KiUserCallbackDispatcher 	
4 	ntdll.dll 	KiUserApcDispatcher 	
5 	user32.dll 	IsDialogMessageW 	
6 	user32.dll 	DialogBox2 	
7 	user32.dll 	InternalDialogBox 	
8 	user32.dll 	DialogBoxIndirectParamAorW 	
9 	user32.dll 	DialogBoxIndirectParamW 	
10 	mozglue.dll 	mozilla::detail::MutexImpl::unlock() 	mozglue/misc/Mutex_windows.cpp:33
11 	xul.dll 	nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&, unsigned int, nsTimerImpl::Callback&&) 	xpcom/threads/nsTimerImpl.cpp:232
12 	mozglue.dll 	mozilla::detail::MutexImpl::unlock() 	mozglue/misc/Mutex_windows.cpp:33
13 	xul.dll 	nsFilePicker::ShowFilePicker(nsString const&) 	widget/windows/nsFilePicker.cpp:534
14 	xul.dll 	nsFilePicker::ShowW(short*) 	widget/windows/nsFilePicker.cpp:620
15 	xul.dll 	nsFilePicker::Show(short*) 	widget/windows/nsFilePicker.cpp:648
16 	xul.dll 	AsyncShowFilePicker::Run() 	widget/nsBaseFilePicker.cpp:86
17 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1418
18 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
19 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
20 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
21 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
22 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:271
23 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:283
24 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4567
25 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4747
26 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4842
27 	xul.dll 	mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/Bootstrap.cpp:45
28 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
29 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
30 	kernel32.dll 	BaseThreadInitThunk 	
31 	ntdll.dll 	__RtlUserThreadStart 	
32 	ntdll.dll 	_RtlUserThreadStart
The volume of crashes in 56 seems very low. Track 56- for now. Feel free to nominate again if you don't agree.
tagging this as regression to get more eyeballs on it (though it might not technically be our regression)
Keywords: regression
According to https://support.mozilla.org/en-US/questions/1176159 this is possibly related to Immunet Anti-virus. Worth checking out.
We can try blocklisting. David can you also investigate the issue, seems like a crash related to this antivirus and file picking.

Adam, can you also try contacting the company and directing them to this bug?
Flags: needinfo?(jmathies)
(ni? adam re: Comment #4)
Flags: needinfo?(astevenson)
I believe this is related to DLLs injecting into our process but I think the issue is probably more general than Immunet AV.  I haven't been able to come up with anything actionable.

First, on Immunet AV, I tried 32-bit Nightly with it installed (Immunet's installer says it chose a 64-bit install).  I was unable to get it to crash when spawned from either the chrome process (Ctrl-O) or proxied from the content process (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/file).  I also don't think I see anything related to Immunet (or really AV in general) in the modules or correlations sections of the crash-stats, although maybe they don't appear in crash-stats if they have been unloaded?

-- The rest of this comment is deep in the debugging weeds --

As for the actual crash, I believe (pretty strongly but definitely without a proof) that the issue is DLL(s) registering for hooks and not cleaning them up (that is, calling SetWindowsHookEx but freeing the handler they register without calling UnhookWindowsHookEx).  I dont have a repro for this bug but I have stepped through the behavior when it _doesn't_ crash and the point of failure seems to be a dynamic call to a function pointer stored in a table.  Indeed, it does two lookups and the first one is strong evidence of the second.  The first lookup, in KiUserCallbackDispatcher, looks to be bringing up the table of function hooks for a specific hook type given by the idHook parameter of SetWindowHookEx [1].  The killer here is the names of the functions it looks up -- in our crashes the name of the function is "__fnHkINLPMSG", which looks like gibberish, until you see the names of other functions in the table, like "__fnHkINLPMOUSEHOOKSTRUCTEX".  This starts to make it pretty clear that the name of the function is a mangled idHook type and our crashes are almost certainly due to hooking the "WH_MSGFILTER" type.  If that weren't convincing enough, the docs for WH_MSGFILTER are: "Installs a hook procedure that monitors messages generated as a result of an input event in a dialog box, message box, menu, or scroll bar."  I believe it is then crashing calling someones bad WH_MSGFILTER handler.  Its more likely that it doesn't exist than that the handler crashed, given the call stack.

Another genuine possibility is that the offending application hooked up its handlers with the wrong bitness (ie a 64-bit handler in our 32-bit process).  I don't know anything about how this might happen tho.

Note that the type of the handler explains why this is happening when we post a dialog -- it may be the rare opportunity for the system to call this handler.

Given this explanation, its not a surprise that there are other crashes under CallHookWithSEH that are probably other injections failing to run or to clean up after themselves.  For example, [2] is a different crash that seems to be due to a WH_MOUSE handler.  [3] is a crash-stats search that looks for __fnHkINLPMSG in the call stack and [4] is the graph showing the spike in this crash on Sep 12.  The spike, strangely, is only in release and the esr, despite the release having gone out 2 weeks prior.  So I suspect something changed out in the wild -- something like a bad AV release -- I just haven't found it.

---

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms644990(v=vs.85).aspx
[2] https://crash-stats.mozilla.com/report/index/3c3955ed-d66c-4f9b-9cda-971d10170921
[3] https://crash-stats.mozilla.com/signature/?proto_signature=~__fnHkINLPMSG&signature=CallHookWithSEH&date=%3E%3D2017-08-21T23%3A46%3A11.000Z&date=%3C2017-09-21T23%3A46%3A11.000Z
[4] https://crash-stats.mozilla.com/signature/?proto_signature=~__fnHkINLPMSG&signature=CallHookWithSEH&date=%3E%3D2017-08-21T23%3A46%3A11.000Z&date=%3C2017-09-21T23%3A46%3A11.000Z#graphs
Flags: needinfo?(davidp99)
Blocks: injecteject
Flags: needinfo?(jmathies)
This is a demo app, hacked up from this github repo [1], that I used to explore SetWindowsHookEx.  I was able to use this to demonstrate most of the stuff in my last comment.  For example, the method is the callback for the hook for WH_MSGFILTER events and the method is only called when we use dialogs, not menus (specially handled) or scrollbars (due to APZ) or other stuff.  What I got wrong was that this handler calls LoadLibrary, so any loaded DLL shouldn't be easily unloaded.  However, the crashes seem to happen immediately after the jump -- the topmost valid stack entry is a "call eax" and eax contains the location that we crash at -- so its probably a bad function pointer.  And I still don't understand how this stuff works with 32/64-bit inconsistencies.

[1] https://github.com/proteansec/visual-studio-projects
Just an update that I've reached out and will update when I hear back.
Flags: needinfo?(astevenson)
They have someone that can help us look into this issue. I've added :jimm to the email chain.
Whiteboard: [main process crash]
This was a known issue and something addressed in their last release 6.0.4, which is available on their website.
Priority: -- → P3
Component: Widget: Win32 → Other
Product: Core → External Software Affecting Firefox
Version: 55 Branch → unspecified
OS: Windows 10 → Windows
Do you know what the issue was? Is there anything we can do on our side?
Flags: needinfo?(jmathies)
Orphaned windows hooks is the current theory, not much to go on here.
Flags: needinfo?(jmathies)
Blocks: 1435797
No longer blocks: injecteject
Whiteboard: [main process crash] → [main process crash][inj+]
You need to log in before you can comment on or make changes to this bug.