Closed Bug 725503 Opened 8 years ago Closed 4 years ago

crash in radhslib.dll from Naomi Internet Filter

Categories

(Core :: Networking, defect, critical)

12 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox12 + wontfix
firefox14 - ---

People

(Reporter: marcia, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-6f25c7a6-3635-417b-a9c1-757922120208 .
============================================================= 

Seen while looking at Aurora crash stats but happens across all versions.

https://crash-stats.mozilla.com/report/list?signature=radhslib.dll@0x3b6f

Bug 519348 was filed back in the day to block something similar to this.

Correlations:

radhslib.dll@0x3b6f|EXCEPTION_ACCESS_VIOLATION_READ (13 crashes)
    100% (13/13) vs.   0% (25/13378) radhslib.dll
    100% (13/13) vs.   0% (25/13378) radprlib.dll

https://crash-stats.mozilla.com/report/index/73963a76-3c8f-44c9-8775-c2e912120204

Frame 	Module 	Signature [Expand] 	Source
0 	radhslib.dll 	radhslib.dll@0x3b6f 	
1 		@0x5f0e0009 	
2 	nspr4.dll 	SocketRead 	nsprpub/pr/src/io/prsocket.c:657
3 	xul.dll 	nsSocketInputStream::Read 	netwerk/base/src/nsSocketTransport2.cpp:362
4 	xul.dll 	nsHttpConnection::OnWriteSegment 	netwerk/protocol/http/nsHttpConnection.cpp:1052
5 	xul.dll 	nsHttpTransaction::WritePipeSegment 	netwerk/protocol/http/nsHttpTransaction.cpp:544
6 		@0x7fff 	
7 	xul.dll 	nsHttpTransaction::WriteSegments 	netwerk/protocol/http/nsHttpTransaction.cpp:574
8 	xul.dll 	nsXULScrollFrame::DoLayout 	layout/generic/nsGfxScrollFrame.cpp:1292
9 	xul.dll 	nsHttpConnection::OnSocketReadable 	netwerk/protocol/http/nsHttpConnection.cpp:1084
10 	xul.dll 	nsHttpConnection::OnInputStreamReady 	netwerk/protocol/http/nsHttpConnection.cpp:1204
11 	xul.dll 	nsSocketInputStream::OnSocketReady 	netwerk/base/src/nsSocketTransport2.cpp:255
12 	xul.dll 	nsSocketTransport::OnSocketReady 	netwerk/base/src/nsSocketTransport2.cpp:1574
13 	xul.dll 	nsSocketTransportService::DoPollIteration 	netwerk/base/src/nsSocketTransportService2.cpp:759
14 	xul.dll 	nsSocketTransportService::Run 	netwerk/base/src/nsSocketTransportService2.cpp:642
15 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
16 	xul.dll 	nsThreadStartupEvent::`scalar deleting destructor' 	
17 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
18 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:308
19 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
20 	msvcr80.dll 	_callthreadstartex 	f:\\sp\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:348
21 	msvcr80.dll 	_threadstartex 	f:\\sp\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:326
22 	kernel32.dll 	BaseThreadStart
(In reply to Marcia Knous [:marcia] from comment #0) 
> Bug 519348 was filed back in the day to block something similar to this.
radhslib.dll is still blocklisted. See http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsWindowsDllBlocklist.cpp#119
Maybe there's something that bypasses it like with VKsaver (see bug 614966 comment 55).
It's #200 top browser crasher in 11.0 and #9 in 12.0b1.
Keywords: topcrash
Can we get some exploratory QA around this to ensure there haven't been any recent regressions in the blocklist? The DLL is still unversioned, and I've looked around Google to try to find new versions of the Naomi Internet Filter without success. We should also see if there's a correlation to a new version of Windows, which could have changed behavior out from under us.
(In reply to Alex Keybl [:akeybl] from comment #3)
> We should also see if there's a correlation to a new version of Windows, which
> could have changed behavior out from under us.
It's correlated to Windows XP:
    100% (111/111) vs.  10% (2240/21798) wtsapi32.dll
         60% (67/111) vs.   2% (461/21798) 5.1.2600.2180 (SP2)
         40% (44/111) vs.   4% (907/21798) 5.1.2600.5512 (SP3)

> Can we get some exploratory QA around this to ensure there haven't been any
> recent regressions in the blocklist?
Based on correlations per module version, the DLL blocklist is ineffective whatever the version (11.0 or 12.0) for:
* Accelerator.dll
* GoogleDesktopNetwork3.dll
* HOOK.DLL
* radhslib.dll
* vksaver.dll

It's not a startup crash that could have made it spike.

It's not caused by the Beta population (#139 in 11.0b8).

Fx 12 is more sensitive to this DLL for an unknown reason.
Thanks Scoobidiver. I'm going to talk to smooney about this one. I'm not sure that the right path forward here would be to try to find root cause for why this crash signature jumped up, as opposed to fixing our DLL blocklist code.
When I install the software, I immediately get a startup crash in Firefox (and many other apps, including IE and Visual Studio:

 	radprlib.dll!003e001c() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for radprlib.dll]	
>	mozutils.dll!je_free(void * ptr)  Line 6260 + 0xc1 bytes	C
 	nspr4.dll!_MD_CURRENT_THREAD()  Line 310	C
 	nspr4.dll!PR_GetEnv(const char * var)  Line 84	C
 	xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData)  Line 2801 + 0x1b bytes	C++
 	ntdll.dll!_ZwReadFile@36()  + 0xc bytes	
 	kernel32.dll!_ReadFile@20()  + 0x67 bytes	
 	xul.dll!nsLocalFile::`vftable'()  + 0x11b bytes	C++

Although the stack seems to be corrupted.
Note that the crash happens in radprlib.dll for me, not in radhslib.dll which is what we have blacklisted.
Both radprlib.dll and radhslib.dll are injected into our process before XRE_SetupDllBlocklist has executed, which means that our DLL blocklist is ineffective here.
They are indeed both loaded before __tmainCRTStartup, which suggests to me that they're injected by a system DLL of some sort.  Note that these are not regular AppInit_DLLs.  Here's their load sequence:

'firefox.exe': Loaded 'C:\Program Files\Mozilla Firefox\firefox.exe', Symbols loaded.
'firefox.exe': Loaded 'C:\WINDOWS\system32\ntdll.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\kernel32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\user32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\gdi32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\WinSxS\x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.4027_x-ww_e69378d0\msvcr80.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\msvcrt.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\imm32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\advapi32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\rpcrt4.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\secur32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\Program Files\rnamfler\radprlib.dll', Binary was not built with debug information.
'firefox.exe': Loaded 'C:\WINDOWS\system32\oleaut32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\ole32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\Program Files\rnamfler\radhslib.dll', Binary was not built with debug information.
'firefox.exe': Loaded 'C:\WINDOWS\system32\wtsapi32.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\winsta.dll', Symbols loaded (source information stripped).
'firefox.exe': Loaded 'C:\WINDOWS\system32\netapi32.dll', Symbols loaded (source information stripped).
Dependency Walker also fails to detect what code is loading this DLL.  I broke on LdrLoadDll in VC++, and it seems like it is the kernel which calls LdrLoadDll as the callback of some APC operations (given the fact that KiUserApcDispatcher seems to be the first frame on the stack when radprlib.dll gets loaded.)

This is scary, I don't know what they're doing.

But here's why we crash.  One of these two DLLs is hooking LoadLibraryExW.  We end up calling it with this call stack:

>	kernel32.dll!_LoadLibraryExW@12() 	
 	version.dll!_GetFileVersionInfoSizeW@8()  + 0x31 bytes	
 	xul.dll!CrashReporter::SetExceptionHandler(nsILocalFile * aXREDirectory=0x0211b100, bool force=false)  Line 733	C++
 	xul.dll!XRE_main(int argc=0x00000001, char * * argv=0x003940d0, const nsXREAppData * aAppData=0x004034d0)  Line 2768 + 0x1d bytes	C++
 	firefox.exe!wmain(int argc=0x00404030, wchar_t * * argv=0x00404038)  Line 107 + 0x599 bytes	C++
 	firefox.exe!pre_cpp_init()  Line 338 + 0x27 bytes	C
 	firefox.exe!__tmainCRTStartup()  Line 594 + 0x17 bytes	C
 	kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes	

The first 5 bytes of this function are:

7C801AF5 FF 25 1E 00 08 5F    jmp         dword ptr ds:[5F08001Eh]

Which denotes a trampoline (similar to the one we generate for LdrLoadDll.)  The target of the jmp does not fall into a module, so it's either JIT code or the uncompressed code for one of these DLLs (both of these DLLs are compressed.)  Here's the code at the target:

5F070F5A  push        edi  
5F070F5B  push        edx  
5F070F5C  push        ecx  
5F070F5D  push        eax  
5F070F5E  mov         edi,5F07000Ah  
5F070F63  mov         edx,dword ptr [esp+10h]  
5F070F67  mov         ecx,118h  
5F070F6C  xor         eax,eax  
5F070F6E  lock cmpxchg dword ptr [edi+1],edx  
5F070F73  je          5F070F7E  
5F070F75  add         edi,0Eh  
5F070F78  xor         eax,eax  
5F070F7A  loop        5F070F6E  
5F070F7C  jmp         5F070F82  
5F070F7E  mov         dword ptr [esp+10h],edi  
5F070F82  pop         eax  
5F070F83  pop         ecx  
5F070F84  pop         edx  
5F070F85  pop         edi  
5F070F86  jmp         003E001C  

The target of the jmp there is in radprlib.dll:

003E001C  push        ebp 

Which crashes.  It seems like the page does not have the execute bit set.

Given all this, I don't know if we can do anything to block this DLL.  The only thing which I can think of is to verify that none of LoadLibrary(Ex)(A/W) functions are hooked, but I don't know if we want to go to such great lengths.  Benjamin, what do you think?
(In reply to Alex Keybl [:akeybl] from comment #3)
> Can we get some exploratory QA around this to ensure there haven't been any
> recent regressions in the blocklist?

Since Ehsan's investigation suggests that we may not be able to do anything about this for FF12 (and the blocklist issue doesn't appear to be a recent regression), Let's get some QA testing to find out what change in FF12 caused this crash to jump up significantly. Once we know that, we can decide whether to back that change out or just take the hit.
FWIW, the crash that I reproduced locally is different with comment 0.  I have not been able to reproduce the other crash, but my analysis leading to the conclusion that we can't do much to blacklist this remains unchanged.
It seems to happen more often after 12.0a1/20120112.
Component: Extension Compatibility → Networking
OS: Windows NT → Windows XP
Product: Firefox → Core
QA Contact: extension.compatibility → networking
Summary: crash radhslib.dll → crash in radhslib.dll from Naomi Internet Filter
Adding Anthony to help with the request in Comment 11.
Sheila asked me to help in terms of when this regressed in crash numbers.

So far, I see that 11 beta had 15-30 crashes of the top signature of this library (radhslib.dll@0x3b6f) while 12 beta has 100-180. Now the numbers for 11 release are even way below what 12 beta is showing, so there seems to be some regression between 11 and 12 at least.

Data from 12 Nightly is too low to make accurate conclusions (not even one crash every day for the most part of the cycle), but it look like there could possibly have been an increase on 2012-01-28 (from 1 every few days to a couple a day). Still, it's hard to pinpoint this more than saying the frequency seems to have regressed between 11 and 12.

Still, crashing in that DLL is nothing really new, this has been around in multiple versions before 12, only the frequency became worse in 12 somehow.
Yeah, I don't think we want to start trying to beat the arms race of trying to prevent other DLLs from hooking functions, as evil as this all is. Is it possible that we are using DEP settings which are interfering with libraries which try to uncompress themself?
So I wasn't able to crash but I did hit a horrible user experience...

1) Install and start Firefox 12.0b2
2) Download and install Naomi Internet Filter 3.2.90 from Softpedia
3) Started Naomi and set up a password
4) Restart Firefox
5) Went to news.google.ca and opened an article with "Prostitution" in the title
> Firefox hung
6) Force quit Firefox and try restarting it
> Firefox wouldn't start...no errors or UI or anything (it just does nothing)
7) Restart Windows and start Firefox
> Firefox wouldn't start...no errors or UI or anything (it just does nothing)
8) Uninstall Firefox and remove profiles
9) Reinstall Firefox and start Firefox
> Firefox wouldn't start...no errors or UI or anything (it just does nothing)
10) Disable Naomi and try to start Firefox
> Firefox wouldn't start...no errors or UI or anything (it just does nothing)
11) Uninstall Naomi
> Firefox now starts

Again, I never experienced a crash so I'm not sure this is helpful to the bug. It's definitely a horrible user experience.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #16)
> Yeah, I don't think we want to start trying to beat the arms race of trying
> to prevent other DLLs from hooking functions, as evil as this all is. Is it
> possible that we are using DEP settings which are interfering with libraries
> which try to uncompress themself?

Yes, that is entirely possible, but I don't know what we can do to fix this?  Really if such code doesn't use VirtualProtect to modify the protection bits of the uncompressed code to make them executable, that's their bug, not ours...

But still the fact remains that users would end up hitting a UX like comment 17 or like what I experienced (Firefox wouldn't start at all without any explanations.)
Also, note that once this turns into a startup crash, we don't have visibility info the crash data at all since the crash happens *before* breakpad is registered, so the crash reporter never gets a chance to start itself up...  :(
So people pointed me to <> which suggests that Chromium blacklists these DLLs.  I looked a bit into what Chromium does, and here's a summary.

Chromium does DLL blacklisting only for its "target processes" (the sandbox processes, for example renderers).  What it does is that it creates the process paused, and intercepts NtMapViewOfSection and NtUnmapViewOfSection (the code lives in sandbox/src/target_interceptions.cc).  In the interceptors, it checks whether the thing being mapped is a PE image, in which case it matches it against the blacklist info that it has, and fails the Map call if it does.

Firstly, this is a possibly better blacklisting method than what we have (intercepting NtLoadDll) and I think I may want to do that.  However, I don't see how Chromium would manage to do this soon enough that it would catch these DLLs.  Perhaps processes which are created in the paused state do not run *any* code so no modules will be loaded at all by that time (but how would you intercept stuff in ntdll.dll if that is the case?)  I don't have a Windows machine now so I can't verify.  Brian, do you know the answer?

If creating a paused process gives us a chance to intercept ntdll before these DLLs are loaded, we can create a simple stub which does that for firefox.exe.  That would be a lot better than the blacklisting mechanism that we have today, but I don't know if that would be safe enough to take on branches, etc.

Benjamin, what do you think?
(In reply to Ehsan Akhgari [:ehsan] from comment #20)
> So people pointed me to <> which suggests that Chromium blacklists these
> DLLs.  I looked a bit into what Chromium does, and here's a summary.

The link is http://code.google.com/p/chromium/issues/detail?id=12517.
> Brian, do you know the answer?

Sorry, I do not.
While testing this I see just about the same thing as Anthony does in Comment 17, except after restarting Firefox I never get the browser to launch.
Marcia, out of curiosity, could you please test Chrome on that VM as well?  If it can run and render pages successfully, could you look and see which one of its processes has these DLLs loaded, if any?  Thanks!
Sure, will do.

(In reply to Ehsan Akhgari [:ehsan] from comment #24)
> Marcia, out of curiosity, could you please test Chrome on that VM as well? 
> If it can run and render pages successfully, could you look and see which
> one of its processes has these DLLs loaded, if any?  Thanks!
I just did some additional testing and my experience has been that as soon as I install Naomi Internet filter, I cannot launch any browser - IE, Chrome, Firefox, etc on this Windows XP VM.

As Anthony notes once I uninstall it everything seems to launch fine.
(In reply to Marcia Knous [:marcia] from comment #26)
> I just did some additional testing and my experience has been that as soon
> as I install Naomi Internet filter, I cannot launch any browser - IE,
> Chrome, Firefox, etc on this Windows XP VM.
> 
> As Anthony notes once I uninstall it everything seems to launch fine.

I find it weird that we're having so much different of an experience than our users - they have to be able to use Firefox 10, WinXP SP 2/3, and Naomi Internet Filter at the same time in order to still show up in our crash data. Sadly I can't find any versions of the software besides 3.2.90.
I'm going to track for FF13 and FF14 in case we feel we can make some lower risk changes to our blocklisting techniques in that timeframe.
Perhaps I wontfix'd this prematurely. We're also seeing FF12 crashes in bug 738661 related to the Covenant internet filtering service. Including some networking folks to see if we made any fundamental changes in FF12 that would cause internet filtering to break.
Do we know why this shows up in Socorro as a plugin crash (plugin icon is next to crash)?
(In reply to Marcia Knous [:marcia] from comment #30)
> Do we know why this shows up in Socorro as a plugin crash (plugin icon is
> next to crash)?
When there is at least one plugin crash, it displays the plugin icon. In 12.0b2, there's one crash report classified as a plugin crash: bp-57025fd8-d430-4769-b915-03eaa2120323.
(In reply to Alex Keybl [:akeybl] from comment #29)
> Perhaps I wontfix'd this prematurely. We're also seeing FF12 crashes in bug
> 738661 related to the Covenant internet filtering service. Including some
> networking folks to see if we made any fundamental changes in FF12 that
> would cause internet filtering to break.

As far as my investigations here show, this has nothing to do with networking.
Removing qawanted for now RE comment 28. Please re-add if there is something specific QA can do to move this bug forward.
Keywords: qawanted
(In reply to Ehsan Akhgari [:ehsan] from comment #32)
> As far as my investigations here show, this has nothing to do with
> networking.

You haven't been able to reproduce the top crash signature though, correct? I find it very fishy that two internet filtering applications have major crash spikes in FF12.
That's right, I have not reproduced the topcrash signature yet.
Ok, so sounds like we can't reproduce it but the trend so far is that it will be top crash throughout beta and potentially release. Any others ideas out there how we might be able to track this down?
(In reply to Sheila Mooney from comment #36)
> Ok, so sounds like we can't reproduce it but the trend so far is that it
> will be top crash throughout beta and potentially release. Any others ideas
> out there how we might be able to track this down?

I'm waiting to hear from Benjamin about comment 20.
The process is created in a suspended state at http://code.google.com/searchframe#OAMlx_jo-ck/src/sandbox/src/target_process.cc&l=152 using the CREATE_SUSPENDED flag to CreateProcess. The actual hooking is done here: http://code.google.com/searchframe#OAMlx_jo-ck/src/sandbox/src/target_interceptions.cc&l=18

I don't know about DllMain procedures, but I presume that they are not run.

I'm a bit skeptical that changing our startup sequence to use a stub would be a net win: it will undoubtedly have some startup time cost, and it may have other side effects. I believe that some parts of the chromium sandbox require Windows XP SP3, and that may affect our compatibility.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #38)
> The process is created in a suspended state at
> http://code.google.com/searchframe#OAMlx_jo-ck/src/sandbox/src/
> target_process.cc&l=152 using the CREATE_SUSPENDED flag to CreateProcess.
> The actual hooking is done here:
> http://code.google.com/searchframe#OAMlx_jo-ck/src/sandbox/src/
> target_interceptions.cc&l=18

Right.

> I don't know about DllMain procedures, but I presume that they are not run.

Why do you think that DllMain would not get run?

> I'm a bit skeptical that changing our startup sequence to use a stub would
> be a net win: it will undoubtedly have some startup time cost, and it may
> have other side effects. I believe that some parts of the chromium sandbox
> require Windows XP SP3, and that may affect our compatibility.

About the platform compatibility, the stub would not do any sand-boxing itself, so that may not be a concern.  But there's definitely the risk of regressions, and also a startup cost.

But in general, there's no good solution to this problem, any path that we decide to take (including not taking any actions) will suck in one way or the other.  :(
See bug 733892 comment 48 for an idea to change the way the DLL blocklisting is done.
I believe that the code which loads dependent libraries runs in user mode in the newly created process. So by pausing the process before any user code runs, it hasn't had the opportunity to run anything including the dependent library loading code.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #41)
> I believe that the code which loads dependent libraries runs in user mode in
> the newly created process. So by pausing the process before any user code
> runs, it hasn't had the opportunity to run anything including the dependent
> library loading code.

One gotcha is that the stub executable cannot protect itself against DLL injection as effectively, so we should be extremely careful about what the stub executable does.  For example it should link to as few system DLLs as possible (e.g., not user32.dll) and it should run the minimum amount of code necessary to do its task (we'd probably need to avoid linking against libc etc).
It doesn't sound like we're going to do anything about this bug for FF12 unless the investigation in bug 738661 ends up resolving this crash as well. We've tried to blocklist radhslib.dll in the past for unrelated reasons, and we're unable to reproduce the crash users are running into here.

We will, however, continue to track for FF13 and FF14 so that we can continue discussion of how to handle DLL blocklisting for injected libraries.
What's the next step for hardening our ability to blocklist DLLs in the future? Is this the right bug to track that work?
(In reply to Alex Keybl [:akeybl] from comment #44)
> What's the next step for hardening our ability to blocklist DLLs in the
> future? Is this the right bug to track that work?
I think it's bug 750601.
(In reply to Alex Keybl [:akeybl] from comment #44)
> What's the next step for hardening our ability to blocklist DLLs in the
> future? Is this the right bug to track that work?

Perhaps starting up Firefox paused and do what Chrome does for its renderer processes (see comment 20)?  That can potentially hurt startup though, so I'm not entirely convinced if we wanna go there.
Depends on: 750601
It's #38 top browser crasher in 13.0.1, but #16 in 14.0b11.
What's up with the plan to harden the DLL blocklist?
There is currently no credible plan to harden the DLL blocklist.
It's now a low volume crash in 17.0 and above.
Keywords: topcrash
Keywords: topcrash
Oh, marking as topcrash because it's #17 on 18.0b4 desktop.
Not a topcrash any more now, #53 on 17.0.1, #42 on 18.0b7, even lower in early 18.0 release data.
Keywords: topcrash
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.