Closed Bug 677247 Opened 9 years ago Closed 9 years ago

Shutdown crashes in desktop Fennec builds on Windows

Categories

(Firefox for Android Graveyard :: General, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Firefox 8

People

(Reporter: martijn.martijn, Assigned: glandium)

References

Details

(Keywords: crash, regression)

Attachments

(2 files)

I'm seeing shutdown crashes in desktop Fennec builds on Windows, one after the profile manager disappears and one after closing Fennec totally.

This regressed between 2011-08-04 and 2011-08-05:
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-08-04+00%3A00%3A00&enddate=2011-08-05+08%3A00%3A00
I guess a regression from bug 675396?

Stacktrace from debug build:
 	10a7fb3a()	
>	nspr4.dll!_PR_NativeRunThread(void * arg=0x04d3a0f0)  Line 426 + 0xf bytes	C
 	nspr4.dll!pr_root(void * arg=0x04d3a0f0)  Line 122 + 0xf bytes	C
 	msvcr90d.dll!_callthreadstartex()  Line 348 + 0xf bytes	C
 	msvcr90d.dll!_threadstartex(void * ptd=0x04d507c0)  Line 331	C
 	kernel32.dll!773ded6c() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!772a37f5() 	
 	ntdll.dll!772a37c8()
Keywords: crash
This probably means something is not properly shutdown. Note that Desktop firefox doesn't have that problem, so that would be fennec specific code.
A nightly doesn't even start, here :(
(In reply to Mike Hommey [:glandium] from comment #2)
> A nightly doesn't even start, here :(

I must have done something wrong when extracting the nightly.

I can reproduce the crash, and it happens in the dll blocklist code.
I don't get the same stack trace at all, though :-/

In my case, ntdll!LdrShutdownProcess calls ntdll!LdrpCallInitRoutine, which calls ntshrui.dll!__CRT_INIT, which after several calls into ntshrui.dll ends up calling LoadLibraryExW... which calls the dll blocklist function which, at that time, is unloaded.
I see three possible solutions:
- Don't call XPCOMGlueShutdown(). This would effectively leave the libraries loaded and should avoid the problem altogether.
- Move the DLL blocklist in the fennec executable, so that the patched_LdrLoadDll function is always in memory, even after XPCOMGlueShutdown() is called.
- Unhook LdrLoadDll when unloading xul.dll. This could be done in a destructor of the WindowsDllInterceptor class.

Somehow, I think the cleanest approach would be the latter. Ehsan, what do you think?
(In reply to comment #5)
> I see three possible solutions:
> - Don't call XPCOMGlueShutdown(). This would effectively leave the libraries
> loaded and should avoid the problem altogether.
> - Move the DLL blocklist in the fennec executable, so that the
> patched_LdrLoadDll function is always in memory, even after XPCOMGlueShutdown()
> is called.
> - Unhook LdrLoadDll when unloading xul.dll. This could be done in a destructor
> of the WindowsDllInterceptor class.
> 
> Somehow, I think the cleanest approach would be the latter. Ehsan, what do you
> think?

I agree.  The third options seems to be the right choice.  Are you willing to work on the patch, or should I?
(In reply to Ehsan Akhgari [:ehsan] from comment #6)
> I agree.  The third options seems to be the right choice.  Are you willing
> to work on the patch, or should I?

I will.
I first thought I would actually restore the hooked function in their original state, but it's just easier, for now (considering it would be better to have a fix before the aurora merge), to just make them go through the trampoline.
This means a function can't be hooked again once a hook is unregistered, since it will contain a jump in its prologue, and we don't support jumps in the function prologue.

I have various ideas how we could improve the dll interceptor, but I think that can wait.
Attachment #552077 - Flags: review?(ehsan)
This is a pretty limited test, but allows to check that the hooks are properly registered/unregistered.
Attachment #552079 - Flags: review?(ehsan)
Comment on attachment 552077 [details] [diff] [review]
Unregister Dll hooks when WindowsDllInterceptor is destructed

Please provide an #error path for the target CPU in case it's none of the detected ones.  r=me with that.
Attachment #552077 - Flags: review?(ehsan) → review+
(In reply to Mike Hommey [:glandium] from comment #8)
> Created attachment 552077 [details] [diff] [review]
> Unregister Dll hooks when WindowsDllInterceptor is destructed
> 
> I first thought I would actually restore the hooked function in their
> original state, but it's just easier, for now (considering it would be
> better to have a fix before the aurora merge), to just make them go through
> the trampoline.
> This means a function can't be hooked again once a hook is unregistered,
> since it will contain a jump in its prologue, and we don't support jumps in
> the function prologue.
> 
> I have various ideas how we could improve the dll interceptor, but I think
> that can wait.

Oh, forgot to mention.  Can you document this fact at least?
Comment on attachment 552079 [details] [diff] [review]
Test hook unregistration

This looks amazing!  Thanks a lot for writing this test :-)
Attachment #552079 - Flags: review?(ehsan) → review+
Comment on attachment 552077 [details] [diff] [review]
Unregister Dll hooks when WindowsDllInterceptor is destructed

In theory, this could affect Firefox 7.
Attachment #552077 - Flags: approval-mozilla-aurora?
Assignee: nobody → mh+mozilla
Attachment #552079 - Flags: approval-mozilla-aurora?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Backed out the test because if crashes on win debug.
http://hg.mozilla.org/mozilla-central/rev/86abf721b3fe
(Forgot to remove the test file before landing, but that doesn't really matter)

It turns out the crash happens when leaving the hook, which returns to the wrong address. It only happens when building with no optimization, and it also (quite obviously, actually) happens when reverting the change from this bug. Which is why i only backed out the test.
(In reply to Mike Hommey [:glandium] from comment #15)
> Backed out the test because if crashes on win debug.
> http://hg.mozilla.org/mozilla-central/rev/86abf721b3fe
> (Forgot to remove the test file before landing, but that doesn't really
> matter)
> 
> It turns out the crash happens when leaving the hook, which returns to the
> wrong address. It only happens when building with no optimization, and it
> also (quite obviously, actually) happens when reverting the change from this
> bug. Which is why i only backed out the test.

And it's due to the fact that GetVersionExA is WINAPI, thus stdcall, and the function pointer for the original function was not marked as stdcall... so when returning from the function, the hook wouldn't adjust the stack pointer accordingly and wouldn't get the right return address.
And returning from the hook also returns with the wrong stack pointer if the hook itself is not stdcall.
Landed a fixup and re-enabled the test:
http://hg.mozilla.org/mozilla-central/rev/42f7ed136034
(In reply to comment #17)
> Landed a fixup and re-enabled the test:
> http://hg.mozilla.org/mozilla-central/rev/42f7ed136034

Looks great, thanks for debugging the issue :)
Verified fixed in today's build, thanks.
Status: RESOLVED → VERIFIED
Attachment #552077 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #552079 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Severity: normal → critical
You need to log in before you can comment on or make changes to this bug.