Closed
Bug 677247
Opened 14 years ago
Closed 14 years ago
Shutdown crashes in desktop Fennec builds on Windows
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
Firefox 8
People
(Reporter: martijn.martijn, Assigned: glandium)
References
Details
(Keywords: crash, regression)
Attachments
(2 files)
|
2.54 KB,
patch
|
ehsan.akhgari
:
review+
blassey
:
approval-mozilla-aurora-
|
Details | Diff | Splinter Review |
|
5.79 KB,
patch
|
ehsan.akhgari
:
review+
blassey
:
approval-mozilla-aurora-
|
Details | Diff | Splinter Review |
I'm seeing shutdown crashes in desktop Fennec builds on Windows, one after the profile manager disappears and one after closing Fennec totally.
This regressed between 2011-08-04 and 2011-08-05:
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-08-04+00%3A00%3A00&enddate=2011-08-05+08%3A00%3A00
I guess a regression from bug 675396?
Stacktrace from debug build:
10a7fb3a()
> nspr4.dll!_PR_NativeRunThread(void * arg=0x04d3a0f0) Line 426 + 0xf bytes C
nspr4.dll!pr_root(void * arg=0x04d3a0f0) Line 122 + 0xf bytes C
msvcr90d.dll!_callthreadstartex() Line 348 + 0xf bytes C
msvcr90d.dll!_threadstartex(void * ptd=0x04d507c0) Line 331 C
kernel32.dll!773ded6c()
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
ntdll.dll!772a37f5()
ntdll.dll!772a37c8()
| Assignee | ||
Comment 1•14 years ago
|
||
This probably means something is not properly shutdown. Note that Desktop firefox doesn't have that problem, so that would be fennec specific code.
| Assignee | ||
Comment 2•14 years ago
|
||
A nightly doesn't even start, here :(
| Assignee | ||
Comment 3•14 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #2)
> A nightly doesn't even start, here :(
I must have done something wrong when extracting the nightly.
I can reproduce the crash, and it happens in the dll blocklist code.
| Assignee | ||
Comment 4•14 years ago
|
||
I don't get the same stack trace at all, though :-/
In my case, ntdll!LdrShutdownProcess calls ntdll!LdrpCallInitRoutine, which calls ntshrui.dll!__CRT_INIT, which after several calls into ntshrui.dll ends up calling LoadLibraryExW... which calls the dll blocklist function which, at that time, is unloaded.
| Assignee | ||
Comment 5•14 years ago
|
||
I see three possible solutions:
- Don't call XPCOMGlueShutdown(). This would effectively leave the libraries loaded and should avoid the problem altogether.
- Move the DLL blocklist in the fennec executable, so that the patched_LdrLoadDll function is always in memory, even after XPCOMGlueShutdown() is called.
- Unhook LdrLoadDll when unloading xul.dll. This could be done in a destructor of the WindowsDllInterceptor class.
Somehow, I think the cleanest approach would be the latter. Ehsan, what do you think?
Comment 6•14 years ago
|
||
(In reply to comment #5)
> I see three possible solutions:
> - Don't call XPCOMGlueShutdown(). This would effectively leave the libraries
> loaded and should avoid the problem altogether.
> - Move the DLL blocklist in the fennec executable, so that the
> patched_LdrLoadDll function is always in memory, even after XPCOMGlueShutdown()
> is called.
> - Unhook LdrLoadDll when unloading xul.dll. This could be done in a destructor
> of the WindowsDllInterceptor class.
>
> Somehow, I think the cleanest approach would be the latter. Ehsan, what do you
> think?
I agree. The third options seems to be the right choice. Are you willing to work on the patch, or should I?
| Assignee | ||
Comment 7•14 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #6)
> I agree. The third options seems to be the right choice. Are you willing
> to work on the patch, or should I?
I will.
| Assignee | ||
Comment 8•14 years ago
|
||
I first thought I would actually restore the hooked function in their original state, but it's just easier, for now (considering it would be better to have a fix before the aurora merge), to just make them go through the trampoline.
This means a function can't be hooked again once a hook is unregistered, since it will contain a jump in its prologue, and we don't support jumps in the function prologue.
I have various ideas how we could improve the dll interceptor, but I think that can wait.
Attachment #552077 -
Flags: review?(ehsan)
| Assignee | ||
Comment 9•14 years ago
|
||
This is a pretty limited test, but allows to check that the hooks are properly registered/unregistered.
Attachment #552079 -
Flags: review?(ehsan)
Comment 10•14 years ago
|
||
Comment on attachment 552077 [details] [diff] [review]
Unregister Dll hooks when WindowsDllInterceptor is destructed
Please provide an #error path for the target CPU in case it's none of the detected ones. r=me with that.
Attachment #552077 -
Flags: review?(ehsan) → review+
Comment 11•14 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #8)
> Created attachment 552077 [details] [diff] [review]
> Unregister Dll hooks when WindowsDllInterceptor is destructed
>
> I first thought I would actually restore the hooked function in their
> original state, but it's just easier, for now (considering it would be
> better to have a fix before the aurora merge), to just make them go through
> the trampoline.
> This means a function can't be hooked again once a hook is unregistered,
> since it will contain a jump in its prologue, and we don't support jumps in
> the function prologue.
>
> I have various ideas how we could improve the dll interceptor, but I think
> that can wait.
Oh, forgot to mention. Can you document this fact at least?
Comment 12•14 years ago
|
||
Comment on attachment 552079 [details] [diff] [review]
Test hook unregistration
This looks amazing! Thanks a lot for writing this test :-)
Attachment #552079 -
Flags: review?(ehsan) → review+
| Assignee | ||
Comment 13•14 years ago
|
||
| Assignee | ||
Comment 14•14 years ago
|
||
Comment on attachment 552077 [details] [diff] [review]
Unregister Dll hooks when WindowsDllInterceptor is destructed
In theory, this could affect Firefox 7.
Attachment #552077 -
Flags: approval-mozilla-aurora?
| Assignee | ||
Updated•14 years ago
|
Assignee: nobody → mh+mozilla
| Assignee | ||
Updated•14 years ago
|
Attachment #552079 -
Flags: approval-mozilla-aurora?
| Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 15•14 years ago
|
||
Backed out the test because if crashes on win debug.
http://hg.mozilla.org/mozilla-central/rev/86abf721b3fe
(Forgot to remove the test file before landing, but that doesn't really matter)
It turns out the crash happens when leaving the hook, which returns to the wrong address. It only happens when building with no optimization, and it also (quite obviously, actually) happens when reverting the change from this bug. Which is why i only backed out the test.
| Assignee | ||
Comment 16•14 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #15)
> Backed out the test because if crashes on win debug.
> http://hg.mozilla.org/mozilla-central/rev/86abf721b3fe
> (Forgot to remove the test file before landing, but that doesn't really
> matter)
>
> It turns out the crash happens when leaving the hook, which returns to the
> wrong address. It only happens when building with no optimization, and it
> also (quite obviously, actually) happens when reverting the change from this
> bug. Which is why i only backed out the test.
And it's due to the fact that GetVersionExA is WINAPI, thus stdcall, and the function pointer for the original function was not marked as stdcall... so when returning from the function, the hook wouldn't adjust the stack pointer accordingly and wouldn't get the right return address.
And returning from the hook also returns with the wrong stack pointer if the hook itself is not stdcall.
| Assignee | ||
Comment 17•14 years ago
|
||
Landed a fixup and re-enabled the test:
http://hg.mozilla.org/mozilla-central/rev/42f7ed136034
Comment 18•14 years ago
|
||
(In reply to comment #17)
> Landed a fixup and re-enabled the test:
> http://hg.mozilla.org/mozilla-central/rev/42f7ed136034
Looks great, thanks for debugging the issue :)
| Reporter | ||
Comment 19•14 years ago
|
||
Verified fixed in today's build, thanks.
Status: RESOLVED → VERIFIED
Updated•14 years ago
|
Attachment #552077 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Updated•14 years ago
|
Attachment #552079 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Updated•14 years ago
|
Severity: normal → critical
You need to log in
before you can comment on or make changes to this bug.
Description
•