Closed
Bug 791784
Opened 12 years ago
Closed 11 years ago
[xul!PL_DHashTableEnumerate] crash on FireFox close
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
mozilla29
People
(Reporter: bagoficha, Assigned: mozilla-bugs)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 3 obsolete files)
58.64 KB,
application/octet-stream
|
Details | |
3.60 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
crash on FireFox close.
unfortunately i can't reproduce this crash :(
===
FOLLOWUP_IP:
xul!PL_DHashTableEnumerate+7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708]
709ce567 0fbf4208 movsx eax,word ptr [edx+8]
NTGLOBALFLAG: 400
APPLICATION_VERIFIER_FLAGS: 0
FAULTING_THREAD: 00000fc8
BUGCHECK_STR: APPLICATION_FAULT_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ
PRIMARY_PROBLEM_CLASS: NULL_CLASS_PTR_DEREFERENCE
DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE
LAST_CONTROL_TRANSFER: from 708cb25a to 709ce567
STACK_TEXT:
148df968 708cb25a 00000000 709367d0 148df97c xul!PL_DHashTableEnumerate+0x7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708]
148df984 7122536f 00000000 148df9b8 11378740 xul!nsBaseHashtable<nsCStringHashKey,nsCString,nsCString>::EnumerateRead+0x1f [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\dist\include\nsbasehashtable.h @ 190]
148df9e8 7125721a 0201d5e0 00000000 148dfa30 xul!CrashReporter::WriteExtraData+0x61 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1890]
148dfa10 7125b9c8 0201d580 148dfa30 148dfa3c xul!CrashReporter::WriteExtraForMinidump+0x45 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1923]
148dfa48 70e6a56f 00000000 11378740 148dfa6c xul!CrashReporter::OnChildProcessDumpRequested+0x48 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1990]
148dfa8c 70f9d0a2 142ed600 11378740 02979e10 xul!google_breakpad::CrashGenerationServer::HandleDumpRequest+0x60 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 820]
148dfa9c 772c11ae 11378740 00000000 63b3599b xul!google_breakpad::CrashGenerationServer::OnDumpRequest+0x13 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 759]
148dfadc 772a4945 148dfb40 02979e10 029732f8 ntdll!RtlpTpWaitCallback+0x94
148dfb04 772a43fd 148dfb40 02973358 63b35f23 ntdll!TppWaitpExecuteCallback+0x11b
148dfc64 76a73677 02969148 148dfcb0 77289f42 ntdll!TppWorkerThread+0x572
148dfc70 77289f42 02969148 63b35ff7 00000000 kernel32!BaseThreadInitThunk+0xe
148dfcb0 77289f15 772a3e59 02969148 00000000 ntdll!__RtlUserThreadStart+0x70
148dfcc8 00000000 772a3e59 02969148 00000000 ntdll!_RtlUserThreadStart+0x1b
===
===
0:004> lmvm xul
start end module name
708c0000 717ea000 xul C (private pdb symbols) C:\ProgramData\dbg\sym\xul.pdb\5CACCCF9F0CB448E944E7BF78A69E3FC2\xul.pdb
Loaded symbol image file: xul.dll
Image path: C:\Program Files (x86)\Mozilla Firefox\xul.dll
Image name: xul.dll
Timestamp: Thu Sep 06 05:15:39 2012 (5047F93B)
CheckSum: 00000000
ImageSize: 00F2A000
File version: 15.0.1.4631
Product version: 15.0.1.4631
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0000.04b0
CompanyName: Mozilla Foundation
ProductName: Firefox
InternalName: libxul
OriginalFilename: libxul
ProductVersion: 15.0.1
FileVersion: 15.0.1
FileDescription: 15.0.1
LegalCopyright: License: MPL 1.1/GPL 2.0/LGPL 2.1
LegalTrademarks: Mozilla
Comments: Mozilla
0:004> lmvm firefox
start end module name
000e0000 001c1000 firefox C (private pdb symbols) C:\ProgramData\dbg\sym\firefox.pdb\25CB5ECD8C2B44ACBBF8262DE59E837E2\firefox.pdb
Loaded symbol image file: firefox.exe
Image path: C:\Program Files (x86)\Mozilla Firefox\firefox.exe
Image name: firefox.exe
Timestamp: Thu Sep 06 05:17:57 2012 (5047F9C5)
CheckSum: 00000000
ImageSize: 000E1000
File version: 15.0.1.4631
Product version: 15.0.1.0
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0000.04b0
CompanyName: Mozilla Corporation
ProductName: Firefox
InternalName: Firefox
OriginalFilename: firefox.exe
ProductVersion: 15.0.1
FileVersion: 15.0.1
FileDescription: Firefox
LegalCopyright: ©Firefox and Mozilla Developers; available under the MPL 2 license.
LegalTrademarks: Firefox is a Trademark of The Mozilla Foundation.
Comments: Firefox is a Trademark of The Mozilla Foundation.
===
"binary/octet-stream" instead "text/plain"
Attachment #661869 -
Attachment is obsolete: true
Comment 2•12 years ago
|
||
This is interesting, probably a race condition with a plugin crashing during shutdown, but I thought we fixed this in bug 773665. It's possible that there's still a race there. In any case I don't think this report needs to remain private; should this minidump remain private, since it might contain passwords in the stack data?
Component: General → Breakpad Integration
Product: Firefox → Toolkit
First I would like to attach a full memory dump and set the bug as "private", but then decided that the mini-dump enough to determine the cause.
If it is important, change it to "public". This dump doesn't have private data.
Comment 4•12 years ago
|
||
Thanks. Opening up so we can get more eyes on it.
Group: core-security
Status: UNCONFIRMED → NEW
Ever confirmed: true
look at thread-0
===
KERNELBASE!Sleep
xul!google_breakpad::CrashGenerationServer::~CrashGenerationServer
xul!google_breakpad::CrashGenerationServer::`scalar deleting destructor'
xul!CrashReporter::OOPDeinit
xul!XRE_main
firefox!wmain
firefox!__tmainCRTStartup
kernel32!BaseThreadInitThunk
ntdll!__RtlUserThreadStart
ntdll!_RtlUserThreadStart
===
in sources:
==
XRE::XRE_main
{
....
//UnsetExceptionHandler
{
...
delete crashReporterAPIData_Hash; // <-- !!! #1
crashReporterAPIData_Hash = nsnull; // <-- !!!
...
//OOPDeinit()
{
...
delete crashServer; // <-- !!! thread-0 now sleeps here
crashServer = NULL;
}
}
}
==
and we have the following:
the CrashServer (thread-4) recieved incoming request and processes it,
but thread-0 at the same time freeing important pointers (#1).
So, thread-4 use the "crashReporterAPIData_Hash" after it has been freed.
I think that if the first CrashServer shutdown, and then remove the exception hooks and free pointers, then it should work correct.
Updated•12 years ago
|
Severity: normal → critical
Crash Signature: [@ google_breakpad::CrashGenerationServer::~CrashGenerationServer()]
Keywords: crash
Comment 6•12 years ago
|
||
Alex, I believe you are correct. Are you interested in writing the patch?
I'm not ready yet :)
Just yesterday the first time opened the source code, have not even tried to build the project. Better to do it to someone who already understands this.
Comment 8•12 years ago
|
||
This may have been fixed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1013
I have a patch queue to update to the latest Breakpad source, I'm still sorting out the last few issues. Once I have that done I can give you Try builds to test with.
Reporter | ||
Comment 10•12 years ago
|
||
FireFox 16.0.2, same problem
Assignee | ||
Comment 11•11 years ago
|
||
The problem still exists in Firefox 26.0. It is easy to reproduce with the inadvertent assistance of ThinApp, a virtualization product from VMware. (I am a ThinApp engineer.) ThinApp's hooks and injected threads can change an application's timing enough to expose serialization problems.
To reproduce the problem, I captured Firefox 26.0 and Flash 11.9.900.170 on Win7-32, and set Firefox's home page to
http://www.adobe.com/software/flash/about
Launch the virtualized Firefox, wait for Adobe's brief video to start playing, then close the browser's window. In most tests, Flash incurs a memory access exception during termination. Mozilla code linked with Flash routes the exception to Firefox, which breaks because of the race described in comment #5.
To fix the problem, I added serialization in nsExceptionHandler.cpp to bail out of dump processing after UnsetExceptionHandler has been called. Could one of the source maintainers vet and apply my fix? The new vs old (26.0) diff is
238,241d237
< // Avoid a race during application termination.
< static Mutex* dumpSafetyLock;
< static bool isSafeToDump = false;
<
1075,1083d1070
< // Initialize the flag and mutex used to avoid dump processing
< // once browser termination has begun.
< NS_ASSERTION(!dumpSafetyLock, "Shouldn't have a lock yet");
< // Do not deallocate this lock while it is still possible for
< // isSafeToDump to be tested on another thread.
< dumpSafetyLock = new Mutex("dumpSafetyLock");
< MutexAutoLock lock(*dumpSafetyLock);
< isSafeToDump = true;
<
1340,1344d1326
< if (isSafeToDump) {
< MutexAutoLock lock(*dumpSafetyLock);
< isSafeToDump = false;
< }
<
2204,2215d2185
< // Bail out if dumping is not safe. This can occur before
< // initialization is complete, or after termination has begun.
< if (!isSafeToDump)
< return;
<
< // The prior test ensures that the mutex is initialized. Now test
< // the flag again under mutex protection to avoid a thread race, then
< // hold the mutex until the current dump request is complete.
< MutexAutoLock lock(*dumpSafetyLock);
< if (!isSafeToDump)
< return;
Assignee | ||
Comment 12•11 years ago
|
||
Does a friendly reminder violate Bugzilla etiquette? More than two weeks since I offered a fix, there's no indication that anyone has noticed.
Flags: needinfo?
Updated•11 years ago
|
Flags: needinfo? → needinfo?(ted)
Comment 13•11 years ago
|
||
Art, please attach a patch and then request review. See https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch
But it doesn't look like a lock is necessary here, can we just use an atomic?
Updated•11 years ago
|
Flags: needinfo?(ted)
Assignee | ||
Comment 14•11 years ago
|
||
The lock seems appropriate, since OnChildProcessDumpRequested() needs to block UnsetExceptionHandler() for the relatively long time that the current dump is processing. Is there an Atomic idiom that supports this without a sleep loop?
Having retrieved the source previously from a tarball, I see I'll need to install hg and mq to create the patch; and get acquainted with the Mozilla try server. Figuring out a Litmus test may be interesting, since the natural repro environment is with Firefox and Flash packaged as a ThinApp.
Assignee | ||
Comment 15•11 years ago
|
||
Please advise on next steps, as I am a newbie.
Comment 16•11 years ago
|
||
Comment on attachment 8356880 [details] [diff] [review]
Add serialization to avoid thread race during Firefox shutdown
I don't think this is properly threadsafe yet. In particular in OnChildProcessDumpRequested, you check isSafeToDump outside of the lock. Is there any reason we can't do this within the lock?
Currently in SetExceptionHandler you create the lock and set isSafeToDump at the end of the method, which means that there is a small window where the exception handler is already set but there is no lock. It seems that if you create the lock first (before initializing gExceptionHandler), then everything later could just be wrapped in the lock.
Assignee | ||
Comment 17•11 years ago
|
||
I updated the patch accordingly, after validating my test case.
I had been worried that OnChildProcessDumpRequested() could be called early during Firefox initialization, but I think that was a false alarm. I now think OnChildProcessDumpRequested() is called only from CrashGenerationServer::HandleDumpRequest(), via the dump_callback_ member; that the CrashGenerationServer object is constructed only by OOPInit(); that the path to OOPInit() is
PluginModuleParent::NP_Initialize
PluginModuleParent::InitializeInjector
InjectCrashReporterIntoProcess
OOPInit
which probably occurs comfortably after SetExceptionHandler() runs.
Attachment #8356880 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #8357337 -
Flags: review?(ted)
Updated•11 years ago
|
Attachment #8357337 -
Flags: review?(ted) → review+
Comment 18•11 years ago
|
||
Art, thanks for the patch! I'm going to do a try run to make sure that all of our automated tests pass, and then you or I can set the checkin-needed keyword to get this pushed to the main tree.
Assignee: nobody → mozilla-bugs
Assignee | ||
Comment 19•11 years ago
|
||
That's great news. I look forward to hearing the results of your automated tests.
Comment 20•11 years ago
|
||
Keywords: checkin-needed
Comment 21•11 years ago
|
||
Keywords: checkin-needed
Comment 22•11 years ago
|
||
Assignee | ||
Comment 23•11 years ago
|
||
The leaks are intentional, since the mutex must be valid and held during dump requests that occur during Firefox shutdown. Is there a later time during Firefox shutdown when we know that a dump request will not arrive, or be pending, from a terminating plugin?
Assignee | ||
Comment 24•11 years ago
|
||
The fix now destroys the mutex in UnsetExceptionHandler(), following the OOPDeinit() call. My ThinApp test case still works.
Attachment #8357337 -
Attachment is obsolete: true
Comment 25•11 years ago
|
||
Comment on attachment 8359392 [details] [diff] [review]
Add serialization to avoid thread race during Firefox shutdown
Looks good to me. I'll do another try run with debug builds this time for leak checking ;-)
Attachment #8359392 -
Flags: review+
Comment 28•11 years ago
|
||
Keywords: checkin-needed
Comment 29•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in
before you can comment on or make changes to this bug.
Description
•