Closed Bug 791784 Opened 12 years ago Closed 11 years ago

[xul!PL_DHashTableEnumerate] crash on FireFox close

Categories

(Toolkit :: Crash Reporting, defect)

15 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: bagoficha, Assigned: mozilla-bugs)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 3 obsolete files)

Attached file mini-dump (obsolete) —
crash on FireFox close. unfortunately i can't reproduce this crash :( === FOLLOWUP_IP: xul!PL_DHashTableEnumerate+7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708] 709ce567 0fbf4208 movsx eax,word ptr [edx+8] NTGLOBALFLAG: 400 APPLICATION_VERIFIER_FLAGS: 0 FAULTING_THREAD: 00000fc8 BUGCHECK_STR: APPLICATION_FAULT_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ PRIMARY_PROBLEM_CLASS: NULL_CLASS_PTR_DEREFERENCE DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE LAST_CONTROL_TRANSFER: from 708cb25a to 709ce567 STACK_TEXT: 148df968 708cb25a 00000000 709367d0 148df97c xul!PL_DHashTableEnumerate+0x7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708] 148df984 7122536f 00000000 148df9b8 11378740 xul!nsBaseHashtable<nsCStringHashKey,nsCString,nsCString>::EnumerateRead+0x1f [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\dist\include\nsbasehashtable.h @ 190] 148df9e8 7125721a 0201d5e0 00000000 148dfa30 xul!CrashReporter::WriteExtraData+0x61 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1890] 148dfa10 7125b9c8 0201d580 148dfa30 148dfa3c xul!CrashReporter::WriteExtraForMinidump+0x45 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1923] 148dfa48 70e6a56f 00000000 11378740 148dfa6c xul!CrashReporter::OnChildProcessDumpRequested+0x48 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1990] 148dfa8c 70f9d0a2 142ed600 11378740 02979e10 xul!google_breakpad::CrashGenerationServer::HandleDumpRequest+0x60 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 820] 148dfa9c 772c11ae 11378740 00000000 63b3599b xul!google_breakpad::CrashGenerationServer::OnDumpRequest+0x13 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 759] 148dfadc 772a4945 148dfb40 02979e10 029732f8 ntdll!RtlpTpWaitCallback+0x94 148dfb04 772a43fd 148dfb40 02973358 63b35f23 ntdll!TppWaitpExecuteCallback+0x11b 148dfc64 76a73677 02969148 148dfcb0 77289f42 ntdll!TppWorkerThread+0x572 148dfc70 77289f42 02969148 63b35ff7 00000000 kernel32!BaseThreadInitThunk+0xe 148dfcb0 77289f15 772a3e59 02969148 00000000 ntdll!__RtlUserThreadStart+0x70 148dfcc8 00000000 772a3e59 02969148 00000000 ntdll!_RtlUserThreadStart+0x1b === === 0:004> lmvm xul start end module name 708c0000 717ea000 xul C (private pdb symbols) C:\ProgramData\dbg\sym\xul.pdb\5CACCCF9F0CB448E944E7BF78A69E3FC2\xul.pdb Loaded symbol image file: xul.dll Image path: C:\Program Files (x86)\Mozilla Firefox\xul.dll Image name: xul.dll Timestamp: Thu Sep 06 05:15:39 2012 (5047F93B) CheckSum: 00000000 ImageSize: 00F2A000 File version: 15.0.1.4631 Product version: 15.0.1.4631 File flags: 0 (Mask 3F) File OS: 4 Unknown Win32 File type: 2.0 Dll File date: 00000000.00000000 Translations: 0000.04b0 CompanyName: Mozilla Foundation ProductName: Firefox InternalName: libxul OriginalFilename: libxul ProductVersion: 15.0.1 FileVersion: 15.0.1 FileDescription: 15.0.1 LegalCopyright: License: MPL 1.1/GPL 2.0/LGPL 2.1 LegalTrademarks: Mozilla Comments: Mozilla 0:004> lmvm firefox start end module name 000e0000 001c1000 firefox C (private pdb symbols) C:\ProgramData\dbg\sym\firefox.pdb\25CB5ECD8C2B44ACBBF8262DE59E837E2\firefox.pdb Loaded symbol image file: firefox.exe Image path: C:\Program Files (x86)\Mozilla Firefox\firefox.exe Image name: firefox.exe Timestamp: Thu Sep 06 05:17:57 2012 (5047F9C5) CheckSum: 00000000 ImageSize: 000E1000 File version: 15.0.1.4631 Product version: 15.0.1.0 File flags: 0 (Mask 3F) File OS: 4 Unknown Win32 File type: 2.0 Dll File date: 00000000.00000000 Translations: 0000.04b0 CompanyName: Mozilla Corporation ProductName: Firefox InternalName: Firefox OriginalFilename: firefox.exe ProductVersion: 15.0.1 FileVersion: 15.0.1 FileDescription: Firefox LegalCopyright: ©Firefox and Mozilla Developers; available under the MPL 2 license. LegalTrademarks: Firefox is a Trademark of The Mozilla Foundation. Comments: Firefox is a Trademark of The Mozilla Foundation. ===
Attached file mini-dump
"binary/octet-stream" instead "text/plain"
Attachment #661869 - Attachment is obsolete: true
This is interesting, probably a race condition with a plugin crashing during shutdown, but I thought we fixed this in bug 773665. It's possible that there's still a race there. In any case I don't think this report needs to remain private; should this minidump remain private, since it might contain passwords in the stack data?
Component: General → Breakpad Integration
Product: Firefox → Toolkit
First I would like to attach a full memory dump and set the bug as "private", but then decided that the mini-dump enough to determine the cause. If it is important, change it to "public". This dump doesn't have private data.
Thanks. Opening up so we can get more eyes on it.
Group: core-security
Status: UNCONFIRMED → NEW
Ever confirmed: true
look at thread-0 === KERNELBASE!Sleep xul!google_breakpad::CrashGenerationServer::~CrashGenerationServer xul!google_breakpad::CrashGenerationServer::`scalar deleting destructor' xul!CrashReporter::OOPDeinit xul!XRE_main firefox!wmain firefox!__tmainCRTStartup kernel32!BaseThreadInitThunk ntdll!__RtlUserThreadStart ntdll!_RtlUserThreadStart === in sources: == XRE::XRE_main { .... //UnsetExceptionHandler { ... delete crashReporterAPIData_Hash; // <-- !!! #1 crashReporterAPIData_Hash = nsnull; // <-- !!! ... //OOPDeinit() { ... delete crashServer; // <-- !!! thread-0 now sleeps here crashServer = NULL; } } } == and we have the following: the CrashServer (thread-4) recieved incoming request and processes it, but thread-0 at the same time freeing important pointers (#1). So, thread-4 use the "crashReporterAPIData_Hash" after it has been freed. I think that if the first CrashServer shutdown, and then remove the exception hooks and free pointers, then it should work correct.
Severity: normal → critical
Crash Signature: [@ google_breakpad::CrashGenerationServer::~CrashGenerationServer()]
Keywords: crash
Alex, I believe you are correct. Are you interested in writing the patch?
I'm not ready yet :) Just yesterday the first time opened the source code, have not even tried to build the project. Better to do it to someone who already understands this.
This may have been fixed upstream: http://code.google.com/p/google-breakpad/source/detail?r=1013 I have a patch queue to update to the latest Breakpad source, I'm still sorting out the last few issues. Once I have that done I can give you Try builds to test with.
FireFox 16.0.1, problem still exists
FireFox 16.0.2, same problem
The problem still exists in Firefox 26.0. It is easy to reproduce with the inadvertent assistance of ThinApp, a virtualization product from VMware. (I am a ThinApp engineer.) ThinApp's hooks and injected threads can change an application's timing enough to expose serialization problems. To reproduce the problem, I captured Firefox 26.0 and Flash 11.9.900.170 on Win7-32, and set Firefox's home page to http://www.adobe.com/software/flash/about Launch the virtualized Firefox, wait for Adobe's brief video to start playing, then close the browser's window. In most tests, Flash incurs a memory access exception during termination. Mozilla code linked with Flash routes the exception to Firefox, which breaks because of the race described in comment #5. To fix the problem, I added serialization in nsExceptionHandler.cpp to bail out of dump processing after UnsetExceptionHandler has been called. Could one of the source maintainers vet and apply my fix? The new vs old (26.0) diff is 238,241d237 < // Avoid a race during application termination. < static Mutex* dumpSafetyLock; < static bool isSafeToDump = false; < 1075,1083d1070 < // Initialize the flag and mutex used to avoid dump processing < // once browser termination has begun. < NS_ASSERTION(!dumpSafetyLock, "Shouldn't have a lock yet"); < // Do not deallocate this lock while it is still possible for < // isSafeToDump to be tested on another thread. < dumpSafetyLock = new Mutex("dumpSafetyLock"); < MutexAutoLock lock(*dumpSafetyLock); < isSafeToDump = true; < 1340,1344d1326 < if (isSafeToDump) { < MutexAutoLock lock(*dumpSafetyLock); < isSafeToDump = false; < } < 2204,2215d2185 < // Bail out if dumping is not safe. This can occur before < // initialization is complete, or after termination has begun. < if (!isSafeToDump) < return; < < // The prior test ensures that the mutex is initialized. Now test < // the flag again under mutex protection to avoid a thread race, then < // hold the mutex until the current dump request is complete. < MutexAutoLock lock(*dumpSafetyLock); < if (!isSafeToDump) < return;
Does a friendly reminder violate Bugzilla etiquette? More than two weeks since I offered a fix, there's no indication that anyone has noticed.
Flags: needinfo?
Flags: needinfo? → needinfo?(ted)
Art, please attach a patch and then request review. See https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch But it doesn't look like a lock is necessary here, can we just use an atomic?
Flags: needinfo?(ted)
The lock seems appropriate, since OnChildProcessDumpRequested() needs to block UnsetExceptionHandler() for the relatively long time that the current dump is processing. Is there an Atomic idiom that supports this without a sleep loop? Having retrieved the source previously from a tarball, I see I'll need to install hg and mq to create the patch; and get acquainted with the Mozilla try server. Figuring out a Litmus test may be interesting, since the natural repro environment is with Firefox and Flash packaged as a ThinApp.
Please advise on next steps, as I am a newbie.
Comment on attachment 8356880 [details] [diff] [review] Add serialization to avoid thread race during Firefox shutdown I don't think this is properly threadsafe yet. In particular in OnChildProcessDumpRequested, you check isSafeToDump outside of the lock. Is there any reason we can't do this within the lock? Currently in SetExceptionHandler you create the lock and set isSafeToDump at the end of the method, which means that there is a small window where the exception handler is already set but there is no lock. It seems that if you create the lock first (before initializing gExceptionHandler), then everything later could just be wrapped in the lock.
I updated the patch accordingly, after validating my test case. I had been worried that OnChildProcessDumpRequested() could be called early during Firefox initialization, but I think that was a false alarm. I now think OnChildProcessDumpRequested() is called only from CrashGenerationServer::HandleDumpRequest(), via the dump_callback_ member; that the CrashGenerationServer object is constructed only by OOPInit(); that the path to OOPInit() is PluginModuleParent::NP_Initialize PluginModuleParent::InitializeInjector InjectCrashReporterIntoProcess OOPInit which probably occurs comfortably after SetExceptionHandler() runs.
Attachment #8356880 - Attachment is obsolete: true
Attachment #8357337 - Flags: review?(ted)
Attachment #8357337 - Flags: review?(ted) → review+
Art, thanks for the patch! I'm going to do a try run to make sure that all of our automated tests pass, and then you or I can set the checkin-needed keyword to get this pushed to the main tree.
Assignee: nobody → mozilla-bugs
That's great news. I look forward to hearing the results of your automated tests.
The leaks are intentional, since the mutex must be valid and held during dump requests that occur during Firefox shutdown. Is there a later time during Firefox shutdown when we know that a dump request will not arrive, or be pending, from a terminating plugin?
The fix now destroys the mutex in UnsetExceptionHandler(), following the OOPDeinit() call. My ThinApp test case still works.
Attachment #8357337 - Attachment is obsolete: true
Comment on attachment 8359392 [details] [diff] [review] Add serialization to avoid thread race during Firefox shutdown Looks good to me. I'll do another try run with debug builds this time for leak checking ;-)
Attachment #8359392 - Flags: review+
try passed, fingers crossed
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: