[xul!PL_DHashTableEnumerate] crash on FireFox close

RESOLVED FIXED in mozilla29

Status

()

defect
--
critical
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: bagoficha, Assigned: mozilla-bugs)

Tracking

({crash})

15 Branch
mozilla29
x86_64
Windows 7
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(2 attachments, 3 obsolete attachments)

Posted file mini-dump (obsolete) —
crash on FireFox close.

unfortunately i can't reproduce this crash :(

===
FOLLOWUP_IP: 
xul!PL_DHashTableEnumerate+7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708]
709ce567 0fbf4208        movsx   eax,word ptr [edx+8]

NTGLOBALFLAG:  400

APPLICATION_VERIFIER_FLAGS:  0

FAULTING_THREAD:  00000fc8

BUGCHECK_STR:  APPLICATION_FAULT_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ

PRIMARY_PROBLEM_CLASS:  NULL_CLASS_PTR_DEREFERENCE

DEFAULT_BUCKET_ID:  NULL_CLASS_PTR_DEREFERENCE

LAST_CONTROL_TRANSFER:  from 708cb25a to 709ce567

STACK_TEXT:  
148df968 708cb25a 00000000 709367d0 148df97c xul!PL_DHashTableEnumerate+0x7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\xpcom\build\pldhash.cpp @ 708]
148df984 7122536f 00000000 148df9b8 11378740 xul!nsBaseHashtable<nsCStringHashKey,nsCString,nsCString>::EnumerateRead+0x1f [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\dist\include\nsbasehashtable.h @ 190]
148df9e8 7125721a 0201d5e0 00000000 148dfa30 xul!CrashReporter::WriteExtraData+0x61 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1890]
148dfa10 7125b9c8 0201d580 148dfa30 148dfa3c xul!CrashReporter::WriteExtraForMinidump+0x45 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1923]
148dfa48 70e6a56f 00000000 11378740 148dfa6c xul!CrashReporter::OnChildProcessDumpRequested+0x48 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\nsexceptionhandler.cpp @ 1990]
148dfa8c 70f9d0a2 142ed600 11378740 02979e10 xul!google_breakpad::CrashGenerationServer::HandleDumpRequest+0x60 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 820]
148dfa9c 772c11ae 11378740 00000000 63b3599b xul!google_breakpad::CrashGenerationServer::OnDumpRequest+0x13 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\toolkit\crashreporter\google-breakpad\src\client\windows\crash_generation\crash_generation_server.cc @ 759]
148dfadc 772a4945 148dfb40 02979e10 029732f8 ntdll!RtlpTpWaitCallback+0x94
148dfb04 772a43fd 148dfb40 02973358 63b35f23 ntdll!TppWaitpExecuteCallback+0x11b
148dfc64 76a73677 02969148 148dfcb0 77289f42 ntdll!TppWorkerThread+0x572
148dfc70 77289f42 02969148 63b35ff7 00000000 kernel32!BaseThreadInitThunk+0xe
148dfcb0 77289f15 772a3e59 02969148 00000000 ntdll!__RtlUserThreadStart+0x70
148dfcc8 00000000 772a3e59 02969148 00000000 ntdll!_RtlUserThreadStart+0x1b
===

===
0:004> lmvm xul
start    end        module name
708c0000 717ea000   xul      C (private pdb symbols)  C:\ProgramData\dbg\sym\xul.pdb\5CACCCF9F0CB448E944E7BF78A69E3FC2\xul.pdb
    Loaded symbol image file: xul.dll
    Image path: C:\Program Files (x86)\Mozilla Firefox\xul.dll
    Image name: xul.dll
    Timestamp:        Thu Sep 06 05:15:39 2012 (5047F93B)
    CheckSum:         00000000
    ImageSize:        00F2A000
    File version:     15.0.1.4631
    Product version:  15.0.1.4631
    File flags:       0 (Mask 3F)
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0000.04b0
    CompanyName:      Mozilla Foundation
    ProductName:      Firefox
    InternalName:     libxul
    OriginalFilename: libxul
    ProductVersion:   15.0.1
    FileVersion:      15.0.1
    FileDescription:  15.0.1
    LegalCopyright:   License: MPL 1.1/GPL 2.0/LGPL 2.1
    LegalTrademarks:  Mozilla
    Comments:         Mozilla

0:004> lmvm firefox
start    end        module name
000e0000 001c1000   firefox  C (private pdb symbols)  C:\ProgramData\dbg\sym\firefox.pdb\25CB5ECD8C2B44ACBBF8262DE59E837E2\firefox.pdb
    Loaded symbol image file: firefox.exe
    Image path: C:\Program Files (x86)\Mozilla Firefox\firefox.exe
    Image name: firefox.exe
    Timestamp:        Thu Sep 06 05:17:57 2012 (5047F9C5)
    CheckSum:         00000000
    ImageSize:        000E1000
    File version:     15.0.1.4631
    Product version:  15.0.1.0
    File flags:       0 (Mask 3F)
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0000.04b0
    CompanyName:      Mozilla Corporation
    ProductName:      Firefox
    InternalName:     Firefox
    OriginalFilename: firefox.exe
    ProductVersion:   15.0.1
    FileVersion:      15.0.1
    FileDescription:  Firefox
    LegalCopyright:   ©Firefox and Mozilla Developers; available under the MPL 2 license.
    LegalTrademarks:  Firefox is a Trademark of The Mozilla Foundation.
    Comments:         Firefox is a Trademark of The Mozilla Foundation.
===
Posted file mini-dump
"binary/octet-stream" instead "text/plain"
Attachment #661869 - Attachment is obsolete: true
This is interesting, probably a race condition with a plugin crashing during shutdown, but I thought we fixed this in bug 773665. It's possible that there's still a race there. In any case I don't think this report needs to remain private; should this minidump remain private, since it might contain passwords in the stack data?
Component: General → Breakpad Integration
Product: Firefox → Toolkit
First I would like to attach a full memory dump and set the bug as "private", but then decided that the mini-dump enough to determine the cause.
If it is important, change it to "public". This dump doesn't have private data.
Thanks. Opening up so we can get more eyes on it.
Group: core-security
Status: UNCONFIRMED → NEW
Ever confirmed: true
look at thread-0
===
KERNELBASE!Sleep
xul!google_breakpad::CrashGenerationServer::~CrashGenerationServer
xul!google_breakpad::CrashGenerationServer::`scalar deleting destructor'
xul!CrashReporter::OOPDeinit
xul!XRE_main
firefox!wmain
firefox!__tmainCRTStartup
kernel32!BaseThreadInitThunk
ntdll!__RtlUserThreadStart
ntdll!_RtlUserThreadStart
===

in sources:
==
XRE::XRE_main
{
  ....
  //UnsetExceptionHandler
  {
     ...
     delete crashReporterAPIData_Hash;   // <-- !!! #1
     crashReporterAPIData_Hash = nsnull; // <-- !!!
     ...

     //OOPDeinit()
     {
        ...
        delete crashServer;  // <-- !!! thread-0 now sleeps here
        crashServer = NULL;
     }
  }
}
==

and we have the following:
the CrashServer (thread-4) recieved incoming request and processes it,
but thread-0 at the same time freeing important pointers (#1).
So, thread-4 use the "crashReporterAPIData_Hash" after it has been freed.

I think that if the first CrashServer shutdown, and then remove the exception hooks and free pointers, then it should work correct.
Severity: normal → critical
Crash Signature: [@ google_breakpad::CrashGenerationServer::~CrashGenerationServer()]
Keywords: crash
Alex, I believe you are correct. Are you interested in writing the patch?
I'm not ready yet :)
Just yesterday the first time opened the source code, have not even tried to build the project. Better to do it to someone who already understands this.
This may have been fixed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1013

I have a patch queue to update to the latest Breakpad source, I'm still sorting out the last few issues. Once I have that done I can give you Try builds to test with.
FireFox 16.0.1, problem still exists
FireFox 16.0.2, same problem
The problem still exists in Firefox 26.0.  It is easy to reproduce with the inadvertent assistance of ThinApp, a virtualization product from VMware.  (I am a ThinApp engineer.)  ThinApp's hooks and injected threads can change an application's timing enough to expose serialization problems.

To reproduce the problem, I captured Firefox 26.0 and Flash 11.9.900.170 on Win7-32, and set Firefox's home page to

  http://www.adobe.com/software/flash/about

Launch the virtualized Firefox, wait for Adobe's brief video to start playing, then close the browser's window.  In most tests, Flash incurs a memory access exception during termination.  Mozilla code linked with Flash routes the exception to Firefox, which breaks because of the race described in comment #5.

To fix the problem, I added serialization in nsExceptionHandler.cpp to bail out of dump processing after UnsetExceptionHandler has been called.  Could one of the source maintainers vet and apply my fix?  The new vs old (26.0) diff is

238,241d237
< // Avoid a race during application termination.
< static Mutex* dumpSafetyLock;
< static bool isSafeToDump = false;
<
1075,1083d1070
<   // Initialize the flag and mutex used to avoid dump processing
<   // once browser termination has begun.
<   NS_ASSERTION(!dumpSafetyLock, "Shouldn't have a lock yet");
<   // Do not deallocate this lock while it is still possible for
<   // isSafeToDump to be tested on another thread.
<   dumpSafetyLock = new Mutex("dumpSafetyLock");
<   MutexAutoLock lock(*dumpSafetyLock);
<   isSafeToDump = true;
<
1340,1344d1326
<   if (isSafeToDump) {
<     MutexAutoLock lock(*dumpSafetyLock);
<     isSafeToDump = false;
<   }
<
2204,2215d2185
<   // Bail out if dumping is not safe.  This can occur before
<   // initialization is complete, or after termination has begun.
<   if (!isSafeToDump)
<     return;
<
<   // The prior test ensures that the mutex is initialized.  Now test
<   // the flag again under mutex protection to avoid a thread race, then
<   // hold the mutex until the current dump request is complete.
<   MutexAutoLock lock(*dumpSafetyLock);
<   if (!isSafeToDump)
<     return;
Does a friendly reminder violate Bugzilla etiquette?  More than two weeks since I offered a fix, there's no indication that anyone has noticed.
Flags: needinfo?
Flags: needinfo? → needinfo?(ted)
Art, please attach a patch and then request review. See https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch

But it doesn't look like a lock is necessary here, can we just use an atomic?
Flags: needinfo?(ted)
The lock seems appropriate, since OnChildProcessDumpRequested() needs to block UnsetExceptionHandler() for the relatively long time that the current dump is processing.  Is there an Atomic idiom that supports this without a sleep loop?

Having retrieved the source previously from a tarball, I see I'll need to install hg and mq to create the patch; and get acquainted with the Mozilla try server.  Figuring out a Litmus test may be interesting, since the natural repro environment is with Firefox and Flash packaged as a ThinApp.
Please advise on next steps, as I am a newbie.
Comment on attachment 8356880 [details] [diff] [review]
Add serialization to avoid thread race during Firefox shutdown

I don't think this is properly threadsafe yet. In particular in OnChildProcessDumpRequested, you check isSafeToDump outside of the lock. Is there any reason we can't do this within the lock?

Currently in SetExceptionHandler you create the lock and set isSafeToDump at the end of the method, which means that there is a small window where the exception handler is already set but there is no lock. It seems that if you create the lock first (before initializing gExceptionHandler), then everything later could just be wrapped in the lock.
I updated the patch accordingly, after validating my test case.

I had been worried that OnChildProcessDumpRequested() could be called early during Firefox initialization, but I think that was a false alarm.  I now think OnChildProcessDumpRequested() is called only from CrashGenerationServer::HandleDumpRequest(), via the dump_callback_ member; that the CrashGenerationServer object is constructed only by OOPInit(); that the path to OOPInit() is

  PluginModuleParent::NP_Initialize
    PluginModuleParent::InitializeInjector
      InjectCrashReporterIntoProcess
        OOPInit

which probably occurs comfortably after SetExceptionHandler() runs.
Attachment #8356880 - Attachment is obsolete: true
Attachment #8357337 - Flags: review?(ted)
Attachment #8357337 - Flags: review?(ted) → review+
Art, thanks for the patch! I'm going to do a try run to make sure that all of our automated tests pass, and then you or I can set the checkin-needed keyword to get this pushed to the main tree.
Assignee: nobody → mozilla-bugs
That's great news.  I look forward to hearing the results of your automated tests.
The leaks are intentional, since the mutex must be valid and held during dump requests that occur during Firefox shutdown.  Is there a later time during Firefox shutdown when we know that a dump request will not arrive, or be pending, from a terminating plugin?
The fix now destroys the mutex in UnsetExceptionHandler(), following the OOPDeinit() call.  My ThinApp test case still works.
Attachment #8357337 - Attachment is obsolete: true
Comment on attachment 8359392 [details] [diff] [review]
Add serialization to avoid thread race during Firefox shutdown

Looks good to me. I'll do another try run with debug builds this time for leak checking ;-)
Attachment #8359392 - Flags: review+
try passed, fingers crossed
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ea4cb98da617
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in before you can comment on or make changes to this bug.