Closed Bug 44621 Opened 25 years ago Closed 25 years ago

crash on shutdown, mac-only

Categories

(Core :: XUL, defect, P3)

PowerPC
Mac System 8.6
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: waterson, Assigned: danm.moz)

Details

(Keywords: crash, platform-parity, regression, Whiteboard: [nsbeta2+])

Attachments

(2 files)

Mac crashes on shutdown with the following stack: Calling chain using A6/R1 links Back chain ISA Caller 00000000 PPC 1BCE0600 0BB5C800 PPC 1BCC63B4 main+001E4 0BB5C790 PPC 1AFB2B0C NS_ShutdownXPCOM(nsIServiceManager*)+0021C 0BB5C660 PPC 1AF4F4D8 nsServiceManager::ShutdownGlobalServiceManager(nsIServiceManager **)+00038 0BB5C610 PPC 1AF4E280 nsServiceManagerImpl::Release()+0008C 0BB5C5D0 PPC 1AF4E0B4 nsServiceManagerImpl::~nsServiceManagerImpl()+0004C 0BB5C590 PPC 1AF448A4 nsObjectHashtable::~nsObjectHashtable()+0002C 0BB5C550 PPC 1AF44ADC nsObjectHashtable::Reset()+00020 0BB5C510 PPC 1AF441D4 nsHashtable::Reset(int (*)(nsHashKey*, void*, void*), void*)+000 40 0BB5C4C0 PPC 1B0B88BC PL_HashTableEnumerateEntries+00060 0BB5C450 PPC 1AF440BC _hashEnumerateRemove(PLHashEntry*, int, void*)+00038 0BB5C410 PPC 1AF4DEC0 DeleteEntry(nsHashKey*, void*, void*)+0002C 0BB5C3D0 PPC 1B3AB3D8 nsAppShellService::Release()+0008C 0BB5C390 PPC 1B3AB000 nsAppShellService::~nsAppShellService()+000C0 0BB5C320 PPC 1B3B59E4 nsWebShellWindow::Close()+0001C 0BB5C2E0 PPC 1B3BC8A0 nsWebShellWindow::Destroy()+00318 0BB5C230 PPC 1B3DEBAC nsXULWindow::Destroy()+00848 0BB5C0D0 PPC 1B3C2B98 nsGetInterface::~nsGetInterface()+12DA0 0BB5C080 PPC 1AB3FA18 nsWindow::Release()+00014 0BB5C040 PPC 1AB65EAC nsBaseWidget::Release()+0008C 0BB5C000 PPC 1AB6AEFC nsMacWindow::~nsMacWindow()+000F4 0BB5BFA0 PPC 1AB82978 nsChildWindow::~nsChildWindow()+00044 0BB5BF60 PPC 1AB3F6E8 nsWindow::~nsWindow()+002D0 0BB5BED0 PPC 1AB5D140 nsMenuBar::Release()+0008C 0BB5BE90 PPC 1AB5D904 nsMenuBar::~nsMenuBar()+0010C 0BB5BE40 PPC 1AC77C88 nsXULElement::Release()+0008C 0BB5BE00 PPC 1AC771DC nsXULElement::~nsXULElement()+002F4 0BB5BD70 PPC 1AF48024 nsSupportsArray::Release()+0008C 0BB5BD30 PPC 1AF47D5C nsSupportsArray::~nsSupportsArray()+0002C 0BB5BCF0 PPC 1AF4828C nsSupportsArray::DeleteArray()+00020 0BB5BCB0 PPC 1AF48F5C nsSupportsArray::Clear()+00044 0BB5BC70 PPC 1AC77C88 nsXULElement::Release()+0008C 0BB5BC30 PPC 1AC771DC nsXULElement::~nsXULElement()+002F4 0BB5BBA0 PPC 1AF48024 nsSupportsArray::Release()+0008C 0BB5BB60 PPC 1AF47D5C nsSupportsArray::~nsSupportsArray()+0002C 0BB5BB20 PPC 1AF4828C nsSupportsArray::DeleteArray()+00020 0BB5BAE0 PPC 1AF48F5C nsSupportsArray::Clear()+00044 0BB5BAA0 PPC 1AC77C88 nsXULElement::Release()+0008C 0BB5BA60 PPC 1AC771DC nsXULElement::~nsXULElement()+002F4 0BB5B9D0 PPC 1AF48024 nsSupportsArray::Release()+0008C 0BB5B990 PPC 1AF47D5C nsSupportsArray::~nsSupportsArray()+0002C 0BB5B950 PPC 1AF4828C nsSupportsArray::DeleteArray()+00020 0BB5B910 PPC 1AF48F5C nsSupportsArray::Clear()+00044 0BB5B8D0 PPC 1AC77C88 nsXULElement::Release()+0008C 0BB5B890 PPC 1AC76F6C nsXULElement::~nsXULElement()+00084 0BB5B800 PPC 1AC922C0 nsXULElement::Slots::~Slots()+00188 0BB5B710 PPC 1ABD3684 CompositeDataSourceImpl::Release()+000F8 0BB5B6C0 PPC 1B308B78 nsChromeUIDataSource::Release()+0008C Closing log
adding some keywords
Keywords: crash, pp, regression
This doesn't happen all of the time. However, it does for anything complicated. If I just start and exit, I usually don't see this. If I run all the bloaturls, I see it 100% of the time (unless I see another crash).
sfraser has been poking around in this area most recently. i don't know what else has changed. cc'ing saari too. sfraser, if it's not you, kick it back to us.
Assignee: trudelle → sfraser
[note that sfraser is gone until 7.10. If this is urgent, it may be worth reassigning.]
Putting on [nsbeta2+] radar for beta2 fix.
Whiteboard: [nsbeta2+]
Although I can't explain precisely why it crashed, I do have a fix for it that fixes code that was wrong, so I'm taking this bug from sfraser. I'd have to guess that the crash was Mac-only because of ordering issues.
Assignee: sfraser → dbaron
dbaron: I think that the only thing that should be necessary here is the refcount stabilization before deleting the object. I looked a bit deeper into the chrome registry datasource, and am really, really confused by what the ownership model is here. The chrome registry's mCompositeDataSource is an observer of mChromeUIDataSource: that seems to be backwards. The mChromeUIDataSource is the datasource that gets registered with the RDF service; so presumably, it will want to be notified of changes in the *composite* datasource that it "owns", right? hyatt: when you get back, can you help us untangle this?
Ok, please ignore my previous comments. I read the code a bit more and now understand what's going on. dbaron's fix is starting to smell better all the time ;-).
Ok, so dbaron and I looked at it a bit more. dbaron's fix (or something like it) is the right thing to do: it makes the composite datasource able to handle fancy ownership models like the chrome UI Datasource is throwing at it. I re-wrote the patch to get rid of the threadsafety stuff (which is bogus on this object, anyway). Will attach...
I'm still seeing the crash occasionally with waterson's new patch (I've been using his Mac, so we're both using the same machine...). However, I'm certainly not seeing it 100% of the time anymore. I've only seen it 2 or 3 times. So I think even if we can't figure the rest out we should probably check this fix in, because it seems like the right thing to do and because it fixes most of the crashes. (We should also make sure it triggers the threadsafety assertion if used on a different thread.)
Hrmph. Tried applying these diffs to my Linux build and I hit the threadsafety assertions, resolving "chrome:" URLs from the necko thread. So, I think I'll reneg and say that doing a properly thread-safe AddRef()/ReleaSe() is probably the right thing to do here. I think dbaron's original patch was probably pretty close...
Simon, as I mentioned today, I'm really stuck on this. Could you have a look?
Assignee: dbaron → sfraser
setting milestone
Target Milestone: --- → M17
I see this crash always after loading just an FTP directory listing URL.
Status: NEW → ASSIGNED
So the crash happens at nsChromeUIDataSource::~nsChromeUIDataSource() { mComposite->RemoveObserver(this); since mComposite has already been deleted. mComposite goes away when the chrome registry is deleted, which happens before the nsChromeUIDataSource goes away in this case.
The chromeUIDataSource is being held onto by a chain of references that lead back to menu items on Mac, so that's why this bug is Mac-only. Assigning to hyatt, since he did nsChromeUIDataSource and has a comment about mComposite ownership in there.
Try that again
Assignee: sfraser → hyatt
Status: ASSIGNED → NEW
So is the CompositeDataSourceImpl that's releasing the nsChromeUIDataSource not the mComposite member of the nsChromeUIDataSource?
Whiteboard: [nsbeta2+] → [nsbeta2+] 7/20 fix
.
Assignee: hyatt → danm
Whiteboard: [nsbeta2+] 7/20 fix → [nsbeta2+] fix in hand, awaiting approvals
Whiteboard: [nsbeta2+] fix in hand, awaiting approvals → [nsbeta2+] fix in hand; might fail approval
The nsChromeUIDataSource was holding a weak reference (not an nsIWeakReference; a simple pointer) to its own data source (mComposite). This object was deleted before the nsChromeUIDataSource if some UI element held a strong reference to the nsChromeUIDataSource, and that element lived longer than the nsChromeRegistry. This happens on the Mac because of its methusalistic menubar. Problem fixed by converting mComposite to a strong reference. I don't see this mentioned anywhere above: this bug happens (happened) most reliably on application quit after the chrome registry was registered (so, delete the file "Component Registry", launch Mozilla, quit.) I claim that's fixed. There seems to remain another crash on shutdown; it happens somewhere in a FrameManager destructor. This is a different bug (I haven't made a new bug for it since I haven't found a reliable way to reproduce it). But if while verifying you notice a crash on shutdown, check the stack trace before re-opening this one. (And of course make a new bug, if you can figure out how to make that other crash happen reliably.)
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Whiteboard: [nsbeta2+] fix in hand; might fail approval → [nsbeta2+]
danm: I take it that you verified that making that a strong ref didn't introduce cycles, and therefore cause lots of leaks?
It does introduce a cycle, but the cycle is explicitly broken when the chrome UI data source realizes it wants to be deleted.
I've been running without shutdown crashes on mac for the past two weeks. Marking verified fixed. Reopen if anyone can reproduce.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: