Created attachment 376492 [details] [diff] [review] proposed fix Whenever I try to use Venkman, I crash in the nsMsgPropertyEnumerator. Venkman is getting all the properties of a header, including the enumerator, and when it gets garbage collected, it crashes. The fix is to make sure that the property enumerator holds a reference to the hdr, and to make sure the rowCellCursor is freed before the hdr.
I see the crash in multiple trees, and have been seeing it for a while. It might have to do with some of the view patches I've been running with, but I suspect not. It happens whenever new mail arrives. The problem might be in Mork, actually. Mork has its own ref-counting system, and I had to drag it kicking and screaming into the xpcom world. I think the row cell cursor stuff might not have been dragged into the xpcom world correctly, since up until now, no one was using it. I'll poke around that a little.
FWIW, Mork does addref the row cell cursor correctly, so that's not it. And if I run Venkman w/o this patch, I do keep crashing. I'll try to find some time to come up with an explanation of what's going on.
There is something funny going on with the mork row references, that manifests itself in much more serious situations than this including several top crashers, such as bug 480090. I guess I am still hoping that if the root cause gets solved in that blocker bug, then perhaps this patch won't be needed. Or if you can develop a reproducible test case from nsMsgPropertyEnumerator issues, then that could be used to understand the underlying problem that might also be used to address those other bugs. But maybe that is hoping for too much. Just in case anyone is wondering, the crash from bug 480090 started occurring before the patch creating nsMsgPropertyEnumerator landed.
If I'm running Venkman, the msg hdr gets destroyed before the property enumerator (normally I think this doesn't happen since the db caches msg hdrs so you somehow need to get a msg hdr that's not cached - I don't know how that's happening, but we have to deal with it). The msg hdr does a release ref on the mdb row, which only has a single XPCom reference to it, so it gets deleted. But the row cell cursor is still holding onto it, though it doesn't hold an xpcom reference. Mork has its own ref counting system internally, and we only use XPCOM references for the things that get returned outside of Mork. I'm a little leery of pushing xpcom ref counting semantics further into Mork (e.g., making mork row cell cursor hold an xpcom reference to the row object) because I really don't want to introduce reference cycles/leaks into Mork. But I can try...
Thanks for the analysis. Do you think that may also be the root cause of some of our more common mork crashers as well?
(In reply to comment #6) > Thanks for the analysis. Do you think that may also be the root cause of some > of our more common mork crashers as well? I suspect the other mork crashers are less subtle (e.g., bug 480090). I believe that's a simple case of our non-mork objects (e.g., the offline imap operations) holding references to mork memory past the time the mork db has been closed. If you think of Mork as using memory pools, and closing the db as freeing the pool, then you can see why that's bad. For msg hdrs, we solve this by having the nsMsgDatabase wrapper around Mork db's keep track of all the extant msg hdrs in a hash table, and clearing their mdb row pointers before we close the mork db. I suppose one solution for the offline imap operation issue is to put them in the same hash table, and rejigger the class hierarchy of those objects a little so that they both implement an interface that has a clear method, and they'd clear their mdb rows when we iterate through the hash table. Or they could share a base class that has the mdb row...That would not fix the property enumerator issue, though, since that's not a case of the db getting closed. I did try tweaking Mork to use a bit more xp com ref counting, and it does fix the crash, but it also caused some assertions in the case that used to crash that made me a bit nervous...so I'm still inclined to go with some variant of my patch - I'll see if I can simplify it a little.
Created attachment 379992 [details] [diff] [review] for posterity, mork xpcom ref-counting tweak this is a patch that fixes the crash, but generated a slightly disturbing assertion.
Created attachment 380460 [details] Unit test that will crash This unit test will crash when run, and is fixed by bienvenu's patch. It crashes in the row object. I'm not sure if this is something we want to promote though, since trying to do a database operation after a forceDBClosed is not really valid.
(In reply to comment #9) > Created an attachment (id=380460) [details] > Unit test that will crash > > This unit test will crash when run, and is fixed by bienvenu's patch. It > crashes in the row object. I'm not sure if this is something we want to promote > though, since trying to do a database operation after a forceDBClosed is not > really valid. What does not promoting it amount to? The semantics of forceDBClosed aren't really compatible with our JS extension world, so we need to wrap everything in foam rubber... naive/lazy JS code shouldn't be able to crash the system. (I think it's okay for orphaned objects to go into a state where they simply throw exceptions no matter what you try to do to them, but no crasing.)
pinging for sr - I'd like to land this before I start dealing with the forceDBClosed issue, to avoid conflicts.
Comment on attachment 376492 [details] [diff] [review] proposed fix This is causing test_index_messages.js to hang if I apply the patch from bug 438922 to my tree as well (this makes sure we clean up after finishing each unit test). This makes me nervous. Unfortunately, the patch that would get me more info with check-interactive is broken (bug 384339) so I can't get more info at the moment :-(
there shouldn't be any property enumerators extant at shut-down time - the only code that uses them in the test cases uses stack variables.
Do any of the property enumerators ever get exposed at all to XPConnect? It can and will hang onto everything it sees until a GC pass happens. Perhaps forcing a GC collection in that shutdown helper patch?
the property enumerator is accessible via xpconnect, but only venkman, which has a lovely habit of enumerating the properties for fun, actually accesses the property enumerator via xpconnect (other than rkent's extensions, which may also access the property enumerator)
There is an existing unit test for the propertyEnumerator that accesses it via xpconnect.
Standard8, could that have been the test that hung? I don't think the test_index_messages test exposes the property enumerator to xpconnect, but if it copy/moves messages, it will cause a property enumerator to be created and destroyed in c++.
Comment on attachment 376492 [details] [diff] [review] proposed fix I guess I hadn't checked the patch in the way I thought I had - I've just tried reproducing and I can reproduce the hang without or without your patch (but with my shutdown patch). So its obviously not this patch that is the issue. > nsCOMPtr<nsIMdbEnv> m_mdbEnv; > nsCOMPtr<nsIMdbStore> m_mdbStore; >+ nsRefPtr<nsMsgHdr> m_hdr; Can you add a short note here saying that we keep hold of it to stop m_hdr being deleted and tidying up the other objects that this enumerator requires? sr=Standard8
Attachment #376492 - Flags: superreview?(bugzilla) → superreview+
fix checked in with comment added.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.