Eliminate PLDHashtable in nsMsgDatabase

RESOLVED FIXED in Thunderbird 55.0

Status

MailNews Core
Database
RESOLVED FIXED
5 years ago
4 months ago

People

(Reporter: jcranmer, Assigned: jcranmer)

Tracking

Trunk
Thunderbird 55.0

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 4 obsolete attachments)

(Assignee)

Description

5 years ago
Old, outdated, hard-to-read.
(Assignee)

Comment 1

5 years ago
Created attachment 728244 [details] [diff] [review]
Kill it dead

Hey look, I also killed a few static_cast<nsMsgHdr*>! :-)

This requires the memory reporter to apply properly. I also did some code cleanup regarding the ownership of nsIMdbRow by nsMsgHdr and eliminated a few return values to reflect operation infallibility.
Attachment #728244 - Flags: review?(kent)

Comment 2

5 years ago
"This requires the memory reporter to apply properly"

I'm not sure what you mean by that. As you say, it does not apply as given. What do I have to do to get it to apply?

I've occasionally whined about big bugs with little motivation listed, so I'll try to preclude my fellow whiners by giving a motivation for this, which I only understood after reading the patch. Please correct me if I am incorrect:

PLDHashTable is a low-level implementation that is not the preferred hash table for common uses where standard types are stored. This bug changes the PLDHashTables uses in nsMsgDatabase to type-specific variants, resulting in cleaner, easier to read code.
(Assignee)

Comment 3

5 years ago
(In reply to Kent James (:rkent) from comment #2)
> "This requires the memory reporter to apply properly"
> 
> I'm not sure what you mean by that. As you say, it does not apply as given.
> What do I have to do to get it to apply?

Bug 480843. There's no real dependency, just the fact that two patches in my queue touch the same code, and I only decided to change this after getting sick of manipulating PLDHashTables in that bug.

> PLDHashTable is a low-level implementation that is not the preferred hash
> table for common uses where standard types are stored. This bug changes the
> PLDHashTables uses in nsMsgDatabase to type-specific variants, resulting in
> cleaner, easier to read code.

I also take the time to fix up some of the ownership: nsMsgHdr now explicitly uses an nsCOMPtr for its nsIMdbRow--it always held a strong reference to it, now we use nsCOMPtr to explicitly manage it.

Comment 4

4 years ago
Comment on attachment 728244 [details] [diff] [review]
Kill it dead

Review of attachment 728244 [details] [diff] [review]:
-----------------------------------------------------------------

Overall the patch is fine, but has bit rotted a bit since my review was delayed. I'm going to r- just so we discuss my few questions, but I probably won't give the next version much scrutiny before r+

I am suggesting that we cleanup at least one of memory management oddities, namely doing an addref in "new nsMsgHdr"

::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
@@ +462,5 @@
> +    hdr->GetMessageKey(&key);
> +  // TODO: Implement LRU replacement.
> +  if (m_cachedHeaders.Count() > m_cacheSize)
> +    ClearHdrCache();
> +  nsIMsgDBHdr *bar = hdr;

Why do you need this temporary instead of just using hdr directly?

@@ +563,5 @@
>  }
>  
> +void nsMsgDatabase::ClearHdrCache()
> +{
> +  if (m_cachedHeaders.IsInitialized())

Is there any reason why we can't just initialize m_cachedHeaders to the default value in the initializer (like you do for the inuse cache) and get rid of all of these .IsInitialized() checks?

@@ +597,5 @@
> +  m_headersInUse.Put(key, hdr);
> +
> +  // the hash table won't add ref, we'll do it ourselves
> +  // stand for the addref that CreateMsgHdr normally does.
> +  NS_ADDREF(hdr);

This is really odd, non-standard behavior. Why do this here instead of the more conventional way of doing the addref in CreateMsgHdr?

Also note that the comment in CreateMsgHdr now says:
// don't need to addref here; GetHdrFromUseCache addrefs.

which is wrong, if the odd behavior is kept, the addref is in AddHdrToUseCache.

@@ +627,5 @@
>    nsMsgHdr *msgHdr = new nsMsgHdr(this, hdrRow);
>    if(!msgHdr)
>      return NS_ERROR_OUT_OF_MEMORY;
>    msgHdr->SetMessageKey(key);
>    // don't need to addref here; GetHdrFromUseCache addrefs.

As I said earlier, this comment is incorrect, and I would prefer if the addref was done here, in the conventional place. I don't expect *msgHdr = new ... to return an addrefed value.

@@ +4029,3 @@
>    {
>      nsresult rv = InitRefHash();
>      if (NS_FAILED(rv))

Unlike the normal NS_ERROR_FAILURE from this method, a failure at this point is something that the devs should know about. Could we at least make this NS_ENSURE_SUCCESS(rv, rv) so that we see a warning if this is failing? Otherwise this is a completely silent failure.

@@ +4037,5 @@
> +}
> +
> +void nsMsgDatabase::AddRefToHash(nsCString &reference, nsMsgKey threadId)
> +{
> +  if (m_msgReferences.IsInitialized())

Is there any good reason why we are not initializing here if needed? Previously this was a complex pain, but now it is relatively easy to add. It was not clear to me from looking a couple of levels deep that we could guarantee initialization.

@@ +4052,5 @@
>  
>    for (int32_t i = 0; i < numReferences; i++)
>    {
>      nsAutoCString reference;
>      

Inserted line of just spaces? Please trim.

@@ +4105,2 @@
>  
>    // Populate table with references of existing messages

Is there some reason that you are removing the null check here?
In my own code, I would use NS_ENSURE_STATE(enumerator) here, as that is the shortest available version. That has not caught on in mailnews, but I am happy for you to start a trend :)

@@ +4487,5 @@
>    return NS_OK; // it's not an error not to find a msg hdr.
>  }
>  
>  nsIMsgDBHdr *nsMsgDatabase::GetMsgHdrForSubject(nsCString &subject)
>  {

This method AFAICT has no users. Perhaps it makes more sense to just remove it instead of fixing it.
Attachment #728244 - Flags: review?(kent) → review-
(Assignee)

Comment 5

4 years ago
(In reply to Kent James (:rkent) from comment #4)
> I am suggesting that we cleanup at least one of memory management oddities,
> namely doing an addref in "new nsMsgHdr"

I originally avoided doing that to minimize regressions and avoid making the patch even larger, but I'm not against doing that in this patch.
> 
> ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
> @@ +462,5 @@
> > +    hdr->GetMessageKey(&key);
> > +  // TODO: Implement LRU replacement.
> > +  if (m_cachedHeaders.Count() > m_cacheSize)
> > +    ClearHdrCache();
> > +  nsIMsgDBHdr *bar = hdr;
> 
> Why do you need this temporary instead of just using hdr directly?

I honestly have no idea what I was doing here.

> @@ +563,5 @@
> >  }
> >  
> > +void nsMsgDatabase::ClearHdrCache()
> > +{
> > +  if (m_cachedHeaders.IsInitialized())
> 
> Is there any reason why we can't just initialize m_cachedHeaders to the
> default value in the initializer (like you do for the inuse cache) and get
> rid of all of these .IsInitialized() checks?

The original reason, IIRC, was to honor the cache size if a non-custom value is set later. Looking in more detail in the code, this isn't an issue.

> @@ +4037,5 @@
> > +}
> > +
> > +void nsMsgDatabase::AddRefToHash(nsCString &reference, nsMsgKey threadId)
> > +{
> > +  if (m_msgReferences.IsInitialized())
> 
> Is there any good reason why we are not initializing here if needed?
> Previously this was a complex pain, but now it is relatively easy to add. It
> was not clear to me from looking a couple of levels deep that we could
> guarantee initialization.

InitRefHash is a fairly expensive method. Looking at the code, this may be a performance enhancement: we only fill the table if we need to grab a reference to it. If we're trying to add an entry and it doesn't exist, then it would be filled properly when we lazily create the table at use.

> @@ +4105,2 @@
> >  
> >    // Populate table with references of existing messages
> 
> Is there some reason that you are removing the null check here?
> In my own code, I would use NS_ENSURE_STATE(enumerator) here, as that is the
> shortest available version. That has not caught on in mailnews, but I am
> happy for you to start a trend :)

It is my understanding that ::operator new is now infallible in mozilla code, so it would abort instead of return NULL.
Joshua, I know you're quite busy, so just touching base before this possibly bitrots horribly ...

Is there any reason we might want this for TB31?

(This probably should bake well and ride the train if we go out for TB31)
Flags: needinfo?(Pidgeot18)
(Assignee)

Comment 7

4 years ago
(In reply to Wayne Mery (:wsmwk) from comment #6)
> Joshua, I know you're quite busy, so just touching base before this possibly
> bitrots horribly ...
> 
> Is there any reason we might want this for TB31?
> 
> (This probably should bake well and ride the train if we go out for TB31)

The main benefit of this is cleaning up some old code, so it's not at all a priority.
Flags: needinfo?(Pidgeot18)

Comment 8

6 months ago
Jorg, would these pointer games be something for you? There was a review done, the patch just needs polishing.
Flags: needinfo?(jorgk)
OS: Windows 7 → All
Hardware: x86_64 → All

Comment 9

6 months ago
Hey, Aceman, you went into clean-up mode, but I'm still trying to fix bugs our users will benefit from.
Flags: needinfo?(jorgk)
If this reduces failure points or helps the memory reporters be more accurate (which they currently do not) - and I don't know that it will - then it will be a win for users, just not user facing.
Status: ASSIGNED → NEW
Whiteboard: [patchlove]

Comment 11

6 months ago
Created attachment 8839599 [details] [diff] [review]
853881-kill-pldhash.patch - refreshed, some (not all!) compile problems fixed.

Refreshed to latest trunk. Sadly there are a few things that didn't compile here.
Most importantly classes nsDataHashtable and nsInterfaceHashtable don't have methods 'Init' and 'IsInitialised' any more. Looks like the need to be initialised in the constructor now.

The worst error is in nsMsgDatabase::CreateMsgHdr():
nsMsgDatabase.cpp(685): error C2664: 'nsMsgHdr::nsMsgHdr(const nsMsgHdr &)': cannot convert argument 2 from 'nsIMdbRow *' to 'already_AddRefed<nsIMdbRow>'

One option to fix this was to revert these changes:
-    nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow);
+    nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow);

-    nsIMdbRow     *m_mdbRow;
+    nsCOMPtr<nsIMdbRow> m_mdbRow;

-nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow)
+nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow)

The second option was to use a static cast. No idea whether that's correct or not.

Surprisingly TB even runs with this patch ;-)

(In reply to :aceman from comment #8)
> Jorg, would these pointer games be something for you? There was a review
> done, the patch just needs polishing.
That's extreme wishful thinking. Apart from being a pain to refresh, the underlying technology has changed and some stuff doesn't compile any more, most likely since M-C tightened some screws. Aceman, would you like to lend a hand here before we ask Kent for help?
Flags: needinfo?(acelists)

Comment 12

6 months ago
Yes, I can look at the new version.

Comment 13

6 months ago
Created attachment 8839731 [details] [diff] [review]
853881-kill-pldhash.patch (v2).

Hi Kent, I've refreshed the patch and solved all the compile problems.

However, I inserted a static cast as marked with XXX. I have addressed some of the issues from your comment #4 (removed 'bar' and 'IsInitialised' had to be removed anyway since the base class doesn't have such a method (any more?)).

There are a few comments left re. refcounting and error handling.

Since this abandoned code fell into my lap and I spent a few hours getting it into shape, could I ask you for your collaboration here to make the adjustments you require yourself. Sorry for the unusual and/or outrageous request, but the idea here is to save Joshua's work, and instead of going through a few loops your direct input would be much appreciated.

Aceman, I solved the compile problems myself, and I'm asking Kent to finish it off.
Attachment #8839599 - Attachment is obsolete: true
Flags: needinfo?(acelists)
Attachment #8839731 - Flags: review?(rkent)

Comment 14

6 months ago
OK, then. But I played with the patch and saw some compile warnigs:

1. reorder variables in the constructor (according to the order in nsMsgDatabase.h):

In file included from tbird-bin/dist/include/nsMailDatabase.h:10:0,
                 from mailnews/db/msgdb/src/nsMsgDatabase.cpp:10:
tbird-bin/dist/include/nsMsgDatabase.h: In constructor ‘nsMsgDatabase::nsMsgDatabase()’:
tbird-bin/dist/include/nsMsgDatabase.h:438:48: warning: ‘nsMsgDatabase::m_headersInUse’ will be initialized after [-Wreorder]
   nsDataHashtable<nsMsgKeyHashKey, nsMsgHdr *> m_headersInUse;
                                                ^
tbird-bin/dist/include/nsMsgDatabase.h:434:12: warning:   ‘uint32_t nsMsgDatabase::m_cacheSize’ [-Wreorder]
   uint32_t m_cacheSize;
            ^
mailnews/db/msgdb/src/nsMsgDatabase.cpp:924:1: warning:   when initialized here [-Wreorder]
 nsMsgDatabase::nsMsgDatabase()
 ^

2. not sure yet what to do with this one. Do we have a problem if hdr is not added to cache? Can we just ignore the return value?

mailnews/db/msgdb/src/nsMsgDatabase.cpp: In member function ‘void nsMsgDatabase::AddHdrToCache(nsIMsgDBHdr*, nsMsgKey)’:
mailnews/db/msgdb/src/nsMsgDatabase.cpp:505:55: warning: ignoring return value of ‘bool nsBaseHashtable<KeyClass, DataType, UserDataType>::Put(nsBaseHashtable<KeyClass, DataType, UserDataType>::KeyType, const UserDataType&, const fallible_t&) [with KeyClass = mozilla::mailnews::MsgKeyHashKey; DataType = nsCOMPtr<nsIMsgDBHdr>; UserDataType = nsIMsgDBHdr*; nsBaseHashtable<KeyClass, DataType, UserDataType>::KeyType = const unsigned int&; nsBaseHashtable<KeyClass, DataType, UserDataType>::fallible_t = mozilla::fallible_t]’, declared with attribute warn_unused_result [-Wunused-result]
   m_cachedHeaders.Put(key, hdr, mozilla::fallible_t());
                                                       ^
3. looks like this function just needs removing:

mailnews/db/msgdb/src/nsMsgDatabase.cpp: At global scope:
mailnews/db/msgdb/src/nsMsgDatabase.cpp:3950:1: warning: ‘void msg_DHashFreeStringKey(PLDHashTable*, PLDHashEntryHdr*)’ defined but not used [-Wunused-function]
 msg_DHashFreeStringKey(PLDHashTable* aTable, PLDHashEntryHdr* aEntry)

Comment 15

6 months ago
Aceman, can you be a charm and fix those warnings for me and then r?rkent again.

Re. item 2:
We have two options: Either put a MOZ_ASSEERT(???, "Logic error: Entry already in cache");
or explicitly discard the return value. This is the code that got removed:

-nsresult nsMsgDatabase::AddHdrToCache(nsIMsgDBHdr *hdr, nsMsgKey key) // do we want key? We could get it from hdr
-{
-  if (m_bCacheHeaders)
-  {
-    if (!m_cachedHeaders)
-      m_cachedHeaders = PL_NewDHashTable(&gMsgDBHashTableOps, (void *) nullptr, sizeof(struct MsgHdrHashElement), m_cacheSize );
-    if (m_cachedHeaders)
-    {
-      if (key == nsMsgKey_None)
-        hdr->GetMessageKey(&key);
-      if (m_cachedHeaders->entryCount > m_cacheSize)
-        ClearHdrCache(true);
-      PLDHashEntryHdr *entry = PL_DHashTableOperate(m_cachedHeaders, (void *) key, PL_DHASH_ADD);
-      if (!entry)
-        return NS_ERROR_OUT_OF_MEMORY; // XXX out of memory
-
-      MsgHdrHashElement* element = reinterpret_cast<MsgHdrHashElement*>(entry);
-      element->mHdr = hdr;
-      element->mKey = key;
-      NS_ADDREF(hdr);     // make the cache hold onto the header
-      return NS_OK;
-    }
-  }
-  return NS_ERROR_FAILURE;

So as far as I can see, the entry is just added, duplicate or not, hmm. What do you read here?
Whiteboard: [patchlove]

Comment 16

6 months ago
If it is a duplicate and it can't be added, we could silently ignore it.
I rather asked if it can't be added to cache due to out of memory or something. 
The original code returns with NS_ERROR_OUT_OF_MEMORY.

But it seems we freely empty the whole cache if it is full so it seems it's not critical if an item isn't in the cache (can't be added). I would just ignore the return value of the m_cachedHeaders.Put(). In the new code AddHdrToCache is void so we never check if it succeeded. It probably doesn't matter.

Comment 17

6 months ago
Created attachment 8839830 [details] [diff] [review]
853881-kill-pldhash.patch (v2.1)

Solved the 3 compile warnings.
Attachment #8839731 - Attachment is obsolete: true
Attachment #8839731 - Flags: review?(rkent)
Attachment #8839830 - Flags: review?(rkent)

Comment 18

5 months ago
Comment on attachment 8839830 [details] [diff] [review]
853881-kill-pldhash.patch (v2.1)

Review of attachment 8839830 [details] [diff] [review]:
-----------------------------------------------------------------

It still applies and compiles, and I ran it and nothing seemed to blow up. Also looked over the code and did not see any obvious issues. Many of us have looked at this, I think this is enough, so let's just land it.

r+=me

Comment 19

5 months ago
Thanks for looking over it again, but in comment #4 you had some issues which are *not* resolved, let me repeat them here:

==== Quote from comment #4 ====

I am suggesting that we cleanup at least one of memory management oddities, namely doing an addref in "new nsMsgHdr".

@@ +597,5 @@
> +  m_headersInUse.Put(key, hdr);
> +
> +  // the hash table won't add ref, we'll do it ourselves
> +  // stand for the addref that CreateMsgHdr normally does.
> +  NS_ADDREF(hdr);

This is really odd, non-standard behavior. Why do this here instead of the more conventional way of doing the addref in CreateMsgHdr?

Also note that the comment in CreateMsgHdr now says:
// don't need to addref here; GetHdrFromUseCache addrefs.

which is wrong, if the odd behavior is kept, the addref is in AddHdrToUseCache.

@@ +627,5 @@
>    nsMsgHdr *msgHdr = new nsMsgHdr(this, hdrRow);
>    if(!msgHdr)
>      return NS_ERROR_OUT_OF_MEMORY;
>    msgHdr->SetMessageKey(key);
>    // don't need to addref here; GetHdrFromUseCache addrefs.

As I said earlier, this comment is incorrect, and I would prefer if the addref was done here, in the conventional place. I don't expect *msgHdr = new ... to return an addrefed value.

==== End Quote from comment #4 ====

There is this additional issue where I had to add a static cast:

+  // XXX Compile problem here: Added static_cast.
+  nsMsgHdr *msgHdr = new nsMsgHdr(this, static_cast<already_AddRefed<nsIMdbRow>>(hdrRow));


I noticed that I also haven't addressed these:

==== Quote from comment #4 ====

@@ +4029,3 @@
>    {
>      nsresult rv = InitRefHash();
>      if (NS_FAILED(rv))

Unlike the normal NS_ERROR_FAILURE from this method, a failure at this point is something that the devs should know about. Could we at least make this NS_ENSURE_SUCCESS(rv, rv) so that we see a warning if this is failing? Otherwise this is a completely silent failure.

@@ +4487,5 @@
>  nsIMsgDBHdr *nsMsgDatabase::GetMsgHdrForSubject(nsCString &subject)
>  {

This method AFAICT has no users. Perhaps it makes more sense to just remove it instead of fixing it.

==== End Quote from comment #4 ====

While I can address the last two issues, I think I need your input on the ones above.
Flags: needinfo?(rkent)

Comment 20

5 months ago
Actually, what is the benefit of
-    nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow);
+    nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow);
-    nsIMdbRow     *m_mdbRow;
+    nsCOMPtr<nsIMdbRow> m_mdbRow;
-nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow)
+nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow)

which makes this
+  // XXX Compile problem here: Added static_cast.
+  nsMsgHdr *msgHdr = new nsMsgHdr(this, static_cast<already_AddRefed<nsIMdbRow>>(hdrRow));
necessary?

Looking at the caller of nsMsgDatabase::CreateMsgHdr() where the static cast was added, the get the rows from:
m_mdbStore->GetRow()
mRowCursor->NextRow(), mRowCursor->PrevRow()
m_mdbStore->NewRow()
m_mdbStore->FindRow()
so I guess those rows are already AddRef'ed, so the static cast doesn't do any damage.

So with this conclusion there are still the issues I repeated in comment #19 from comment #4, that is, the odd behaviour that "new nsMsgHdr" returns an AddRef'ed object where the AddRef happens in AddHdrToUseCache.

OK, writing these comments, I had to look into the issues further, let me come up with another patch. So far I've only refreshed it and made it compile, so now I have to understand the issues :-(
Flags: needinfo?(rkent)

Comment 21

5 months ago
(In reply to Jorg K (GMT+1) from comment #20)
> Looking at the caller of nsMsgDatabase::CreateMsgHdr() where the static cast
> was added, they get the rows from:
> m_mdbStore->GetRow()
> mRowCursor->NextRow(), mRowCursor->PrevRow()
> m_mdbStore->NewRow() and m_mdbStore->NewRowWithOid()
> m_mdbStore->FindRow()
> so I guess those rows are already AddRef'ed, so the static cast doesn't do
> any damage.

OK, just for a bit of homework I looked at all those Mork functions:
morkStore::GetRow() calls morkRow::AcquireRowObject()/morkRow::AcquireRowObject and that does an AddRef() on the row it returns.
morkTableRowCursor::PrevRow() and morkTableRowCursor::NextRow() also run the acquire, same for morkStore::NewRowWithOid() and morkStore::NewRow(). And also morkStore::FindRow().

Comment 22

5 months ago
I spent a lot of time trying to understand the issues here:

1) Why is it safe to do that static cast, answer in comment #21.
   I added a comment to that effect.
2) Added NS_ENSURE_SUCCESS(rv, rv); after InitRefHash(); as Kent had requested.
3) Remove unused function nsMsgDatabase::GetMsgHdrForSubject()
Also:
4) Shifted the NS_ADDREF(hdr); from nsMsgDatabase::AddHdrToUseCache()
   to a NS_ADDREF(*result = msgHdr) in nsMsgDatabase::CreateMsgHdr().

In the end I tested with the patch and at first it crashed at shutdown, upon further testing it crashed after visiting a few folders, local folders, news folders, IMAP folders. The crash is always at:
morkRowObject.cpp#572 - MORK_ASSERT(row->mRow_Object == this); with an access violation.

Row is not null, but row->mRow_Object can't be accessed. Stack:
>	xul.dll!morkRowObject::CloseRowObject(morkEnv * ev) Line 572	C++
 	xul.dll!morkRowObject::~morkRowObject() Line 61	C++
 	[External Code]	
 	xul.dll!morkObject::Release() Line 35	C++
 	xul.dll!morkRowObject::Release() Line 88	C++
 	xul.dll!nsMsgHdr::~nsMsgHdr() Line 108	C++
 	[External Code]	
 	xul.dll!nsMsgHdr::Release() Line 18	C++

I've also undone 4) and it still blows up big time.

Form comment #8:
> the patch just needs polishing
Haha, that was just wishful thinking. And debugging it now that the original author is not available, won't be much fun at all.

Aceman, can you apply version 2.1 locally and bash some folders and see whether it crashes for you.
Flags: needinfo?(acelists)

Comment 23

5 months ago
Yes, I get the same crash:
#04: morkRowObject::CloseRowObject(morkEnv*) (TB-hg/db/mork/src/morkRowObject.cpp:572)
#05: morkRowObject::~morkRowObject() (TB-hg/db/mork/src/morkNode.h:254)
#06: morkRowObject::~morkRowObject() (TB-hg/db/mork/src/morkRowObject.cpp:63)
#07: morkObject::Release() (TB-hg/db/mork/src/morkObject.cpp:35 (discriminator 5))
#08: morkRowObject::Release() (TB-hg/db/mork/src/morkRowObject.cpp:88)
#09: nsMsgHdr::~nsMsgHdr() (TB-hg/tbird-bin/dist/include/nsTArray.h:397)
#10: nsMsgHdr::~nsMsgHdr() (TB-hg/mailnews/db/msgdb/src/nsMsgHdr.cpp:108)
#11: nsMsgHdr::Release() (TB-hg/mailnews/db/msgdb/src/nsMsgHdr.cpp:18 (discriminator 5))
Flags: needinfo?(acelists)

Comment 24

4 months ago
Comment on attachment 8839830 [details] [diff] [review]
853881-kill-pldhash.patch (v2.1)

This is an obsolete r? from an earlier patch.

Looking at the crash issues associated with the row, really the attempted improvements in row refcount handling is not really related to the core of this patch, which is to remove the pldhash stuff. We should really just return that row handling to its earlier state and try to land the majority of the patch.

On other issues with the patch, I review the older patch and saw the same issues I commented on, but they are not all that important compared to just getting this landed.
Attachment #8839830 - Flags: review?(rkent)

Comment 25

4 months ago
Created attachment 8860312 [details] [diff] [review]
853881-kill-pldhash.patch (v2.1) - refreshed on Windows

Same as (2.1) but refreshed on Windows.
Attachment #8839830 - Attachment is obsolete: true

Comment 26

4 months ago
Created attachment 8860334 [details] [diff] [review]
853881-kill-pldhash.patch (v3).

OK, this doesn't crash any more.

As Kent suggested, I reverted some of the ref-counting business.

Also addressed here:
- removed nsMsgDatabase::GetMsgHdrForSubject()
- Added NS_ENSURE_SUCCESS(rv, rv) after InitRefHash().
- Shifted Addref from AddHdrToUseCache() to GetHdrFromUseCache()
  for more standard behaviour as requested by Kent.

I hope the interdiff will show the changes.

I guess this is good to go now after a try run.

Comment 27

4 months ago
Yep, interdiff is our friend here and shows exactly what I mentioned:
https://bugzilla.mozilla.org/attachment.cgi?oldid=8860312&action=interdiff&newid=8860334&headers=1

Let's give this a good workout:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a835ed3abc23d3ea8a2304dd09531bb5d12e1b6b

Oops, Mac didn't compile due to bug 1358444, so here's another one for Mac:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=c99ae0b96aaf675ce033c52819fdf28fb16f874a

BTW, landing that on try gave me the opportunity to look over the changes in a different format than "Splinter review", so I think they look good now.

Comment 28

4 months ago
Mac still didn't compile, so here another one:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=1f22ce277fd2c30c0cea4124f6948bcef0722817

Comment 29

4 months ago
Mac still didn't compile, so here another one:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=aa9bafa6fb5f1aa7f351057c24d9ed7617cfd7b8

Windows + Linux Opt looking good so far ;-)
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a835ed3abc23d3ea8a2304dd09531bb5d12e1b6b

Comment 30

4 months ago
Created attachment 8860382 [details] [diff] [review]
1358444-comma2.patch (v4).

More. This time some real beauty in MIME.

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=84483de178bf0d6655f13a3a759ab1858efd32f2
Attachment #8860334 - Attachment is obsolete: true

Comment 31

4 months ago
Comment on attachment 8860382 [details] [diff] [review]
1358444-comma2.patch (v4).

Oops, attached to wrong bug, the try is still relevant.
Attachment #8860382 - Attachment is obsolete: true

Updated

4 months ago
Attachment #8860334 - Attachment is obsolete: false

Comment 32

4 months ago
Created attachment 8860569 [details] [diff] [review]
853881-kill-pldhash.patch (v4) [landed in comment #34]

Looking at this some more. New try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a28e9f8022faa8ab7167294560c099cb4069355b

Comment 33

4 months ago
I'm going to land (v4) in the next comment.

For the record:

(v4) has the original code:

 nsMsgHdr::~nsMsgHdr()
 {
   if (m_mdbRow)
   {
     if (m_mdb)
     {
       NS_RELEASE(m_mdbRow);
       m_mdb->RemoveHdrFromUseCache(this, m_messageKey);
     }
   }
   NS_IF_RELEASE(m_mdb);
 }

(v3) had:

 nsMsgHdr::~nsMsgHdr()
 {
   if (m_mdb)
     m_mdb->RemoveHdrFromUseCache(this, m_messageKey);
   NS_IF_RELEASE(m_mdb);
 }

(v3) and (v4) do not crash :-), but it appears that (v3) does less free'ing.

The following (v3-crash) crashes in exactly the same way (v2.1) crashes, see comment #22 and comment #23. 'row' was basically free'ed already when accessing 'row->mRow_Object':
https://dxr.mozilla.org/comm-central/rev/5e4e889f13eb1fd5091f60921a1566c661f2c630/db/mork/src/morkRowObject.cpp#572
So a case of double-free:

 nsMsgHdr::~nsMsgHdr()
 {
   NS_IF_RELEASE(m_mdbRow); <==== added
   if (m_mdb)
     m_mdb->RemoveHdrFromUseCache(this, m_messageKey);
   NS_IF_RELEASE(m_mdb);
 }

Aha!!

Let's look again at (2.1). That has a tricky ref-counting scheme where 
-    nsIMdbRow     *m_mdbRow;
+    nsCOMPtr<nsIMdbRow> m_mdbRow;
m_mdbRow is ref counted, this is apparently the same as (v3-crash) since m_mdbRow will be released when ref-counted down in nsMsgHdr::~nsMsgHdr(), only that it's already been free'ed.

So Kent, with this research, do you understand the situation?

Can you make sense of the code in nsMsgHdr::~nsMsgHdr()? I went for the non-crashing original version. In fact, like this, nsMsgHdr.cpp/nsMsgHdr.h don't change at all in this patch, apart from some type adjustments.
Flags: needinfo?(rkent)

Comment 34

4 months ago
https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 55.0

Updated

4 months ago
Attachment #8860569 - Attachment description: 853881-kill-pldhash.patch (v4). → 853881-kill-pldhash.patch (v4) [landed in comment #34]

Comment 35

4 months ago
Hmm, after this push I see Mozmill failures on Windows, but not Linux or Mac (Mac has other failures now, bug 1358711).

So another try with this backed out:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=91e16cded787765b69b00208cb161400db74f669

Looks like the crash happens in:
https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1492843819/comm-central_win7_ix-debug_test-mozmill-bm112-tests1-windows-build19.txt.gz
TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test_selection_extension

I'll run it locally.

Comment 36

4 months ago
Test passes locally, so let's see what the next Daily does.

Comment 37

4 months ago
The Daily run had the same Mozmill failure,
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=62bfaab2731fc47ee7ed26aefa001e63c4f2c371&selectedJob=93529413

the try with this backed out was green.
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=91e16cded787765b69b00208cb161400db74f669

I have another try on Windows which is green:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a835ed3abc23d3ea8a2304dd09531bb5d12e1b6b
but that tried (v3), for the difference see comment #33.

Aceman, can you make any sense of this?
Flags: needinfo?(acelists)

Comment 38

4 months ago
I only see a failure in mozmill\folder-display\test-mail-views.js | test_verify_saved_mail_view .
The test-selection.js isn't even in the log. Is TB crashing before that test?
Flags: needinfo?(acelists)

Comment 39

4 months ago
Hmm, are we looking at the same log?

https://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-22-03-02-07-comm-central/comm-central_win7_ix_test-mozmill-bm126-tests1-windows-build18.txt.gz

04:27:07     INFO -  TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-mail-views.js | test_verify_saved_mail_view
04:27:07     INFO -  Timeout: bridge.execFunction("1d7ea19e-274e-11e7-9660-002590c5f2bf", bridge.registry["{c5fef04f-61af-4bb1-8e21-092fdea3ac65}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"])
04:27:07  WARNING -  TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed

Well, I was looking in

https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1492843819/comm-central_win7_ix-debug_test-mozmill-bm112-tests1-windows-build19.txt.gz

01:44:07     INFO -  TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test_selection_extension
(snip)
01:46:08     INFO -  mozcrash INFO | Copy/paste: folder-display C:\slave\test\build\tests\mozmill\mozmillprofile\minidumps\116ffb50-fac0-4846-9afe-d3ddfb5a5cc9.dmp C:\slave\test\build\symbols
01:46:08     INFO -  Timeout: bridge.execFunction("e0b8bc8f-2736-11e7-951f-002590c5411b", bridge.registry["{aea58b0f-faba-4ec6-b59d-f144899822e9}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"])
01:46:08  WARNING -  TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed

But there is also:

https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32/1492843819/comm-central_win7_ix_test-mozmill-bm112-tests1-windows-build16.txt.gz

01:29:56     INFO -  TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-message-commands-on-msgstore.js | test_mark_messages_replied
01:29:56     INFO -  Timeout: bridge.execFunction("5f7802e1-2735-11e7-af6e-002590c669e3", bridge.registry["{90d5f1a7-2a4b-406e-b1b0-e914e62050ff}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"])
01:29:56  WARNING -  TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed

Great, so the crash is moving around.

Does the code in comment #33 make any sense to you? You've worked on aspects of the database.
Flags: needinfo?(acelists)

Comment 40

4 months ago
I can't see any of those failures locally, even running whole folder-display folder passes fine (on Linux). So it's not sure this patch caused the problem.

I'm not familiar with the code in question. I'd think jcranmer would know best as the author of the rework.
Flags: needinfo?(acelists)

Comment 41

4 months ago
I ran the whole folder with
  mozmake SOLO_TEST=folder-display mozmill-one
and I do see the crash:

[5224] ###!!! ASSERTION: row->mRow_Object == this: 'Error', file c:/mozilla-source/comm-central/db/mork/src/morkConfig.cpp, line 24
#01: morkRowObject::CloseRowObject (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:573)
#02: morkRowObject::~morkRowObject (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:61)
#03: morkRowObject::`scalar deleting destructor'[c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\dist\bin\xul.dll +0x77cd24]
#04: morkObject::Release (c:\mozilla-source\comm-central\db\mork\src\morkobject.cpp:35)
#05: morkRowObject::Release (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:88)
#06: nsMsgHdr::~nsMsgHdr (c:\mozilla-source\comm-central\mailnews\db\msgdb\src\nsmsghdr.cpp:111)
#07: nsMsgHdr::`scalar deleting destructor'[c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\dist\bin\xul.dll +0xa0f6ec]
#08: nsMsgHdr::Release (c:\mozilla-source\comm-central\mailnews\db\msgdb\src\nsmsghdr.cpp:18)
#09: ReleaseObjects (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nscomarray.cpp:272)
#10: nsCOMArray_base::Clear (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nscomarray.cpp:282)
#11: nsArrayBase::~nsArrayBase (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nsarray.cpp:41)
#12: nsArrayCC::DeleteCycleCollectable (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nsarray.cpp:50)
#13: SnowWhiteKiller::~SnowWhiteKiller (c:\mozilla-source\comm-central\mozilla\xpcom\base\nscyclecollector.cpp:2656)
#14: nsCycleCollector::FreeSnowWhite (c:\mozilla-source\comm-central\mozilla\xpcom\base\nscyclecollector.cpp:2840)
#15: AsyncFreeSnowWhite::Run (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcjsruntime.cpp:147)
#16: nsThread::ProcessNextEvent (c:\mozilla-source\comm-central\mozilla\xpcom\threads\nsthread.cpp:1271)
#17: XPTC__InvokebyIndex (c:\mozilla-source\comm-central\mozilla\xpcom\reflect\xptcall\md\win32\xptcinvoke_asm_x86_64.asm:99)
#18: CallMethodHelper::Call (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednative.cpp:1331)
#19: XPCWrappedNative::CallMethod (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednative.cpp:1296)
#20: XPC_WN_CallMethod (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednativejsops.cpp:983)

It's the crash we already know. At least now I can try some options locally.

Comment 42

4 months ago
https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371 (comment #34)
https://hg.mozilla.org/comm-central/rev/5c3191edbbd4fe552450b582d520ba9be4cd431b

The follow-up takes us to (v3) or (v2.1) as far as the DTOR of nsMsgHdr is concerned. That should avoid crashes due to double-freeing the same DB row twice. Gotta love Mork :-(

Comment 43

4 months ago
Created attachment 8860683 [details] [diff] [review]
853881-follow-up.patch [landed in comment #42]

Comment 44

4 months ago
"Can you make sense of the code in nsMsgHdr::~nsMsgHdr()?"

I've tried to understand the mork reference counting before with little success. It was all converted from very early Mozilla code, and (as we say here) trying to "fix" things usually causes more problems than it solves.
Flags: needinfo?(rkent)
Depends on: 1360873
You need to log in before you can comment on or make changes to this bug.