Bug 1677202 Comment 16 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Hi,

Thank you for your comment.

(In reply to :aceman from comment #15)
> Comment on attachment 9188190 [details] [diff] [review]
> Try counting the database pointer left in the cache due to early XPCOM
> shutdown and avoid accessing them later.
> 
> Review of attachment 9188190 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for analyzing this tough problem.
> 
> ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
> @@ +83,5 @@
> > +  // some entries cannot be removed even if we tried.
> > +  // So we we count such unremoved entries and ignore them.  Bug 1677202 
> > +  NS_WARNING_ASSERTION(
> > +    !m_dbCache.Length() ||
> > +    m_dbCache.Length() <= unremoved_db_from_cache_due_to_unavailable_XPCOM,
> 
> Does this actually hide the strange "some msg dbs left open" message in some
> cases? Do we just hide it, but the underlying problem of XPCOM shutdown
> still exists?
> 

It avoids the "some msg dbs left open" messages ALL THE TIME in my local testing and past try server runs.
More importantly, now it avoids touching invalid pointers and thus avoids the assertion abort,
and we can now print the offending DB's pathname. (See my comment on pathname in the next comment section.)

Now, why this matters is this.
In DEBUG build that prints memory leaks at the end had aborted prematurely if this assertion is hit.
We could not obtain memory leak information at all due the abort.
However, with this patch, the DEBUG version could terminate successfully and all the nice dumps produced at exit time is produced in full. (My local log immediately became larger thanks to this.) 

> @@ +119,5 @@
> > +             (void *) pMessageDB);
> > +      fflush(stdout);
> > +      printf("p->m_dbFile=%p, ", (void *) pMessageDB->m_dbFile);
> > +      fflush(stdout);
> > +      printf("%s\n", pMessageDB->m_dbFile->HumanReadablePath().get());
> 
> So we print out the folder path, but is it useful for anything? Will it be
> one of the last accessed folders? But we can't do anything about it.
> 
Yes, right now, we can't do anything. I am printing this in order for anyone to
to figure out the underlying problem of unclosed(?) dbfile at program shutdown.
In each test, where such an unclosed db is left before shutdown, the path name is printed. 
A serious debugger should be able to figure out what folder is left unclosed (and hopefully where it is not closed, etc.).

> @@ +887,5 @@
> > +//
> > +// This does not work. Depending on timing, m_dbCache.Length() accesses already
> > +// released heap.
> > +//
> > +// void nsMsgDBService::RemoveFromCacheDuringShutdown(nsMsgDatabase* pMessageDB){
> 
> Adding commented out code for documentation purposes?

Yes. I tried a few tricks which did not work. Accessing m_dbCache.Length() did not work during SHUTDOWN phase. :-(

> 
> @@ +1137,5 @@
> > +    // comm/mail/test/browser/keyboard/browser_spacehit.js is gone.
> > +    // But the problem with the following test persists.
> > +    // comm/mail/test/browser/multiple-identities/browser_displayNames.js
> > +    //
> > +    // m_mdbEnv->Release();  //??? is this right?
> 
> So when do we release this now if not here?

Hm... I wonder if this ~nsMsgDatabase() is called only once in TB's execution or is called many times.
If it is called many times, I need to see why m_mdbEnv->Release() crashed. But I am afraid that the crash happens only during SHUTDOWN  and we may need to treat this Release() specially. Either we may not call it for now to avoid the mysterious crash() , especially if _nsMsgDatabase() only once (?)

[But I have a suspicion that maybe with the counting hack, this may not abort any more. 
Unfortunately, I cannot test this easily now. For the past few weeks, I  cannot test local repo due to a corruption caused by an HG problem right now. The fix that uses |hg pull|  mentioned in  Bug 1720302 does not work. It does a lot of I/O and the repo seems to become pristine. However, as soon as I try to do something using HG MQUEUE the phantom modification appears and I could not apply changes, etc. :-(
Trying to submit a job is also screwed up due to the same reason. I want to remove unwanted local patches, but any operation that touches the local repo refuses to operate due to the same issue.
Maybe I should try the script mentioned in gist in the same bug, but I am a bit afraid that maybe MQUEUE is not really compatible with the fixes proposed now. Ugh...
Hi,

Thank you for your comment.

(In reply to :aceman from comment #15)
> Comment on attachment 9188190 [details] [diff] [review]
> Try counting the database pointer left in the cache due to early XPCOM
> shutdown and avoid accessing them later.
> 
> Review of attachment 9188190 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for analyzing this tough problem.
> 
>  ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
>  @@ +83,5 @@
> > +  // some entries cannot be removed even if we tried.
> > +  // So we we count such unremoved entries and ignore them.  Bug 1677202 
> > +  NS_WARNING_ASSERTION(
> > +    !m_dbCache.Length() ||
> > +    m_dbCache.Length() <= unremoved_db_from_cache_due_to_unavailable_XPCOM,
> 
> Does this actually hide the strange "some msg dbs left open" message in some
> cases? Do we just hide it, but the underlying problem of XPCOM shutdown
> still exists?
> 

It avoids the "some msg dbs left open" messages ALL THE TIME in my local testing and past try server runs.
More importantly, now it avoids touching invalid pointers and thus avoids the assertion abort,
and we can now print the offending DB's pathname. (See my comment on pathname in the next comment section.)

Now, why this matters is this.
In DEBUG build that prints memory leaks at the end had aborted prematurely if this assertion is hit.
We could not obtain memory leak information at all due the abort.
However, with this patch, the DEBUG version could terminate successfully and all the nice dumps produced at exit time is produced in full. (My local log immediately became larger thanks to this.) 

>  @@ +119,5 @@
> > +             (void *) pMessageDB);
> > +      fflush(stdout);
> > +      printf("p->m_dbFile=%p, ", (void *) pMessageDB->m_dbFile);
> > +      fflush(stdout);
> > +      printf("%s\n", pMessageDB->m_dbFile->HumanReadablePath().get());
> 
> So we print out the folder path, but is it useful for anything? Will it be
> one of the last accessed folders? But we can't do anything about it.
> 
Yes, right now, we can't do anything. I am printing this in order for anyone to
to figure out the underlying problem of unclosed(?) dbfile at program shutdown.
In each test, where such an unclosed db is left before shutdown, the path name is printed. 
A serious debugger should be able to figure out what folder is left unclosed (and hopefully where it is not closed, etc.).

> @@ +887,5 @@
> > +//
> > +// This does not work. Depending on timing, m_dbCache.Length() accesses already
> > +// released heap.
> > +//
> > +// void nsMsgDBService::RemoveFromCacheDuringShutdown(nsMsgDatabase* pMessageDB){
> 
> Adding commented out code for documentation purposes?

Yes. I tried a few tricks which did not work. Accessing m_dbCache.Length() did not work during SHUTDOWN phase. :-(

> 
>  @@ +1137,5 @@
> > +    // comm/mail/test/browser/keyboard/browser_spacehit.js is gone.
> > +    // But the problem with the following test persists.
> > +    // comm/mail/test/browser/multiple-identities/browser_displayNames.js
> > +    //
> > +    // m_mdbEnv->Release();  //??? is this right?
> 
> So when do we release this now if not here?

Hm... I wonder if this ~nsMsgDatabase() is called only once in TB's execution or is called many times.
If it is called many times, I need to see why m_mdbEnv->Release() crashed. But I am afraid that the crash happens only during SHUTDOWN  and we may need to treat this Release() specially. Either we may not call it for now to avoid the mysterious crash() , especially if _nsMsgDatabase() only once (?)

[But I have a suspicion that maybe with the counting hack, this may not abort any more. 
Unfortunately, I cannot test this easily now. For the past few weeks, I  cannot test local repo due to a corruption caused by an HG problem right now. The fix that uses |hg pull|  mentioned in  Bug 1720302 does not work. It does a lot of I/O and the repo seems to become pristine. However, as soon as I try to do something using HG MQUEUE the phantom modification appears and I could not apply changes, etc. :-(
Trying to submit a job is also screwed up due to the same reason. I want to remove unwanted local patches, but any operation that touches the local repo refuses to operate due to the same issue.
Maybe I should try the script mentioned in gist in the same bug, but I am a bit afraid that maybe MQUEUE is not really compatible with the fixes proposed now. Ugh...

Back to Bug 1677202 Comment 16