Closed Bug 758688 Opened 12 years ago Closed 12 years ago

Thunderbird Bloat & Mozmill tests - Assertion failure: !connections[i]->ConnectionReady(), mozStorageService.cpp:852

Categories

(Thunderbird :: General, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 15.0

People

(Reporter: standard8, Assigned: espindola)

References

Details

Attachments

(2 files, 1 obsolete file)

Since bug 711076 landed, we're seeing:

Assertion failure: !connections[i]->ConnectionReady(), at ../../../../mozilla/storage/src/mozStorageService.cpp:852

This is on all our mozmill and bloat tests.

Unfortunately the stack isn't useful:

0  linux-gate.so + 0x424
    eip = 0x0045e424   esp = 0xbfef64e0   ebp = 0xbfef64f8   ebx = 0x00000870
    esi = 0x00000002   edi = 0x00c7cff4   eax = 0x00000000   ecx = 0x00000870
    edx = 0x00000006   efl = 0x00000202
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::storage::Service::Observe [mozStorageService.cpp : 852 + 0x2f]
    eip = 0x01d6b211   esp = 0xbfef6500   ebp = 0xbfef6558
    Found by: previous frame's frame pointer
 2  libxul.so!nsObserverList::NotifyObservers [nsObserverList.cpp : 99 + 0xe]
    eip = 0x023441cc   esp = 0xbfef6560   ebp = 0xbfef6598   ebx = 0x02fa77a0
    esi = 0x00000004   edi = 0xbfef658c
    Found by: call frame info
 3  libxul.so!nsObserverService::NotifyObservers [nsObserverService.cpp : 149 + 0xa]
    eip = 0x023445ff   esp = 0xbfef65a0   ebp = 0xbfef65d8   ebx = 0x02fa77a0
    esi = 0x02a8032e   edi = 0x00000000
    Found by: call frame info
 4  libxul.so!mozilla::ShutdownXPCOM [nsXPComInit.cpp : 585 + 0x10]
    eip = 0x02334e18   esp = 0xbfef65e0   ebp = 0xbfef6668   ebx = 0x02fa77a0
    esi = 0xbfef6628   edi = 0xbfef6638
    Found by: call frame info
 5  libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp : 1103 + 0x7]
    eip = 0x011686ea   esp = 0xbfef6670   ebp = 0xbfef66a8   ebx = 0x02fa77a0
    esi = 0x088dc810   edi = 0xbfef6754
    Found by: call frame info
 6  libxul.so!XREMain::XRE_main [nsAppRunner.cpp : 3864 + 0x12]
    eip = 0x011710ff   esp = 0xbfef66b0   ebp = 0xbfef66f8   ebx = 0x02fa77a0
    esi = 0xbfef6734   edi = 0x00000000
    Found by: call frame info
 7  libxul.so!XRE_main [nsAppRunner.cpp : 3918 + 0xe]
    eip = 0x01171234   esp = 0xbfef6700   ebp = 0xbfef6838   ebx = 0x08057724
    esi = 0xbfef6734   edi = 0xbfef790c
    Found by: call frame info
 8  thunderbird-bin!main [nsMailApp.cpp : 144 + 0x16]
    eip = 0x0804987c   esp = 0xbfef6840   ebp = 0xbfef7938   ebx = 0x08057724
    esi = 0x000000ff   edi = 0xbfef790c
    Found by: call frame info
 9  libc-2.11.so + 0x16bb5
    eip = 0x00afebb6   esp = 0xbfef7940   ebp = 0xbfef79c8
    Found by: previous frame's frame pointer
(In reply to Mark Banner (:standard8) from comment #0)
> This is on all our mozmill and bloat tests.

Correction: all our debug mozmill & bloat builds.
Do you have instructions on how to reproduce this locally?
It would probably be easier if the assertion said what the database name was that's not closed; can that be added to the patch?
I was able to reproduce it.
Assignee: nobody → respindola
Status: NEW → ASSIGNED
It is the connection

var gDBConnection = null;

created by

this.__defineGetter__("DBConn", function() {
  if (gDBConnWithPendingTransaction)
    return gDBConnWithPendingTransaction;

  if (!gDBConnection)
    gDBConnection = getDBConnection();
  gDBConnWithPendingTransaction = gDBConnection;
  gDBConnection.beginTransaction();
  executeSoon(function() {
    gDBConnWithPendingTransaction.commitTransaction();
    gDBConnWithPendingTransaction = null;
  });
  return gDBConnection;
});

That is never being closed.

I don't see any of the usual profile-* or xpcom-* observers in here. Where is it supposed to be closed?

The backtrace that created the db:


0 getDBConnection() ["file:///home/espindola/comm-central/obj-x86_64-unknown-linux-gnu/mozilla/dist/bin/components/imContacts.js":52]
    this = [object BackstagePass @ 0x7f2692702320 (native @ 0x7f269bad0108)]
1 anonymous() ["file:///home/espindola/comm-central/obj-x86_64-unknown-linux-gnu/mozilla/dist/bin/components/imContacts.js":125]
    this = [object BackstagePass @ 0x7f2692702320 (native @ 0x7f269bad0108)]
2 anonymous() ["file:///home/espindola/comm-central/obj-x86_64-unknown-linux-gnu/mozilla/dist/bin/components/imContacts.js":1212]
    this = [object Object]
3 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>
4 anonymous() ["file:///home/espindola/comm-central/obj-x86_64-unknown-linux-gnu/mozilla/dist/bin/components/imCore.js":300]
    this = [object Object]
5 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>
6 anonymous() ["chrome://messenger/content/chat/chat-messenger-overlay.js":850]
    this = [object Object]
7 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>
8 anonymous() ["file:///home/espindola/comm-central/obj-x86_64-unknown-linux-gnu/mozilla/dist/bin/components/msgAsyncPrompter.js":58]
    this = [object Object]
9 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>


I will attach the debugging patch Is used to track this.
cc:ing florian, king of instant messaging in Thunderbird.
Attached patch close the connection (obsolete) — Splinter Review
It is not clear which of the init methods could be used, so I just added the addObserver call in the getter. Let me know if there is a better place.

How are thunderbird patches pushed to try?
Attachment #627358 - Flags: review?(mbanner)
Comment on attachment 627358 [details] [diff] [review]
close the connection

you should really ask florian for a review
Attachment #627358 - Flags: review?(mbanner) → review?(florian)
Comment on attachment 627358 [details] [diff] [review]
close the connection

The code looks good, but I have 2 questions:
- why close and not asyncClose? The documentation says it's illegal to call close on connections where asynchronous statements have been executed.
- why is this needed at all? I've never seen any documentation saying that it's required to call either close or asyncClose on all connections before shutdown, so I (apparently mistakenly) assumed that connections were closed automatically in a destructor after the JS garbage collector has dropped the last reference to the connection (I would expect that to happen at shutdown when the JS scope where the connection is held in a global variable is destroyed).

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #8)

> It is not clear which of the init methods could be used, so I just added the
> addObserver call in the getter. Let me know if there is a better place.

This seems the best place to do it.

> How are thunderbird patches pushed to try?

https://wiki.mozilla.org/Thunderbird/Infrastructure/TryServer
I've just pushed this to try despite the questions as if it succeeds it'll give us a pointer that we're heading in the right direction:

https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=d6b219c7b5e2
(In reply to Florian Quèze from comment #10)
> - why is this needed at all? I've never seen any documentation saying that
> it's required to call either close or asyncClose on all connections before
> shutdown, so I (apparently mistakenly) assumed that connections were closed
> automatically in a destructor after the JS garbage collector has dropped the
> last reference to the connection (I would expect that to happen at shutdown
> when the JS scope where the connection is held in a global variable is
> destroyed).

From Bug 711076 comment 4:
> It is a sanity check. Making sure connections are closed makes sure that all
> relevant statements are finalized and the corresponding threads are shut
> down.
> 
> A thread that sometimes survives too long is the cause of bug 673017 for example.
The try build passed, but failed for in the same way for an xpcshell test, I'll file a separate bug for that.
> The code looks good, but I have 2 questions:
> - why close and not asyncClose? The documentation says it's illegal to call
> close on connections where asynchronous statements have been executed.

There were no async statements in this case, so using just close seemed better. Using asyncClose should work too.

> - why is this needed at all? I've never seen any documentation saying that
> it's required to call either close or asyncClose on all connections before
> shutdown, so I (apparently mistakenly) assumed that connections were closed
> automatically in a destructor after the JS garbage collector has dropped the
> last reference to the connection (I would expect that to happen at shutdown
> when the JS scope where the connection is held in a global variable is
> destroyed).
> 

For regular (non async) connections that could be made to work, but for async ones we have to make sure all pending statements are executed or discarded early in the shutdown process so that we can skip parts of it (including GC) in release builds.
The imContact code seems to be calling createStatement, followed by executeAsync on that statement:

http://hg.mozilla.org/comm-central/annotate/f2729b390e7b/chat/components/src/imContacts.js#l399

According to the interfaces, I believe that executes as async? Although I'm not that familiar with how that module works for us.

If you can do a patch with async, then I think I'd be prepared to r+ it based on the comments earlier in the bug and the fact that florian is out today.
Attached patch use asyncCloseSplinter Review
Attachment #627358 - Attachment is obsolete: true
Attachment #627358 - Flags: review?(florian)
Attachment #627697 - Flags: review?(mbanner)
Comment on attachment 627697 [details] [diff] [review]
use asyncClose

This patch with asyncClose is OK with me if it fixes the unit test failure. (Thanks for fixing this!)

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #14)

> > - why is this needed at all? I've never seen any documentation saying that
> > it's required to call either close or asyncClose on all connections before
> > shutdown, so I (apparently mistakenly) assumed that connections were closed
> > automatically in a destructor after the JS garbage collector has dropped the
> > last reference to the connection (I would expect that to happen at shutdown
> > when the JS scope where the connection is held in a global variable is
> > destroyed).
> > 
> 
> For regular (non async) connections that could be made to work, but for
> async ones we have to make sure all pending statements are executed or
> discarded early in the shutdown process so that we can skip parts of it
> (including GC) in release builds.

Is the requirement for calling close/asyncClose before shutdown something new (with the ongoing work to shutdown release builds with exit and skipping the GC maybe?) or something that always existed and that I missed?
Attachment #627697 - Flags: review+
> Is the requirement for calling close/asyncClose before shutdown something
> new (with the ongoing work to shutdown release builds with exit and skipping
> the GC maybe?) or something that always existed and that I missed?

The close side is new. A missing asyncClose could keep a thread running past xpcom-shutdown-threads.
Attachment #627697 - Flags: review?(mbanner)
https://tbpl.mozilla.org/?tree=Thunderbird-Trunk&rev=f566c0fb497e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Mark Banner (:standard8) from comment #15)
> The imContact code seems to be calling createStatement, followed by
> executeAsync on that statement:

It's worth noting that this creates a _synchronous_ statement and then uses it asynchronously.  The synchronous creation will acquire the DB connection's mutex on the main thread which may potentially have to block for the operations of the async I/O thread because of the use of executeAsync.  createAsyncStatement should be used instead of createStatement if the statement is only being used asynchronously.

Skimming the rest of the file, there are some other issues in the use of mozStorage:
- the magic DBConn getter that attempts to implement transactions is going to race any use of executeAsync.
- executeStep is used without reset (as required by the .idl docs) in a number of places which can make commits return BUSY until GC runs, causing storage code to go into a tight loop, etc.
I don't know if it is related but for several days now (can easily be the 25-05 as standard8 filed this) TB does not exit properly. It seems to crash and waits some time for the debugger (it is a debug build) and then exits. It happens 100% at each close (which I do 100x per day on this test build).

This is 'bt' from gdb on linux:
#0  0xb7553a8c in nanosleep () from /lib/libc.so.6
#1  0xb75538c4 in sleep () from /lib/libc.so.6
#2  0xb4aa0207 in ah_crap_handler (signum=6) at /var/SSD/TB-hg/mozilla/toolkit/xre/nsSigHandlers.cpp:87
#3  0xb4aa4ccc in nsProfileLock::FatalSignalHandler (signo=6, info=0xbfc4a86c, context=0xbfc4a8ec) at /var/SSD/TB-hg/tbird-bin/mozilla/toolkit/profile/nsProfileLock.cpp:190
#4  <signal handler called>
#5  0xb7776749 in raise () from /lib/libpthread.so.0
#6  0xb563299d in Observe (this=0xaf9884c0, aTopic=<optimized out>) at /var/SSD/TB-hg/mozilla/storage/src/mozStorageService.cpp:852
#7  mozilla::storage::Service::Observe (this=0xaf9884c0, aTopic=0xb5e5ec6e "xpcom-shutdown-threads") at /var/SSD/TB-hg/mozilla/storage/src/mozStorageService.cpp:819
#8  0xb5b89bac in nsObserverList::NotifyObservers (this=0xa5a2287c, aSubject=0x0, aTopic=0xb5e5ec6e "xpcom-shutdown-threads", someData=0x0)
    at /var/SSD/TB-hg/mozilla/xpcom/ds/nsObserverList.cpp:99
#9  0xb5b8a493 in NotifyObservers (someData=0x0, aTopic=0xb5e5ec6e "xpcom-shutdown-threads", aSubject=0x0, this=0xb2d1bb50) at /var/SSD/TB-hg/mozilla/xpcom/ds/nsObserverService.cpp:149
#10 nsObserverService::NotifyObservers (this=0xb2d1bb50, aSubject=0x0, aTopic=0xb5e5ec6e "xpcom-shutdown-threads", someData=0x0)
    at /var/SSD/TB-hg/mozilla/xpcom/ds/nsObserverService.cpp:138
#11 0xb5b7a60a in mozilla::ShutdownXPCOM (servMgr=0xb7230704) at /var/SSD/TB-hg/mozilla/xpcom/build/nsXPComInit.cpp:585
#12 0xb4a93bcf in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0xb725f10c, __in_chrg=<optimized out>) at /var/SSD/TB-hg/mozilla/toolkit/xre/nsAppRunner.cpp:1104
#13 0xb4a9b1eb in XREMain::XRE_main (this=0xbfc4ade4, argc=4, argv=0xbfc4c074, aAppData=0xb7201180) at /var/SSD/TB-hg/mozilla/toolkit/xre/nsAppRunner.cpp:3885
#14 0xb4a9b3d1 in XRE_main (argc=4, argv=0xbfc4c074, aAppData=0xb7201180, aFlags=0) at /var/SSD/TB-hg/mozilla/toolkit/xre/nsAppRunner.cpp:3939
#15 0x08049e32 in do_main (argv=0xbfc4c074, argc=4, exePath=0xbfc4afb0 "/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/") at /var/SSD/TB-hg/mail/app/nsMailApp.cpp:111
#16 main (argc=4, argv=0xbfc4c074) at /var/SSD/TB-hg/mail/app/nsMailApp.cpp:200
aceman, are you getting an assertion/abort line in the console just before the hang? (and have you clobbered as well).

If you are, and its the same as you're getting here, I'd suggest you apply attachment 627338 [details] [diff] [review] and see if you can find out what's causing it. We'll probably want to deal with it in a new bug.
I can't answer that. How do I know the assertion?
I can watch the linux console. Will I see the word 'assertion' in the tons of output?
(In reply to :aceman from comment #23)
> I can't answer that. How do I know the assertion?
> I can watch the linux console. Will I see the word 'assertion' in the tons
> of output?

When it hangs, have a look at a few lines from the log just before it hangs - there should be a line there similar to comment 0.
This is several lines before termination (I see 2 assertions in there):

###!!! ASSERTION: Oops!  You're asking for a weak reference to an object that doesn't support that.: 'factoryPtr', file /var/SSD/TB-hg/tbird-bin/mozilla/xpcom/build/nsWeakReference.cpp, line 77
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIObserverService.removeObserver]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: resource://calendar/modules/calUtils.jsm -> file:///var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/calendar-js/calCalendarManager.js :: ccm_shutdown :: line 165"  data: no]
************************************************************
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIObserverService.removeObserver]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: resource://calendar/modules/calUtils.jsm -> file:///var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/calendar-js/calCalendarManager.js :: ccm_shutdown :: line 165"  data: no]
************************************************************
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIObserverService.removeObserver]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: file:///var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/components/steelApplication.js :: app_observe :: line 622"  data: no]
************************************************************
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /var/SSD/TB-hg/mozilla/xpcom/base/nsExceptionService.cpp, line 166
Assertion failure: !connections[i]->ConnectionReady(), at /var/SSD/TB-hg/mozilla/storage/src/mozStorageService.cpp:852

Program ./thunderbird (pid = 20005) received signal 6.
Stack:
__kernel_rt_sigreturn+0x00000000 [linux-gate.so.1 +0x0000040C]
UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0131D604]
UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0131DEEB]
UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0130E0D2]
UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x00226CAF]
UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0022E2CB]
XRE_main+0x000000A7 [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0022E4B1]
UNKNOWN [./thunderbird +0x00001E32]
Sleeping for 300 seconds.
Blocks: 760243
> Assertion failure: !connections[i]->ConnectionReady(), at
> /var/SSD/TB-hg/mozilla/storage/src/mozStorageService.cpp:852
> 
> Program ./thunderbird (pid = 20005) received signal 6.
> Stack:
> __kernel_rt_sigreturn+0x00000000 [linux-gate.so.1 +0x0000040C]
> UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0131D604]
> UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0131DEEB]
> UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0130E0D2]
> UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x00226CAF]
> UNKNOWN [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so +0x0022E2CB]
> XRE_main+0x000000A7 [/var/SSD/TB-hg/tbird-bin/mozilla/dist/bin/libxul.so
> +0x0022E4B1]
> UNKNOWN [./thunderbird +0x00001E32]
> Sleeping for 300 seconds.


That is the one. Can you try patching in 

https://bug758688.bugzilla.mozilla.org/attachment.cgi?id=627338

or posting instructions on how to reproduce the crash?
Thanks to standard8 it was found out that the crash goes away when the Lightning extension is disabled so I filed bug 760243 for it.
Target Milestone: --- → Thunderbird 15.0
You need to log in before you can comment on or make changes to this bug.