+++ This bug was initially created as a clone of Bug #704984 +++ See bug 704984 comment 25 and other comments in that bug. See bug 711786 for discussion on whether it is allowed to call nsISocketTransport::IsAlive off the socket transport thread. I looked at the code in nsIMAPProtocol that calls IsAlive(). It is very similar, at least in concept, to the HTTP connection manager. There are many IMAP connections open to the same server and this code is looking for one that is alive and that isn't busy doing something else, to send the IMAP "request" to. In HTTP, this happens all in the socket transport thread; in IMAP, this happens on the main thread. To me, it seems very wrong that the IMAP code is executing off the socket transport thread. At the same time, I don't have any clue about how much effort it would be to make the IMAP execute on the socket transport thread.
I don't think we can do any fixes in IMAP protocol implementation like this in 2 days before Aurora merge. We need a quick fix. I think the quick fix would be to re-enable nsSocketTransport::IsAlive() be accessible on the main thread too (bug 711786). I'll comment on that bug.
I can try to fix this in the imap code - I'm not convinced that changing nsSocketTransport::IsAlive is easier or safer at this point.
Created attachment 582828 [details] [diff] [review] Making nsSocketTransport::IsAlive concurent-robust for SSL sockets For now leaving w/o an r?. If David comes with a better fix for IMAP, we may obsolete this patch. However, this patch is a more general solution that could fix any other inappropriate calls from any other protocol.
I've got a potential fix coded and am rebuilding so I can test it.
Created attachment 582862 [details] [diff] [review] move isAlive to imap thread This is what I was thinking for the imap code - it moves the check for isAlive to the imap thread, and retries the url if that check fails. This means we detect the socket failure later, which is sub-optimal in the case of all the cached connections going bad at once - in that case, we try the url on each of the cached connections until we find out that each is bad, and then we open a new connection. But in theory, it should work. It's a bit hard to test because it's hard to make all the cached connections to go bad at once.
I've just been running with a breakpoint set in the retry code, and I've hit it a couple times and the retry seemed to work, so this patch works at least somewhat.
Comment on attachment 582862 [details] [diff] [review] move isAlive to imap thread For anyone interested, here soon (an hour or two up) will be a try build containing this patch, so check if it helps fixing the deadlock please: https://firstname.lastname@example.org/ (For reference, the try run: http://hg.mozilla.org/try-comm-central/rev/e281fd419c4a)
Comment on attachment 582862 [details] [diff] [review] move isAlive to imap thread Ok, lets get this out and give it a try.
http://hg.mozilla.org/comm-central/rev/75840841cc21 pushed to trunk. If NSS/Necko decides to keep the ability to call isAlive from the UI thread, then we could think about reverting this, but the deadlock makes the trunk fairly unusable for me.
Checked into aurora: http://hg.mozilla.org/releases/comm-aurora/rev/df426b0ca043
this seems to hang me when switching folders, bug 733731