Closed Bug 711787 Opened 13 years ago Closed 13 years ago

Deadlock in nsImapProtocol::CanHandleUrl caused by calling nsISocketTransport::IsAlive off the socket transport thread

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
All
defect
Not set
critical

Tracking

(thunderbird11+ fixed)

RESOLVED FIXED
Thunderbird 12.0
Tracking Status
thunderbird11 + fixed

People

(Reporter: briansmith, Assigned: Bienvenu)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #704984 +++

See bug 704984 comment 25 and other comments in that bug.

See bug 711786 for discussion on whether it is allowed to call nsISocketTransport::IsAlive off the socket transport thread.

I looked at the code in nsIMAPProtocol that calls IsAlive(). It is very similar, at least in concept, to the HTTP connection manager. There are many IMAP connections open to the same server and this code is looking for one that is alive and that isn't busy doing something else, to send the IMAP "request" to.

In HTTP, this happens all in the socket transport thread; in IMAP, this happens on the main thread.

To me, it seems very wrong that the IMAP code is executing off the socket transport thread. At the same time, I don't have any clue about how much effort it would be to make the IMAP execute on the socket transport thread.
I don't think we can do any fixes in IMAP protocol implementation like this in 2 days before Aurora merge.

We need a quick fix.

I think the quick fix would be to re-enable nsSocketTransport::IsAlive() be accessible on the main thread too (bug 711786).  I'll comment on that bug.
I can try to fix this in the imap code - I'm not convinced that changing nsSocketTransport::IsAlive is easier or safer at this point.
Assignee: nobody → dbienvenu
For now leaving w/o an r?.  If David comes with a better fix for IMAP, we may obsolete this patch.

However, this patch is a more general solution that could fix any other inappropriate calls from any other protocol.
I've got a potential fix coded and am rebuilding so I can test it.
This is what I was thinking for the imap code - it moves the check for isAlive to the imap thread, and retries the url if that check fails. This means we detect the socket failure later, which is sub-optimal in the case of all the cached connections going bad at once - in that case, we try the url on each of the cached connections until we find out that each is bad, and then we open a new connection. But in theory, it should work. It's a bit hard to test because it's hard to make all the cached connections to go bad at once.
I've just been running with a breakpoint set in the retry code, and I've hit it a couple times and the retry seemed to work, so this patch works at least somewhat.
Comment on attachment 582862 [details] [diff] [review]
move isAlive to imap thread

For anyone interested, here soon (an hour or two up) will be a try build containing this patch, so check if it helps fixing the deadlock please:

https://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/honzab.moz@firemni.cz-e281fd419c4a/


(For reference, the try run: http://hg.mozilla.org/try-comm-central/rev/e281fd419c4a)
Attachment #582862 - Flags: review?(mbanner)
Blocks: 713253
Comment on attachment 582862 [details] [diff] [review]
move isAlive to imap thread

Ok, lets get this out and give it a try.
Attachment #582862 - Flags: review?(mbanner) → review+
http://hg.mozilla.org/comm-central/rev/75840841cc21 pushed to trunk. If NSS/Necko decides to keep the ability to call isAlive from the UI thread, then we could think about reverting this, but the deadlock makes the trunk fairly unusable for me.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #582862 - Flags: approval-comm-aurora?
Attachment #582862 - Flags: approval-comm-aurora? → approval-comm-aurora+
Checked into aurora:

http://hg.mozilla.org/releases/comm-aurora/rev/df426b0ca043
Target Milestone: --- → Thunderbird 12.0
Depends on: 733731
this seems to hang me when switching folders, bug 733731
No longer depends on: 733731
Regressions: 733731
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: