Closed Bug 312775 (imapdeadlock) Opened 20 years ago Closed 19 years ago

deadlock between the ui thread and the imap code is causing a hang

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Bienvenu, Assigned: Bienvenu)

References

Details

(Keywords: fixed1.8.1)

Attachments

(13 files, 2 obsolete files)

proposed fix 20 years ago David :Bienvenu 1.73 KB, patch	mscott : superreview+ mscott : approval1.8rc1+	Details \| Diff \| Splinter Review
additional patch to try 20 years ago David :Bienvenu 2.05 KB, patch		Details \| Diff \| Splinter Review
Another backtrace 19 years ago Kai Engert [:KaiE:] 15.17 KB, text/plain		Details
trunk patch 19 years ago David :Bienvenu 4.13 KB, patch		Details \| Diff \| Splinter Review
New backtrace 19 years ago Kai Engert [:KaiE:] 15.72 KB, text/plain		Details
remove monitor in CreateNewLIneFromSocket 19 years ago David :Bienvenu 1.22 KB, patch		Details \| Diff \| Splinter Review
backtrace 2006-02-14 19 years ago Kai Engert [:KaiE:] 21.16 KB, patch		Details \| Diff \| Splinter Review
backtrace 2006-02-14 second 19 years ago Kai Engert [:KaiE:] 16.55 KB, text/plain		Details
don't make IDLE grab the monitor 19 years ago David :Bienvenu 767 bytes, patch		Details \| Diff \| Splinter Review
fix that doesn't regress exiting 19 years ago David :Bienvenu 12.14 KB, patch		Details \| Diff \| Splinter Review
account manager part of change 19 years ago David :Bienvenu 786 bytes, patch		Details \| Diff \| Splinter Review
Patch: Combines the previous 2 19 years ago Kai Engert [:KaiE:] 7.90 KB, patch	mscott : superreview+	Details \| Diff \| Splinter Review
backtrace 2006-02-24 19 years ago Kai Engert [:KaiE:] 14.64 KB, text/plain		Details
backtrace 2006-03-14 19 years ago Kai Engert [:KaiE:] 14.48 KB, text/plain		Details
Backtrace 2006-05-18 19 years ago Dave Miller [:justdave] 7.08 KB, text/plain		Details

David :Bienvenu

Assignee

Description

•

20 years ago

Neil is reporting a hang in the imap code when his laptop gets out of range of his wireless network. The stack traces are roughly as follows: NTDLL! 77f88f13() NTDLL! 77f87f26() nsImapIncomingServer::ConnectionTimeOut(nsImapIncomingServer * const 0x01010101, nsIImapProtocol * 0x064dd070) line 671 + 21 bytes nsImapIncomingServer::GetImapConnection(nsImapIncomingServer * const 0x01010101, nsIEventQueue * 0x00f07f10, nsIImapUrl * 0x032f0470, nsIImapProtocol * * 0x0012fd6c) line 738 + 10 bytes nsImapIncomingServer::RetryUrl(nsImapIncomingServer * const 0x04bfbebc, nsIImapUrl * 0x032f0470) line 485 + 40 bytes XPTC_InvokeByIndex(nsISupports * 0x04bfbebc, unsigned int 16, unsigned int 1, nsXPTCVariant * 0x032f40a8) line 102 EventHandler(PLEvent * 0x07671028) line 562 + 22 bytes PL_HandleEvent(PLEvent * 0x07671028) line 689 PL_ProcessPendingEvents(PLEventQueue * 0x10044af9) line 624 _md_TimerProc(HWND__ * 0x002a00e0, unsigned int 275, unsigned int 0, unsigned long 654316727) line 1013 + 6 bytes USER32! 77e4158f() USER32! 77e3c01a() USER32! 77e41e7e() nsAppStartup::Run(nsAppStartup * const 0x016f0188) line 207 + 48 bytes main1(int 0, char * * 0x00264398, nsISupports * 0x00000000) line 1251 main(int 1, char * * 0x00264398) line 1736 + 22 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 7c598989() and the other thread with IMAP in the stack: NTDLL! 77f88f13() NTDLL! 77f87f26() nsImapProtocol::TellThreadToDie(nsImapProtocol * const 0x00000000, int 0) line 1039 nsImapProtocol::CreateNewLineFromSocket(nsImapProtocol * const 0x7ffd5000) line 4294 nsImapServerResponseParser::GetNextLineForParser(nsImapServerResponseParser * const 0x7ffd5000, char * * 0x064dd1fc) line 130 + 12 bytes nsIMAPGenericParser::AdvanceToNextLine(nsIMAPGenericParser * const 0x7ffd5000) line 204 nsImapServerResponseParser::ParseIMAPServerResponse(nsImapServerResponseParser * const 0x7ffd5000, const char * 0x00000000, int 1162626121) line 244 4d4c544e() [

David :Bienvenu

Assignee

Comment 1

•

20 years ago

Attached patch proposed fix — Details — Splinter Review

Neil, can you try this patch out? I've removed the acquiring of the server monitor in nsImapIncomingServer::ConnectionTimeOut - it was only being used to protect the connection cache, I believe. I switched to using RemoveConnection(), which grabs the server monitor for a much shorter time.

David :Bienvenu

Assignee

Updated

•

20 years ago

Attachment #199885 - Flags: superreview?(mscott)

David :Bienvenu

Assignee

Comment 2

•

20 years ago

Scott, can you try running with this patch as well? If it fixes Neil's problem, I'm going to want to land this on the branch as well (Neil says the problem occurs on the branch as well)

Status: NEW → ASSIGNED

Scott MacGregor

Comment 3

•

20 years ago

will do.

neil@parkwaycc.co.uk

Comment 4

•

20 years ago

I'm not actually sure how long a dropout I need to trigger this bug (my biff check interval is one minute), but I had one overnight and one while I was away from the PC each for at least two minutes, with no ill effects with this patch.

David :Bienvenu

Assignee

Comment 5

•

20 years ago

Great, thx, Neil. Scott, can you sr this so I can get it into the trunk - should I just check it into the branch as well, at the same time?

Scott MacGregor

Updated

•

20 years ago

Attachment #199885 - Flags: superreview?(mscott)

Attachment #199885 - Flags: superreview+

Attachment #199885 - Flags: approval1.8rc1+

David :Bienvenu

Assignee

Comment 6

•

20 years ago

fixed on trunk.

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

neil@parkwaycc.co.uk

Comment 7

•

20 years ago

Sorry, my patched build hung this morning, same stacks :-(

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

David :Bienvenu

Assignee

Comment 8

•

20 years ago

Attached patch additional patch to try — Details — Splinter Review

Neil, can you try this patch as well? The previous patch was checked in. With this patch on top of that one, I would at least expect the stack trace to change... the change is to only set the m_lastActiveTime from the UI thread, so we don't need a monitor, and thus can't get blocked. But I suspect we might see a stack trace with CanHandleUrl on it instead.

neil@parkwaycc.co.uk

Comment 9

•

20 years ago

Comment on attachment 200459 [details] [diff] [review] additional patch to try Sorry for the delay, but my PC crashed 20 times in the last month, so it's only now that I'm sure that this attachment fixes the given stack. I know this because I've now got a new hang ;-) Stack trace is: NTDLL! 77f88f13() NTDLL! 77f87f26() nsImapIncomingServer::GetImapConnection(nsImapIncomingServer * const 0x042482a8, nsIEventQueue * 0x00ea4f10, nsIImapUrl * 0x041b48d8, nsIImapProtocol * * 0x0012fd6c) line 744 + 50 bytes nsImapIncomingServer::RetryUrl(nsImapIncomingServer * const 0x02f2236c, nsIImapUrl * 0x041b48d8) line 485 + 40 bytes XPTC_InvokeByIndex(nsISupports * 0x02f2236c, unsigned int 16, unsigned int 1, nsXPTCVariant * 0x04221100) line 102 EventHandler(PLEvent * 0x032640c8) line 562 + 22 bytes PL_HandleEvent(PLEvent * 0x032640c8) line 689 PL_ProcessPendingEvents(PLEventQueue * 0x10044af9) line 624 _md_TimerProc(HWND__ * 0x00010088, unsigned int 275, unsigned int 0, unsigned long 28605342) line 1013 + 6 bytes USER32! 77e4158f() USER32! 77e3c01a() USER32! 77e41e7e() nsAppStartup::Run(nsAppStartup * const 0x01a7c378) line 207 + 48 bytes main1(int 0, char * * 0x00262638, nsISupports * 0x00000000) line 1251 main(int 3, char * * 0x00262638) line 1736 + 22 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 7c598989() The line of code in GetImapConnection is rv = connection->CanHandleUrl(aImapUrl, &canRunUrlImmediately, &canRunButBusy); The IMAP thread is still TellThreadToDie() trying to call CloseStreams().

David :Bienvenu

Assignee

Comment 10

•

19 years ago

the second patch is on the 1.8.1 branch, as well as on the trunk.

David :Bienvenu

Assignee

Comment 11

•

19 years ago

*** Bug 326314 has been marked as a duplicate of this bug. ***

Kai Engert [:KaiE:]

Updated

•

19 years ago

Alias: imapdeadlock

OS: Windows XP → All

Kai Engert [:KaiE:]

Comment 12

•

19 years ago

Attached file Another backtrace — Details

Just ran into this again and thought I'd post another detailed backtrace.

David :Bienvenu

Assignee

Comment 13

•

19 years ago

is it possible that backtrace is missing a thread? I don't see any two threads blocked on the same monitor - with Neil's stacks, I do, so I'll try to work on his case.

David :Bienvenu

Assignee

Comment 14

•

19 years ago

nsImapProtocol::TellThreadToDie grabs an AutoMonitor on itself first thing, and holds onto it until it returns. It causes to be called functions that need a monitor on the server (nsImapIncomingServer::RemoveConnection, in particular). Meanwhile, functions that have a monitor on the server call functions that require a monitor on the protocol object. nsImapIncomingServer::GetImapConnection has a monitor on itself, and then iterates over the protocol objects calling ::CanHandleUrl, which grabs a monitor on the protocol objects. Deadly embrace-a-gogo. Lesson: nsCAutoMonitor considered harmful, and we need to limit the scope of the AutoMonitors.

David :Bienvenu

Assignee

Comment 15

•

19 years ago

Attached patch trunk patch — Details — Splinter Review

Kai, this is a patch against the tip of the trunk that might help - please try it out and let me know...I've run with it for a bit and things seem to work ok, but I was never able to reproduce the problem. There are more changes I can make, but this one is easier and safer than the other changes I have in mind.

Kai Engert [:KaiE:]

Comment 16

•

19 years ago

> is it possible that backtrace is missing a thread? I don't see any two threads > blocked on the same monitor - with Neil's stacks, I do, so I'll try to work on > his case. Maybe I missed one. Note the stacktrace in the other bug 326314 I filed had the threads blocked on the same monitor. > Kai, this is a patch against the tip of the trunk that might help - please try > it out and let me know... I'm currently working with the 1.8 branch, but your patch applies there, too! I just rebuilt and will use it over the next days. Today I ran into the problem only once, so we'll see how soon I'm confident this patch helps.

Kai Engert [:KaiE:]

Comment 17

•

19 years ago

No, this patch does not fix it. At least it does not on the 1.8 branch. David, does it make sense that I try a trunk build with this patch applied?

David :Bienvenu

Assignee

Comment 18

•

19 years ago

either the trunk or the 1.8.1 branch - the 1.8 branch is missing the second patch above, so another option would be to try applying that patch as well to the 1.8 branch. Whichever way you go, if it still happens with all the patches applied, if you could attach backtraces, I can continue from there. Thx!

Kai Engert [:KaiE:]

Comment 19

•

19 years ago

I was wrong. I made a mistake, and when I posted my previous comment, my build did not yet contain your patch! Now I'm sure, I'm running a MOZILLA_1_8_BRANCH build, and it contains all the patches attached to this bug. I will report once I know more.

Kai Engert [:KaiE:]

Comment 20

•

19 years ago

Attached file New backtrace — Details

It just happened again, here are new stacks. This one looks more obvious, 3 threads blocked on same lock.

David :Bienvenu

Assignee

Comment 21

•

19 years ago

Yes, the 3 threads are trying to get the lock on the imap incoming server, and nsImapProtocol::CreateNewLineFromSocket() grabs a monitor on the protocol object before calling TellThreadToDie(). Then nsImapIncomingServer::GetImapConnection has a lock on the server and calls nsImapProtocol::CanHandleUrl, which tries to get a lock on the protocol object. I'm not sre if nsImapProtocol::CreateNewLineFromSocket needs to grab the monitor it does, for as long as it does. I'll try to figure that out...

Status: REOPENED → ASSIGNED

David :Bienvenu

Assignee

Comment 22

•

19 years ago

Attached patch remove monitor in CreateNewLIneFromSocket — Details — Splinter Review

Kai, here's a patch to try, on top of the previous ones. It should apply to the 1.8 branch as well. I've tested it briefly w/o a problem, but I don't hit the changed code very often. Let me know how it works, and attach a new stack trace if it hangs, thx!

Kai Engert [:KaiE:]

Comment 23

•

19 years ago

Yes, your additional patch applies fine.

Kai Engert [:KaiE:]

Comment 24

•

19 years ago

Attached patch backtrace 2006-02-14 — Details — Splinter Review

With your latest patch applied in addition, it ran stable for a while, until I switched my router off, and it hung again. These are new stacks. They are not as obvious to me as the previous ones, hopefully it makes sense to you. I will continue to run and update more stacks.

David :Bienvenu

Assignee

Comment 25

•

19 years ago

we're in Idle() on the same protocol object that's trying to do a pseudo interrupt message load. Idle() is blocked wating for data, and hold a monitor for the protocol object while it does so. I'm not sure how to fix that...I'll have to think about it. Idle() is blocked way down in the pipe code waiting for data that's not going to come...it should get killed because the socket is dead but that doesn't always happen.

Kai Engert [:KaiE:]

Comment 26

•

19 years ago

Attached file backtrace 2006-02-14 second — Details

David :Bienvenu

Assignee

Comment 27

•

19 years ago

Attached patch don't make IDLE grab the monitor — Details — Splinter Review

Kai, here's an additional change to try. It has real potential to re-introduce a hang on shutdown, however - https://bugzilla.mozilla.org/show_bug.cgi?id=246909

David :Bienvenu

Assignee

Comment 28

•

19 years ago

It definitely re-introduces bug 246909 so I'm going to need to figure out how not to regress that - but for now, if you don't have empty trash or cleanup inbox on exit set, you probably won't see bug 246909.

Kai Engert [:KaiE:]

Comment 29

•

19 years ago

> but for now, if you don't have empty trash or cleanup > inbox on exit set I use neither, and with all the patches applied, this build has been running very well until now!!!

David :Bienvenu

Assignee

Comment 30

•

19 years ago

Attached patch fix that doesn't regress exiting (obsolete) — Details — Splinter Review

Kai, can you try this cumulative fix to make sure things still work for you with it? It shouldn't regress bug 246909 because I made it so we won't try to go idle when shutting down, and I think it should still work for you. Scott, I had to add to an interface, but it's not one I would expect any extensions to touch with a 10 foot pole.

Attachment #212783 - Flags: superreview?(mscott)

neil@parkwaycc.co.uk

Comment 31

•

19 years ago

Comment on attachment 212783 [details] [diff] [review] fix that doesn't regress exiting >+ attribute boolean shuttingDown; I don't actually see anyone setting this... >+ PRBool shuttingDown; >+ (void) imapServer->GetShuttingDown(&shuttingDown); >+ if (!shuttingDown) >+ (void) imapServer->GetUseIdle(&m_useIdle); >+ else >+ m_useIdle = PR_FALSE; Would it be worth moving the shutting down check into GetUseIdle, thus avoiding the interface change?

David :Bienvenu

Assignee

Comment 32

•

19 years ago

Attached patch account manager part of change (obsolete) — Details — Splinter Review

Thx, Neil, forgot the base/src part of the patch. Since the account manager only has an nsIImapIncomingServer, the interface change is still required (though I could just have a setter...). I thought about putting it in GetUseIdle, but since I'd still need an interface change, this seemed about the same.

Attachment #212847 - Flags: superreview?(mscott)

Kai Engert [:KaiE:]

Comment 33

•

19 years ago

Some more feedback about the previous set of patches, prior to 2006-02-22. I am running two instances of SeaMonkey all the time, two profiles. One work work, one for private stuff. Both profiles have a different IMAP+SSL account on separate servers open all the time. The work profile did not hang since you added the latest patch on 2006-02-15. However, when my network trouble occurred, it could happen the application ended up in a state, where it still was responsive to user input, but loading data from the network did not work (after restoring the network). Even switching the application to offline and back online did not work. I was required to restart the application, but that worked fine, no hang. In my private profile I have "cleanup inbox on exit" enabled, and I ran into your predicted hang once, but no more often. I am now building a new optimized build of the latest 1.8 branch plus both patches from 2006-02-22 applied and will use that as my new primary application.

Kai Engert [:KaiE:]

Comment 34

•

19 years ago

Attached patch Patch: Combines the previous 2 — Details — Splinter Review

There was a typo in your patch, I fixed that and produced a combined patch.

Attachment #212783 - Attachment is obsolete: true

Attachment #212847 - Attachment is obsolete: true

Attachment #212865 - Flags: review?(mscott)

Attachment #212783 - Flags: superreview?(mscott)

Attachment #212847 - Flags: superreview?(mscott)

Kai Engert [:KaiE:]

Updated

•

19 years ago

Attachment #212865 - Flags: review?(mscott) → superreview?(mscott)

David :Bienvenu

Assignee

Comment 35

•

19 years ago

Thx, Kai, what was the typo?

Kai Engert [:KaiE:]

Comment 36

•

19 years ago

There was a ; after IMPL_GET_SET and gcc bailed out with a syntax error.

David :Bienvenu

Assignee

Comment 37

•

19 years ago

thx, I'm always doing that. Fixed in my tree. Re networking activity stopping, does that include non-imap networking activity, like http in messages, or rss feeds? If so, I bet one of the necko threads is blocked, like the socket transport thread or the dns lookup thread - backtraces might be interesting.

Kai Engert [:KaiE:]

Comment 38

•

19 years ago

I think the network hang was limited to imap, and other networking still working. But I might be wrong. If I see it again, I'll test.

David :Bienvenu

Assignee

Comment 39

•

19 years ago

if limited to imap, it would be most likely limited to a particular folder or folders, so that a folder you hadn't opened should be OK.

Kai Engert [:KaiE:]

Comment 40

•

19 years ago

Attached file backtrace 2006-02-24 — Details

Yes, I confirm. The stalled network is limited to some imap folders only. I just encountered this situation again while the application has still been running from yesterday. I restored the broken network connection after the night, the new mail check automatically started. Multiple folders had new messages. I could click some and got a message display, but the hourglass remained, not being able to click messages. In another folder I didn't get an updated message list display. I was able to open new folders and click, read and delete messages. Then I attached with the debugger, produced the attached stack listing, and detached from the application, keeping it running. To my surprise, after another while, all folders have now recovered and work correctly. The application is still running and I'm posting this bugzilla comment with it.

neil@parkwaycc.co.uk

Comment 41

•

19 years ago

(In reply to comment #32) >Thx, Neil, forgot the base/src part of the patch. Since the account manager >only has an nsIImapIncomingServer, the interface change is still required Ah, that explains things, thanks!

Scott MacGregor

Updated

•

19 years ago

Attachment #212865 - Flags: superreview?(mscott) → superreview+

Kai Engert [:KaiE:]

Comment 42

•

19 years ago

Attached file backtrace 2006-03-14 — Details

I'm running 1.8 branch from yesterday, with the latest patch from this bug (and my patch from bug 111384). I just run into a cleanupOnExit related hang, stack attached.

David :Bienvenu

Assignee

Comment 43

•

19 years ago

yep, I'm seeing that too, which is why I haven't got this stuff checked in yet :-( I think the url to cleanup the inbox is failing but the account manager code never gets notified that the url finished/failed.

Kai Engert [:KaiE:]

Comment 44

•

19 years ago

There seems to be a general wake-up problem somewhere. I just had a situation when I saw two new mail messages, which were junk, but not yet marked as such. I clicked them, but they didn't load, mail stalled, although the application was still responsive. Then I sent a mail message. This suddenly released the stalled situation, my junk messages got downloaded, marked as junked, and moved to the junk folder.

Dave Miller [:justdave]

Comment 45

•

19 years ago

Attached file Backtrace 2006-05-18 — Details

I've been getting a lot of hangs in Thunderbird lately, and the symptoms sound similar to this bug, except that my UI is totally unresponsive once it hangs (can't bring backgrounded windows to the front or anything when it happens). I'm on OS X. gdb backtraces attached.

Serge Gautherie (:sgautherie)

Updated

•

19 years ago

Blocks: 283823

David :Bienvenu

Assignee

Comment 46

•

19 years ago

fixed on trunk and branch (or at least, it should be much better). My worry now is about the hang on exit...

Status: ASSIGNED → RESOLVED

Closed: 20 years ago → 19 years ago

Keywords: fixed1.8.1

Resolution: --- → FIXED

Serge Gautherie (:sgautherie)

Updated

•

19 years ago

No longer blocks: 283823

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.