Last Comment Bug 403603 - Over One minute IMAP online copy or move retried forever and mail FLOODING!
: Over One minute IMAP online copy or move retried forever and mail FLOODING!
Status: RESOLVED FIXED
[ready to land]
: qawanted
Product: MailNews Core
Classification: Components
Component: Networking: IMAP (show other bugs)
: Trunk
: x86 All
: -- normal with 1 vote (vote)
: ---
Assigned To: David :Bienvenu
:
Mentors:
Depends on: 410692
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-13 03:30 PST by Reinin Oyama
Modified: 2010-05-14 13:40 PDT (History)
9 users (show)
standard8: blocking‑thunderbird3-
mozilla: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
rc1+
rc1-fixed


Attachments
OK! (2.33 KB, patch)
2007-12-30 18:20 PST, Reinin Oyama
mozilla: review-
Details | Diff | Review
IMAP4 Server log showing the bug in effect (4.04 KB, text/plain)
2008-04-09 02:34 PDT, Paul Smith
no flags Details
wip (33.89 KB, patch)
2010-04-26 15:41 PDT, David :Bienvenu
no flags Details | Diff | Review
proposed fix (34.04 KB, patch)
2010-04-27 08:58 PDT, David :Bienvenu
standard8: review+
standard8: superreview+
Details | Diff | Review

Description Reinin Oyama 2007-11-13 03:30:56 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ja; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Build Identifier: HEAD

If IMAP online Copy or Move timeouted, thunderbird retries it forever.
And mail FLOODING!
And IMAP server is saturated.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Actual Results:  
I made a patch!

Index: mailnews/imap/src/nsImapProtocol.cpp
===================================================================
RCS file: /cvsroot/mozilla/mailnews/imap/src/nsImapProtocol.cpp,v
retrieving revision 1.671
diff -u -8 -p -r1.671 nsImapProtocol.cpp
--- mailnews/imap/src/nsImapProtocol.cpp	9 Nov 2007 15:07:23 -0000	1.671
+++ mailnews/imap/src/nsImapProtocol.cpp	13 Nov 2007 11:04:01 -0000
@@ -311,17 +311,17 @@ static PRBool gFetchByChunks = PR_TRUE;
 static PRInt32 gMaxChunkSize = 40960;
 static PRBool gInitialized = PR_FALSE;
 static PRBool gHideUnusedNamespaces = PR_TRUE;
 static PRBool gHideOtherUsersFromList = PR_FALSE;
 static PRBool gUseEnvelopeCmd = PR_FALSE;
 static PRBool gUseLiteralPlus = PR_TRUE;
 static PRBool gExpungeAfterDelete = PR_FALSE;
 static PRBool gCheckDeletedBeforeExpunge = PR_FALSE; //bug 235004
-static PRInt32 gResponseTimeout = 60;
+static PRInt32 gResponseTimeout = 8400;
 static nsCStringArray gCustomDBHeaders;
 
 nsresult nsImapProtocol::GlobalInitialization()
 {
     gInitialized = PR_TRUE;
     nsresult rv;
     nsCOMPtr<nsIPrefBranch> prefBranch(do_GetService(NS_PREFSERVICE_CONTRACTID, &rv));
     NS_ENSURE_SUCCESS(rv, rv);
@@ -1596,18 +1596,23 @@ PRBool nsImapProtocol::RetryUrl()
   nsCOMPtr <nsIImapMockChannel> saveMockChannel;
   m_runningUrl->GetMockChannel(getter_AddRefs(saveMockChannel));
   ReleaseUrlState(PR_TRUE);
   nsresult rv;
   nsCOMPtr<nsIImapIncomingServer> imapServer  = do_QueryReferent(m_server, &rv);
   kungFuGripImapUrl->SetMockChannel(saveMockChannel);
   if (NS_SUCCEEDED(rv))
     imapServer->RemoveConnection(this);
-  if (m_imapServerSink)
-    m_imapServerSink->RetryUrl(kungFuGripImapUrl);
+  if (m_imapServerSink){
+    nsImapAction imapaction;
+    kungFuGripImapUrl->GetImapAction(&imapaction);
+    if(imapaction != nsIImapUrl::nsImapOnlineCopy && imapaction != nsIImapUrl::nsImapOnlineMove){
+        m_imapServerSink->RetryUrl(kungFuGripImapUrl);
+    }
+  }
   return (m_imapServerSink != nsnull); // we're running a url (the same url)
 }
 
 // ignoreBadAndNOResponses --> don't throw a error dialog if this command results in a NO or Bad response
 // from the server..in other words the command is "exploratory" and we don't really care if it succeeds or fails.
 void nsImapProtocol::ParseIMAPandCheckForNewMail(const char* commandString, PRBool aIgnoreBadAndNOResponses)
 {
     if (commandString)
Comment 1 Magnus Melin 2007-11-13 09:12:50 PST
Please add the patch as an attachment (use the "Add an attachment" link). Then ask review by setting the r? flag of the patch to the email address of a suitable reviewer like David.
Comment 2 Reinin Oyama 2007-12-30 18:20:14 PST
Created attachment 294922 [details] [diff] [review]
OK!
Comment 3 David :Bienvenu 2007-12-31 01:07:18 PST
Comment on attachment 294922 [details] [diff] [review]
OK!

We don't want to not retry online move/copies at all - we simply want to retry them once, which is what I thought the code did...
Comment 4 Dario Corti 2008-01-06 10:15:58 PST
I confirm the issue, if tried to copy a 180mb folder in another and I had to kill TB when the new folder was over 1gb..... (around 750 messages copied in 4k+ messages, over and over).
Comment 5 Paul Smith 2008-04-09 02:34:36 PDT
Created attachment 314551 [details]
IMAP4 Server log showing the bug in effect
Comment 6 Paul Smith 2008-04-09 02:43:30 PDT
We get this all the time as well. It's totally reproducible for us.

I've uploaded a server log showing the bug take effect.

It's only an extract as the same thing repeated itself over and over. It did it 20 times before I killed the server.

Description of log:
Session 0x28 started a copy at 9:48:54
At 9:49.55, Thunderbird started a new session (0x2a), got the message UIDs/flags, and started another copy at 9:49:57

At 9:50:43, session 0x28's copy finished

At 9:51:02, Thunderbird started a new session (0x2b), got the message details and started another copy at 9:51:03

At 9:52:07, Thunderbird started a new session (0x2d), got the message details and started another copy at 9:52:09

At 9:52:21, session 0x2a's copy finished


My thoughts:
- retrying a command like 'copy' is potentially not a good idea anyway. Even if you just retry it once, you end up with two copies of the messages if they both succeed. Commands like store/fetch etc are OK to retry as they have no side effects.

- retrying the command after it has received a successful response back from a previous attempt of the command is a bad idea.

- maybe the timeout should be dynamic - it will probably take longer to copy 1000 messages than it takes to copy 1 message. 

- maybe Thunderbird should break down copies into smaller 'chunks' as an option

Comment 7 Aaron Wormus 2008-08-21 05:31:05 PDT
This bug is effecting me as well - each time I try to copy several email messages from one folder to another the emails are duplicated MANY times.
Comment 8 Wayne Mery (:wsmwk, NI for questions) 2009-01-18 08:49:01 PST
Reinin, do you also see this using trunk?  And if so, is there a way forward?

(bug 391853, bug 410692 are possible duplicates)
Comment 9 Michael Baffoni 2009-02-08 11:37:08 PST
I agree, bug 410692 appears to be a dupe of this one, please dupe it.

Please note my comments in 410692; with current TB3 capability of storing commands on a per folder basis, we are getting into a never-ending loop with the move/copy commands that timeout because they are never removed from the execute queue.
Comment 10 Wayne Mery (:wsmwk, NI for questions) 2009-03-18 14:33:24 PDT
bienvenu has targeted bug 410692 for 3.0b3 so not duping. Let's see if that fixes this one
Comment 11 Mark Banner (:standard8) 2009-05-10 11:57:24 PDT
(In reply to comment #10)
> bienvenu has targeted bug 410692 for 3.0b3 so not duping. Let's see if that
> fixes this one

If anyone can still reproduce on latest trunk (ftp://ftp.mozilla.org/pub/thunderbird/nightly/) please let us know asap.

Marking as not blocking as I believe bug 410692 and others I've seen have covered this already. Please re-request blocking if that's not the case (and provide steps to repeat).
Comment 12 Michael Baffoni 2009-05-10 13:25:08 PDT
Well, it is different now.  My understanding of the fix is that it increased the timeout greatly (even dynamically ?) so that copies of emails (etc.) should no longer hit the timeout, and therefore shouldn't hit failure until the operation actually finished.

The timeout does seem longer, and now one copy does not seem to create many times the number of messages in a single copy/move operation.  However, I'm still seeing the long moves as a failure, so it is like it is still hitting the 60 second timeout, but it isn't immediately retrying the failed command.  With storage of the command in the IMAP folder, if I leave the folder, and come back to it, it will retry the operation.

So although now it is not _flooding_, I'm still seeing multiple copies created from a single move/copy operation (with revisits of the source folder).

Is this behavior sufficiently different to be it's own bug and "fix" this one, or do you want to keep this here?
Comment 13 Pehr Söderman 2009-09-21 14:53:03 PDT
This bug is not dead yet. I just ran into it with 3.0b3. The worst part is that the tool that could previous clean up after these errors (Remove Duplicate Messages (alternative)) seems to trigger the bug also if the destination folder is in IMAP and the original number of duplicates large enough.

I am not sure what is the best way to handle this case, but my feeling is that to offer a list of pending IMAP commands (like for example mail.app does) and allow the user to cancel commands or let them run instead would be better. The current behavior is... Not acceptable. Reposting potentially dangerous commands are not a good idea.

Now, time to remove 1860000 messages from my mail server :(
Comment 14 Paul Smith 2009-09-22 01:02:59 PDT
My view is that Thunderbird should simply not retry commands (especially those such as COPY, SEARCH, APPEND). By all means, tell the user that the action is taking a long time, but an automatic silent retry is inappropriate.

The only reason a retry WOULD be appropriate is if the IMAP protocol allowed servers to ignore commands - which it doesn't. So, if Thunderbird sends a command, then either the server will act on it, report an error, or there's a bug with the server. In the first case, a retry is harmful, in the second case, Thunderbird will (or at least should) tell the user the error, and in the third case, it should just tell the user that nothing is happening, and let the user decide what to do (which may involve them contacting the server vendor for help).

I'm not sure what the rationale behind the automatic retry is, but I can't think of a good enough reason for it.
Comment 15 Dan Mosedale (:dmose) 2010-02-12 16:22:27 PST
It's not clear to me from the comments that this is still happening on 3.0.x (though it might well be!).  If people here are still seeing it, please comment here in detail and renominate.
Comment 16 Pehr Söderman 2010-02-12 23:50:17 PST
Just tested. This bug is certainly alive and ready to destroy your IMAP folders in 3.0.1.
Comment 17 David :Bienvenu 2010-03-08 09:10:08 PST
Pehr, what steps are you using to reproduce the problem? Move/copy of a large message between imap folders on the same server? Or a lot of large messages? Are you getting a timeout error, or a server error?
Comment 18 Pehr Söderman 2010-03-08 09:58:32 PST
Running 3.0.3 now, against hMailServer 5.2.1-B361.

To reproduce the bug follow the following steps: 

1: Create an IMAP folder with too many messages to copy before the timeout (10 000 is more than enough at my system).
2: Create a new IMAP folder, using the same IMAP account.
3: Choose all the messages, move them to the new folder.
4: Thunderbird will be sluggish but work.
5: Wait...

Result:
In the background thunderbird will post new copy operations regularly, creating multiple copies of the original 10 0000 messages in the destination folder, but never removing them from the original folder.
Comment 19 Pehr Söderman 2010-03-09 07:33:01 PST
If I read the patch in Bug 409259 (which supposedly fixes the related bugs) correctly it increases the timeout to #messages/40 seconds for large amounts of messages (over 2400). It's not impossible that my IMAP server can't mannage an average 40/second as many of the mails in my archive are rather large. And once the bug triggers the first time the load on the server makes it certain the bug will trigger again.
Comment 20 Chris 2010-04-23 10:33:16 PDT
This bug still exists in 3.0.4

When moving approx. 1000 messages from one folder to another, the copy operations are being continuously retried and the destination folder is filled with duplicates. Currently up to 19000.

Messages are of a variety of sizes 2k-1M, depending on attachment size.

This is a common operation for many users, as they may archive their incoming messages (as I do) to monthly subfolders.
Comment 21 David :Bienvenu 2010-04-23 20:44:36 PDT
I've hacked our fake imap server to simulate timeouts, so I can probably use that to work on the imap code not to do retries in this case, but I also have to worry about the offline imap playback code, to make the sure the offline playback operation is removed.

Unfortunately, our fakeserver code doesn't allow multiple connections or cleanup particular well after itself, which makes it a bit tricky to use for testing this.
Comment 22 David :Bienvenu 2010-04-26 15:41:05 PDT
Created attachment 441633 [details] [diff] [review]
wip

This WIP does several things:

Makes nsImapProtocol::m_connectionStatus an nsresult, and propagates it back to the url listener OnStopRunningUrl call. This allows us to detect timeouts.

Makes imap not internally retry move/copies.

Propagates NO/BAD server responses back to OnStopRunningUrl via a new error code, NS_MSG_ERROR_IMAP_COMMAND_FAILED. In general, retrying commands that failed on the server is not useful (there are exceptions, e.g., temporary bad state on the server, but we can't know something is temporary.)

If an offline move/copy fails because of a timeout or a command error, remove the offline operation, so that we won't keep retrying it.

Extends the fake imap server to optionally timeout imap copy commands.

Adds an xpcshell imap move/copy test that times out.

Makes pseudo-offline playback requests hold a reference to the msg window. This allows the new imap move/copy timeout xpcshell test to pass in a msg window w/o having it get garbage collected out from under us (xpconnect's wrapper object was not getting add-reffed, so it got gc'd before the timeout fired).

I need to test this some more, and actually make the test fail. This is a little tricky because fake server doesn't allow multiple connections.
Comment 23 David :Bienvenu 2010-04-27 08:58:16 PDT
Created attachment 441819 [details] [diff] [review]
proposed fix

See prev comment for description of what this patch does. Most of the patch is simply making connectionStatus an nsresult, and propagating it around.
Comment 24 David :Bienvenu 2010-05-14 13:40:01 PDT
fixed for trunk and 3.1 on 1.9.2 branch

Note You need to log in before you can comment on or make changes to this bug.