Closed Bug 403603 Opened 15 years ago Closed 12 years ago

Over One minute IMAP online copy or move retried forever and mail FLOODING!


(MailNews Core :: Networking: IMAP, defect)

Not set


(blocking-thunderbird3.1 rc1+, thunderbird3.1 rc1-fixed)

Tracking Status
blocking-thunderbird3.1 --- rc1+
thunderbird3.1 --- rc1-fixed


(Reporter: lenin, Assigned: Bienvenu)



(Keywords: qawanted, Whiteboard: [ready to land])


(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ja; rv: Gecko/20071025 Firefox/
Build Identifier: HEAD

If IMAP online Copy or Move timeouted, thunderbird retries it forever.
And mail FLOODING!
And IMAP server is saturated.

Reproducible: Always

Steps to Reproduce:
Actual Results:  
I made a patch!

Index: mailnews/imap/src/nsImapProtocol.cpp
RCS file: /cvsroot/mozilla/mailnews/imap/src/nsImapProtocol.cpp,v
retrieving revision 1.671
diff -u -8 -p -r1.671 nsImapProtocol.cpp
--- mailnews/imap/src/nsImapProtocol.cpp	9 Nov 2007 15:07:23 -0000	1.671
+++ mailnews/imap/src/nsImapProtocol.cpp	13 Nov 2007 11:04:01 -0000
@@ -311,17 +311,17 @@ static PRBool gFetchByChunks = PR_TRUE;
 static PRInt32 gMaxChunkSize = 40960;
 static PRBool gInitialized = PR_FALSE;
 static PRBool gHideUnusedNamespaces = PR_TRUE;
 static PRBool gHideOtherUsersFromList = PR_FALSE;
 static PRBool gUseEnvelopeCmd = PR_FALSE;
 static PRBool gUseLiteralPlus = PR_TRUE;
 static PRBool gExpungeAfterDelete = PR_FALSE;
 static PRBool gCheckDeletedBeforeExpunge = PR_FALSE; //bug 235004
-static PRInt32 gResponseTimeout = 60;
+static PRInt32 gResponseTimeout = 8400;
 static nsCStringArray gCustomDBHeaders;
 nsresult nsImapProtocol::GlobalInitialization()
     gInitialized = PR_TRUE;
     nsresult rv;
     nsCOMPtr<nsIPrefBranch> prefBranch(do_GetService(NS_PREFSERVICE_CONTRACTID, &rv));
     NS_ENSURE_SUCCESS(rv, rv);
@@ -1596,18 +1596,23 @@ PRBool nsImapProtocol::RetryUrl()
   nsCOMPtr <nsIImapMockChannel> saveMockChannel;
   nsresult rv;
   nsCOMPtr<nsIImapIncomingServer> imapServer  = do_QueryReferent(m_server, &rv);
   if (NS_SUCCEEDED(rv))
-  if (m_imapServerSink)
-    m_imapServerSink->RetryUrl(kungFuGripImapUrl);
+  if (m_imapServerSink){
+    nsImapAction imapaction;
+    kungFuGripImapUrl->GetImapAction(&imapaction);
+    if(imapaction != nsIImapUrl::nsImapOnlineCopy && imapaction != nsIImapUrl::nsImapOnlineMove){
+        m_imapServerSink->RetryUrl(kungFuGripImapUrl);
+    }
+  }
   return (m_imapServerSink != nsnull); // we're running a url (the same url)
 // ignoreBadAndNOResponses --> don't throw a error dialog if this command results in a NO or Bad response
 // from the other words the command is "exploratory" and we don't really care if it succeeds or fails.
 void nsImapProtocol::ParseIMAPandCheckForNewMail(const char* commandString, PRBool aIgnoreBadAndNOResponses)
     if (commandString)
Version: unspecified → Trunk
Please add the patch as an attachment (use the "Add an attachment" link). Then ask review by setting the r? flag of the patch to the email address of a suitable reviewer like David.
Assignee: nobody → bienvenu
Component: General → Networking: IMAP
Product: Thunderbird → Core
QA Contact: general → networking.imap
Attached patch OK!Splinter Review
Comment on attachment 294922 [details] [diff] [review]

We don't want to not retry online move/copies at all - we simply want to retry them once, which is what I thought the code did...
Attachment #294922 - Flags: review-
I confirm the issue, if tried to copy a 180mb folder in another and I had to kill TB when the new folder was over 1gb..... (around 750 messages copied in 4k+ messages, over and over).
We get this all the time as well. It's totally reproducible for us.

I've uploaded a server log showing the bug take effect.

It's only an extract as the same thing repeated itself over and over. It did it 20 times before I killed the server.

Description of log:
Session 0x28 started a copy at 9:48:54
At 9:49.55, Thunderbird started a new session (0x2a), got the message UIDs/flags, and started another copy at 9:49:57

At 9:50:43, session 0x28's copy finished

At 9:51:02, Thunderbird started a new session (0x2b), got the message details and started another copy at 9:51:03

At 9:52:07, Thunderbird started a new session (0x2d), got the message details and started another copy at 9:52:09

At 9:52:21, session 0x2a's copy finished

My thoughts:
- retrying a command like 'copy' is potentially not a good idea anyway. Even if you just retry it once, you end up with two copies of the messages if they both succeed. Commands like store/fetch etc are OK to retry as they have no side effects.

- retrying the command after it has received a successful response back from a previous attempt of the command is a bad idea.

- maybe the timeout should be dynamic - it will probably take longer to copy 1000 messages than it takes to copy 1 message. 

- maybe Thunderbird should break down copies into smaller 'chunks' as an option

This bug is effecting me as well - each time I try to copy several email messages from one folder to another the emails are duplicated MANY times.
Reinin, do you also see this using trunk?  And if so, is there a way forward?

(bug 391853, bug 410692 are possible duplicates)
Product: Core → MailNews Core
I agree, bug 410692 appears to be a dupe of this one, please dupe it.

Please note my comments in 410692; with current TB3 capability of storing commands on a per folder basis, we are getting into a never-ending loop with the move/copy commands that timeout because they are never removed from the execute queue.
Ever confirmed: true
Flags: blocking-thunderbird3?
bienvenu has targeted bug 410692 for 3.0b3 so not duping. Let's see if that fixes this one
Depends on: 410692
(In reply to comment #10)
> bienvenu has targeted bug 410692 for 3.0b3 so not duping. Let's see if that
> fixes this one

If anyone can still reproduce on latest trunk ( please let us know asap.

Marking as not blocking as I believe bug 410692 and others I've seen have covered this already. Please re-request blocking if that's not the case (and provide steps to repeat).
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Keywords: qawanted
Well, it is different now.  My understanding of the fix is that it increased the timeout greatly (even dynamically ?) so that copies of emails (etc.) should no longer hit the timeout, and therefore shouldn't hit failure until the operation actually finished.

The timeout does seem longer, and now one copy does not seem to create many times the number of messages in a single copy/move operation.  However, I'm still seeing the long moves as a failure, so it is like it is still hitting the 60 second timeout, but it isn't immediately retrying the failed command.  With storage of the command in the IMAP folder, if I leave the folder, and come back to it, it will retry the operation.

So although now it is not _flooding_, I'm still seeing multiple copies created from a single move/copy operation (with revisits of the source folder).

Is this behavior sufficiently different to be it's own bug and "fix" this one, or do you want to keep this here?
This bug is not dead yet. I just ran into it with 3.0b3. The worst part is that the tool that could previous clean up after these errors (Remove Duplicate Messages (alternative)) seems to trigger the bug also if the destination folder is in IMAP and the original number of duplicates large enough.

I am not sure what is the best way to handle this case, but my feeling is that to offer a list of pending IMAP commands (like for example does) and allow the user to cancel commands or let them run instead would be better. The current behavior is... Not acceptable. Reposting potentially dangerous commands are not a good idea.

Now, time to remove 1860000 messages from my mail server :(
My view is that Thunderbird should simply not retry commands (especially those such as COPY, SEARCH, APPEND). By all means, tell the user that the action is taking a long time, but an automatic silent retry is inappropriate.

The only reason a retry WOULD be appropriate is if the IMAP protocol allowed servers to ignore commands - which it doesn't. So, if Thunderbird sends a command, then either the server will act on it, report an error, or there's a bug with the server. In the first case, a retry is harmful, in the second case, Thunderbird will (or at least should) tell the user the error, and in the third case, it should just tell the user that nothing is happening, and let the user decide what to do (which may involve them contacting the server vendor for help).

I'm not sure what the rationale behind the automatic retry is, but I can't think of a good enough reason for it.
Flags: blocking-thunderbird3.1?
It's not clear to me from the comments that this is still happening on 3.0.x (though it might well be!).  If people here are still seeing it, please comment here in detail and renominate.
blocking-thunderbird3.1: --- → -
Flags: blocking-thunderbird3.1?
Just tested. This bug is certainly alive and ready to destroy your IMAP folders in 3.0.1.
Flags: blocking-thunderbird3.1?
blocking-thunderbird3.1: - → rc1+
Flags: blocking-thunderbird3.1?
Pehr, what steps are you using to reproduce the problem? Move/copy of a large message between imap folders on the same server? Or a lot of large messages? Are you getting a timeout error, or a server error?
Running 3.0.3 now, against hMailServer 5.2.1-B361.

To reproduce the bug follow the following steps: 

1: Create an IMAP folder with too many messages to copy before the timeout (10 000 is more than enough at my system).
2: Create a new IMAP folder, using the same IMAP account.
3: Choose all the messages, move them to the new folder.
4: Thunderbird will be sluggish but work.
5: Wait...

In the background thunderbird will post new copy operations regularly, creating multiple copies of the original 10 0000 messages in the destination folder, but never removing them from the original folder.
If I read the patch in Bug 409259 (which supposedly fixes the related bugs) correctly it increases the timeout to #messages/40 seconds for large amounts of messages (over 2400). It's not impossible that my IMAP server can't mannage an average 40/second as many of the mails in my archive are rather large. And once the bug triggers the first time the load on the server makes it certain the bug will trigger again.
This bug still exists in 3.0.4

When moving approx. 1000 messages from one folder to another, the copy operations are being continuously retried and the destination folder is filled with duplicates. Currently up to 19000.

Messages are of a variety of sizes 2k-1M, depending on attachment size.

This is a common operation for many users, as they may archive their incoming messages (as I do) to monthly subfolders.
I've hacked our fake imap server to simulate timeouts, so I can probably use that to work on the imap code not to do retries in this case, but I also have to worry about the offline imap playback code, to make the sure the offline playback operation is removed.

Unfortunately, our fakeserver code doesn't allow multiple connections or cleanup particular well after itself, which makes it a bit tricky to use for testing this.
Attached patch wip (obsolete) — Splinter Review
This WIP does several things:

Makes nsImapProtocol::m_connectionStatus an nsresult, and propagates it back to the url listener OnStopRunningUrl call. This allows us to detect timeouts.

Makes imap not internally retry move/copies.

Propagates NO/BAD server responses back to OnStopRunningUrl via a new error code, NS_MSG_ERROR_IMAP_COMMAND_FAILED. In general, retrying commands that failed on the server is not useful (there are exceptions, e.g., temporary bad state on the server, but we can't know something is temporary.)

If an offline move/copy fails because of a timeout or a command error, remove the offline operation, so that we won't keep retrying it.

Extends the fake imap server to optionally timeout imap copy commands.

Adds an xpcshell imap move/copy test that times out.

Makes pseudo-offline playback requests hold a reference to the msg window. This allows the new imap move/copy timeout xpcshell test to pass in a msg window w/o having it get garbage collected out from under us (xpconnect's wrapper object was not getting add-reffed, so it got gc'd before the timeout fired).

I need to test this some more, and actually make the test fail. This is a little tricky because fake server doesn't allow multiple connections.
Whiteboard: [patch in progress]
Attached patch proposed fixSplinter Review
See prev comment for description of what this patch does. Most of the patch is simply making connectionStatus an nsresult, and propagating it around.
Attachment #441633 - Attachment is obsolete: true
Attachment #441819 - Flags: superreview?(bugzilla)
Attachment #441819 - Flags: review?(bugzilla)
Whiteboard: [patch in progress] → [has patch for review]
Whiteboard: [has patch for review] → [has patch for review][eta for review Mon]
Whiteboard: [has patch for review][eta for review Mon] → [has patch; needs review Standard8]
Attachment #441819 - Flags: superreview?(bugzilla)
Attachment #441819 - Flags: superreview+
Attachment #441819 - Flags: review?(bugzilla)
Attachment #441819 - Flags: review+
Whiteboard: [has patch; needs review Standard8] → [ready to land]
fixed for trunk and 3.1 on 1.9.2 branch
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.