Closed Bug 340265 Opened 18 years ago Closed 14 years ago

can't remove/delete large number of messages at the same time, timeout error (imap)

Categories

(Thunderbird :: Folder and Message Lists, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: sergiyz, Unassigned)

Details

(Whiteboard: closeme 2010-09-25 WFM?)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4

I'm running courier imap on my server and some users have complained they can't remove messages from some of their folders.
Apparently, their thunderbird (1.5.0.x) was setup to move deleted messages to Trash folder and they had more than 6,000 messages
they needed to remove.
They were trying to remove the messages by selecting them all and pressing delete button.
The client would work for a while, then they would get a timeout error, reconnect and find all messages still in
the folder.
So they would try it again and it would fail again.
If they later checked their Trash folder, they would find several copies of all those messages sitting there in addition to messages still
being in the original folder ;-/

Here's what happens behind the scene:

1. User selects all messages in TB and clicks on delete.
2. TB sends a COPY [range] command to the server.
3. Serves starts copying the messages to the Trash folder.
4. Since there's a lot of messages and some of them are really large, it takes server a while to respond, so
   TB displays "timeout" error message to the user.
5. Server *continues to process the request* though (imapd is still running) and copies all the messages to Trash.
6. TB never finds out the messages were copied and doesn't remove any messages from the original folder so
    we end up with having all the messages we wanted to remove still in the folder with a copy in Trash
7. User notices the messages are not removed, and repeats the actions (goto [1]).

Why one would have 6,000+ messages in his folder which they wanted to remove at once is a whole another story, but it happens often
enough to be annoying, unfortunately.
The workaround is to select up to 500-1000 messages at a time or try setting up TB to remove messages right away or to mark them as deleted, but users in general setup TB just once and if their preference is to move messages to Trash, it's their choice.

What would be ideal is if TB instead of processing all selected messages at once would split them into groups if the number
of messages exceeded a certain limit, which could be configurable in the client itself.
I think groups of 100 messages would be reasonable as it provides enough feedback to the user (counter is updating) and guarantees
a timely server response.
Same goes for removal (expunge) and such.

Reproducible: Always

Steps to Reproduce:
1. setup thunderbird to put deleted messages in Trash
2. choose a folder with large number of messages in it (3,000 plus,  depends
on the imap server performance)
3. select all messages
4. click on delete
5. after getting a timeout error, check the folder again, see all messages still there
6. check Trash folder in several minutes, find all messages there in addition to being in the original folder
Actual Results:  
messages are not removed from the original folder but copied into Trash

Expected Results:  
messages should be removed from the original folder and put into Trash folder

this behavior is easy to reproduce with Maildir type folders on the imap server, specifically courier imap (the most recent 0.52.3)
I'd suggest that the user increase their timeout under tools | options | advanced | general, if they're going to do that, and the server is that slow.
(In reply to comment #1)
> I'd suggest that the user increase their timeout under tools | options |
> advanced | general, if they're going to do that, and the server is that slow.
> 

Increasing timeouts is a notorious programming bodge though.  What value should the user increase the timeout to?  The user relies on the client to get it right.  Is it not possible to make the move (or delete) operation atomic?
Sergiy, what release were you running in comment 0?  Has the problem gotten better or worse with automatic updates?  did comment 1 help?  

I experienced "partial" delete results twice in the last few days.  Once when our mail server was hammered and on it's knees. And once today under normal load.  But it was only ~200 messages I was deleting from trash - less than a dozen were not deleted.
Summary: can't remove large number of messages at the same time → can't remove/delete large number of messages at the same time
correction due to faulty memory

attempted 40 messages, 2 didn't delete  (mail server under severe load)

attempted ~200 messages - a dozen not deleted   (server under normal load)
 

> Sergiy, what release were you running in comment 0?

Whatever stable TB version we have back then (1.5.0.4?)

> Has the problem gotten
> better or worse with automatic updates?  did comment 1 help?  

The behavior didn't change, so nothing has changed in that regard.
One can increase the timeout, yes, but it's not a solution, since it's low by
default and would require every user to do it.
I think processing selected messages in groups of N each (where N could be
about a 100 msgs) on delete and any other mass update would improve user
experience especially if there's some kind of progress bar or status line
(say on the bottom of the screen) showing the number of messages
selected and number of messages processed.
Right now even if you set timeout high enough, most users would simply
not wait long enough thinking the app is hosed.

// serge
So, it this gonna be addressed ?
The Trash folder handling seems to be terribly redundant.
You're copying all messages into Trash and then don't
even expunge messages from the original folder, but flag them
as deleted.
So essentially now you have two copies of the message which user
wants to get rid of in the first place.
Then the user has to expunge his mailbox and empty the Trash.
I have also had this issue. I tried deleting mail in a folder that is sent via mailing list using Thunderbird/IceDove and had to have my WebHost delete my TMP folder because I was unable to delete it from the shell. They stated there were 36,000 messages in the Trash/tmp folder from trying to delete mail, and that they have seen this issue in the past.
(In reply to comment #6)
> So, it this gonna be addressed ?
> The Trash folder handling seems to be terribly redundant.
> You're copying all messages into Trash and then don't
> even expunge messages from the original folder, but flag them
> as deleted.

you can change the delete model of course (tho doesn't address all the issues)

see also http://kb.mozillazine.org/Compacting_folders
Summary: can't remove/delete large number of messages at the same time → can't remove/delete large number of messages at the same time, timeout error (imap)
Just FYI, holding down the shift key when deleting messages causes them to 
be marked deleted in the original folder and NOT copied to the trash folder.
But you knew that already, right?  :)
I was just hit with this bug using ver 2.0.0.6 on Ubuntu. I *may* have created it using an earlier version from home. I have an IMAP account. There were a 1000 or so mails in the main inbox folder ( a month's worth of saved ) and I tried to highlight-delete about 100 or so in one move.

I don't know anything about traditional email file-handling ( see  http://kb.mozillazine.org/Compacting_folders ) though it sounds, well, archaic.

So maybe this weird sort of out-of-synch duplicative method is just whats required- I defer to experience. But.

Because it both corrupts your data, and freezes you out of your account I think this ought to be labeled a critical. I'll be happy to confirm it for you too, just let me know.

thanks
John
I've experienced this bug twice. Both times required my host to manually log into the mail server and clear out the Trash TMP folder, leaving the mail account inaccessible until they get around to doing it.
Dup. of bug 296453?
(In reply to comment #12)
> Dup. of bug 296453?
> 

I don't think so, although I suppose it is possible that there are actually 2 different problems that display very similar symptoms.  I've been meaning to post on this bug for a while and haven't had a chance to, so here are my observations.

Our site has seen bug 340265 many times from the server side. Since we run a cyrus-imap installation with over 80k users, we still experience it on a regular basis.  I suspect that the timeout users are seeing, are not actually timeouts (at our site anyway).  I would also add that TB appears to retry the delete on it's own after the timeout and, in most cases, it is not the user retrying.

Our cyrus configuration creates hardlinks of messages that are copied.  Since a delete for most people is actually a copy to Trash, even large numbers of messages can be "deleted" quite rapidly.  The combination of this bug and the speed at which hardlinks can be created has caused some users to end up with 500k-1M+ messages in their trash (or some other folder they were moving to), most of the messages being duplicates of other messages.  Although this doesn't cause a quota issue for our users, it causes other problems and we generally have to disable their access to that mailbox while it is cleaned up.

In trying to recreate this bug in our test environment I found it wasn't strictly connected to the number of messages that were being deleted.  Many users trigger it when trying to delete as few as a 1000 messages, but I was able to delete 4-5000+ messages easily.  The bug appears to occur when the OK response from the IMAP server is beyond a certain length.  For example, if you delete 5000 message and their UIDs are sequential from 1:5000, then the process goes something like: 

<30 uid copy 1:5000 "Trash"
>30 OK [COPYUID 1:5000] Completed

And TB accepts that and continues.  But if the UIDs are not sequential, and the range is something like [1:10,26,57,....] which can get very long, the server takes the range, does it and responds in a few seconds or less with;

>30 OK [COPYUID 1:10,26,57,.....] Completed

But when the range section is very long, TB appears to ignore or discard the response, waits for the timeout, then tries the whole thing over again.  I can post more detailed debug output if there is a need for it.  You can guess how much we hate this bug when a user selects 1000 random messages to delete before going to bed at 2am and we usually get paged 1 or 2 hours later.

Brian
Brian, more detailed output might be helpful, or even better, access to a test account where I can recreate the problem.

yes, we do retry operations when they fail/timeout, but we should only retry them once, though there are certainly reports where we retry multiple times. 
Is it possible that the OK [COPYUID response might approach 8K in length, in the case where you were able to reproduce the problem? I do see an 8K buffer in the code that's used to create a line and it's possible that we might just spin when presented with a line that long. I'll try to test that here...
Status: UNCONFIRMED → NEW
Ever confirmed: true
We have code to grow the line buffer dynamically, but I have a suspicion that it's not always invoked. See http://mxr.mozilla.org/seamonkey/source/mailnews/base/util/nsMsgLineBuffer.cpp#370
Status: NEW → ASSIGNED
An 8K response seems large, but possible.  However it's been a while since I did the testing when I first reproduced it.  I'll post full logs as soon as I can.  I'll see if I can get you access to a test account also.  
I can also try shrinking the buffer size in my local build and see if I run into problems sooner...
Assignee: mscott → bienvenu
Status: ASSIGNED → NEW
I think I was able to reproduce this by reducing the buffer size from 8K to 200, and copying a bunch of messages. I need to catch it in the debugger, though.
It now appears as though I'm unable to reproduce the bug using TB 2.0.0.16.  I'm not sure exactly which version I was using when I originally reproduced it, but it was probably 2.0.0.9.  It also seems as though TB has changed the way it sends the COPY commands.  Sometimes it appears to break the complete list of messages up into separate copy commands, and other times it just sends a range even though the UIDs are not sequential.  In the later case the server responds with a complete list of UIDs copied, not a range, and even with a large response (>8k) TB seems to handle it ok.  I've attached some example logs.


IMAP allows us to use ranges with "non-existent" uids, in other words, 1:10 is fine, even if 2-8 don't exist. But to do that, we need to know the set of existing uids, and at some point we extended the copy/move code to know about the set of existing uids. I thought that was pre 2.0, however.

If the problem was that TB was generating a too long copy command, and the server errored out, and we retried over and over again, that would explain this. But it sounded earlier like you were pretty sure that the command was succeeding, but the server was replying with a very long copyuid string.
When I was able to reproduce before, the copy command was definitely successful on the server side but TB continued to retry.  It looks like the ftp site only has 2.0.0.14 & 16 available, is there any place to download older versions for testing or would I have to check out a code branch from cvs?  
Yes, a log from 2.0.0.9 would be very useful:

ftp://ftp.mozilla.org/pub/thunderbird/releases/2.0.0.9/
IMO Thunderbird should do something like gray out the deleted messages and start deleting/moving them as a background activity.
in 3.0, we delete them immediately as an offline operation and playback the offline operation in the background.
Sergy and others, in version 3.0 or 3.1, is the problem gone for you, or do you still see it.  Please comment
Assignee: bienvenu → nobody
Component: General → Folder and Message Lists
QA Contact: general → folders-message-lists
Whiteboard: closeme 2010-09-25 WFM?
RESOLVED INCOMPLETE due to lack of response to previous question. If you feel this change was made in error, please respond to this bug with your reasons why.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
I don't know if I'm experiencing the same bug as others on this ticket.
Thunderbird 52.6.0 (64-bit) on CentOS 7.4
IMAP mail account running on server with Dovecot

1.  Start with folder containing 5,000 messages
2.  Select most messages (all but 8)
3.  Push the DELETE button

Expected response:
  progress bar in bottom bar
  decrease in message count in bottom bar
  no alerts

Observed response:
  No change in bottom bar
  Dialog box:  "A script on this page may be busy, or it may have
stopped responding.  You can stop the script now, or you can 
continue to see if the script will complete.
Script:  chrome://messenger/content/folderPane.js:2113
[ ] Don't ask me again
(Stop script)
(Continue)

The situation is repeatable.  When I press (Continue) the dialog
does NOT disappear, but instead sits there.  After a delay, the
dialog will disappear and immediately re-appear.  Repeat about
10 times.

When I press (Stop script) Thunderbird is in an unstable state;
I have to quit and restart in order to be able to work with
the folder pane again.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: