Open Bug 1096096 Opened 6 years ago Updated 3 years ago

Messages Not Moved to Folders (Intermittent, v 31 and later). Perhaps? ... copy fais and OnCopyCompleted never gets called to clear a copy request, folder thinks that a copy is in progress, and refuses to do any additional copies.

Categories

(Thunderbird :: Filters, defect)

31 Branch
x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: ceplaw, Unassigned, NeedInfo)

Details

(Whiteboard: [dupeme?])

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141105223254

Steps to reproduce:

This is an intermittent issue that normally occurs when TBird has been running for more than 30 minutes or so. I first noticed in in version 31.0.0.

Note: This has occurred on machines that have only a single e-mail account (POP) and multiple e-mail accounts (both POP and IMAP), so I do not believe it is account-management related.


Actual results:

On occasion, TBird loses the ability to move messages between folders. This occurs as any of:
Cannot manually move a message
Junk messages marked as junk in inbox, but not moved to junk folder
When sending a message, message sends, but no copy placed in "sent" folder (and error message results)

Restarting TBird regains the ability to move messages. However, it does not move messages that "should have" been moved already (e.g., junk mail from the inbox to the junk mail folder).


Expected results:

Umm, moving the messages as occurred silently prior to version 31.0.0 is what I "expected."
Does File | Offline | Work offline, then going back online also restore functionality?
Flags: needinfo?(ceplaw)
I will have to wait until it recurs and try that; it recurred this morning, but I did the shut-down cycle before I saw Mr Mery's query.

I should also note:

* that there are several questions in the Support fora suggesting that others are having this problem, and in versions prior to 31.

* that there's such a wide variety of extensions installed (or not) on the machines with which I've experienced this that interaction with extensions seems an unlikely source of the problem; the machines don't even use the same appearance extensions.
Flags: needinfo?(ceplaw)
Further update:

Just tried "work offline" cycling, does not help; still required a complete shutdown of Tbird and restart.

Side thought: As Skype also updated itself at about the same time as Tbird v.31 came out, might they not be playing nice together, particularly if there are SMS messages being exchanged on Skype?
Exactly the same 'Actual Results' here using TB 31.2.0. Filter actions are also affected.
No problem with sent messages, yet: I sent only a few of them and they wouldn't be affected anyway while TB was still in a 'healthy' state.) 

AFAICT troubles started when I added a new message *filter*: I wanted to move some kind of newly received messages to a special folder ("Local Folders / Inbox" -> "Local Folders / Suspicious", yes, some spam related obnoxious ****;-). Since almost all of these messages are automatically classified as "Junk" they would be moved to "Local Folders / Junk", but my filter moves it to a different location. Maybe this confuses TB for the rest of the session ('sick' state;-). Applying the filter to new messages before or after junk classification seems to make no difference. Running the filter manually works while TB is in 'healthy' state. I've now disabled this filter - but I have to wait for some new input to give reliable feedback. I'll be back.
Version 31.3.0 has now INCREASED the prevalence and decreased the amount of time after restart in which this is observed. It is still not a critical error, but it is becoming more annoying.

Please change status of this bug from "unconfirmed" to "confirmed," or state what you need to make it a "confirmed" bug. Since it doesn't actually crash Tbird, using the Crash Reporter is irrelevant.
(This is a follow-up on my own comment #4)
>  [...] I've now disabled this filter
> - but I have to wait for some new input to give reliable feedback.

Here it is: I received a few messages, that would have been caught by the filter I've disabled. TB behaves completely normal now.

Quick recap: These messages are all classified as junk (and end up now in the folder "Junk"). My filter (disabled now) contains only one action, viz. move message to folder "Suspicious". My idea was to bypass the movement to the junk folder. Maybe TB cannot handle such conflicting movements (implied vs. specified) properly, and even "hurts itself" in the action.
(In reply to CEP from comment #2)
> I will have to wait until it recurs and try that; it recurred this morning,
> but I did the shut-down cycle before I saw Mr Mery's query.
> 
> I should also note:
> 
> * that there are several questions in the Support fora suggesting that
> others are having this problem, and in versions prior to 31.

please provide some URLs.
Thanks

> * that there's such a wide variety of extensions installed (or not) on the
> machines with which I've experienced this that interaction with extensions
> seems an unlikely source of the problem; the machines don't even use the
> same appearance extensions.

Does *your* problem reproduce if you have started in TB's safe mode?
Flags: needinfo?(ceplaw)
(In reply to Heribert Slama from comment #6)
> (This is a follow-up on my own comment #4)
> >  [...] I've now disabled this filter
> > - but I have to wait for some new input to give reliable feedback.
> 
> Here it is: I received a few messages, that would have been caught by the
> filter I've disabled. TB behaves completely normal now.
> 
> Quick recap: These messages are all classified as junk (and end up now in
> the folder "Junk"). My filter (disabled now) contains only one action, viz.
> move message to folder "Suspicious". My idea was to bypass the movement to
> the junk folder. Maybe TB cannot handle such conflicting movements (implied
> vs. specified) properly, and even "hurts itself" in the action.

Interesting. So in account settings you have "enable adaptive junk mail" turned on. What happens if you disable this, and enable your filter?


(noting bug 835766 for future reference)
Flags: needinfo?(slama-h)
(In reply to Wayne Mery (:wsmwk) from comment #7)
> (In reply to CEP from comment #2)
> > I will have to wait until it recurs and try that; it recurred this morning,
> > but I did the shut-down cycle before I saw Mr Mery's query.
> > 
> > I should also note:
> > 
> > * that there are several questions in the Support fora suggesting that
> > others are having this problem, and in versions prior to 31.
> 
> please provide some URLs.
> Thanks

For example, https://support.mozilla.org/en-US/questions/1018307#answer-640145 (there are others out there, just not on mozilla sites, and I won't have a chance to run them down until over the weekend)


> 
> > * that there's such a wide variety of extensions installed (or not) on the
> > machines with which I've experienced this that interaction with extensions
> > seems an unlikely source of the problem; the machines don't even use the
> > same appearance extensions.
> 
> Does *your* problem reproduce if you have started in TB's safe mode?

Yes (no change). TB's safe mode seems to have no effect on this bug... although I almost never use TB's safe mode, and I have NOT tested this after the most-recent update (this week).

As an aside, I've also noted that the spam filter has become substantially less effective since version 30; more and more spam is getting through both my manual filters and the spam filter, and it's not new spam.
Flags: needinfo?(ceplaw)
(In reply to CEP from comment #9)
> (In reply to Wayne Mery (:wsmwk) from comment #7)
> > (In reply to CEP from comment #2)
> > > I will have to wait until it recurs and try that; it recurred this morning,
> > > but I did the shut-down cycle before I saw Mr Mery's query.
> > > 
> > > I should also note:
> > > 
> > > * that there are several questions in the Support fora suggesting that
> > > others are having this problem, and in versions prior to 31.
> > 
> > please provide some URLs.
> > Thanks
> 
> For example,
> https://support.mozilla.org/en-US/questions/1018307#answer-640145 (there are
> others out there, just not on mozilla sites, and I won't have a chance to
> run them down until over the weekend)

Thanks

> As an aside, I've also noted that the spam filter has become substantially
> less effective since version 30; more and more spam is getting through both
> my manual filters and the spam filter, and it's not new spam.

I've seen an increase also. It goes in waves. In my case it's not caused by Thunderbird, and likely also in your case not caused by Thunderbird
(In reply to Wayne Mery (:wsmwk) from comment #10)
> (In reply to CEP from comment #9)
> > (In reply to Wayne Mery (:wsmwk) from comment #7)
> > > (In reply to CEP from comment #2)
> > > > I will have to wait until it recurs and try that; it recurred this morning,
> > > > but I did the shut-down cycle before I saw Mr Mery's query.
> > > > 
> > > > I should also note:
> > > > 
> > > > * that there are several questions in the Support fora suggesting that
> > > > others are having this problem, and in versions prior to 31.
> > > 
> > > please provide some URLs.
> > > Thanks
> > 
> > For example,
> > https://support.mozilla.org/en-US/questions/1018307#answer-640145 (there are
> > others out there, just not on mozilla sites, and I won't have a chance to
> > run them down until over the weekend)
> 
> Thanks

I see that I wasn't clear in my original report: I had already tried the suggested fix in that thread to no effect. Indeed, the problem cropped up on another machine that was a fresh install.

> 
> > As an aside, I've also noted that the spam filter has become substantially
> > less effective since version 30; more and more spam is getting through both
> > my manual filters and the spam filter, and it's not new spam.
> 
> I've seen an increase also. It goes in waves. In my case it's not caused by
> Thunderbird, and likely also in your case not caused by Thunderbird

Understood, but I noted that for completeness, since everything I'm experiencing/seeing indicates that there's some kind of interaction -- and failure of the spam filter to learn that, for example, any e-mail with "Dr Oz" in the send is spam may be related.

Inference: Something in a relatively recent revision of Tbird is causing choking when a message is simultaneously being processed for both a filter and adaptive learning/processing of spam -- an error condition is being exited from by the (obvious and proper) "don't change anything because I found an error!" response. The message gets marked, the error occurs, and the process(es) exit without moving. Interestingly, this also prevents later _manual_ moves after the error occurs.

Here's a thought: Might there be a way to "force adapt" the adaptive learning by manually adding filter-like spam characteristics? That would obviate the need for many of my manual spam-and-harassment filters... meaning that simultaneous processing would no longer be a problem.
(In reply to Wayne Mery (:wsmwk) from comment #8)
> [...]
> 
> Interesting. So in account settings you have "enable adaptive junk mail"
> turned on. What happens if you disable this, and enable your filter?

1) I changed settings as  you proposed. The filter works as expected (moves matching messages to the specified folder). No adverse side effects on TB.

2) Then I started a new series of tests (and could reproduce the ill side effects on TB): I created my own test messages, and sent them to myself, training TB to mark them as junk. I created a new filter (PseudoSpam) matching on From and Subject and moving the message to a special folder (PseudoSpam) in Local Folders. I tested the filter for all meaningful combinations of running options, i.e. manually, on getting new mail), before/after junk classification. The results:

+ option: before junk classification:
..... new mail: moved to Junk (error#1: not to folder specified in filter, as if filter didn't match)
..... manually: moved to specified folder (PseudoSpam)
..... TB: remains fully operational (at least for a few test messages; very low traffic here)

+ option: after junk classification:
..... new mail: moved to specified folder (PseudoSpam)
..... manually: when applied on folder other than PseudoSpam, e.g. Sent: nothing moved; error#2
..... TB: can no longer move or delete any message in any folder (error#2). After restart fully operational again.

Move and delete operations are blocked no matter whether they are initiated from keyboard, menu or filter.
Flags: needinfo?(slama-h)
please see coment 12

Heribert, great work. Thanks for doing that.
Component: Untriaged → Filters
Flags: needinfo?(m-wada)
Flags: needinfo?(acelists)
Do we have anything in the Error console?
Flags: needinfo?(acelists)
(In reply to Heribert Slama from comment #12)
> + option: after junk classification:
> ..... new mail: moved to specified folder (PseudoSpam)

If your training is sufficient, the mail should be moved to Junk folder by Junk filter, shouldn't it?

Contention between "Junk move" and "Filter move" is suspected.
    When "filter after junk classifiction" is invoked on mail which will be moved by Junk filter, something wrong may happen.
    After it, copy/move of mail is impossible until restart.

Please note following:
   1. "Filter Before/After Junk Classification" is meaningful only when "New message".
        i.e. "Manual filter run with Before or After" itself is nonsense.
   2. When "Filter Before Junk Classification", Junk filter is also applied to each "message moved by filter" at "filter move target 
       folder", unless Junk Status/Score is set by message filter(it sets "Junk filtering is already applied" status.)
       Needless to say, "setting Junk Status/Score by message filter" affects on "Junk filter's learning".
   3. When "Filter After Junk Classification", Junk filter is applied to all new mails downloaded to Inbox at Inbox,
       then message filter is applied to new mails in Inbox which was not moved by Junk filter.

Is "steps you show" procedure which consistently produces problem?
Or it's merely "problem occurred once when you did test in such order"?

Tested with IMAP? Or POP3?
Both "New message : Before, if a-condition, do action then move to  XXX" and "New message : After, if a-condition, do action then move to  XXX" defined at same time?
If so, nd if imap, 1. Before filter does action and tries to do move, 2. Junk filter tries to do move at Inbox and at move target folder, 3. After filter does action and tries to do move.
Note: "Before filter" and "After  filter" is executed independently.
And, if IMAP, "move == copy + store flag \Deleted", so action on "mail marked as \Deleted" is possible.
And, if "move from imap to local", "copy by filter" == "fetch to local move target folder".
If "body condition" is involved, "fetch entire message data" is executed before filtering if imap.
If previewText of Biff is enabled, Biff tries to fetch first 2048 bytes of message body at same time.
If auto-sync of imap is enabled, auto-sync tries to do fetch of entire mail data.

What is written in messaage filter log?  What is written in Junk filter log?
How about Junk icon of each mail?
Same question as aceman. Is error relevant to problem shown at Error Console?

Please describe detail of "What you did" and "what happened".. 
Message filter rules is saved in msgFilterRules.dat file. "Attaching the file to this bug" is better than words about filter rule.

By the way, I saw "copy/move of mail is impossible until restart" sometimes during checking Tb's behavior when "outdated msf condition" exists in local mail folder. Please surely rule out problem due to "outdated msf condition in local mail folder".
"outdated msf condition" is explicitly cleared by "Repair Folder", or explicit folder open when folder is not opened.
Flags: needinfo?(m-wada)
One additional side note that may be relevant to any fix:

I do NOT use the default-on-install profile folder location for Tbird (or Firefox, or anything else); everything is pointed at a folder that is second-level off the root, with all permissions set appropriately. I say this in case there's any folder reference tree hardcoded into anything that might be used to resolve this bug. That is, any fix needs to ensure that it's using the SHORTHAND folder reference, not a full path (there was a bugfix about five years ago in which this became an issue, kludged through by a cold boot).
Further data point indicating that this is quite possibly not just about filters:

One of my work e-mail accounts is through an outside vendor that is constantly changing its certificates, particularly for the SMTP server. I've now confirmed that the loss-of-function problem occurs every time there's a send attempt that is blocked by an invalid SMTP server certificate. I can have just moved half a dozen messages to folders, written a message, and then Thunderbird throws up all over my shoes.

Side note: I am assiduous in keeping my inbox under a single screen, so I really don't think it's an indexing problem with the inbox. However, some of the folders to which filters send things are quite large (not nearly as large as Outlook would make them, which is one of the many reasons I use Thunderbird).

Second side note: This is also happening on a much-more-capable machine running Windows 8.1, so it's definitely not OS-version dependent.

PLEASE upgrade this to a "confirmed" bug.
Yet even more further data:

It's worse under Windows 8.1. In fact, it's now "repeating" in a rather disturbing manner: Even after a close-and-restart cycle, Tbird will now -- about 25% of the time on this 8.1 machine -- start up in "I can't move messages" mode.

This was a fresh installation of Thunderbird on this machine, from a fresh/direct download of the program... and fresh/direct downloads of each and every extension. The specific extension list is:
Automatic Export 0.5.2
CompactHeader 2.0.9
ImportExportTools 3.1
Lightning 3.3.2
Lightning Month Tabs 1.9
Manually sort folders 1.1
Message Pane Button 1.02
No Message Pane Sorting by Mouse 1.2
Return Receipt Toolbar Button 0.16
Saved Password Editor 2.8.1
Send Later 4.3.1
Toolbar Buttons 1.0.2
ViewSourceWith 0.9.4.2

just to rule out any bad interactions. (Note: I actually need functionality provided by all of these, particularly those related to Lightning and to message sorting.)

Of course, I wouldn't have found out about Windows 8.1 being more of a problem if the former machine hadn't bricked on me.
I encountered this behavior doing some development. My cause unlikely to be your issue, but it does point out what could be going wrong.

If something causes a copy to fail, and OnCopyCompleted never gets called to clear a copy request, then the folder thinks that a copy is in progress, and refuses to do any additional copies. You would probably see the root failure when running in debug mode (as developers do) but many of these messages are not seen in release versions.

That's not much help to pinning this down, but it might give us some idea of where to look. One solution might be to implement a timeout in the copy methods, so that after a period of time with no activity the copy auto exits with a failure.
So, if I understand Mr James's comment correctly, the "move" operation works by making a copy in the destination, then deleting the source (as opposed to the old C paradigm of merely changing a pointer's value). That may well explain why Tbird locks up after the failure... but it doesn't explain the cause of the initial failure.

Based on Mr James's comment, I have changed the way my spam/blocking-related filters are configured -- I have changed them all to run only AFTER junk classification, which would have them not actually move anything (since it would already be in the junk folder). Unfortunately, certain other filters must run BEFORE junk classification (some messages that have to be moved to avoid false positives as junk and/or to maintain subject separation), so it's going to take at least a full business week to determine whether there's a difference. I still think that a better "solution" would be to enable direct editing of the junk-mail system so that I didn't need separate custom filters in the first place... but that may not be feasible without considerable programming effort, and is perhaps more suitable for an extension.
Do we have any workaround for this ? I'm on ubuntu 14.10 with latest thunderbird 31.4.0 and I think it started happening after I did some update in my ubuntu. I went back to version 24 to try and that also shows the problem, so probably some library got updated and that is affecting this. I can provide whatever info you need to root cause this. 

Kind of handicapped because Thunderbird is not working, and evolution is nowhere close to it. Appreciate any suggestions.
My guess is that there are a variety of possible initiating events that cause a copy to fail without calling OnCopyCompleted, but there is a single uniform cause of the lockup, which is that copy requests never timeout. The fix would involve both identifying the (probably multiple) root causes, fixing them, as well as implementing some sort of timeout.

I know from experience that timeouts need to be very long, as it could take many minutes to copy a large message to a slow server. Any timeout would result in an upper limit on the size of messages that could be copied to slow servers in some cases.

As for workarounds, other than restarting the only viable approach at the moment is to try to change anything that is routinely doing copies (filters or spam processing) to try to eliminate the root cause in your instance.
(In reply to CEP from comment #20)
> I have changed them all to run only AFTER junk classification, which would have them not actually move anything 
> (since it would already be in the junk folder).
>  Unfortunately, certain other filters must run BEFORE junk classification
> (some messages that have to be moved to avoid false positives as junk and/or to maintain subject separation),
> so it's going to take at least a full business week to determine whether there's a difference.

lthough both "Before Junk Classification" and "After Junk Classification" can be requested at same time, mixing of them is slightly dangerous, because "Before" and "After" are independently invoked.
  Following can be defined.
     Before Junk Classification: RuleA : if a conditionA met, do action A, move to FolderA
     After Junk Classification     : RuleB : if a conditionB met, do action B, move to FolderB
  When a new mail matches with both conditionA and conditionB, "RuleB of After" is not skipped by "move to folder of RuleA of Before".
  Because "move to FolderA" is already done by "RuleA of Before", "move to FolderB by RuleB of After" usually fails even if it's requested.
  However, "do action B" may be executed, because "After Junk Classification" may be started before completion of actual execution of
  "uid copy nn FolderA + uid store nn +Flags(\Deleted)".
  Because further action on uid=nn  will never requested by other rule of "Before Junk Classification", "end of move of uid=nn to FolderA"
  is when request of "uid copy nn FolderA + uid store nn +Flags(\Deleted)" is queued.
  (If imap delete model=Just mark it as deleted, "Move to B" may be executed, because "Delete step in Move"="store +Flags(\Deleted)".
This is perhaps rare case, but please be careful when mixing "Before Junk Classification" and "After Junk Classification".
If possible, in any rule of "Before Junk Classification", remove "Move action" and add "Mark As NonJunk" please, to reduce unwanted/unexpected problem. If "Mark As Junk" is executed, Junk Filter is not applied to the mail because "Mark As Junk" sets status of "Junk checking is already done".
And, please never use action of "Delete" nor "Stop filter execution" at any place because known bug exists.
I have had this "bug" of intermittent failure to move messages or failure to delete messages for about a year.  I had it with windows 8.1 and it persisted after upgrade to Windows 10.  I have Thunderbird on multiple computers running Windows 7 [one machine], 8.1 [3 machines] and 10 [2 machines] as well as on 2 Linux machines, and it only occurs on one machine [previously Windows 8.1, now Windows 10].  All of these systems are 64 bit.  

It happens at least once a day when I am using this particular machine, but I do not use this machine every day.  

Once it occurs, no further delete or move message commands will succeed until I exit Thunderbird and then re-enter.  On some occasions it is not possible to view messages other than the one that was selected after the error has occurred.

Today the erroe was accompanied by this message in the Error Console:
Timestamp: 12/23/2015 11:52:28 PM
Error: An error occurred executing the button_delete command: [Exception... "Component returned failure code: 0x8052000e (NS_ERROR_FILE_IS_LOCKED) [nsIMsgDBView.doCommand]"  nsresult: "0x8052000e (NS_ERROR_FILE_IS_LOCKED)"  location: "JS frame :: chrome://messenger/content/folderDisplay.js :: FolderDisplayWidget_doCommand :: line 1773"  data: no]
Source File: chrome://global/content/globalOverlay.js
Line: 99
Continue to get multiple errors daily.  See above comment for details.  

Most recent error message on failure:

Timestamp: 12/28/2015 11:24:37 AM
Error: An error occurred executing the button_delete command: [Exception... "Component returned failure code: 0x8052000e (NS_ERROR_FILE_IS_LOCKED) [nsIMsgDBView.doCommand]"  nsresult: "0x8052000e (NS_ERROR_FILE_IS_LOCKED)"  location: "JS frame :: chrome://messenger/content/folderDisplay.js :: FolderDisplayWidget_doCommand :: line 1773"  data: no]
Source File: chrome://global/content/globalOverlay.js
Line: 99

Seems that we have a reproducible error.  Its amusing if disappointing that this bug is still listed as "unconfirmed".
Today I decided to leave forever my bloated, slow, defective Thunderbird email client, and I began using Claws Mail.  

Claws Mail is working perfectly and is much faster for both single and multiple message deletes than was Thunderbird.

It took me about 15 minutes to complete the migration from Thunderbird to Claws Mail, from start to finish.

My recommendation to all of those who suffer with continuing problems with Thunderbird is to move on to another email client.  The Thunderbird developers are clearly unable to fix the parts of Thunderbird that are broken.  Otherwise, the bugs that have remained unrepaired for more than a year in the case of this bug and for several years in the case of some other bugs would have long ago been repaired.  

There are many alternative email clients out there.  I've chosen Claws Mail, but that is just one of many possible choices.
FYI, a similar problem for me was due to maildir, and was fixed by reverting to mailbox.
(In reply to CEP from comment #20)
> ...
> Based on Mr James's comment, I have changed the way my spam/blocking-related
> filters are configured -- I have changed them all to run only AFTER junk
> classification, which would have them not actually move anything (since it
> would already be in the junk folder). 

CEP Has that helped?
And any other changes in behavior since then?
Flags: needinfo?(ceplaw)
Manish, do you still see this?
Flags: needinfo?(ceplaw) → needinfo?(mkatiyar)
Summary: Messages Not Moved to Folders (Intermittent, v 31 and later) → Messages Not Moved to Folders (Intermittent, v 31 and later). Perhaps? ... copy fais and OnCopyCompleted never gets called to clear a copy request, folder thinks that a copy is in progress, and refuses to do any additional copies.
Whiteboard: [dupeme?]
You need to log in before you can comment on or make changes to this bug.