Closed Bug 754508 Opened 12 years ago Closed 12 years ago

Thunderbird 12.0.1 loses/ignores some new messages from POP server...

Categories

(MailNews Core :: Networking: POP, defect)

x86
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 754301

People

(Reporter: fred, Unassigned)

Details

(Keywords: dataloss)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19

Steps to reproduce:

Clicked "Get Mail" when there were hundreds of new messages waiting at the POP server.


Actual results:

Some of the new messages were downloaded correctly.  Others were silently lost.


Expected results:

All of the new messages should have been downloaded correctly.

More Info:

- This is a new bug in 12.0.1.  I saw it very repeatably for several  
  days with 12.0.1.  Then I reverted to 11.0, and the problem is gone.

- I'm using Mac OS X Snow Leopard

- Seems to be a timing or capacity issue of some sort.  If I set 
  Thunderbird 12.0.1 to fetch new messages every 2 minutes, I 
  rarely lose a message, but when I stop the autofetching or 
  close Thunderbird for a few hours, so that a couple hundred 
  messages accumulate at the POP server, the next fetch starts
  out OK, but then loses a large batch of consecutive messages.
  I'm not positive, but I suspect that once it starts losing 
  messages, it loses all remaining messages in that fetch from 
  the POP server.  As though an error aborted the fetch, but 
  still updated the records as though all messages had been 
  fetched successfully.  Fetching again does not fetch the missing
  messages, but does fetch any new messages that arrived at the
  POP server since the previous fetch.

- I set up a cron job on a server to send e-mail once per minute 
  so I can easily see any gaps.  That's how I know when it's 
  working correctly and when I'm experiencing the bug.

- May be related to the 128 filters I have for incoming messages, 
  moving them to various folders in my Inbox.  Perhaps it takes 
  too long to process all of the filters when that many messages
  are being pulled at once from the POP server.

- I have exactly the same Thunderbird data and config files on 
  another Mac, synchronized daily via rsync, so it has exactly the
  same profile (folders, filters, messages, popstate.dat file, etc.)
  The only difference is that the other Mac is still running 
  Thunderbird 8.0 and has never had this problem.  It successfully 
  fetches all messages from the same POP server, even those 
  messages that the problem Mac has already lost.  That's why I 
  suspected the version of Thunderbird and thought to revert from
  12.0.1 to 11.0 on the problem Mac, which caused the problem to 
   go away.

- When I reverted to Thunderbird 11.0, which fixed the problem, I 
  deleted the 12.0.1 version of the the /Applications/Thunderbird.app
  folder tree, and then installed Thunderbird 11.0 from a DMG file,
  opening the DMG file and dragging the Thunderbird icon to the
  Applications icon.  I did NOT make any changes to the  ~/Library
  tree or to my Thunderbird profile directory tree.  So, it is not a 
  data or configuration issue.  The 12.0.1 software has a bug that
  the 11.0 version does not.

- I posted this same bug report 6 days ago on 5/5/2012 at the 
  getsatisfaction site where at least one other person has 
  commented that he has the same problem.  Like me, he has lots 
  of filters on incoming e-mail message, so perhaps that is 
  relevant.  See:
     https://getsatisfaction.com/mozilla_messaging/topic/thunderbird_loses_ignores_some_new_messages_from_pop_server

- This is an ABSOLUTELY CRITICAL bug for me.  I'm leery to ever 
  upgrade from 11.0 until I hear that this bug is fixed.  Silently 
  losing mail is a BIG problem.  I could have clients assuming I'm 
  purposely ignoring their e-mail. I could have potential clients
  unable to reach me to hire me.  I've been a huge fan and advocate
  of Thunderbird (and previously Netscape Messenger) for over 15
  years.  I recommend it to all of my clients.  Please fix this ASAP.

- Any more info I can provide?  You can reach me at fred@bristle.com
 
Thanks!
--Fred Stluka
Severity: normal → blocker
(In reply to Fred Stluka from comment #0)
> This is a new bug in 12.0.1

Does it mean that you used Tb 12.0.0 and upgraded to Tb 12.0.1 and there were no problem with Tb 12.0.0?

> - When I reverted to Thunderbird 11.0, which fixed the problem, (snip)
> The 12.0.1 software has a bug that the 11.0 version does not.

Have you read "Known Issues" section of Release Note of Tb 12.0.0 or Tb 12.0.1?
> http://www.mozilla.org/en-US/thunderbird/12.0/releasenotes/
> http://www.mozilla.org/en-US/thunderbird/12.0.1/releasenotes/
> Running TB 12 with local mail folders and pop3 mail filters may produce summary
> files that aren't correctly read by previous versions of Thunderbird.
> If you decide to go back to a previous version of Thunderbird after running TB 12, 
> you should delete the .msf files for your local folders and pop3 accounts,
> or repair the folders using the folder properties dialog, to avoid potential data loss.
See also bug 730947 comment #118, bug 730947 comment #109, bug 730947 comment #111, please.

You looks to use "filter move to local mail filter".
Did you do "manual clean up of incompatibility of data in .msf" before your fall back to Tb 11?
Tb 12 can use .msf which was generated by Tb 11 and before and changes it to new format of Tb 12 or later. However, once .msf data is mixed by Tb 12 and Tb 11, result is unpredictable.
As for mail data folder file(not .msf, Inbox instead of Inbox.msf if Inbox folder), data is compatible and is not corrupted unless Compact is executed after broken .msf data is generated.
Do following, please.
(1) Keep back up of Tb's profile directory(at least Mail directory which is set at Local Directory: of Server Settings of each mail account).
(2) "Repair Folder" of all local mail foldr which is used as "move target folder of filter move".
FYI.
Corrupted mail data can be observed by next test.
(0) Disable auto-compact, or enable prompt before auto-compact. 
(1) By Tb 12.0.1, download multiple mails from POP3 server.
    Multiple mails are moved by a local mail folder by message filter.
    Phenomenon of bug 754518 is seen at "Order Received" column.
    This is an incompatibility of .msf data.
(2) By Tb 12.0.1, Shift+Delete some mails of smallest "Order Received" value.
    If prompt before auto-compact is shown, reply Cancel.
(3) By Tb 11.0.1, Compact the local mail folder.
    => Shown mail data of some mails is broken.
       Mail data in mail folder file is broken.
    If "Repair Folder" is executed at this stage,
    corruption like "loss of mails" is perhaps exposed at thread pane too.
    This is perhaps same phenomenon as bug 730947.

Do you enable auto-compact? If yes, is mail.purge.ask=true set?
When prompt before auto-compact is shown, did you reply Cancel when you falled back to Tb 11 from Tb 12?
> > This is a new bug in 12.0.1

> Does it mean that you used Tb 12.0.0 and upgraded to Tb 12.0.1 and there were no problem with Tb 12.0.0?

I don't know.  i have allowed many automatic upgrades in the past few months and 
have come to trust them, so I haven't paid much attention to when they occurred or
what version numbers they moved me to.  I first noticed the problem with 12.0.1,
but I don't know whether I ever used 12.0, or for how long, or whether it had the 
same problem.

This is an insidious bug because it is not easy to notice the absence of an e-mail that 
you don't know someone sent you.  That's why I finally set up a server to send me 
one message per minute, so that I could notice any gaps.

When I decided to go back to a previous version, I could not find 12.0 anywhere
for download, so I downloaded 11.0 from:
     http://mozilla-thunderbird.en.softonic.com/mac
and it does not have the same problem.  Where would I go to get version 12.0?

I'm still reading the rest of your comments and will reply to them all as I process
them.

Thanks!
--Fred Stluka
> Have you read "Known Issues" section of Release Note of Tb 12.0.0 or Tb 12.0.1?
>> http://www.mozilla.org/en-US/thunderbird/12.0/releasenotes/
>> http://www.mozilla.org/en-US/thunderbird/12.0.1/releasenotes/
>> Running TB 12 with local mail folders and pop3 mail filters may produce summary
>> files that aren't correctly read by previous versions of Thunderbird.
>> If you decide to go back to a previous version of Thunderbird after running TB 12, 
>> you should delete the .msf files for your local folders and pop3 accounts,
>> or repair the folders using the folder properties dialog, to avoid potential data loss.
>
> See also bug 730947 comment #118, bug 730947 comment #109, bug 730947 comment #111, please.
>
> You looks to use "filter move to local mail filter".
> Did you do "manual clean up of incompatibility of data in .msf" before your fall back to Tb 11?
> Tb 12 can use .msf which was generated by Tb 11 and before and changes it to new format of Tb 12 or later. However, > once .msf data is mixed by Tb 12 and Tb 11, result is unpredictable.


This does not seem to be related.  I had no problem with pre-12 versions.  Then I
allowed an upgrade to 12.0.1 and noticed the problem.  Then I reverted to 11.0, 
and the problem has gone away.  The above talks about how moving from 12 to 
11 might cause a problem.  For me, it was the opposite.   I could now delete my 
*.msf files and let them get recreated by 11.0, but since I'm having no problem 
with 11.0, there's no reason for me to do so.   There's nothing to fix in 11.0, and 
the above says that 12.0+ should work fine with the new *.msf  files.

Any other ideas?  

Thanks!
--Fred Stluka
(In reply to WADA from comment #2)
> As for mail data folder file(not .msf, Inbox instead of Inbox.msf if Inbox
> folder), data is compatible and is not corrupted unless Compact is executed
> after broken .msf data is generated.
> Do following, please.
> (1) Keep back up of Tb's profile directory(at least Mail directory which is
> set at Local Directory: of Server Settings of each mail account).
> (2) "Repair Folder" of all local mail foldr which is used as "move target
> folder of filter move".

Interesting point.  When I saw this problem with 12.0.1, I did not think to 
check whether the missing messages were missing only from the index 
(*.msf) file or also from the data file.  As a user of Thunderbird for many 
years, I have occasionally found it useful to delete the *.msf files and let 
them get re-created, but that did not occur to me this time.

Is there any reason for me to try this now, on version 11.0?  Or are you
suggesting that I move from 11.0 to 12.0.1, wait for the error to occur
again, and then do this?

BTW, is there any advantage to "Repair Folder" over simply deleting the 
*.msf files?

Thanks!
--Fred Stluka
(In reply to WADA from comment #3)
> FYI.
> Corrupted mail data can be observed by next test.
> (0) Disable auto-compact, or enable prompt before auto-compact. 
> (1) By Tb 12.0.1, download multiple mails from POP3 server.
>     Multiple mails are moved by a local mail folder by message filter.
>     Phenomenon of bug 754518 is seen at "Order Received" column.
>     This is an incompatibility of .msf data.
> (2) By Tb 12.0.1, Shift+Delete some mails of smallest "Order Received" value.
>     If prompt before auto-compact is shown, reply Cancel.
> (3) By Tb 11.0.1, Compact the local mail folder.
>     => Shown mail data of some mails is broken.
>        Mail data in mail folder file is broken.
>     If "Repair Folder" is executed at this stage,
>     corruption like "loss of mails" is perhaps exposed at thread pane too.
>     This is perhaps same phenomenon as bug 730947.

OK.  Thanks for the info.  I haven't tried this experiment explicitly,
but I haven't seem any evidence of this problem.


> Do you enable auto-compact? If yes, is mail.purge.ask=true set?
> When prompt before auto-compact is shown, did you reply Cancel when you
> falled back to Tb 11 from Tb 12?

Yes, I enable auto-compact, but I always have it ask me first.  I checked
just now with the Thunderbird Config Editor and yes, mail.purge.ask=true.
I almost always allow the compact when asked, unless I know that mail
is currently being pulled from the POP server.  So, no, I probably did not 
click Cancel on the first compact prompt after moving back to TB 11.

However, I do not think this is relevant because the problem occurred
in 12.0.1 before I moved back to 11, and has not occurred since.

--Fred Stluka
I think to prevent further corruption you should not allow the compact to run on your folders until this issue is investigated.

There is a difference between Repair and just nuking the .msf, e.g. in some case there are tags on messages only stored in the msf file and not the mbox file. Deleting it would lose you the tags whereas repair would preserve them (repair uses info from both the old msf and the mbox file to generate new msf).

On the other hand, in this case I think you should backup the .msfs and then remove them. Let them recreate and see if anything is missing.

Also you should see in the mbox file if they aren't corrupted. You will spot it in this way: the "From " header (of each message) must always start at the beginning of a line. If this is not the case, you have a problem and rebuilding the msfs will still not help you.
(In reply to :aceman from comment #8)
> I think to prevent further corruption you should not allow the compact to
> run on your folders until this issue is investigated.

Good idea, but too late.  I have allowed it to compact many times in the 
past week or so that this problem has been occurring.  I'll prevent further
compacts now.

> There is a difference between Repair and just nuking the .msf, e.g. in some
> case there are tags on messages only stored in the msf file and not the mbox
> file. Deleting it would lose you the tags whereas repair would preserve them
> (repair uses info from both the old msf and the mbox file to generate new
> msf).

OK.  Thanks for the info.


> On the other hand, in this case I think you should backup the .msfs and then
> remove them. Let them recreate and see if anything is missing.

Today, for the first time since I moved to 11.0, I saw the same problem 
again.  Looks like I was wrong.  It is not strictly a bug in 12.0.1.  It also
occurs in 11.0, at least in cases like mine where I reverted from 12.0.1
to 11.0.

Now that I can see the problem again, I looked in the mbox files and yes, 
the missing messages are there.

I did as you suggested and deleted all of the *.msf files.  No help.  When 
I launched TB 11.0 again, the *.msf files were immediately recreated and 
the messages are still missing.  I checked the mbox files and the missing
messages all have X-Mozilla-Status: 0009, which means they are marked
as deleted.  That's probably why they are still gone after recreating the 
*.msf files.  The *.msf files were recreated from the mbox files, right?
And the mbox files must have already had the messages marked as
deleted somehow.  I'm not sure how that happened.  These are newly
arrived messages that I've never even read.

I do NOT have any filters that delete messages.  I can send you my 
msgFilterRules.dat to confirm that, if you like.

So now I have several questions:
- What is causing the messages to be marked as deleted when they 
   get fetched from POP and filtered into my various folders?
- What changed recently to cause this to start happening?
- Why does it happen to some messages in some folders and not to
   other messages in the same folders?
- Why does it NEVER happen on my other Mac where I have an 
   exact copy of the same profile, but am running TB 8.0

 
> Also you should see in the mbox file if they aren't corrupted. You will spot
> it in this way: the "From " header (of each message) must always start at
> the beginning of a line. If this is not the case, you have a problem and
> rebuilding the msfs will still not help you.

I don't see any corruption in the *.msf files.  I did a recursive grep for 
"From - " and it always seems to occur at the beginning of a line.


Is there anything else I can safely delete to reset things?  Deleting 
the *.msf files files was a good idea, but did not seem to help.
(That is, it did not recover any lost messages, though I can't say yet
whether it will prevent future messages from being lost.)  Are there
other files I could delete or repair?  Would it help to delete any 
cache files or search index files or anything like that?


Thanks for your continued help!
--Fred Stluka
(In reply to Fred Stluka from comment #9)
> Now that I can see the problem again, I looked in the mbox files and yes, 
> the missing messages are there.
That is good, so TB is downloading them.

> I did as you suggested and deleted all of the *.msf files.  No help.  When 
> I launched TB 11.0 again, the *.msf files were immediately recreated and 
> the messages are still missing.  I checked the mbox files and the missing
> messages all have X-Mozilla-Status: 0009, which means they are marked
> as deleted.  That's probably why they are still gone after recreating the 
> *.msf files.  The *.msf files were recreated from the mbox files, right?
> And the mbox files must have already had the messages marked as
> deleted somehow.  I'm not sure how that happened.  These are newly
> arrived messages that I've never even read.
> 
> I do NOT have any filters that delete messages.  I can send you my 
> msgFilterRules.dat to confirm that, if you like.
Do you have junk filtering enabled? I think a status of 0009 is Read+Deleted. Junk classifier can produce such status (can deleted messages and mark a read).

> I don't see any corruption in the *.msf files.  I did a recursive grep for 
> "From - " and it always seems to occur at the beginning of a line.
Did you mean the mbox files, without .msf extension?
(In reply to :aceman from comment #10)
> Do you have junk filtering enabled? I think a status of 0009 is
> Read+Deleted. Junk classifier can produce such status (can deleted messages
> and mark a read).

I never intended to have Junk filtering enabled, but when I checked the
Account Settings, yes the checkbox is set for "Enable adaptive junk mail 
controls for this account.  I have now turned it off.  

I'm not sure that was relevant though because checkbox was (and still 
is) set for "Do not mark mail as junk if the sender is in Personal Address
Book" and many of the missing messages are coming from address in
that address book.  Also, none of the other checkboxes in that "Junk
Settings" preference page are set.  So, it should not be moving junk
messages to a Junk folder or anything.  Also, there is no Junk folder.


> > I don't see any corruption in the *.msf files.  I did a recursive grep for 
> > "From - " and it always seems to occur at the beginning of a line.
> Did you mean the mbox files, without .msf extension?

Yes.  Sorry.  Typo.  Meant to say "in the mbox files".


How can I tell whether junk filtering is enabled/happening?  Was I 
checking the right settings?


Also, as I said originally, this still feels like a timing or capacity issue.
If I set TB  to fetch new messages every 2 minutes, I rarely lose a 
message, but when I stop the autofetching or close TB for a few hours, 
so that a couple hundred messages accumulate at the POP server, the 
next fetch starts out OK, but then loses a large batch of consecutive 
messages.  I suspect that once it starts losing messages, it loses all 
remaining messages in that fetch from the POP server.  Somehow it
gets into the mode of marking all messages deleted as they are 
fetched perhaps?

--Fred Stluka
There are also Junk settings in Options (Preferences) -> Security -> Junk.

Not sure about the timing. You could also review the Disk space pane in the account settings.
(In reply to :aceman from comment #12)
> There are also Junk settings in Options (Preferences) -> Security -> Junk.

OK.  I looked there just now.  All checkboxes are unchecked.

> Not sure about the timing. You could also review the Disk space pane in the
> account settings.

The Disk Space settings are set to "Don't delete any messages" and no
limit on the download size.

Any more ideas?

Thanks!
--Fred Stluka
After you let the messages build up on the server and then start TB, how long does it take to download the new ones? May it be over those 2 minutes when a next check would happen?
(In reply to :aceman from comment #14)
> After you let the messages build up on the server and then start TB, how
> long does it take to download the new ones? May it be over those 2 minutes
> when a next check would happen?

Yes, possibly.  Is there a problem with TB fetching from POP concurrently
with itself?  Over the years, I've often had an "every 2 minutes" or "every
one minute" automatic setting and have manually done a fetch without
worrying about a race condition.  Also, sometimes I hit Shift-F5 to fetch 
from all POP servers to all accounts, and then hit it again if it seems like 
some or all accounts didn't start fetching.  It has always seemed to handle
it cleanly.  Either fetching or ignoring me because it is already fetching.
But I might have just not noticed that there was a problem.  Do I need to 
be more careful about race conditions?

In any case, I have the auto-fetch turned off for this account now, and 
will pay more attention to not causing races.

--Fred Stluka
I am not sure if it is a problem, if TB initiates concurrent downloads and breaks. I'd hope there is some protection.
Component: General → Networking: POP
Product: Thunderbird → MailNews Core
QA Contact: general → networking.pop
Blocks: 754301
As recommended by quantum.projects@yahoo.com in:

https://bugzilla.mozilla.org/show_bug.cgi?id=754301

I have now disabled Test Pilot, which was previously enabled.

Hard to tell yet whether it has any effect.  I've only seen the bug
once in the past several days since I moved from 12.0.1 to 11.0.

--Fred Stluka
The Test pilot disabling was for solving his column resetting problem.
It should not be relevant here. But of course, it is always best to disable extensions you do not use so that there are not unexpected factors in play (and less memory usage :)).
I have not seen this problem for over a week.  It may have been fixed by
one of the changes I've made:

- Deleted all *.msf files and let them re-create.  (BTW, they re-created
   REALLY fast, instantaneously as far as I can tell, certainly MUCH faster
   than they did a few years back.  Even with hundreds of mail folders, 
   some containing thousands of messages from the past 10 or more 
   years.  Nice!)

- Removed autofetch every 2 minutes from POP server, so it won't 
   start fetching again in the middle of a fetch.

- Disabled Test Pilot plugin

- Turned off "Enable adaptive junk mail controls for this account"

I suspect that the relevant change was deleting the *.msf files.

I plan soon to upgrade from 11.0 to 12.0.1 again, and immediately
delete the *.msf files again.  Hopefully, the problem is gone, and was
caused only by some corruption of the *.msf files.  I'll report back 
after a few days with my results.

Thanks for all the info and suggestions!
--Fred Stluka
Bad news!

I've lost another 8 hours worth of e-mail.  Here's what I did:

  - Moved back to 12.0.1 from 11.0
    - Compact all folders
    - Close TB
    - Delete all *.msf files
    - Open TB, allowing to recreate the *.msf files.
    - Upgrade via About box to 12.0.1
    - Close TB
    - Delete *.msf
    - Open TB, allowing to recreate the *.msf files.
    - Fetch from POP servers.

  - Seems fine so far.  One quirk:  I had to click on each folder (or move to it 
     via an arrow key) to get them to rebuild the counts.  Some had blank new 
     and total message counts and only the size column was updated during 
     the fetch from POP.

  - Waited till the next morning, to tell messages accumulate at the POP
     server.

  - Fetched from POP at 8:24am and lost a bunch of messages.
  - Over 1000 messages were reported as downloaded.  However, starting with
    messages at 00:47 and for all remaining messages in that fetch from POP,
    this is what happened:
    - The "Downloading message nnnn of nnnn" messages continued to show,
      counting up to the expected number of messages, and counting at the 
      same rate as though the messages were being downloaded correctly.
    - The message counts in the mail folders stopped increasing.
  - I then checked the folders and found that no messages after 00:47 were 
    present.
  - I checked the raw mbox files.  The messages are not there either. 
  - I used grep to recursively search the entire Thunderbird profile for
    a sample subject line.  It was not found in any of the files of the 
    profile (so the messages are not in a different folder or anything
    like that).
  - These messages were simply discarded as they were fetched from POP.

  - Fetched from POP via my other Mac running Thunderbird 8.0, and 
    correctly received all of the messages, including those lost above
    on the Mac running Thunderbird 12.0.1.

  - Fetched from POP via TB 12.0.1 again 8 minutes later, and correctly 
    received those 8 minutes worth of new messages, but none of the 
    missing ones.

  - As I said above, a week or two ago, I set up a cron job on a server 
    to send e-mail once per minute.  I have a TB filter to filter those
    incoming messages into a folder, which is how I know so precisely
    which messages are missing.  Today, I made a copy of the mbox and
    *.msf file for that folder, for debugging purposes.  I'll add them
    as attachments to this ticket.

Any suggestions?

Thanks!

--Fred Stluka
If you go into the config editor (tools, options, general, advanced, config editor button) and type dup_action, do any non-0 values come up?
(In reply to David :Bienvenu from comment #23)
> If you go into the config editor (tools, options, general, advanced, config
> editor button) and type dup_action, do any non-0 values come up?

No.  I see:

mail.server.default.dup_option    default    integer  0

--Fred Stluka
In the 12.01 folder you've attached, the first quarter or so of the messages are marked as deleted (the x-mozilla-status of 0008 or 0009 indicates deletion). Do you have filters or spam detection turned on?
(In reply to David :Bienvenu from comment #25)
> In the 12.01 folder you've attached, the first quarter or so of the messages
> are marked as deleted (the x-mozilla-status of 0008 or 0009 indicates
> deletion). 

Those messages were deleted manually by me.  Here's what you 
should see in the file, going by the timestamps in the subject
lines of the messages:

2012_05_22__16_04_01 through
2012_05_22__18_13_01
- Status 0009.  I confirmed that these messages had arrived and
   I manually deleted them, but the folder was not yet compacted.

2012_05_22__18_14_02
- Status 0001.  I read this, but did not delete it, leaving it as a 
   marker for which message to expect next.

2012_05_22__18_15_01 through
2012_05_22__22_44_02
- Status 0000.  As you can see from the "From - " line, these were 
   fetched at 22:44.  I had not yet read or deleted any of them.

2012_05_22__22_45_01
2012_05_23__00_46_01
- Status 0000.  As you can see from the "From - " line, these were 
   fetched the next morning (today) at 08:24.  This is the POP fetch 
   that failed.  It got this many messages (plus one more -- see below)
   and then lost the rest.

2012_05_23__00_47_02
- Status 0001.  This was the last message fetched by the failed fetch
   at 8:24.  I read it, so it is status 0001, not 0000.

2012_05_23__08_25_01 through
2012_05_23__08_32_01
- Status 0000.  These are the messages fetched 8 minutes later, at
   08:32.  They arrived fine, but the previously missing messages are 
   still missing.

Make sense?

> Do you have filters ... turned on?

I do have many filters enabled, moving incoming messages to various
folders, but these messages did not arrive in any folders.  I do NOT 
have any filters defined to delete messages.  I am attaching my 
msgFilterRules.dat file for you to see.  These filters all generally work 
reliably, except when this problem occurs.

When the problem occurs, the ongoing POP fetch stops putting 
messages properly into any folder, and seems to just discard all 
remaining messages from that fetch.  It's like a buffer overrun or 
some other timing or capacity issue that only happens when there 
are hundreds or thousands of messages being pulled from the POP 
server in one batch.  Partway through the fetch, the messages start 
getting discarded, and once that happens, all remaining messages
from that fetch are lost.

> Do you have ... spam detection turned on?

As far as I know, I do not.  How can I be absolutely sure?

Here's what I know:

- The lost messages are all at the POP server.  I can see them via 
   the Web mail UI to that POP server.  I can also fetch those same 
   messages to TB 8.0 on another Mac.  So, I know that they were 
   not filtered on their way to the POP server.  Any filtering would 
   have to be being done by TB during the POP fetch.
   
- The lost messages are from a variety of different senders, with 
   different contents, etc., and are intended to be filtered into a 
   variety of different folders in TB.  That generally works fine, except
   when this problem occurs and causes the entire tail end of a fetch 
   from POP to be lost.  It also works fine for the same messages on 
   TB 8.0 on my other Mac.  So, I'm pretty sure it's nothing that is 
   particularly spammish about these messages.  

During a POP fetch, all messages are coming in correctly, then 
something goes wrong and all further messages are silently 
discarded.  Doesn't feel like a spam filter problem.  However, if
you tell me how, I'd be glad to check for certain that TB is not 
configured to filter spam.

Have you tried to reproduce the problem?  If you set up a mail 
address somewhere, I'll be glad to automate something to send lots
of messages to that address, so you can test whether a POP fetch 
fails for you when fetching thousands of messages.

Thanks!
--Fred Stluka
I've lost more messages a couple times since this morning.  Seems to
be a pretty serious and repeatable problem with 12.0.1.

So, I switched back to 11.0 and deleted all the *.msf files again:
  - Close TBird
  - Delete Applications/TBird:
     % rm -Rfv /Applications/Thunderbird.app
  - Delete all *.msf files
     % cd ~/fred/Mozilla/TBird/Profile/mail
     % find . -name \*.msf -exec rmv "{}" ";"
  - Install version 11.0:
     % open ~/Setup/Mozilla/Thunderbird/Thunderbird11.0.dmg 
     - Dragged Thunderbird icon to Applications icon
     - Closed window
     - Ejected Thunderbird11.0.dmg 
  - Confirm version number
    - Yes.  Now 11.0
    - Messages all look intact.
    - Had to go to each folder to get the counts all recomputed.  Some of
      them were low, especially recently fetched ones like the PingIn folders.

I'm not sure what to do next.  I can't afford to keep silently losing e-mails
and it's a lot of work to keep checking from my other Mac and reconciling
things to see of anything is missing.  I now stuck on 11.0 until I hear that 
this problem is resolved.  Not good...

I'll let you know if it I notice more missing messages with 11.0.  Last time
I reverted to 11.0, I lost a batch, but after deleting the *.msf files I never 
lost any more for at least a week.

--Fred Stluka
The number of filters shouldn't matter; filters are applied synchronously to each message, i.e., we don't process the next batch of downloaded data until we finish the previous message.

When these messages are getting downloaded, are you doing anything in the UI or is the download "unattended"? Is it possible that you have any backup software or anti-virus/spam program that might be occasionally preventing Thunderbird from writing to the mail folder? I realize this is less likely on the mac...

Oh, one thing you can look at is the filter log (bring up filters ui, click on filter log, make sure logging is enabled, and then look at a log of what happens when this bug occurs, to see if there's any info about the missing messages in the filter log)
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to David :Bienvenu from comment #29)
> When these messages are getting downloaded, are you doing anything in the UI
> or is the download "unattended"?

Usually, I just sit back and watch the message counts for the folders to 
see if they stop increasing.  Hands off until the download finishes.
Sometimes I've started reading messages, but no failures have yet
occurred when I was doing so.

> Is it possible that you have any backup
> software or anti-virus/spam program that might be occasionally preventing
> Thunderbird from writing to the mail folder? I realize this is less likely
> on the mac...

No.  Backups are not running, and there is no virus scan.  There should be
no contention for the mail files.

> Oh, one thing you can look at is the filter log (bring up filters ui, click
> on filter log, make sure logging is enabled, and then look at a log of what
> happens when this bug occurs, to see if there's any info about the missing
> messages in the filter log)

OK.  I've enabled the filter logging.  I'll look there if it happens again.
I'm on version 11.0 now with newly generated *.msf files, so it may not
happen again.  Too bad the logging wasn't already on, eh?

--Fred Stluka
If you could try this with 12.0 and your chron job, it would be extremely helpful. I understand it's painful, but we haven't been able to reproduce it here...
I'm seeing this bug as well, on my Windows machine using 12.0.1.

I was away for a week, and when I opened up Thunderbird it downloaded 475 messages but only 6 appeared in my mail boxes!

I also have Thunderbird 12.0.1 on my MacBook Pro, and it is *not* having issues.  One difference is that I have many filter rules on my Windows machines, but all my messages (total 36000) on the Mac go into the Inbox folder.  Another is that I only have one account set up on the Mac, versus 6 on Windows.

I did a search for one of the missing messages' subjects in the mail folder, but didn't find it.

None of the missing messages show up in my filter log, either.  I did notice an odd thing in the filter log, however, for messages that *did* make it through -- they had a bad time and no description.  For example:

Applied filter "<filter_name>" to message from - at 12/31/1969 7:00:00 PM moved message id = to mailbox://<email_address>/<top_folder>/<sub_folder>

The popstate.dat file marks all the missing files as 'k' -- keep message on server.  If I delete the entry for a missing file, it will successfully download it later.

Andy
I'm not sure if it's relevant, but there's another difference between the Windows 7 x64 installation and the MacBook Pro:  The mail folder on Windows is 2.3 GB, and the one on Mac is only 1.03 GB.  Possibly there's an issue at 2GB?
Fred, how large is the inbox file on disk for you? Andy, from other reports, my suspicion is it's more likely to be the filters than the total mailbox size. Also, are all your messages on windows filtered into other folders?
On my windows machine, almost all messages are filtered into other folders.
Some are also marked as junk.
(In reply to Andy Hendrickson from comment #32)
> Applied filter "<filter_name>" to message from - at 12/31/1969 7:00:00 PM
> moved message id = to mailbox://<email_address>/<top_folder>/<sub_folder>

Bug 754301 comment #43 reprts similar funny filter log.
I have a snippet from a very verbose log showing retrieving of 3 messages, including the last one to succeed and the next two, which didn't.

I used

set NSPR_LOG_MODULES=all:5,timestamp:5

The one thing that stands out is that there are a bunch of reads and some writes after the email has been downloaded.  No idea why, though.

I'd be willing to mail the zipped file to someone, but it has some personal information that I'd rather not post publicly.

Andy
(In reply to Andy Hendrickson from comment #37)
> set NSPR_LOG_MODULES=all:5,timestamp:5

all:5 will produce too large log file with non-useful data, and timestamp:5 is incorrect.
First, get log with timestamp,POP3:5 for 3 mail download, with filter move and without filter move(ordinal/normal download).
Is Tb's funny activity seen in log? What is difference from normal download?
No longer blocks: 754301
Depends on: 754301
(In reply to David :Bienvenu from comment #34)
> Fred, how large is the inbox file on disk for you? Andy, from other reports,
> my suspicion is it's more likely to be the filters than the total mailbox
> size. Also, are all your messages on windows filtered into other folders?

David, my Inbox file (the mbox file, not the entire contents of the 
Inbox.sbd directory tree) on the Mac is 0 bytes.  All of my mail gets
filtered into other folders.

--Fred Stluka
The first email that fails to be processed seems to process database opens oddly.

Normally, I'm getting messages like:

2012-05-26 19:41:41.664000 UTC - 0[d0f140]: nsMsgDatabase::Open(E:\Mail\worjl7rs.slt\Mail\Mail\Purchases.sbd\Ads.msf, FALSE, 7d2ae40, TRUE)
2012-05-26 19:41:42.039000 UTC - 0[d0f140]: 10 open DB's

In the first failing email, I get:

2012-05-26 19:41:44.896000 UTC - 0[d0f140]: nsMsgDatabase::Open(E:\Mail\worjl7rs.slt\Mail\Mail\Inbox.msf, FALSE, 42b6040, FALSE)
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]:     ...returned after 1 milliseconds
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketTransport::OnSocketReady [this=ceb20c0 outFlags=1]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketInputStream::OnSocketReady [this=ceb2158 cond=0]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: STS dispatch [ced1988]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: send: fd=4124300 osfd=652 buf=5ec5d5c1 amount=1
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: send -> 1
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketInputStream::Read [this=ceb2158 count=31232]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]:   calling PR_Read [count=31232]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv: fd=d30b620 osfd=1280 buf=d3102a4 amount=5 flags=0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv -> 5, error = -5998, os error = 0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv: fd=d30b620 osfd=1280 buf=d474000 amount=1568 flags=0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv -> 1568, error = -5998, os error = 0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]:   PR_Read returned [n=1536]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketTransport::SendStatus [this=ceb20c0 status=804b0006]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketInputStream::Read [this=ceb2158 count=29696]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]:   calling PR_Read [count=29696]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv: fd=d30b620 osfd=1280 buf=d3102a4 amount=5 flags=0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv -> 5, error = -5998, os error = 0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv: fd=d30b620 osfd=1280 buf=d474000 amount=1296 flags=0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: recv -> 1296, error = -5998, os error = 0
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]:   PR_Read returned [n=1272]
2012-05-26 19:41:44.896000 UTC - 4072[d0f640]: nsSocketTransport::SendStatus [this=ceb20c0 status=804b0006]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: nsSocketInputStream::Read [this=ceb2158 count=28424]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   calling PR_Read [count=28424]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: recv: fd=d30b620 osfd=1280 buf=d3102a4 amount=5 flags=0
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: recv -> -1, error = -5998, os error = 10035
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   PR_Read returned [n=-1]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: nsSocketInputStream::AsyncWait [this=ceb2158]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: STS poll iter [1]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   active [0] { handler=ceb20c0 condition=0 pollflags=5 }
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   calling PR_Poll [active=1 idle=0]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: poll timeout: 100
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:     timeout = 100000 milliseconds
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:     ...returned after 0 milliseconds
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: recv: fd=41242a0 osfd=636 buf=449f874 amount=1024 flags=0
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: recv -> 1, error = -5998, os error = 0
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: STS poll iter [1]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   active [0] { handler=ceb20c0 condition=0 pollflags=5 }
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:   calling PR_Poll [active=1 idle=0]
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]: poll timeout: 100
2012-05-26 19:41:44.897000 UTC - 4072[d0f640]:     timeout = 100000 milliseconds
2012-05-26 19:41:44.898000 UTC - 0[d0f140]: 10 open DB's

(Oddly enough, in the one before that message, I don't have a call to Open)

Thereafter, I get messages like:

2012-05-26 19:41:45.418000 UTC - 0[d0f140]: nsMsgDatabase::Open(E:\Mail\worjl7rs.slt\Mail\Mail\Inbox.msf, FALSE, 42b6040, FALSE)
2012-05-26 19:41:45.418000 UTC - 0[d0f140]: error opening db 80550006
2012-05-26 19:41:45.418000 UTC - 0[d0f140]: 10 open DB's

although there are sometimes a few (up to 5) messages in between those statements with the "4072[d0f640]" signature.
Hi all,
I often recognized after some weeks of holidays, that it seemed, the 1st bunch of downloaded messages was not complete, as some minutes later (next timed download) unusual much new mails were downloaded additionally. I never checked there over-all completeness by comparing with other mail software or by help of the web portal, as I always trusted TB in that point. But sometimes I recognize emails that appear up to some weeks too late in mailing lists, where I'm subscribed. Since today I believed, that this was the mailing list servers fault.
Just for your note, as maybe related to this bug.
Andy, thx for the info. That error is basically that the db/.msf file does not exist, which would imply that somehow it got deleted in the middle of fetching e-mail. But even that error shouldn't prevent us from fetching e-mails and storing them in the mail folder. Under options/prefences, security, anti-virus, is the check box that says to quarantine incoming messages checked? Same question for Fred...
(In reply to David :Bienvenu from comment #42)
> Under options/prefences,
> security, anti-virus, is the check box that says to quarantine incoming
> messages checked? Same question for Fred...

It was checked.  I've now turned it off.  However, I have no anti-virus
software installed on my Mac, so it should not have mattered.

--Fred Stluka
Interesting!  On my Windows box (with the problems), I had the box checked, even though Avast was set to ignore email.  I unchecked it and told Avast to ignore the mail folder -- then I was able to download > 400 emails at once!

On my Mac, which didn't have this problem, the box was unchecked all along.

Andy
I wrote a python script to check what emails were missing that Thunderbird thought were downloaded.  It has a couple of limitations (it can't tell the difference between emails that were deleted & then compacted vs. never downloaded, and it depends on the UIDL being in a particular format that Verizon uses), but I was able to determine what emails I needed to re-download.  Using this, I was able to get all my missing emails.

I'll upload the script, in case someone else can use or modify it.

You can pass a parameter to the script which is the index of the first UIDL that you care about, so you can ignore missing files (due to compacting) that are found before that point.

At the very least, you'll need to change the mail folder path in the script.

I ran it in Windows, under Cygwin.

Andy
I used this script to generate a list of UIDLs for emails that were nominally downloaded, but were not present in Thunderbird's folders.

It does not distinguish emails that were downloaded, deleted and compacted away, versus ones that were never saved after the download.

It also depends on a specific format of the UIDLs, used by Verizon, which has a unique incrementing number followed by a dash and then another number.  The algorithm would need to be changed for other formats.

You can pass in a number, which tells the script to ignore missing emails with a UIDL first number less than the parameter.  That way you can ignore older emails (any previous to a compaction).

Once you have a list of UIDLs, you can delete the lines containing them from the popstate.dat file.  That will cause Thunderbird to re-download them.

To use it, you also need to set the path to your mail folder.  It currently runs under Cygwin on Windows, so you will need to alter the path as well for other platforms (Mac, Linux).
(In reply to Fred Stluka from comment #43)
> (In reply to David :Bienvenu from comment #42)

> It was checked.  I've now turned it off.  However, I have no anti-virus
> software installed on my Mac, so it should not have mattered.

It matters because Thunderbird handles downloading differently if you have it checked. We download to a temp file instead of directly to the inbox, which means there are several subtle differences in the code when that checkbox is checked.

However, re Andy's log info, we shouldn't have to re-open the Inbox db for every message downloaded; we shouldn't see any open's for the Inbox.msf file once the download has started. So I'm still investigating that. I'm not seeing it here.
Andy, do any of your filters filters messages to a different Inbox? I.e., is nsMsgDatabase::Open(E:\Mail\worjl7rs.slt\Mail\Mail\Inbox.msf, FALSE, 42b6040, FALSE) the inbox you're downloading to, or the target of a filter move for a different inbox?
David, I don't think any of my filters target my Inbox (I only have one).
Andy, oh, but you had Avast on. That complicates things a bit. I should say that our anti-virus quarantining checkbox isn't integrated with any anti-virus programs per se - it just tries to detect them messing with files we write, and give us a chance of them not deleting the whole inbox. I would be interesting in seeing the log file you mentioned earlier, actually. I'll keep any personal stuff confidential, of course.
David, OK, I'll send you the piece that covers the first failure, as well as one before and another after it.

I'd rather not post it here (publicly).  Where should I email it?
(In reply to Andy Hendrickson from comment #51)
> David, OK, I'll send you the piece that covers the first failure, as well as
> one before and another after it.
> 
> I'd rather not post it here (publicly).  Where should I email it?

dbienvenu@mozilla.com, thx.
Log snippet sent.
not blocking widespread development or testing, so not a blocker.
however, is it preventing you from using latest daily?


(In reply to Andy Hendrickson from comment #53)
> Log snippet sent.

bienvenu, any interesting results?
Severity: blocker → critical
Keywords: dataloss
I was able to prevent the bug from occurring by going to Tools -> Options -> Security -> Anti-Virus, and unchecking the box that allows AV clients to quarantine individual incoming messages.
I think this is a dup of bug 754301. The log showed us closing the db in the middle of download, iirc, and that's what the fix in bug 754301 corrects.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
(In reply to Wayne Mery (:wsmwk) from comment #54)
> not blocking widespread development or testing, so not a blocker.
> however, is it preventing you from using latest daily?

Yes, it has been blocking me from using the latest version of TB.
I noticed the bug in 12.0.1, and reverted to 11.0 to avoid it.  When
I later moved back to 12.0.1, the bug recurred, so I reverted again
to 11.0.

Is the proposed fix included in 13.0.1?

Thanks!
--Fred Stluka
(In reply to Andy Hendrickson from comment #55)
> I was able to prevent the bug from occurring by going to Tools -> Options ->
> Security -> Anti-Virus, and unchecking the box that allows AV clients to
> quarantine individual incoming messages.

I made this change also, but haven't yet gone back to 12.0.1 from 11.0,
so I don't know if it had any effect.

Is the bug now fixed, so that this setting doesn't matter any more?  Or 
should I still keep this box unchecked even for future versions of TB?

Thanks!
--Fred Stluka
(In reply to David :Bienvenu from comment #56)
> I think this is a dup of bug 754301. The log showed us closing the db in the
> middle of download, iirc, and that's what the fix in bug 754301 corrects.
> 
> *** This bug has been marked as a duplicate of bug 754301 ***

Sounds promising.  What version should I download to test it?

Thanks!
--Fred Stluka
13.0.1 is the shipping version with the fix.
No longer depends on: 754301
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: