Closed Bug 1742975 Opened 1 year ago Closed 8 months ago

Bug 1734847 (.msf corruption) NOT FIXED on beta. (NOT VERSION 91.x)

Categories

(MailNews Core :: Database, defect, P1)

Thunderbird 95
x86_64
Windows 8.1

Tracking

(thunderbird_esr91 unaffected, thunderbird99 wontfix, thunderbird100 affected, thunderbird102 fixed)

RESOLVED FIXED
103 Branch
Tracking Status
thunderbird_esr91 --- unaffected
thunderbird99 --- wontfix
thunderbird100 --- affected
thunderbird102 --- fixed

People

(Reporter: j.r.andresen, Assigned: benc)

References

()

Details

(Keywords: dataloss)

Attachments

(6 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0

Steps to reproduce:

Mail corrupted. Multiple(now as many as 5) messages appear in single entry. Things are getting worse instead of better.
TB still crashing on a regular basis. Inbox repair doesn't have any affect on issue(s).

Expected results:

Seems as though the SOM(start of message) and EOM(end of message) aren't being processed correctly.

Component: Untriaged → Database
OS: Unspecified → Windows 8.1
Product: Thunderbird → MailNews Core
Hardware: Unspecified → x86_64

bug 1742049 comment 1 reports "You'll have to preform Repair Folder to get the .msf file corrected."

Is 95 still generating bad messages after doing that?

Flags: needinfo?(j.r.andresen)

95.0b4 still is.

Repair doesn't correct it.

Flags: needinfo?(j.r.andresen)

I think TCW is also seeing this

Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: dataloss

I am confirming it as well. It's not as bad as it used to be. Meaning, it's not doing it on EVERY message.

I was able to repro this way using my GMail account:

  1. Send yourself a test message with something in the Subject field and something in message body window. It should arrive in your Inbox intact and un-mangled. This will be our 1st test message
  2. Go to your Sent Mail folder, click on the sent Test message you just sent yourself and do an Edit As New Message action to it
  3. Re-Send the same test message once more to yourself so that now you have two of the same test messages in your Inbox
  4. Go to your Inbox now and view the new (2nd) test message you just re-sent. It should appear ok and un-mangled
  5. Go view the 1st test message you sent. It should appear mangled now

In my case, the original 1st test message is now mangled but the 2nd one is still ok

Severity: S2 → --
Status: NEW → UNCONFIRMED
Ever confirmed: false
Keywords: dataloss
Summary: Bug 1734847 NOT FIXED → Bug 1734847 (.msf corruption) NOT FIXED
Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: dataloss
See Also: → 1734847

J R, Aureliano,

does comment 4 match your steps to reproduce?

Flags: needinfo?(j.r.andresen)
Flags: needinfo?(euryalus.0)

Hi Wayne.
Comment 4 is not my STRs. I don't have a clear STRs. This happens to me not in my google accounts but on my microsoft.outlook365. I have noticed that it happens if, while I am doing a search, many messages arrive at the same time in my IMAP Inbox.
Aniway "Repair Folder" not fix the issue at all.

I have tried with a fresh new profile with TB95.0b4 and I had the same problem.
I have tried with a fresh new profile with TB91.3.2 and I didn't have any problems.

Flags: needinfo?(euryalus.0)

I can confirm I can reproduce corrupted results using the steps in comment 4.
I have 384 emails today with dozens containing the corruptions.

Flags: needinfo?(j.r.andresen)

(In reply to J R Andresen from comment #7)

I can confirm I can reproduce corrupted results using the steps in comment 4.
I have 384 emails today with dozens containing the corruptions.

Are the ones for which you see the corruption from the same sender? Or is it randomly spread out? Meaning, are you seeing any kind of pattern?

Odd as it sounds, my STR from comment 4 seemed to indicate (to me) that something to do with the same sender, same subject and same message body content might have been triggering this issue. But today I got a slew of messages with subject "[Bug 1714846] High CPU consumption when downloading multiple .mp4 files" for which I thought I would certainly see some corruption (same sender, same subject) in the descending / older bunch of message but....nothing. Could the variable here be that the message body is different so it doesn't trigger?

So what's the commonality here and how were I and J R Andersen able to repro so easily?

they are from many senders, many subjects. Todays are from commercial advertisers. I can forward you a few if you like. Let me know where?

(In reply to J R Andresen from comment #7)

I can confirm I can reproduce corrupted results using the steps in comment 4.
I have 384 emails today with dozens containing the corruptions.

Is this also what is causing your crashes?

Flags: needinfo?(j.r.andresen)

(In reply to J R Andresen from comment #9)

they are from many senders, many subjects. Todays are from commercial advertisers. I can forward you a few if you like. Let me know where?

No, no need to send. By chance are you seeing crashes like in bug 1742590?

What's the next possible item to focus on, or thing(s) we need more information about?

Sorry to use the word "mess", but it is pretty clear from https://mzl.la/3xQU1Dl (this is just version 95 bug reports), even though lately there hasn't been chatter in https://thunderbird.topicbox.com/groups/beta

  • bug 1734157 Thunderbird beta loses e-mails. Going back to regular doesn't recover lost e-mails
  • bug 1740319 Loading some messages from IMAP server cause TB 95.0b1 & 95.0b2 OOM, @ OutOfMemory crash
  • bug 1740486 Attempting a Repair Folder operation fails
  • bug 1740846 IMAP mails displaying other emails in message body
  • bug 1741004 After deleting message and compacting folder, restores duplicated mail
  • bug 1741874 Unified (virtual) Sent folder fails to search in GMail, it gets unchecked
  • bug 1742321 Crash [@ OutOfMemory | large ]
  • bug 1742590 If running a Repair Folder operation and attempting to shut down, crash @ mozilla::`anonymous namespace'::RunWatchdog / @ nsMsgAttachmentData::~nsMsgAttachmentData occurs, @ MimeInlineTextHTMLParsed_parse_line
  • bug 1742782 TB 95.0b4 64bit consumes very large amounts of RAM (up 8G and increasing)
  • bug 1742794 when compacting the INBOX of an IMAP-mailbox the nstmp-file will reach up to 550 GB
  • bug 1743388 Drag & Drop of collapsed email thread from Message List to a Folder not always transferring the entire thread of message when Keep Filter and Quick Filter are applied
  • bug 1744080 Getting daily notification of compacting folders
Flags: needinfo?(benc)
Priority: -- → P1

(In reply to Arthur K. [He/Him] from comment #11)

(In reply to J R Andresen from comment #9)

they are from many senders, many subjects. Todays are from commercial advertisers. I can forward you a few if you like. Let me know where?

No, no need to send. By chance are you seeing crashes like in bug 1742590?

My crashes occur when I simply select a message. It's random occurrence, at times I can repair the inbox and move on. Other times it will continue to crash on the same message selection.

Flags: needinfo?(j.r.andresen)

(In reply to J R Andresen from comment #14)

(In reply to Arthur K. [He/Him] from comment #11)

(In reply to J R Andresen from comment #9)

they are from many senders, many subjects. Todays are from commercial advertisers. I can forward you a few if you like. Let me know where?

No, no need to send. By chance are you seeing crashes like in bug 1742590?

My crashes occur when I simply select a message. It's random occurrence, at times I can repair the inbox and move on. Other times it will continue to crash on the same message selection.

Hmm, that is different than mine for sure. Do you have a recent crash report from Help > More Troubleshooting Information > Crash Reports for the Last 3 Days that you can post a link to?

Attached file reports

Crash Reports for the Last 3 Days
Report ID Submitted
bp-1146f5bc-f927-483a-9222-9e6d71211204 3 hours ago
bp-a8abe09a-7606-465b-a7e0-0b24c1211203 3 hours ago
bp-a731d87b-fb29-464a-b14a-89c8d1211203 3 hours ago
bp-b8c03cea-e92a-4c8a-8e27-d44851211203 4 hours ago
bp-f86bbd72-cf84-418d-afdd-8044c1211203 12 hours ago
bp-ce0dc048-03aa-493a-85ba-8739e1211203 12 hours ago
bp-6bdc7433-4c50-4a66-b994-50dd41211203 12 hours ago
bp-c8c2ea07-584c-4ff6-931b-8f8001211203 13 hours ago
bp-c5678646-39d4-43c0-be08-a60cd1211203 23 hours ago
bp-6f1f32b9-e47d-417f-bdec-e269f1211203 23 hours ago
bp-c150786c-fb5d-4029-aac2-7269f1211203 24 hours ago
bp-5e4690b1-a6cb-4bee-aa7e-ec2231211203 24 hours ago
bp-e843a298-ce49-4414-bf0a-9de641211203 1 day ago
bp-47258a62-6d32-408d-9095-009361211202 1 day ago
bp-8fbf5f05-6a5d-4d62-89c9-db40e1211202 1 day ago
bp-2b00d1c4-8218-4966-80b3-718091211202 1 day ago
bp-76d2f650-f0c4-4c03-b0fd-947731211202 1 day ago
bp-0fce0cda-bf8b-423c-96e2-ee9eb1211202 1 day ago
bp-7f66279c-a07d-45da-a7e5-fb7141211202 1 day ago
bp-93153d71-39b3-4333-853b-20eb71211202 1 day ago
bp-957cfbc3-f4b6-46dc-85c8-6858c1211202 1 day ago
bp-090bdbc6-3657-40ef-a5f2-092041211202 1 day ago
bp-25f5686c-f839-4aeb-b21a-e7c261211202 2 days ago
bp-e23cf01b-aa79-4f39-98bc-60af51211201 2 days ago
bp-7ad7ba65-a8db-4f83-8624-21f8e1211201 2 days ago
bp-8a6595c7-e6c5-41fb-962f-66f9c1211201 2 days ago
bp-47099ee4-1bfd-48cb-9d07-339921211201 2 days ago
bp-33bd9ffc-c8d8-4be3-858c-a21e41211201 2 days ago
bp-3089301e-8b31-4e28-9cd0-083931211201 2 days ago
bp-c9b2dad8-a8e5-4bb3-a0fb-939611211201 2 days ago
bp-01ad2414-51b3-4ea1-b736-278c61211201 2 days ago
bp-d01df61e-74e6-44f5-a8d9-ab7531211201 2 days ago
bp-9f2e45d6-cf36-4f60-955c-348ba1211201 2 days ago
bp-6866a90a-d6b5-4a27-91c6-386281211201 2 days ago
bp-a91ef240-5d08-4391-81f2-33f9c1211201 2 days ago
bp-1cbdd833-13b9-4bd7-98f1-c76781211201 2 days ago
bp-200e6841-449c-4a8f-b6b7-0abd41211201 2 days ago
bp-e0c926fa-621f-466d-a0e8-59fe31211201 2 days ago
bp-bf874292-a3c6-4e79-82bf-e794c1211201 3 days ago
bp-9e03cef8-c3d8-4563-bcc7-5c1301211201 3 days ago
bp-6848512e-20f6-4c36-b5a6-3261c1211201 3 days ago
bp-03ff1ebf-d847-4c04-a97b-f22151211201 3 days ago
bp-a4ade261-a7c9-4df3-99c5-892881211201 3 days ago
bp-0006bc6d-8dc3-49a2-950d-ca1721211201 3 days ago
bp-f627a622-ab6b-48cf-a15e-4abcc1211201 3 days ago
bp-333ccbe7-58b7-4658-99b3-7f3fb1211201 3 days ago
bp-5f87d9f2-1801-4fd8-bf1b-c3b721211201 3 days ago
bp-a84ca050-7d8b-4656-984f-01a501211201 3 days ago

A random sampling of these seem to point to something different than what I am seeing but most are "OutOfMemory | large or small" it seems. Is this 32-bit TB and 32-bit Windows 7?

https://crash-stats.thunderbird.net/report/bp-a84ca050-7d8b-4656-984f-01a501211201 (@ nsBidiPresUtils::TraverseFrames)
https://crash-stats.thunderbird.net/report/bp-1146f5bc-f927-483a-9222-9e6d71211204 (@ OutOfMemory | large)
https://crash-stats.thunderbird.net/report/bp-5f87d9f2-1801-4fd8-bf1b-c3b721211201 (@ OutOfMemory | small)
https://crash-stats.thunderbird.net/report/bp-e0c926fa-621f-466d-a0e8-59fe31211201 (@ mozilla::ArenaAllocator<1024,4>::Allocate)
https://crash-stats.thunderbird.net/report/bp-090bdbc6-3657-40ef-a5f2-092041211202 (@ mozilla::dom::FontFaceSet::UpdateRules)
https://crash-stats.thunderbird.net/report/bp-93153d71-39b3-4333-853b-20eb71211202 (@ mozilla::ArenaAllocator<8192,8>::Allocate)

64bit OS Windows 8.1 12GBmemory
32bit TB

additional observation:

When TB is started in windows, it's a 120Mb process as viewed via Task Manager.
I select an email message to view(that is only a couple Kb on the server) and the TB process size rapidly grows to over 2.5Gb then crashes.
I can repeat this using the same message over and over.

(In reply to J R Andresen from comment #19)

64bit OS Windows 8.1 12GBmemory
32bit TB

Yeah. it's going to be OOM crash city with 32-bit TB I would surmise. Any reason you haven't switched to 64-bit TB to leverage your system resources? Probably would help with the OOM but not with the actual bug I'm afraid.

I select an email message to view(that is only a couple Kb on the server) and the TB process size rapidly grows to over 2.5Gb then crashes.

I usually see this behavior only when I click on an unread and attempt to exit TB. I'm on 64-bit TB but my process will grow past 10GB and then crash.

JR,

I don't know if it's of any help to the devs here but do you know how to capture a perf profile?

Try this:

  1. CTRL-SHIFT-i to open Developer Tools. Accept the incoming Connection message when prompted
  2. in the Developer Tools window, press F1
  3. in the Default Developer Tools window, click the Performance check-box at top left. It will create a Performance tab at top middle of the screen
  4. Click on the Performance tab and you'll see a Start Recording Performance button. Don't click it yet
  5. Switch back to TB and click on the email that will start growing mem usage and cause TB to crash and then QUICKLY switch back and click the Start Recording Performance
  6. Record about 10 seconds and then stop the recording
  7. At top left, there will be a blue "Recording #1" box with a Save option next to it, click Save to save the recording to a JSON file
  8. Upload the JSON file to this bug report. You may have to .ZIP it as it might be large uncompressed

Note, even though I asked about the crashes, they will probably not be of great interest because they are just a symptom of the cause, which is (presumably) corrupted msf. I think the same could be said of performance info.

We need Ben or Magnus, or some other developer, to weigh in.

(In reply to Wayne Mery (:wsmwk) from comment #23)

Note, even though I asked about the crashes, they will probably not be of great interest because they are just a symptom of the cause, which is (presumably) corrupted msf. I think the same could be said of performance info.

We need Ben or Magnus, or some other developer, to weigh in.

I'm sure you're on point here. It's more for just validating what we probably already assume is the primary cause. More data probably isn't a bad thing though. You never knew when some other unrelated bug(s) gets uncovered as a result.

For what it's worth, I first saw this on October 27th. I keep trying to pull up the email where I first saw it happen to figure out what build I was running that day but it's futile. I'll keep trying.

Yes, it's entirely possible there is more than one bug here. I'm just suggesting we don't need to kill ourselves collecting data just yet.

I wasn't able to reproduce it using the steps in Comment 4 :-(

A few questions for Arthur:

  1. Did those steps work reliably, or did you have to perform them a bunch of times to get it to happen? (I tried a few times, but no dice).
  2. Just to confirm: I'm assuming it was via IMAP, right?
  3. Would you be able to send me the tail end of the mbox file for your inbox? Whatever you feel comfortable sending, but at least the last bit containing those two messages (i.e. including the one that appears corrupted)?

The mbox file will be somewhere like <your profile dir>/ImapMail/imap.gmail.com/INBOX. It's probably easiest if the corrupted messages were composed as text rather than html (easier to parse manually!), but anything is fine!

My current working theory is that there is some oddness with the "From " lines separating messages in the mbox file, so it'd be nice to confirm this and come up with a nice simple test case to fix.

Flags: needinfo?(benc) → needinfo?(thee.chicago.wolf)

Hello, I send you 2 examples at this adresse (benc-at-thunderbird.net)
Regards
Nicolas

(In reply to Ben Campbell from comment #28)

I wasn't able to reproduce it using the steps in Comment 4 :-(

A few questions for Arthur:

  1. Did those steps work reliably, or did you have to perform them a bunch of times to get it to happen? (I tried a few times, but no dice).

For me they worked reliably.

  1. Just to confirm: I'm assuming it was via IMAP, right?

Yes, IMAP.

  1. Would you be able to send me the tail end of the mbox file for your inbox? Whatever you feel comfortable sending, but at least the last bit containing those two messages (i.e. including the one that appears corrupted)?

I just bumped up to the test build of 96.0b1. I'll see if it still repros there. If not, I'll revert back to 95.0b5 and try to repro.

The mbox file will be somewhere like <your profile dir>/ImapMail/imap.gmail.com/INBOX. It's probably easiest if the corrupted messages were composed as text rather than html (easier to parse manually!), but anything is fine!

My current working theory is that there is some oddness with the "From " lines separating messages in the mbox file, so it'd be nice to confirm this and come up with a nice simple test case to fix.

I'll try to see if I can sequester the repro email files to a dedicated mbox and then try and send that to you. My Inbox is north of 1.4GB and there's data in there I cannot send outside my org's walls.

Flags: needinfo?(thee.chicago.wolf)

(In reply to Ben Campbell from comment #28)

I wasn't able to reproduce it using the steps in Comment 4 :-(

And now I am not able to repro either with 95.0b5 or 96.0b1. That's frustrating. I assume sending you a mangled email won't do much good?

Hello,
I specify that my data is on a disk d:
Every day my box gets corrupted. There seems to be memory leaks and / or my disk is working a lot.
Disk D: \ does not appear to be broken
I can send my destroyed INBOX (whith good examples) and my new INBOX directly to a developer (10Mo) (not on the forum)
Every day my box is corrupted
Nicolas

Sure, you can send it to me and Ben. (Please refer to bug 1742975 - this bug)

(In reply to Arthur K. [He/Him] from comment #31)

And now I am not able to repro either with 95.0b5 or 96.0b1. That's frustrating. I assume sending you a mangled email won't do much good?

Hmm... the symptoms in the original description match up so well with those in Bug 1734847. I thought the fix from that was included in 95.0b4, but comment 2 suggests it was still happening there :-(
I'm wondering if it was fixed in 95.0b4, but the effect was being masked by problems in folder repair? (Bug 1740486)

(In reply to Paour from comment #32)

I can send my destroyed INBOX (whith good examples) and my new INBOX directly to a developer (10Mo) (not on the forum)

Thanks Nicolas - I received your example files.

Which version of Thunderbird are you running? The mixed-up emails you're seeing do match Bug 1734847 symptoms.
I suspect that a repair folder would sort things out for you. You mentioned you weren't sure how to do that. Try this:

  • Right-click on Inbox and choose "Properties"
  • Click "Repair Folder"

(please excuse the English-centric instructions ;- )

(In reply to Ben Campbell from comment #34)

(In reply to Arthur K. [He/Him] from comment #31)

And now I am not able to repro either with 95.0b5 or 96.0b1. That's frustrating. I assume sending you a mangled email won't do much good?

Hmm... the symptoms in the original description match up so well with those in Bug 1734847. I thought the fix from that was included in 95.0b4, but comment 2 suggests it was still happening there :-(
I'm wondering if it was fixed in 95.0b4, but the effect was being masked by problems in folder repair? (Bug 1740486)

There could be these two possibilities as you say, but it's contingent upon others who've bumped to 95.0b4/b5 and subsequently run a folder repair with success to mostly eliminate bug 1734847 from the picture. I would LOVE to hear other user experiences as it relates to bug 1740486. I'm going to be a bit shocked if I am the only one having that problem.

I'm presently running a folder repair (that I'd intended to run last night) using 96.0b1 which I started at 9:11AM today. The repair/download operation is running far faster than with 95.0b5 and presently has about 12k of 49k+ left to process. After it's done, I hope to see a couple things: 1) no more mangled messages since bug 1734847 should be fixed and 2) no more CPU use after my currently running repair finishes.

If after finishing a repair operation the CPU use keeps going, I would also surmise that a perf profile against 96.0b1 won't do any more good than the ones I already submitted in bug 1740486?

Hello, I have a lot of "mixed" emails in my Gmail IMAP inbox every day, it started with TB 94b1.
I tried to reconstruct inbox many times, but it don't solved the issue.
Feel free to ask me more informations to help on this.

(In reply to Ben Campbell from comment #35)

Which version of Thunderbird are you running?
96.0b1 (and bug start with the last update 95.0b3 -> 95.0b4) see my duplicated bug https://bugzilla.mozilla.org/show_bug.cgi?id=1743856)

You mentioned you weren't sure how to do that. Try this:
No, I did this many time without effect (see https://bugzilla.mozilla.org/show_bug.cgi?id=1743856#c2)

As my connection is Imap, I also completely deleted INBOX and INBOX.msf .
The rebuild is OK at the start and after 10 ' (update of the subfolder), INBOX becomes corrupted.
I also specify that I have 5 mailboxes in Thunderbird and another mailboxes is corrupted

Don't hesitate to ask for tests, I'm available until Christmas

Nicolas

I'm presently running a folder repair (that I'd intended to run last night) using 96.0b1 which I started at 9:11AM today. The repair/download operation is running far faster than with 95.0b5 and presently has about 12k of 49k+ left to process. After it's done, I hope to see a couple things: 1) no more mangled messages since bug 1734847 should be fixed and 2) no more CPU use after my currently running repair finishes.

Well, that was a bust. CPU is still churning away. Anything you'd like me to try?

(In reply to Fernando Hartmann from comment #37)

Hello, I have a lot of "mixed" emails in my Gmail IMAP inbox every day, it started with TB 94b1.
I tried to reconstruct inbox many times, but it don't solved the issue.
Feel free to ask me more informations to help on this.

I forgot to mention that I'm now running TB 96b1
And starting in TB 95, I'm experiencing a lot o OOM crashes mainly during accessing emails in this corrupted mail boxes, some samples:

(In reply to Fernando Hartmann from comment #40)

(In reply to Fernando Hartmann from comment #37)

Hello, I have a lot of "mixed" emails in my Gmail IMAP inbox every day, it started with TB 94b1.
I tried to reconstruct inbox many times, but it don't solved the issue.
Feel free to ask me more informations to help on this.

I forgot to mention that I'm now running TB 96b1
And starting in TB 95, I'm experiencing a lot o OOM crashes mainly during accessing emails in this corrupted mail boxes, some samples:

I've seen this on my machine as well. Depends on the corrupted email though. Once I saved one of them and it turned into a 3GB .eml file so I imagine even on a well equipped and modern machine it could still OOM trying to bring up and even more corrupted one.

Hello,
I suggest this workaround for end users that use imap serveur :
1- close Thunderbird
2- open ImapMail folder in your profile

  • remove all msf file (i.e. imap.free.fr.msf ; imap1.free.fr.msf
  • for each subfolder, remove all files (i.e. imap.free.fr ; imap1.free.fr folder should be empty)
    3 - start Thunderbird
    ===========================================================
    For me, I suppose one of my mails in a subfolder was corrupted (Sent, any other folder).
    By deleting all the box I resolved the problem !
    If the workaround doesn't works
    1- uninstall Thunderbird
    2- export your calendar -> ics
    3- save abook.sqlite (adress book) from your profil
    4- remove your profile
    5-Install Thunderbird
    6- setup your profile and import your ICS calendar
    7- close Thunderbird
    8- replace abook.sqlite
    that's all ...

I tried today with TB 97.0a1 (2021-12-22) (64-bit) and I always encounter the same problem reported in related issue #1734847 (that is closed as verified...): mails are messed-up because one mail contains different mail body unrelated. Neither Repair folder and Neither deleting msf files (as stated in previous comment) solve the issue.

Is anyone addressing the issue? it's been 4 months.

It is acknowledged as a bad problem, so we were just discussing this at today's community meeting. The challenge is that a) we don't know which code changed the behavior and b) a developer has not been able to reproduce to issue, both of which would obviously help lead to a solution.

Those who can reproduce can help:

If that route doesn't get progress, then perhaps Ben can provide a special build.

(In reply to Wayne Mery (:wsmwk) from comment #45)

It is acknowledged as a bad problem, so we were just discussing this at today's community meeting. The challenge is that a) we don't know which code changed the behavior ...

Is that so? Looking at BMO references, this bug is a continuation of bug 1734847 which was regressed by bug 1728924. In fact, a backout of the latter was attempted, see bug 1734847 comment #33. I have the impression that code that was removed in bug 1728924 (https://hg.mozilla.org/comm-central/rev/2c8857af0eb3) was in fact needed. What's wrong in this line of argument?

The emphasis here has been on MSF and MSF corruption, but I do wonder if repair does not fix the issue if the actual storage is where the issue lies. Initial MSF corruption should be fixed as a new MSF is generated in the repair. I think someone needs to see the actual mbox store behind the msf to see if corruption, or duplication of message leader (FROM) information is the issue

Generally in support it is standard procedure to check if the issue can be replicated without antivirus scanning in the profile folder, or in the operating system's safe mode with networking. Corruption issues usually have an external cause. But that does not appear to have been investigated at all here.

Further checks include, the storage location used is a local internal drive and the profile location is in the default location. The local location must also not subject to streaming backups, cloud synchronisation or network storage. As all of those things have led to issues in the past, perhaps we can get these things clarified here.

Additionally, has the account local directory been modified from the default. No point having the profile locally if the directory is pointing to the document's folder (more common that it should be. The usual excuse if to facilitate backups) or some cloud synchronised location.

(In reply to newsfan from comment #46)

(In reply to Wayne Mery (:wsmwk) from comment #45)

It is acknowledged as a bad problem, so we were just discussing this at today's community meeting. The challenge is that a) we don't know which code changed the behavior ...

Is that so? Looking at BMO references, this bug is a continuation of bug 1734847 which was regressed by bug 1728924. In fact, a backout of the latter was attempted, see bug 1734847 comment #33. I have the impression that code that was removed in bug 1728924 (https://hg.mozilla.org/comm-central/rev/2c8857af0eb3) was in fact needed. What's wrong in this line of argument?

I'm not arguing against your point, but if you are correct I have these questions:

  • Which line(s) of that patch might be at fault?
  • AFAICT the problems didn't start until beta 95, but the patch shipped in beta 93. Why the gap in time between landing and reporting?

Given that it cannot be reproduced by a developer we need to be more creative and try something, i.e. anything, because this cannot continue. What is next to try, a try build with a backout to confirm that in fact this code block helps those who CAN reproduce the problem? If not, then what?

p.s. and once the problem is identified there is clearly a need for an automated test

Flags: needinfo?(benc)

AFAICT the problems didn't start until beta 95, but the patch shipped in beta 93.

The issue was first reported for TB 93 in bug 1730676 which was made a duplicate of bug 1734847. There is no gap in reporting. Strangely enough a developer reproduced the issue in bug 1734847 and the same person said: "I think I found a test case today" referring to this bug here:
https://thunderbird.topicbox.com/groups/developers/Tb67ca24581814a31-Mb72a225403851eda401d6e6d

Disclaimer: This is just assembling the published information, no own testing done. Maybe there are multiple issues. That mbox repair allegedly doesn't work any more is additionally worrying.

Hello,

  • AFAICT the problems didn't start until beta 95, but the patch shipped in beta 93. Why the gap in time between landing and reporting?

As I post in comment-40 I started to have this problems in 94b1.

Of course, as a non developer, I can have a naive opinion, but, I can imagine the difficulty to narrow down what causes the problem, but I really can't understand why using "Repair Folder" doesn't work !
At leas in my case, even right after use Repair Folder, the messages are downloaded but the mixing problem is still there, on the same messages that was mixed before !

Thanks for time !

Generally in support it is standard procedure to check if the issue can be replicated without antivirus scanning in the profile folder, or in the operating system's safe mode with networking. Corruption issues usually have an external cause. But that does not appear to have been investigated at all here.

For my part I can certify that I had no virus (or other external actions).
On the other hand, I have many sub-folders, and six e-mail accounts and after the incriminated update (I applied all of them), I happened to stop the Thunderbird process because it was too long (I'm a bad a bad user!).
This is probably the cause of the first corruption because I then think I have corrupted an email, or a file, and only my reset procedure (https://bugzilla.mozilla.org/show_bug.cgi?id=1742975#c42) solved the problem.
I haven't had any problem since (and I no longer stop the thunderbird process)
My external opinion , there are two approaches:
1- the initial cause of the problem
2- the eradication of the problem even with a stable version if an msf subfolder remains corrupted. Maybe offer an option to repair ALL msf folders

So, I don't know if it's of any help but I just got a spam message today and have been noticing within that the header is not being split out of the message.

Months ago when this issue first manifest, I feel like it was just pulling in the subject message from the succeeding message in Inbox. Today, I think I observed something that seems different than before. Today, I clicked on the spam message and saw that it is pulling in the header and subject from the >PRECEDING< message that wasn't even read yet.

That was a first. I used to have to read oldest to newest unread messages to see this. I am on 96.0b4 x64. I attached the message here if it's of any use to be analyzed. It seems like it's not knowing where the end of one message beings and the other one ends and just rolls it all into one.

OK, there's lots of different things all going on at once, so it's time for a bit of a recap.

  1. The patch in Bug 1728924 landed. This made some changes to the code that copies messages from elsewhere to a local mail store (ie to a local folder). There's always been an assumption that all local messages are in mbox files, and all the code assumes it has direct access to the raw file (which is problematic for all kinds of reasons). This patch removed some file seeks to loosen this assumption.
    Unfortunately, it turns out that the messageparser the copy code uses is reused without being reinitialised if multiple messages are being copied. The end result is that the .msf database records the wrong message offsets/size for subsequent messages.
    So, if multiple messages come in at once (from IMAP, say) and are moved to a local folder by a filter rule, the first one will be fine, but the others will look appear up (because of the borked offset/size in the .msf). The backing local mbox file should be OK though. If those screwed-up messages are then copied to another local folder, then the borked offset/size is used and the resultant mbox will contain screwed-up messages :-(

  2. The regression was tracked down and fixed in Bug 1734847. However, before this happened, a bunch of changes were made that rely on loosening the "everything is an mbox file" assumption that we're working toward. Changes like moving the protocol-independant message quarantining out of POP3 code (quarantining means just single messages get embargoed by anti-virus, rather than the entire folder).
    These changes are what caused the attempted backout of the Bug 1728924 patch to fail.

  3. Because the originally-bad patch of Bug 1728924 was out in the tree for a while without the Bug 1734847 fix... a bunch of people ended up with scrambled messages. This should fixable by "repair folder", but it looks like there are some issues there too (Bug 1740486). I'm not sure that is related - it doesn't happen to everyone, so it might just be that there was already a folder-repair bug for some messages, but the sudden rash of people doing folder-repair has brought more cases to light...
    Worth noting that folder-repair is a completely different operation for local folders than for IMAP. For local folders is just rebuilds the .msf file from the mbox file. For IMAP it re-downloads the messages.

Phew.

Next steps:
I'm pretty confident that Bug 1734847 fixes the Bug 1728924 regression. My suspicion is that most of the problems people are having now are due to a combination of data being borked before the fix went in, combined with folder repair not working as it should (very hypothetical example: maybe a badly-formatted message on an IMAP server throwing the folder repair into an endless loop).
So for now, unless we can nail down a replicatable case of new corruption in non-borked folders, I'm going to focus on Bug 1740486, and make sure folder repair is working properly.

[UPDATE: updated links to Bug 1734847, with the regression fix. They originally linked to this bug by mistake]

Flags: needinfo?(benc)

Ben, please edit the previous comment and use the correct bug numbers. Bug 1742975 is this very bug there.

I have a question about the .msf corruption. Does it matter where the mail store file and .msf file are located? I seem to have no problem if the files are in Local Folders, but I have the problem if the files are in my pop.att.yahoo.com directory (e.g., Ibbox, Sent, Drafts). Thanks.

(In reply to Arthur K. [He/Him] from comment #21)

(In reply to J R Andresen from comment #19)

64bit OS Windows 8.1 12GBmemory
32bit TB

Yeah. it's going to be OOM crash city with 32-bit TB I would surmise. Any reason you haven't switched to 64-bit TB to leverage your system resources? Probably would help with the OOM but not with the actual bug I'm afraid.

This might be a good reason to look into finally moving Thunderbird to 64 bit when possible, see:
Bug 1556748

(In reply to Wayne Mery (:wsmwk) from comment #48)
...

Given that it cannot be reproduced by a developer we need to be more creative and try something, i.e. anything, because this cannot continue. What is next to try, a try build with a backout to confirm that in fact this code block helps those who CAN reproduce the problem? If not, then what?

See above.

This is how one of my emails looks in Thunderbird. There is nothing in the top part (sender, etc.):

left:10px; padding-right:10px">
=20
=20
<!--[if !((mso)|(IE))]><!-- -->
<div class=3D"hse-column-container" style=3D"min-width:280px; max-wid=
th:600px; width:100%; Margin-left:auto; Margin-right:auto; border-collapse:=
collapse; border-spacing:0; background-color:#FFFFFF; padding-top:15px" bgc=
olor=3D"#FFFFFF">
<!--<![endif]-->
=20
<!--[if (mso)|(IE)]>
<div class=3D"hse-column-container" style=3D"min-width:280px;max-widt=
h:600px;width:100%;Margin-left:auto;Margin-right:auto;border-collapse:colla=
pse;border-spacing:0;">
<table align=3D"center" style=3D"border-collapse:collapse;mso-table-l=
space:0pt;mso-table-rspace:0pt;width:600px;" cellpadding=3D"0" cellspacing=
=3D"0" role=3D"presentation" width=3D"600" bgcolor=3D"#FFFFFF">
<tr style=3D"background-color:#FFFFFF;">
<![endif]-->
<!--[if (mso)|(IE)]>
<td valign=3D"top" style=3D"width:600px;padding-top:15px;">
<![endif]-->
<!--[if gte mso 9]>
<table role=3D"presentation" width=3D"600" cellpadding=3D"0" cellspacing=
=3D"0" style=3D"border-collapse:collapse;mso-table-lspace:0pt;mso-table-rsp=
ace:0pt;width:600px">
<![endif]-->
<div id=3D"column_1592509568105_0" class=3D"hse-column hse-size-12">
<table role=3D"presentation" cellpadding=3D"0" cellspacing=3D"0" width=3D=
"100%" style=3D"border-spacing:0 !important; border-collapse:collapse; mso-=
table-lspace:0pt; mso-table-rspace:0pt"><tbody><tr><td class=3D"hs_padded" =
style=3D"border-collapse:collapse; mso-line-height-rule:exactly; font-famil=
y:Arial, sans-serif; font-size:14px; color:#635951; word-break:break-word; =
padding:10px 20px 15px"><div id=3D"hs_cos_wrapper_module_15925095220262" cl=
ass=3D"hs_cos_wrapper hs_cos_wrapper_widget hs_cos_wrapper_type_module" sty=
le=3D"color: inherit; font-size: inherit; line-height: inherit;" data-hs-co=
s-general-type=3D"widget" data-hs-cos-type=3D"module"><div id=3D"hs_cos_wra=
pper_module_15925095220262_" class=3D"hs_cos_wrapper hs_cos_wrapper_widget =
hs_cos_wrapper_type_rich_text" style=3D"color: inherit; font-size: inherit;=
line-height: inherit;" data-hs-cos-general-type=3D"widget" data-hs-cos-typ=
e=3D"rich_text"><p style=3D"mso-line-height-rule:exactly; font-size:14px; l=
ine-height:175%; font-weight:bold"><span style=3D"color: #000000;">

It also has a pop up message. Will attach.

Summary: Bug 1734847 (.msf corruption) NOT FIXED → Bug 1734847 (.msf corruption) NOT FIXED on beta. (NOT VERSION 91.x)
Blocks: 1740319
See Also: → 1734157

I managed to circumvent this bug by unticking "Select this folder for offline use" for all my connected accounts inboxes and then repairing the msf index for all of them.

(In reply to thepcmaniaccc from comment #62)

I managed to circumvent this bug by unticking "Select this folder for offline use" for all my connected accounts inboxes and then repairing the msf index for all of them.

The same steps worked for me

See Also: → 1760931, 1759902, 1761549

Can confirm this is still happening in Thunderbird 100.0b1

Not sure if related, but I did a File:Compact Folders on my Inbox. It came up with one message (could have been more than one) that came in last week, but now was showing today at 2:03, which was the time I ran compact. I was actually looking for that particular message earlier today, and it wasn't showing at all, or was buried somewhere. Other messages are still coming out scrambled. Thunderbird 91.8.1 (64-bit), updated earlier today.

Hello, I am also getting the multi load emails. I did also want to point this out as I believe it is related. I have always moved emails off my exchange server to local folders. In the past the emails would remain the same size as they were on the exchange server. As of late the emails have increased dramatically in size. Example being on the exchange server the email might be between 10 KB and 100 KB. Then when moved to a local folder the email will increase above 500 MB. This can be any email that may only have a few words in it.

Happy to provide any other information that may help.

Thank you very much.

(In reply to MRGSER from comment #66)

Hello, I am also getting the multi load emails. I did also want to point this out as I believe it is related. I have always moved emails off my exchange server to local folders. In the past the emails would remain the same size as they were on the exchange server. As of late the emails have increased dramatically in size. Example being on the exchange server the email might be between 10 KB and 100 KB. Then when moved to a local folder the email will increase above 500 MB. This can be any email that may only have a few words in it.

Happy to provide any other information that may help.

Thank you very much.

And you're using TB 91.8.1? 32-bit? 64-bit?

(In reply to Arthur K. [He/Him] from comment #67)

(In reply to MRGSER from comment #66)

Hello, I am also getting the multi load emails. I did also want to point this out as I believe it is related. I have always moved emails off my exchange server to local folders. In the past the emails would remain the same size as they were on the exchange server. As of late the emails have increased dramatically in size. Example being on the exchange server the email might be between 10 KB and 100 KB. Then when moved to a local folder the email will increase above 500 MB. This can be any email that may only have a few words in it.

Happy to provide any other information that may help.

Thank you very much.

And you're using TB 91.8.1? 32-bit? 64-bit?

Hey Arthur, my apologies here as I should have stated the version. I have used the beta version of TB for many years now. Everything had always been perfect up to the last few months. My current version is 100.0b2 (64-bit). I do see b3 is available so I am going to change to that now.

Let me know if any other info would help.

Thank you once again.

(In reply to Worcester12345 from comment #65)

Not sure if related, but I did a File:Compact Folders on my Inbox. It came up with one message (could have been more than one) that came in last week, but now was showing today at 2:03, which was the time I ran compact. I was actually looking for that particular message earlier today, and it wasn't showing at all, or was buried somewhere. Other messages are still coming out scrambled.
Thunderbird 91.8.1 (64-bit), updated earlier today.

I thought the scrambled or multiple emails joined together was a post-91-beta specific issue. I don't know your version usage history, but If you were running a post-91 beta and then "downgraded" back to 91.8.1 you may need to repair the inbox or other problem folders or, if that doesn't work, remove the mbox and mbox.msf files for the problem folder(s) and let tb re-download and rebuild them.

(In reply to gene smith from comment #69)

(In reply to Worcester12345 from comment #65)

Not sure if related, but I did a File:Compact Folders on my Inbox. It came up with one message (could have been more than one) that came in last week, but now was showing today at 2:03, which was the time I ran compact. I was actually looking for that particular message earlier today, and it wasn't showing at all, or was buried somewhere. Other messages are still coming out scrambled.
Thunderbird 91.8.1 (64-bit), updated earlier today.

I thought the scrambled or multiple emails joined together was a post-91-beta specific issue. I don't know your version usage history, but If you were running a post-91 beta and then "downgraded" back to 91.8.1 you may need to repair the inbox or other problem folders or, if that doesn't work, remove the mbox and mbox.msf files for the problem folder(s) and let tb re-download and rebuild them.

Hey Gene, my apologies as I should have mentioned the version I am on. Currently I am using 100.b3 (64-bit). I have not downgraded yet as I keep thinking one of the updates will correct the problem. However I do not want it to get to out of hand and sadly may be forced to downgrade.

(In reply to MRGSER from comment #70)

(In reply to gene smith from comment #69)

(In reply to Worcester12345 from comment #65)

Not sure if related, but I did a File:Compact Folders on my Inbox. It came up with one message (could have been more than one) that came in last week, but now was showing today at 2:03, which was the time I ran compact. I was actually looking for that particular message earlier today, and it wasn't showing at all, or was buried somewhere. Other messages are still coming out scrambled.
Thunderbird 91.8.1 (64-bit), updated earlier today.

I thought the scrambled or multiple emails joined together was a post-91-beta specific issue. I don't know your version usage history, but If you were running a post-91 beta and then "downgraded" back to 91.8.1 you may need to repair the inbox or other problem folders or, if that doesn't work, remove the mbox and mbox.msf files for the problem folder(s) and let tb re-download and rebuild them.

Hey Gene, my apologies as I should have mentioned the version I am on. Currently I am using 100.b3 (64-bit). I have not downgraded yet as I keep thinking one of the updates will correct the problem. However I do not want it to get to out of hand and sadly may be forced to downgrade.

Please see bug 1740486 comment 19

It's the only thing that fixed this for me. I too am on 100.0 b3 but only after starting over circa 98.0 b2. Been running like a boss ever since and the issue has not returned. Up to you if you want to rip off the bandage or not.

Same here I guess. I am on a "post-91" version of Thunderbird, and also a 91 version, on two different computers. I may have gotten them mixed up. Thanks for pointing this out.

See Also: → 1746632

It still happens in TB 102.0b2 (64-bit, Windows 11) but repair-folder fix the issue now.
It worries me that a novice user might think TB is unusable by updating from 91. It should probably be treated as a UX-level blocking bug, considering most users probably don't even know they would have to repair the folder to restore the correct behavior.

I've had this happen to me in 102b3 on Windows 10. I've isolated two broken mails and their .msf file, as far as I can tell you can remove all unaffected mails from a folder and Compact the folder and this does not resolve the bug.

Some bits of info:

I'm not sure this is .msf file corruption. There appear to be two copies of the TARGET mail in the mbox file now. In Thunderbird, they each show with a different subject line, correspondents column, date, etc but they load the same data that's duplicated twice in the mbox. The data from the original mail seems to be gone(locally - this bug does not affect the server).

I can provide the .msf file and mbox file of the folder to someone working on this bug, which have been compacted down to only the two mails affected. But I don't want to upload them to Bugzilla as there's still some PII in there.

I don't know what is triggering this. I repaired my folder for the 2nd time and will see if it happens again.

On further investigation, while the message is duplicated in the mbox file, it's only partially duplicated and the 2nd message seems to start in the middle of the previous one's data with a busted boundary. The original message content does seem to be gone, but that could be a coincidence.

This is a folder that I (manually, without server side filters) move mails into more regularly than any other, so it's possible that something is breaking during the message move process.

Quick recap of my understanding of this issue:

I introduced a bug which caused the wrong message sizes to be recorded in the .msf when you copied multiple messages to a local folder. Copying single messages always worked fine.
It's possible that offline stores for an offline-backed IMAP folders were also affected, but I'm not sure (that uses totally different code paths to local folders).

This bug made it through to a beta version and the result was messages being displayed wrong (multiple messages mashed together etc), as per Bug 1734847. That bug landed a fix for the issue.

But it left people with corrupted folders - the mbox file was OK, but the .msf was screwed up.

After the fix was landed, repairing the folder should sort things out - the repair ignores the .msf file and just reparses the mbox (or re-downloads the messages for an offline-backed IMAP folder). The new .msf file should be all correct.

BUT.
While the .msf file was wrong, compacting the folder would likely screw up the mbox file too, as it would be running from the (corrupt) message boundaries in the .msf. And that's bad - actual data loss.

So, if the mbox file itself is borked (i.e it was compacted while the .msf was wrong) then likely the corrupted messages in there are unrecoverable. Folder repair should pick up the un-corrupted messages OK.

My theory at the moment is that the problems people are still having are a result of having run the bad beta version and ending up with a corrupt mbox file (i.e. due to compact being run while the .msf was wrong). For local folders there's not much we can do to remedy it :-( For IMAP folders, a folder repair should work - essentially throwing away the mbox file and re-downloading all the messages.

So, after my summary in Comment 78, what I want to find out is:

Are there still cases of messages being munged together with known-good .msf and mbox files?

Andrei: your comment 76 and comment 77 kind of hint that there might be a problem still somewhere. Seems like the best bet I've seen so far for isolating a case...
I'd love to take a look at the mbox/.msf files in which you isolated the offending messages - please drop me an email.
But I don't think I'll be able to tell how they got screwed up, only that they are screwed up.
I think a proper diagnosis will require going back to the original data served up by the IMAP server and built a repeatable case locally (and a unit test to go with it!)...

Assignee: nobody → benc

(In reply to Ben Campbell from comment #78)

Quick recap of my understanding of this issue:
My theory at the moment is that the problems people are still having are a result of having run the bad beta version and ending up with a corrupt mbox file (i.e. due to compact being run while the .msf was wrong). For local folders there's not much we can do to remedy it :-( For IMAP folders, a folder repair should work - essentially throwing away the mbox file and re-downloading all the messages.

The profile that experienced this problem never ran Beta 94, or any beta until 102. I was only using it for 91(and 78 etc previously).

(In reply to Ben Campbell from comment #79)

I'd love to take a look at the mbox/.msf files in which you isolated the offending messages - please drop me an email.
But I don't think I'll be able to tell how they got screwed up, only that they are screwed up.
I think a proper diagnosis will require going back to the original data served up by the IMAP server and built a repeatable case locally (and a unit test to go with it!)...

I'll do that. I'm not sure if building such a test case is currently possible because I certainly can't reproduce the problem on demand.

In my scenario I have created a new fresh profile for TB 102.0b2 (64-bit, Windows 11), and I have experienced the same issue.

MRGSER and others who have not yet commented about version 102, can you produce your problem when using a new profile, like comment 81?

Flags: needinfo?(MRGSER)

Ben, do you think it would be possible to instrument a build with any sort of debug output that would help with tracking it down, assuming we can provide that build to people who are experiencing the issue?

It seems pretty clear to me that this is an active issue and not merely bad data from before, so we need to treat it as such.

Flags: needinfo?(benc)

(In reply to Andrei Hajdukewycz [:sancus] from comment #83)

Ben, do you think it would be possible to instrument a build with any sort of debug output that would help with tracking it down, assuming we can provide that build to people who are experiencing the issue?

It seems pretty clear to me that this is an active issue and not merely bad data from before, so we need to treat it as such.

At this late stage in the game, you (devs) should probably consider asking some poor soul if they'd be up for a Windows Remote Assistance / Easy Connect session and see if you can work on a truly affected machine in real-time otherwise you'll be spinning your wheels forever on what the cause of this issue is.

And again, please see bug 1740486 comment 19 for my "fix" as it has not come back to bite me since I did all that is mentioned therein.

Can confirm, using a new profile and re-creating the accounts fixes the problem - Offline enabled folders are no longer getting corrupted.

(In reply to Arthur K. [He/Him] from comment #84)

And again, please see bug 1740486 comment 19 for my "fix" as it has not come back to bite me since I did all that is mentioned therein.

The big concern is whether or not corruption is happening on profiles that were not exposed to the previously broken version 94. If that workaround works for you, great! But that doesn't help us ensure that this won't happen to 100s of thousands of new users who never used beta.

We have at least two reports of it happening on new profiles or profiles that were only on 91 previously.

(In reply to Andrei Hajdukewycz [:sancus] from comment #86)

(In reply to Arthur K. [He/Him] from comment #84)

And again, please see bug 1740486 comment 19 for my "fix" as it has not come back to bite me since I did all that is mentioned therein.

The big concern is whether or not corruption is happening on profiles that were not exposed to the previously broken version 94. If that workaround works for you, great! But that doesn't help us ensure that this won't happen to 100s of thousands of new users who never used beta.

We have at least two reports of it happening on new profiles or profiles that were only on 91 previously.

Wayne and I share the same concern. I, as a user and not a Dev, know how to fix it myself but I get it, it's not remotely a panacea. I too anticipate this will bite some users once they flip the switch to bump from 91.x to 102. In their wisdom, I know Mozilla will still issue a few more 91.x releases once they have a clearer picture of what happens to those few who either manually update from 91 to 102 or get auto-updated to 102. For 8 or more months we've been going in circles. Could it be from that 94 beta? Most likely but good luck figuring out exactly which change caused it. A few have tried and come up empty including me. That a couple folks using 91 hit the issue doesn't mean they didn't try a beta and then ran back to ESR with an --allow-downgrade. Who knows.

Since no one can reliable repro the thing that causes this mail corruption, it's anybody's guess what'll happen. Expect the worst and hope for the best. I am confident that since going with the nuclear option I haven't seen it come back. I am cautiously optimistic that whatever it was in 94 got smoothed out and most 91.x users will have a good transition. We'll be here to help out those who get caught in the mire.

(In reply to Andrei Hajdukewycz [:sancus] from comment #80)

(In reply to Ben Campbell from comment #79)

I'd love to take a look at the mbox/.msf files in which you isolated the offending messages - please drop me an email.
I'll do that. I'm not sure if building such a test case is currently possible because I certainly can't reproduce the problem on demand.

Thanks - I've had a poke about at them. A few clues, but no smoking gun:

  • The mbox file is corrupted - it's the same message twice, with the first copy truncated.
  • The second message "From " separator appears midway through a line, which is invisible to any mbox parser ("From " has to start at the beginning of a line). So there will appear to only be one message in the mbox (with badly-formed MIME parts because of the corruption!).
  • the .msf file contains both messages.
  • The per-message messageOffset and offlineMesssageSize values in the .msf correctly match where the messages start and end in the corrupt mbox file.
  • however, for the first message, the .messageSize value is larger than the .offlineMessageSize value, which should never happen (for IMAP folders the first is the size of the raw message on the server, the second is the local copy, which might have extra X-Mozilla-... headers added).
  • It's an IMAP offline folder, right? So that's where we should focus, rather than local folders.

Pure speculation: it's like it tried to write the first message to the mbox, but failed failed partway through and just continued as if nothing had happened, writing the truncated .offlineMessageSize into the database.
I've been doing a lot of work to refactor mbox reading/writing code, and there is some very shonky error handling in there (which I'm trying to tighten up as I go).

(In reply to Andrei Hajdukewycz [:sancus] from comment #83)

Ben, do you think it would be possible to instrument a build with any sort of debug output that would help with tracking it down, assuming we can provide that build to people who are experiencing the issue?

Going by the case you sent me, I can probably add a check in to warn if .offlineMessageSize is ever smaller than .messageSize when copying an IMAP message. That'll throw up a flag that something bad has happened. Doesn't directly identify what's actually causing it... but maybe it'll throw out some more leads.

It seems pretty clear to me that this is an active issue and not merely bad data from before, so we need to treat it as such.

It does look that way :-(

(In reply to Ben Campbell from comment #88)

  • The per-message messageOffset and offlineMesssageSize values in the .msf correctly match where the messages start and end in the corrupt mbox file.

Probably a stupid question but were either of these messageOffset and offlineMesssageSize functions / things messed with in 93/94 betas?

(In reply to Arthur K. [He/Him] from comment #89)

(In reply to Ben Campbell from comment #88)

  • The per-message messageOffset and offlineMesssageSize values in the .msf correctly match where the messages start and end in the corrupt mbox file.

Probably a stupid question but were either of these messageOffset and offlineMesssageSize functions / things messed with in 93/94 betas?

Not specifically, but I have been painstakingly trying to refactor the local/offline message reading/writing to make it more robust and tractable (it's currently insanely complex, brittle and error-prone). So it's certainly possible that something has changed which has unintended consequences.
Out of interest, .messageOffset and .offlineMessageSize are attributes on the message header, as held in the folder database (.msf file). Their meaning was always a little ill-defined, but I did some archeology in Bug 1764857 and wrote up my findings there

(In reply to Ben Campbell from comment #88)

  • It's an IMAP offline folder, right? So that's where we should focus, rather than local folders.

Correct, all my folders are IMAP offline folders. That's also the default for IMAP folders, I believe.

Going by the case you sent me, I can probably add a check in to warn if .offlineMessageSize is ever smaller than .messageSize when copying an IMAP message. That'll throw up a flag that something bad has happened. Doesn't directly identify what's actually causing it... but maybe it'll throw out some more leads.

The reason I think this might be valuable is if we have a build that can even detect the issue at all, we can have someone, or many someones, move messages around folders in every way they can think of in every context they can think of until we turn up something that triggers it. Without a distinct error condition we rely on people visually seeing the corruption and reporting it, which is obviously much more error prone and time consuming, especially if large folders are needed to trigger this. It's entirely possible there are people with corruption they don't even realize is there.

I also discovered Bug 1773605 in the process of testing for this bug. It may be totally unrelated, but a data moving process that is extremely slow when it wasn't in 91 makes me pretty nervous in terms of added failure points.

From the two examples I've seen, it looks like x bytes of message X should be written followed by y bytes of message Y, but actually x bytes of Y and y bytes of Y were written. That suggests to me that two things are happening at once and getting confused. Could we somehow have one nsImapMailCopyState doing two jobs?

Sneaky edit: what I'm trying to say is the size of the first message appears to be correct, but the contents of it are the contents of the second message.

See Also: → 1773605

For those who can get it to eventually reproduce, does setting mailnews.downloadToTempFile true make a difference?

See Also: → 1741517

(In reply to Magnus Melin [:mkmelin] from comment #93)

For those who can get it to eventually reproduce, does setting mailnews.downloadToTempFile true make a difference?

Nope, the "Allows antivirus to quarantine single incoming message" setting I have been using for years and it has always been on since I started experiencing this issue starting with TB 94+.
After having encountered the same problem for the umpteenth time with TB 102 and a completely new profile, I fixed the error by repairing the folder and I also disabled the synchronization of the folder itself (as I have read in this ticket). It is now 3 days that I have not encountered the problem anymore. However, since I need to search the body of the email and without synchronization this search does not work for my imap account, I will reactivate the synchronization on the folder and report any future problems. In my experience, the problem occurs almost systematically in the following way (at least before TB 102):

  • I create a new profile;
  • I define the Imap account of interest (with about 140k mails on the server);
  • I set the synchronization of the Inbox folder to true;
  • while TB is downloading the emails and indexing them, I search to find an email from 2+ years ago; at this point I click on the email found and TB takes a long time to view it: when it appears, I find the email messed up.

As noted in comment 45, for anyone who can reproduce relatively quickly, a regression range would be great.
https://mozilla.github.io/mozregression/

I tried a similar test as [:Aureliano Buendía] in comment 94 but only have about 80k message in the folder.

gVIM editor crashed X when opening almost 1-Gbyte mbox file while downloading 80K messages, got about 25% done. On reboot and tb restart I see a bunch of these:

[Parent 2638, Main Thread] ###!!! ASSERTION: morkBool_kFalse: 'Error', file /home/gene/mozilla/comm/mailnews/db/mork/morkConfig.cpp:20

followed by a bunch of these:

[Parent 2638, Main Thread] WARNING: Missing .messageOffset (key=28116, storeToken='12345678'): file /home/gene/mozilla/comm/mailnews/db/msgdb/src/nsMsgHdr.cpp:438

Otherwise, after restart and continuing the download, I see no problems after searching for older messages.

I'm wondering if [:Aureliano Buendía] is seeing the same corruption as shown in first attachment here: bug 1759902. The header for an adjacent message appears after the expected message. If so, I also wonder if something in the displayed header is actually findable via search, like "X-Mozilla" or "From -" or "Message-id". I don't find via search or see any of these while downloading or after it finishes.

Info: after finishing the download, the mbox file is 5.1G and the msf file is 31.1M.

This patch adds some logging to nsImapMailFolder::CopyMessagesOffline():

$ export MOZ_LOG="BORK:5"

This code path is triggered when you manually move or copy messages to
offline-backed IMAP folders (I was just testing it with a subfolder of
INBOX on the same IMAP server).

I thought this was the most likely place to find the error.
However, frustratingly this code path doesn't seem to be used if the
copy/move is due to a filter (or even for the initial save-an-offline-copy
syncing). Sighs.
So I don't think this will catch the problem. My reading of the bug reports
suggests the bug probably occurs for filter-initiated move/copy (and maybe
even the inital offline sync), so I need to track down when that happens in
the code and instrument that too.
But still, this one is worth trying, just in case a manual move does trigger it!

Flags: needinfo?(benc)

Aureliano,

while TB is downloading the emails and indexing them, I search to find an email from 2+ years ago; at this point I click on the email found and TB takes a long time to view it: when it appears, I find the email messed up.

Not sure it matters, but exactly what type of "search" are you doing? Are you searching for "subject" or something in "body"?
How soon after starting the download do you do the search? I've seen that if the message is not yet downloaded to the synchronized folder, the search will turn up empty.

However, since I need to search the body of the email and without synchronization this search does not work for my imap account, I will reactivate the synchronization on the folder and report any future problems.

Note: I can search and find plain text items OK in the message body for not-synchronized folders. It uses the imap SEARCH command which I think is supported by all RFC 3501 compliant servers. Exactly what type of imap server are you using?

Anyhow, still unable to duplicate the problem by starting a new profile and letting it fully download all 80K message. Tried it several times. I'm doing this on my local dovecot server.

(In reply to [:Aureliano Buendía] from comment #6)

Hi Wayne.
Comment 4 is not my STRs. I don't have a clear STRs. This happens to me not in my google accounts but on my microsoft.outlook365. I have noticed that it happens if, while I am doing a search, many messages arrive at the same time in my IMAP Inbox.

Ok, I see after re-reading comment above that it might be server specific. I'll test again tomorrow with Ben new BORK patch on outlook365. (I think outlook has historically returned wrong lengths when fetching messages.)

(In reply to gene smith from comment #99)

Ok, I see after re-reading comment above that it might be server specific. I'll test again tomorrow with Ben new BORK patch on outlook365. (I think outlook has historically returned wrong lengths when fetching messages.)
Yep, I use microsoft.outlook365 and search for Sender, Recipients and Subject.

I also filed a bug, that seems to be a duplicate of the original bug in this topic. I am not sure if anyone was able to reproduce this consistently so I thought I'd post my reply to my own bug (same) here as well:

https://bugzilla.mozilla.org/show_bug.cgi?id=1773647

If someone still needs a mail to be able to reproduce this issue: let me know.

Hi I can confirm that after set to true the option to synchronized my imap folder, I have again some messed up email in my Inbox.

@Wayne:
I haven't used the Nigthly version for a long time. I remember that the first experiences with this problem, I met them with TB 94 b3 or TB 94 b4: in particular at the first update, I noticed that the search (sender-recipients-subject) was very slow. Only later did I realize that the slowness was due to malformed emails.
If you tell me which version of Daily corresponds to the TB 94 b1 and which version of Daily corresponds TB 94 beta 5, a couple of weeks from now, I could try to come up with a regression range in my spare time.

ps: I have refixed the issue setting off the synchronize folder (Inbox) and running repair folder.

I unfortunately have only partial understanding of all the technical details of the above comments, but I've noticed some possibly interesting stuff during my own problems with this bug, which I at least don't think have been brought up yet:
• When pressing Ctrl+U to open the source code of Gmail E-mails, the window title looks like Source code for: imap://imreeil42%40gmail%2Ecom@imap.gmail.com:993/fetch%3EUID%3E/INBOX%2E28496 and always shows multiple E-mails in the same source code, even for those E-mails that are displayed correctly in the beta channel.
• A longshot plausibility, is that the use of third-party profile/cache cleaners that support Thunderbird, for instance SpeedyFox, may or may not have deleted some profile stuff that it shouldn't have deleted.

(In reply to mark from comment #101)

I also filed a bug, that seems to be a duplicate of the original bug in this topic. I am not sure if anyone was able to reproduce this consistently so I thought I'd post my reply to my own bug (same) here as well:

https://bugzilla.mozilla.org/show_bug.cgi?id=1773647

Haven't been able to reproduce. Will try again with outlook365 imap server.
What is your imap server? How many messages in the problem folder? Have you done any searches, moves or filters that might have triggered the problem?

If someone still needs a mail to be able to reproduce this issue: let me know.

Thanks! You can send to me anything you think might be useful.

Are you sure this is an IMAP-only issue? We had cases where we moved messages from a local folder to a local folder, and upon visiting the local folder immediately afterwards using the "Location" widget on the toolbar (in fact, the enhanced version via the add-on Quick Folder Move), we found the folder corrupt, that is, the thread pane was showing with lots of messages without From/To/Subject, etc. Gremlins have entered the MSF backend, also causing bug 1774072. It wouldn't come as a surprise if all the weirdness that's experienced had the same root cause but leading to different effects depending on which MSF got corrupted. Maybe it's worth doing a stress test of moving many messages, preferably not manually, but instead via filter or whatever QFM uses under the covers. Ben, the reporter of bug 1773647 claims that he can reproduce the issue almost at will, so best to pass him a debug version.

(In reply to [:Aureliano Buendía] from comment #102)

Hi I can confirm that after set to true the option to synchronized my imap folder, I have again some messed up email in my Inbox.

@Wayne:
I haven't used the Nigthly version for a long time. I remember that the first experiences with this problem, I met them with TB 94 b3 or TB 94 b4: in particular at the first update, I noticed that the search (sender-recipients-subject) was very slow. Only later did I realize that the slowness was due to malformed emails.
If you tell me which version of Daily corresponds to the TB 94 b1 and which version of Daily corresponds TB 94 beta 5, a couple of weeks from now, I could try to come up with a regression range in my spare time.

We don't have a couple weeks to get a solution - but perhaps this list of bug queries will help:

  • 94.0b2 build 2 built October 8, 2021
  • 94.0b3 built October 15, 2021
  • 94.0b4 built October 20, 2021

I have a case where, for some unknown reason, mail in all of my folders from 03/13/2020 until 08/01/2021 disappeared. This included my POP3 Inbox, Sent, and local folders. I was running the daily version due to bug 1577548. Once that bug was resolved and made it into a production release, I went from Daily 91.0a1 21-11-30 to Production 91.3.2. This was 11/30/2021.

When I was running the Daily build (which I first installed 04/21/2021), I tried to fix the missing mail problems by

 1) restoring the files from a backup 08/09/2021 to \temp
 2) copying the restored files from \temp to the Local Folders or pop3.att.yahoo.com directory with a rename to xxx-restored
 3) restarting TB so that it saw the restored files and auto-built the xxx-restored.msf files
 4) selecting the missing mail messages from the xxx-restored file and copying them into the Local Folder file (where the messages were missing)

Then I did this, the first mail message was copied properly, but All of the subsequent mail messages were corrupted. I opened bug 1725750.

To get around this problem I did the following:

 1) using grep -n, head, and tail, I took the xxx-restored file and deleted the messages at the front and back that were not missing
 2) shut down TB
 3) appended the xxx-restored file piece to the base file
 4) removed the xxx,msf file
 5) restarted TB.

Even though the mail files in the repaired folder are not in date order, the resulting rebuilt .msf file put the index in order.
I was able to do this and repair my Sent folder and the affected Local Folders folders. I have not yet repaired my Inbox file.

Now, Wayne Mery, as part of this bug (1742975), wants me to test this on the Daily build. I have not been following this bug in detail (I do get e-mail updates), and I am wondering what I should do. This bug talks about .msf corruption, but when I first tried to copy messages from the xxx-restored folder to the real xxx folder, TB auto-built the XXX-restored.msf file. Was it that file that was corrupted? I had no trouble looking at the xxx-restored file and look at any message; there was no corruption. It was only when I tried to copy (or move) mail messages to the real folder that the copied (or moved) messages were corrupted. Will my testing on my Inbox file be useful? If so, how should I proceed to run an adequate test? Thanks.

OK, update to my logging patch to add a bit of info for the code path used by general IMAP offline-message-syncing.
There's a try build running here:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=e2c2fd682bd14e10fe4f12297f0c49c2b462c12c

Is that the best way to get builds with the patch into the hands of people to try out?

export MOZ_LOG="BORK:5"
export MOZ_LOG_FILE=/tmp/borklog.txt

Then we can look at the borklog and see if we can spot oddness.

No luck in seeing anything bad with bork patch. Setup new o365 profile and filtered (copied) messages to another folder too. All I see is normal info I/BORK lines in the log showing good message IDs and consistent offsets like:

:
I/BORK nsMsgDBFolder::StartNewOfflineMessage() msgid='0a58b582-4a4f-3645-eb54-9b7d7d3fc9a1@mgh.harvard.edu'
I/BORK nsMsgDBFolder::EndNewOfflineMessage: done. offset=848916 messageSize=17613 offlineSize=17696
I/BORK nsMsgDBFolder::StartNewOfflineMessage() msgid='056E3D2B-A2D9-4C0B-855D-56B801712232@stanford.edu'
I/BORK nsMsgDBFolder::EndNewOfflineMessage: done. offset=866612 messageSize=22951 offlineSize=23034
:

I compared the log's offset= to the offsets in the mbox file and they are correct. They correspond to the 'F' in the "From - " delimiter separating messages.
Also, FWIW, it appears that
next offset = previous offset + offlineSize
and messageSize not used in the calculation since messageSize probably excludes the "From - " and "X-MOZ*" items as previously pointed out by Ben.

Re: comment 88:

The second message "From " separator appears midway through a line, which is invisible to any mbox parser ("From " has to start at the beginning of a line). So there will appear to only be one message in the mbox (with badly-formed MIME parts because of the corruption!).

Ben,
Are you looking at a full size mbox file when you saw the "From " in the middle of a line? I would be curious to see the file myself if that's possible. I think you probably got it from [:Aureliano Buendía] so I'll NI him too.

Anyhow, tried some more to duplicate and still only see good stuff logged with BORK patch and no obviously corrupted messages.

Flags: needinfo?(euryalus.0)

(In reply to gene smith from comment #110)

Re: comment 88:
Are you looking at a full size mbox file when you saw the "From " in the middle of a line? I would be curious to see the file myself if that's possible. I think you probably got it from [:Aureliano Buendía] so I'll NI him too.

No, it was from sancus. I ran it past him and then forwarded you his email.

Couple of extra thoughts:
It's a result of moving out the non-corrupted messages and compacting the folder.
So it's not necessarily a good reflection of what was in there originally. The mbox might have been fine, but then been screwed up by the compaction...

Anyhow, tried some more to duplicate and still only see good stuff logged with BORK patch and no obviously corrupted messages.

It's a frustrating one all right :-(

Flags: needinfo?(euryalus.0)

Thanks Ben. Looks like the part of the ending of the 1st duplicate message is cut off, including the CRLFs, so the "From " for the second dupe messages is at the end of the line and not the middle (line 1404 of the mbox file). Also, don't see any X-Mozilla* headers like I see on all of my messages in mbox files. Where did they go? I see that "compact" removes the X-Moz*: lines as well as the dash and date from the "From " delimiter.
Anyhow, with my test I tried deleting messages and compacting and the mbox file seems to go down in size and I saw no corruption on message above or below the deleted message.
The only way I've been able to see the next message bleed into the current message is to remove some lines in the mbox from the end of the current message and then I see the From and other headers at the end when current message is displayed.

If for some reason there's a problem reading the mbox, the message might be fetched and stored again in the file (e.g., don't find expected headers at the file offset). A growing mbox with the same message stored over and over was the problem hopefully solved in bug 1702692.

Should have done this this morning, but here's a list of installers from the try build with the borklog enabled, for anyone who can replicate the bug:

Linux x64:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/I0h_dM1JSKmX6rkPkY-NUw/runs/1/artifacts/public/build/target.tar.bz2

OSX:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CpatBGwRRauuqKt47ntS3Q/runs/0/artifacts/public/build/target.dmg

Windows (32bit):
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WhUpu4hzRg6ta7l0GVTJsA/runs/0/artifacts/public/build/install/sea/target.installer.exe

Windows (64bit):
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/JyA2_fwpRLuLdAwBg3CTmQ/runs/0/artifacts/public/build/install/sea/target.installer.exe

It dumps out some more diagnostics for writing offline messages to disk, with a few warnings to check if things have obviously going wrong.

You need to set environment variables to enable the logging and specify an output file.
eg, on Linux:

export MOZ_LOG="BORK:5"
export MOZ_LOG_FILE="/tmp/borklog"

So, run one of these builds, enable the bork log, replicate the issue and send me the logs!

(In reply to Ben Campbell from comment #111)

It's a result of moving out the non-corrupted messages and compacting the folder.
So it's not necessarily a good reflection of what was in there originally. The mbox might have been fine, but then been screwed up by the compaction...

I checked this and the borked message boundary is the same.

See Also: → 1774498

Still unable to see the problem.
I found this bug from 10 years ago, bug 764662, which describes almost the issue we are seeing now. Commenter WADA has some theories on what is causing the bug but I still can't cause it. He thinks it is the interaction of occurrences when new messages arrive: a small amount of the message is fetched (up to 2048 bytes) for the preview popup, headers are fetched and then the full message is fetched when clicked and autosync causes the full message to be fetched if not clicked. I think he is saying the short 2048 bytes are stored offline and not the full message. But the lengths in the msf DB are right causing the next message in the mbox to "bleed" into the current message displayed.

I caused 3 or 4 big messages to appear in inbox using another client and then on tb tried to quickly click on the received messages with preview notification occurring but never saw corruption. The header, 2048 preview and full message fetches look OK in the IMAP:5 log. Did this several times with no corruption.

I don't think we have any evidence that only 2048 bytes are getting stored for the problem messages.

See Also: → 764662

Just to rule out another possible culprit:
TB seems to cope fine with the IMAP server serving up messages containing nul ('\0') chars in the middle of the body. It just replaces them with space (0x20, ' ').
The IMAP thread passes data out to the IMAP folder offline-message writing (on the main thread) as a C-style char* string, and strlen() is used to figure out the byte count. But it seems that the nul has already been replaced by a space by then. Ah well. Was a nice theory.

OK, qualified "Eureka!".
I can replicate it and I know what's going wrong. There are potentially other causes, but I think this one will likely account for the bulk of the issues people are seeing.

Steps to replicate:

  1. set up a profile with the message quarantining workaround turned on (it downloads messages into a temp file first before adding them to the mbox file, so the OS doesn't embargo all your messages). There is a UI option for this somewhere I think... but I don't know where to find it. In your prefs file, you want "mailnews.downloadToTempFile" set to true.
  2. receive some new messages into your IMAP INBOX (to force the IMAP code to write out a local offline copy of each message)
  3. if you've got the BORK log running, you'll see some errors: E/BORK nsMsgDBFolder::EndNewOfflineMessage: outstream not seekable
  4. copy a few of those new messages into a subfolder
  5. observe that the copies in the subfolder are screwed up

quick explanation:

Basically, the IMAP save-to-offline code assumes the nsIOutputStream used to write the offline copy is seekable, and after writing it, it uses Tell() to figure out the message size, to set .offlineMessageSize in the msg header.
But the quarantining code gives you a non-seekable nsIOutputStream, so the message size is not correctly calculated (but of course, the code doesn't check that or clean up or anything like that ;- )
It just leaves a message without any .offlineMessageSize. When copying, the IMAP folder code will use the .messageSize (the size of the message as sent by the server, which can be a little different to the size of the local copy) as a fallback, but that'll be wrong, so everything will get screwed up.

Working on a fix now.

(In reply to Ben Campbell from comment #117)

There is a UI option for this somewhere I think... but I don't know where to find it. In your prefs file, you want "mailnews.downloadToTempFile" set to true.

It's labelled "Allow antivirus clients to quarantine individual incoming messages" near the bottom of the Privacy & Security tab in the settings.

A little ugly, but hopefully not too intrusive.
Fixes the case for IMAP/News folders where offline message writes were
leaving .offlineMessageSize unset when destination stream is non-seekable.
This is already the case when quarantining is enabled (i.e. when
mailnews.downloadToTempFile is true), and will eventually be the case for
most nsIMsgPluggableStream-provided output streams.

Wasn't setting mailnews.downloadToTempFile to false the original workaround for this identified months ago?
Well, that was a different bug: bug 1734843 comment 53
I thought I was testing with it true but I must have forgot to set it on my new profile I was using with o365.
Anyhow, with it true I now see the error "outstream not seekable" on all fetches into inbox and 2 or the 3 message I just received into Inbox are corrupted in inbox
But when I copy them to a subfolder of Inbox they are OK again and not bad in the subfolder.

Merged your comment 119 patch with my local build still having the bork patch and don't see a problem on newly received messages into inbox with quarantine enabled (i.e., mailnews.downloadToTempFile = true). Still see the "outstream not seekable" on the fetches which I guess is OK.
So I suppose the hope is that most everyone who have reported the problem has quarantine enabled (which is false by default).

(In reply to Ben Campbell from comment #120)

Try build running here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=e27e66699310d11d6b5c460c9e7e27ac50b47133

Can you give me a Windows installer so I can try? Thanks!

I'll get the patch landed.
It could be good to also add some more logging for tracking down any additional issues that could be involved here. (Similar to the BORK logging patch above).

Status: NEW → ASSIGNED
Target Milestone: --- → 103 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/8c748cbf6e65
Fix nsMsgDBFolder::EndNewOfflineMessage() for quarantined message writes. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED

After 6 hours with the last try-build installed and TB running, in the same scenario as the other times (new fresh profile, Allows antivirus... = true, synchronize inbox folder = true), I still haven't encountered any problems. The borklog is empty. No mail is messed up, the search for sender-recipients-subject is fast as in the TB 91 version: it seems the bug is fixed.
I continue to test until tomorrow evening and, in case of errors, I report it here.
Thanks!

(In reply to [:Aureliano Buendía] from comment #128)

After 6 hours ...
... the search for sender-recipients-subject is fast as in the TB 91 version: it seems the bug is fixed.
Should this also get the "Perf" key word?

(In reply to Ben Campbell from comment #117)

OK, qualified "Eureka!".
I can replicate it and I know what's going wrong. There are potentially other causes, but I think this one will likely account for the bulk of the issues people are seeing.
Working on a fix now.

Tell us where to send the beer!

(In reply to Worcester12345 from comment #130)

Should this also get the "Perf" key word?

+1 for me.

Thanks for the explanation in comment #117, Ben. Have you identified where the regression was introduced after TB 91?

(In reply to newsfan from comment #133)

Thanks for the explanation in comment #117, Ben. Have you identified where the regression was introduced after TB 91?

It would have been the quarantine rewrite I did - it was late sept 2021
revision: 35e064ada8debd021ce507fd0e8b20be363a9046
Bug 1717147: Move message-quarantining from nsPop3sink into mbox implementation.

Previously, the quarantining was done by the POP3 code, so I think that POP3 got all the scrutiny. But IMAP made the nsISeekable assumption that POP3 didn't, and so the problem shows up with IMAP.

Just to leave some breadcrumbs for anyone with affected data:

Any affected IMAP folders can be fixed by just deleting the corrupted mbox and .msf files. Then when running with a fixed TB, it'll redownload the messages and all should be fine.
Unfortunately, there not much that can be done to fix up corrupted messages in local folders - there's nowhere to recover it from (other than any backups you might have). The unaffected messages in the folder should be just fine though.

Ben, it would be insanely great to have someone create a test for this issue.
(Then I will feel comfortable doing some low-level I/O stuff and will be confident that I won't break the code with the test ensuring no regression is introduced with regard to this issue.)

Sorry, I have not been able to familiarize myself with mozmill and mochitest so far. :-(

Actually, I am concerned with strange warning I see in xpcshell local test log (with --sequential --verbose):

00:20.08 pid:50659 [Parent 50659, Main Thread] WARNING: Missing .messageOffset (key=1, storeToken=''): file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/db/msgdb/src/nsMsgHdr.cpp:438

Note the "Missing .messageOffset" string.
The warning lines appear somewh ere between 100-200 times during xpcshell test.
I forgot to mention.: this is an xpcshell test run of FULL DEBUG version under Debian GNU/Linux.

I was not sure if these were serious or not, but now that I see this bugzilla, maybe I should report this in a separate bugzilla. I intend to
investigate it a bit more locally before posting a bugzilla about this.
(I have yet to figure out how to run xpcshell test with "--sequential and --verbose" on try-comm-central.)

This message began appearing sometime between late November 2021 and early January 2022.
(And I think this is related to some not so intermittent error(s) I observed.)
But then I realize we began seeing this due to the following patch and the underlying symptom had been there all along I suppose, and is still there today.

changeset: 34359:3b3ba6e833b0
user: Ben Campbell <benc@thunderbird.net>
date: Mon Nov 22 12:16:10 2021 +0200
summary: Bug 1720047 - Show error in debug if unset nsMsgHdr.messageOffset is read. r=mkmelin

Comment on attachment 9281552 [details]
Bug 1742975 - Fix nsMsgDBFolder::EndNewOfflineMessage() for quarantined message writes. r=mkmelin

[Approval Request Comment]
We want this for beta.

Attachment #9281552 - Flags: approval-comm-beta?

FYI, this could be a false positive, but this morning I received a coverity automatic scan that mentions
|m_tempMessageStreamBytesWritten| is not initialized.

The second issue in the coverity message is it.
There may be a path where it is not initialized?
(The first one about timer is hard to figure out. Maybe coverity got confused due to some macros?)

Hi,

Please find the latest report on new defect(s) introduced to Thunderbird found with Coverity Scan.

2 new defect(s) introduced to Thunderbird found with Coverity Scan.


New defect(s) Reported-by: Coverity Scan
Showing 2 of 2 defect(s)


** CID 1506301:  Incorrect expression  (NO_EFFECT)
/comm/mailnews/imap/src/nsImapMailFolder.cpp: 6907 in nsImapMailFolder::CopyMessages(nsIMsgFolder *, const nsTArray<RefPtr<nsIMsgDBHdr>> &, bool, nsIMsgWindow *, nsIMsgCopyServiceListener *, bool, bool)()


________________________________________________________________________________________________________
*** CID 1506301:  Incorrect expression  (NO_EFFECT)
/comm/mailnews/imap/src/nsImapMailFolder.cpp: 6907 in nsImapMailFolder::CopyMessages(nsIMsgFolder *, const nsTArray<RefPtr<nsIMsgDBHdr>> &, bool, nsIMsgWindow *, nsIMsgCopyServiceListener *, bool, bool)()
6901         }
6902     
6903         // Create and start a new playback one-shot timer. Callback will delete it.
6904         NS_ASSERTION(!srcImapFolder->m_playbackTimer, "expected null");
6905         rv = NS_NewTimerWithFuncCallback(
6906             getter_AddRefs(srcImapFolder->m_playbackTimer),
>>>     CID 1506301:  Incorrect expression  (NO_EFFECT)
>>>     Part "srcImapFolder" of statement "srcImapFolder , (&PlaybackTimerCallback)" has no effect due to the comma.
6907             srcImapFolder->PlaybackTimerCallback,
6908             (void*)srcImapFolder->m_pendingPlaybackReq,
6909             PLAYBACK_TIMER_INTERVAL_IN_MS, nsITimer::TYPE_ONE_SHOT,
6910             "nsImapMailFolder::PlaybackTimerCallback", nullptr);
6911         if (NS_FAILED(rv)) {
6912           NS_WARNING("Could not start m_playbackTimer timer");

** CID 1137630:  Uninitialized members  (UNINIT_CTOR)
/comm/mailnews/base/src/nsMsgDBFolder.cpp: 279 in nsMsgDBFolder::nsMsgDBFolder()()


________________________________________________________________________________________________________
*** CID 1137630:  Uninitialized members  (UNINIT_CTOR)
/comm/mailnews/base/src/nsMsgDBFolder.cpp: 279 in nsMsgDBFolder::nsMsgDBFolder()()
273       mProcessingFlag[2].bit = nsMsgProcessingFlags::TraitsDone;
274       mProcessingFlag[3].bit = nsMsgProcessingFlags::FiltersDone;
275       mProcessingFlag[4].bit = nsMsgProcessingFlags::FilterToMove;
276       mProcessingFlag[5].bit = nsMsgProcessingFlags::NotReportedClassified;
277       for (uint32_t i = 0; i < nsMsgProcessingFlags::NumberOfFlags; i++)
278         mProcessingFlag[i].keys = nsMsgKeySetU::Create();
>>>     CID 1137630:  Uninitialized members  (UNINIT_CTOR)
>>>     Non-static class member "m_tempMessageStreamBytesWritten" is not initialized in this constructor nor in any functions that it calls.
279     }
280     
281     nsMsgDBFolder::~nsMsgDBFolder(void) {
282       for (uint32_t i = 0; i < nsMsgProcessingFlags::NumberOfFlags; i++)
283         delete mProcessingFlag[i].keys;
284     


________________________________________________________________________________________________________
To view the defects in Coverity Scan visit, https://u15810271.ct.sendgrid.net/ls/click?upn=HRESupC-2F2Czv4BOaCWWCy7my0P0qcxCbhZ31OYv50yrakv2rGx9VCLK-2FXa3W6lt1eEBJD74Kk49VArp-2FVObLHpD1nelGFEX4HutYCQIOyHc-3DkjCa_TrCR6VUpVlYLcItNTuaQzeuhod48Eiyf-2F5-2FoN672LFngsKnw-2BY2MWtETXH0xTELoGAVHhD6cztiQQtdoKL57meCvRi316V8o6m494sA5MSvPZ50tjjL3H1d3ZycWdx6u5gEGF7DrqL7beg7zeuMARtEl3lWACiissQN-2Bbej2xJrnyCTmXxYXt-2B2DrUw7fjVR7sdnGB-2F6bGAkVlAF2NG1Ww-3D-3D

Was just looking at that. I think not this bug but bug 1773605

Ah sorry, the first one is for this bug.

(In reply to ISHIKAWA, Chiaki from comment #137)

Note the "Missing .messageOffset" string.
The warning lines appear somewhere between 100-200 times during xpcshell test.
I forgot to mention.: this is an xpcshell test run of FULL DEBUG version under Debian GNU/Linux.

I checked the treeherder logs, and it's not there. So either some local patch, or something else. Anyway, best to keep it in a separate bug - and see if you can note down which test(s) trigger it.

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/c2750abf4d3c
Ensure m_tempMessageStreamBytesWritten is initialized. r=freaktechnik

Comment on attachment 9281552 [details]
Bug 1742975 - Fix nsMsgDBFolder::EndNewOfflineMessage() for quarantined message writes. r=mkmelin

[Triage Comment]
Approved for beta

Attachment #9281552 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9281761 [details]
Bug 1742975 - Ensure m_tempMessageStreamBytesWritten is initialized. r=benc

[Triage Comment]
Approved for beta

Attachment #9281761 - Flags: approval-comm-beta+

Just tried beta7:
I can still reproduce the issue with existing mails in my inbox.
When I drag a mail that get's corrupted to an empty folder it is fine. When I add some mails that I know corrupt it to it it still get's corrupted. Doing a repair on the Inbox or that folder does not fix it. I did not expect it to still fail when moving a mail to a different folder and testing it from there.

However, after removing my .msf files it works as expected! I am no longer able to reproduce the bug! And what's more: my Thunderbird got a BIG performance boost, especially when moving mails (any) from one folder to another!

Great work! :D

(In reply to Magnus Melin [:mkmelin] from comment #143)

(In reply to ISHIKAWA, Chiaki from comment #137)

Note the "Missing .messageOffset" string.
The warning lines appear somewhere between 100-200 times during xpcshell test.
I forgot to mention.: this is an xpcshell test run of FULL DEBUG version under Debian GNU/Linux.

I checked the treeherder logs, and it's not there. So either some local patch, or something else. Anyway, best to keep it in a separate bug - and see if you can note down which test(s) trigger it.

Thank you for the comment.

Will file a separate bug for this issue.

(Oops. It is --sequential and not --serialize. I was writing from memory. I am fixing the typo.)
As for the visibility of warning lines, you have top run xpcshtest with "--sequential --verbose" to see the warnings.
Right now, I am seeing them locally.
Warnings are not printed from ordinary test run if the test as a whole is deemed to have succeeded.
I have not been able to figure out how to run xpctest with "--sequential --verbose" on trreeherder
I am beginning to feel maybe the JSON data that seems to configure the job on try-comm-central and others can be tweaked to contain these flags somewhere and then the test runs with these flags.

Oops. It was --sequential and not --serialize. I was writing from memory. I am fixing the typoe.
I filed
Bug 1774952 Opened Just now
Many warnings WARNING: Missing .messageOffset during xpcshell test (--verbose --sequential)
for the "Missing .messageOffset" issue.

Can we have a HOWTO with steps for repairing, for everyone affected by this bug?

Is there an existing comment that has it? If not, it might be worth writing a separate comment focusing on those steps, and referencing it from the whiteboard on this bug.

In my environment , the problem is solved in TB 102b7 .
Thank you guys !

In response to my posting 14 days ago, I got no response. So this is what I did using the production TB, not the Daily.
1) I restored the Inbox to \temp from a backup.
2) I removed the top and bottom pieces that I did not need; I was left with the missing e-mails (and a few more).
3) Exit TB.
4) Move the Inbox-piece file to the TB pop.yahoo.att.com directory.
5) Restarted TB.
6) Select all the messages in Inbox-piece and copy them to Inbox.
7) I checked some of the messages, and they appeared to be copied intact.
8) Delete duplicate messages in Inbox; I had left a few in inbox-piece.
9) Exit TB
10) Remove the Inbox-piece and Inbox-piece.msf files.

In conclusion, I have no idea why these steps did not work when I first tried them.
Whatever bug there was (and I am not running IMAP, so it could not have been this bug) has been fixed.

See Also: → 1777454
Flags: needinfo?(MRGSER)
You need to log in before you can comment on or make changes to this bug.