Closed Bug 1872849 Opened 9 months ago Closed 5 months ago

imap folder corruption for thunderbird 122+

Categories

(MailNews Core :: Networking: IMAP, defect, P1)

Thunderbird 122
Unspecified
All

Tracking

(thunderbird_esr115 unaffected, thunderbird122+ wontfix, thunderbird124 wontfix)

RESOLVED FIXED
125 Branch
Tracking Status
thunderbird_esr115 --- unaffected
thunderbird122 + wontfix
thunderbird124 --- wontfix

People

(Reporter: mkmelin, Assigned: benc)

References

(Blocks 1 open bug, Regression, )

Details

(Keywords: dataloss, leave-open, regression)

Attachments

(4 files)

Filing this as a tracker for folder corruption occurring with 122 and above, to at least track feedback on whether it got worse. xref bug 1719121.

I had two instances in the last month. I can't recall any cases in a long time before that.

What I've seen recently is different from what I've seen previously, notably in that the corruption didn't lead to content being displayed that didn't belong to the selected message. Instead it seemed like the start of it was missing or similar, so no headers were displayed and the actual content was only part of the full body.

Mine is similar. I actually have a copy of the source (at the time) for one of them. It looks like most headers are missing. The start of that source - to the extent it can be trusted - was

X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
 =?utf-8?B?ZkplWGt6NU9YYmlsWjcraU96dzlMMzBzdkhhdisxUWJsTjluS2Qwdngzc3Fi?=
 =?utf-8?B?aTVYdGxBY2drV1lIbms4aFN2MXZ1Q2IraHZHbzdnQ25SMHZwM3ZhOWZoVjZi?=
 =?utf-8?B?TzY1UmJjREFSaXF0b3FISHE3VlZKbFRxcVZMYkZnczV1THRRQ2xrK1N2VWNB?=
 =?utf-8?B?ckJWbklmKys0K1pXcDJxVUd3TFNhdFV3cjgrTzh3TUE3Y05LSmtjRlV0eEJm?=
 =?utf-8?B?dE1JdklpQjdmcElUaWpZakgzcXl3QTRBdFBUWkcwOWZILzRJSGFtM3V4aDZt?=
 =?utf-8?B?Z1V4Wm90azBWODlWRUp5U2RZakY5Znl1VEFmOWRWMkdhbTRiTkQ0V0FsSkhQ?=
 =?utf-8?B?S2pKbytLTHEzY2dYZ21tTW4xTkpsUk5KVHhyT3NIT0RmN25xMVB5TmxtMUxJ?=
 =?utf-8?B?QkpZQzJkMmZRakgrVFgxMXdmSURmQ3I2NjFPdTJtNFQ4VThrRmp0RG5aU0Yz?=
 =?utf-8?B?WTVzbVVTbXJ5RTlpeE9ES0lTWWxoNW9DcTBXdSsyTzJYSkNtMHdTRnk5bXZq?=
 =?utf-8?B?WnZpWWZEZVNSZzNtVVhSWklmU1E2ZkNPYjBJeDZTYmx4Y0RBb3NRa2RlUU9v?=
 =?utf-8?B?bUpQU3JFcXFPNldHRUx3bVlNbWM3U2NnaUozbGlCS25LZzQ4VmNEeDNhTFBm?=
 =?utf-8?B?TG4zOTNac1phTVVwbEdVZWJTdWtIZno4dlE0RGIwYWZybDB1ekwzb2xIUWxv?=
 =?utf-8?B?VHI5WDFrME1LeWVOSG5yRENNcm5Ib2R3blcxdE1NMjI0YzhrOVBJalI3YzB1?=
 =?utf-8?B?RzhieHBjQ0JRTnRJMnVLcTkrVnN3RDFxTVl4UWEzQ3lQUEdvZzZpVlQ5VE1R?=
 =?utf-8?B?aCtML1JpN1oxejhJMHdLNE9jSkxnSGM3MVltaTZSY3ovTXIxeUtscUVsNEor?=
 =?utf-8?B?OFFGRFdreXpFMUljUDhMRzBFZHRwenpGU3VhbEROMGNhTnV5VFpMTXFEeUM1?=
 =?utf-8?B?c3lkdVY4bU5YbDNlSU9BYVRzc1BUeFhFNld6VnZacUJiL3kxR2MyV1VZUnk3?=
 =?utf-8?B?dGxIeXA4SzFFTTl2Q1YvL0pQd0JYUFF6aE1YK3YyWXEwNUVRaUFWNGRLMXV6?=
 =?utf-8?B?UzZJd1AvRFRNWVhvQmtOSjl2TzRkbWdMdXVlQTV6TVEvY0dsL2tqNEFRb1Qz?=
 =?utf-8?B?Y25wYzFCYzV1eGhBUU5TK2UyQWkwZEVTMW0zVnhwKy9tVG9idmFEWG43SEtK?=
 =?utf-8?B?NHIyR1RGNEJZSGtRN29XeWFmVFFoNUdwWm1MaVViVmlTSFUrSWRuY1Z5bWhX?=
 =?utf-8?B?MkkvNHhKODZydzl2bUdWN045RFRrdVdvWG1tcjZlcmJidnN0akUxNEVEYy9j?=
 =?utf-8?B?eFBaa3NwcWN1SytLcWJ3aUphNm0xbUtBQk5ZNVY1RXQwNlIrdWtWQWMwZ0My?=
 =?utf-8?B?UFc4aDEyVnFVTzJsTFJjTnFkWEp0NUdzU1luUStObXhiazhaV0tCUlRaSU9o?=
 =?utf-8?B?YlBiWGZ4bVExT2RqMnZmajZPUWFvNU1TRll5akxKTlVlNjhtaWt4VmtKZlBB?=
 =?utf-8?Q?pzYiXsiAYtdAFL58Q679v1EXGoA4=3D?=
MIME-Version: 1.0

--===============2315777135858755778==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Obviously, one of the missing headers should have said it's multipart/something, with boundary 2315777135858755778. So the message is rendered all wrong.

See Also: → 1872677
Severity: -- → S2
Keywords: regression
Priority: -- → P1
Version: unspecified → Thunderbird 122

Just had it again, now my self-bcc of the tb planning metrics 2023-12 mail :(

https://support.mozilla.org/en-US/questions/1435280 is another likely report.

For such a serious issue, we might assume the most likely regressor is 1719121, until proved otherwise.
We should consider the possibility of stopping beta updates.

Flags: needinfo?(benc)
Flags: needinfo?(alessandro)
Keywords: dataloss
OS: Unspecified → All
Regressed by: 1719121

Is this just on local and/or pop3 folders? If so, not sure if repair would help. FWIW, if problem is on imap folders with offline store (mbox or maildir) the emails should still be OK on the server. May need to disable imap offline store for bad imap folder(s) and repair the folder(s), Any emails opened after that will be saved to disk cache and then later open about as fast as from offline store (at least until the 1G cache fills up).

My cases were all IMAP (offline store).

Screenshot from the sumo case is what it looks like yes - seems the same issue.

I think the assumption should be that Bug 1719121 is causing this :-(

I'm still traveling, so it'll be tricky getting time to focus on this. But I think the hardest (and most important) part is to nail down a good replication case.
Ideally, either:

  1. An mbox which a folder reparse screws up.
    2 A "good" mbox which can be put into a new profile, with simple steps to make it go bad.
  2. an email (or emails) which can be served up from an imap server to cause problems. I'm using dovecot locally, so if I can directly insert bad emails and see the problem occur, that's a huge help. (Bug 1872677 sounds promising here!)

We need the replication steps to catch the problem as it happens - it's almost impossible to go from a borked mbox and work out what screwed it up.

There is an "mbox" log (gMboxLog if you want to grep for it in the C++), which might be useful. At it's most verbose, it'll dump out every mbox read and write.

Flags: needinfo?(benc)

Let's focus on trying to reach a consistently reproducible state, and keep track of any similar reports.
The reports seem sporadic and rare, so I wouldn't stop beta as we need to further investigate this and we need more reports.
If it's mbox related, we can safely assume the corruption is not permanent and no actual message gets permanently mangled.

Flags: needinfo?(alessandro)

(In reply to Alessandro Castellani [:aleca] from comment #11)

If it's mbox related, we can safely assume the corruption is not permanent and no actual message gets permanently mangled.

I'm not sure we can assume that. The message fragment Magnus posted in Comment 2 looks pretty mulilated...

I experienced this once as well in the past month but I repaired the folder and it fixed itself.
Also I could still see the message correctly directly on the server.
Magnus, were you able to repair/restore the message?

Repair fixed it for at least the latest occurrence. Probably for the others as well but I wasn't paying attention (as it was unimportant mails) so not sure. I would assume we don't get real dataloss for imap, but if it happens for pop there could be. We don't yet know if pop is affected.

See Also: → 1873282

One thing that would be nice to confirm is that the corrupting "From " line is in fact coming from the new mbox outputstream class.

https://searchfox.org/comm-central/source/mailnews/base/src/MboxMsgOutputStream.cpp#111

If you change the line:

rv = Emit("From \r\n"_ns);

to something identifiable like:

rv = Emit("From XYZZY\r\n"_ns);

The data after the "From " is ignored.
So if anyone can run with a build patched like this, and manages to get corruption like Magnus showed in Comment 5, that'd be very useful.
(my thinking is that there might be some obscure code somewhere I missed, which is still trying to write directly to the mbox file, and this would tell us if that's the case. I'd have posted a proper patch here, but I'm still on the road)

Had this bug today again.

See also bug 1857450 comment 5.

See Also: → 1857450
Duplicate of this bug: 1877107

Another local email folder has just been corrupted in the same way. Repair of the folder has put it right but it seems to be becoming more frequent.

(In reply to Magnus Melin [:mkmelin] from comment #14)

Repair fixed it for at least the latest occurrence....
So if anyone can run with a build patched like this, and manages to get corruption like Magnus showed in Comment 5, that'd be very useful.

Try build ?

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(benc)
See Also: → 1878669

(In reply to Wayne Mery (:wsmwk) from comment #19)

(In reply to Magnus Melin [:mkmelin] from comment #14)

Repair fixed it for at least the latest occurrence....
So if anyone can run with a build patched like this, and manages to get corruption like Magnus showed in Comment 5, that'd be very useful.

Try build ?

I'm currently thinking it's likely a similar issue as the NNTP issue in Bug 1857450, where multiple connections are trying to deliver messages to the same folder simultaneously (which the folder code just can't handle). The news code is simpler than IMAP, so I'll run that one down first and then check back on this one.

Flags: needinfo?(benc)

My current thinking is Bens (wip) patch in bug 1857450 could prevent this issue, though error handling would likely be a bit rough.
The underlying bug there seems nntp specific.
For this bug, there must be something else as underlying cause.

I'll let Ben create a suitable try build if he thinks that would be useful. I'm not sure such a build is for general consumption as it's putting "junk" data into your data, which is ok for a throwaway profile but not for real usage.

Flags: needinfo?(mkmelin+mozilla)
Duplicate of this bug: 1878669
See Also: 1878669
Duplicate of this bug: 1878810

Any update on this?

TB is very slow and taking more than 3 minutes to start and be responsive. Might be related to this.

Tried 123.0b4 today but no improvement.

My drive is reading at 9940 MB/s (Crystal Disk Mark).

Attached image 123.0b3 blank header-1.jpg —

Could this instance which i've been seeing be part of same issue?

I'm repeatedly having a download issue with one particular email which has Subject : [Bugzilla] Thunderbird beta bug list for 24 hour period
No: FROM, TO, Subject
Message content shows partial full headers and html content.
See image attached.
This has been going on for 6 months.

I have to edit the html to remove blank lines from headers section to get it to display properly.

Attached image 123.0b3 blank header-2.jpg —

This image shows the actual headers or rather only a fraction of them.
I've opened 3 windows in an attempt to try to show the scale of these headers.
Notice the bottom scrollbar - it's indicating there is a load of text off to the right which cannot be displayed in window of that size. You are only seeing about 1 tenth of the horizontal text.
Also Notice the vertical scrollbar - that scrollbar has to be 2 thirds of the way down before you get to the end of the header section and where atcual email starts. So you are only seeing barely half of the headers in that direction.

I hope this offers some idea of the large amount of header in this email.
In the middle there is an arrow - it shows the blank line position
Notice that content displayed in actual email - see previous image - starts after that blank line.

I only get this issue with this one email. All other emails received from topicbox are ok.

But, the blank headers and incorrect content sounds like same issue.

my profile has grown from

11GB Jan 26

15GB feb 8

58GB today!!!

obviously havent received this many emails.

what is going on?

(In reply to dquiros from comment #27)

my profile has grown from

11GB Jan 26

15GB feb 8

58GB today!!!

obviously havent received this many emails.

what is going on?

And this profile is on your local C: drive, right?

And this profile is on your local C: drive, right?

No.

(In reply to dquiros from comment #29)

And this profile is on your local C: drive, right?

No.

It's not on a network share right?

Like I said above, drive is fast.

checking the files

\ImapMail[SERVER]\nstmp-2 28GB
\ImapMail[SERVER]\nstmp-1 16GB

deleted them but kept running out of space. had to free like 100GB for TB to finish making my inbox 40GB!

Also had a lot of mozmsgs folders. As I understand this is a Windows Search thing which I dont get why it would create folders instead of using its index location.

Did a Repair Folder (which should be called delete and download everything again). Went down to 1.1GB.

No idea what caused it but I was running out of space because of this and probably made the problem worse. Running out of space shouldnt be so chaotic.

Even after Repair, the problem with emails continues, some are not shown. No boundary line now but blank tabs. Of course, dont know which emails gave boundary error as there is no info about them when shown.

(In reply to dquiros from comment #32)

checking the files

\ImapMail[SERVER]\nstmp-2 28GB
\ImapMail[SERVER]\nstmp-1 16GB

deleted them but kept running out of space. had to free like 100GB for TB to finish making my inbox 40GB!

Also had a lot of mozmsgs folders. As I understand this is a Windows Search thing which I dont get why it would create folders instead of using its index location.

Did a Repair Folder (which should be called delete and download everything again). Went down to 1.1GB.

No idea what caused it but I was running out of space because of this and probably made the problem worse. Running out of space shouldnt be so chaotic.

Even after Repair, the problem with emails continues, some are not shown. No boundary line now but blank tabs. Of course, dont know which emails gave boundary error as there is no info about them when shown.

It isn't the point that the drive is fast, it is not standard to have your profile on a network drive even if it seems to work ok. I'm speaking as a user here when I say it is not a supported method of use.

Data access across a network is not the same as data access on a physical disk. Every time I hear about problems like this, it's always because someone is using it on a network share (or via Onedrive, Google Drive, etc.). It is never a good idea to go outside the defaults the program is meant to use. Are we surprised it's doing something not normal?

why did you assume its a network drive?

(In reply to dquiros from comment #34)

why did you assume its a network drive?

I've seen more than a few support tickets over the years that I've been helping users when the user reveals that their profile is not on a physical disk but on some manner of networked drive and is having odd issues. As soon as they move their profile to a physical disk or use something like Thunderbird Portable, it all seems to clear up. Would you be able to try with TB Portable and see if your issues persist? Is having your profile on a local disk not an options for you due to drive size constraints?

Just a thought. I slide emails concerning a particular project into a sub folder. (all on local physical disk).
It is these emails that are being affected, but not all of them. The emails being affected seems to be fairly random.

These crashes mention nstmp or tmp

Keywords: meta
See Also: → 1879897, 1878541
Summary: folder corruption tracker for thunderbird 122+ → [meta] folder corruption tracker for thunderbird 122+

As a datapoint, I had this again today in my inbox. There are filters running on the inbox, but none applicable for this message.

I have no filters.

Blocks: 1880867
Blocks: 1872677
Assignee: nobody → benc
Status: NEW → ASSIGNED

Pushed by ikey@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/12ffdc0549a4
Add protection against interleaved message writes to nsImapMailFolder. r=mkmelin

I had this again with 2024-02-21 daily, so bug 1857450 didn't help for that case at least.
It seems to happen more frequently if you have the computer under severe strain, like doing a full compile or getting mails just after starting up.

(In reply to Magnus Melin [:mkmelin] from comment #42)

I had this again with 2024-02-21 daily, so bug 1857450 didn't help for that case at least.
It seems to happen more frequently if you have the computer under severe strain, like doing a full compile or getting mails just after starting up.

Thanks for the report!
The bug 1857450 added some protection into the mbox store, and improves rollback of aborted messages.
But the imap folder ParseAdoptedMsgLine()/NormalEndMsgWriteStream()/AbortEndMsgWriteStream() mechanism will happily interleave messages and the mbox store will never know about it :-(

https://hg.mozilla.org/comm-central/rev/12ffdc0549a4 is supposed to deal with this (it uses the UID to distinguish interleaved writes and fail them).
(fingers crossed ;-)

Sorry to interrupt from sideways.

(In reply to Ben Campbell from comment #15)

One thing that would be nice to confirm is that the corrupting "From " line is in fact coming from the new mbox outputstream class.

https://searchfox.org/comm-central/source/mailnews/base/src/MboxMsgOutputStream.cpp#111

If you change the line:

rv = Emit("From \r\n"_ns);

... rest omitted ...

Ben, was there a reason not to insert "-" in the From line?
That is, why not Emit("From - \r\n") ?

TB mail folder files used to have "From - \r\n" as message separators and some third party shell/perl/awk/whatever scripts may rely on this "-" to recognize the separator. Mine did and so I had to fix it in my local test suites for pop3 e-mail message exchange.

I suspect this omission of "- " is tied to the way your "From" handling code was organized.
But if it is not, it may be worthwhile to insert "-" in the From line to be compatible with old folder file.
Just my two cents worth.

As far as I can tell the important thing is "From " (with the space!), and out in the real world you can't rely on the format of anything after it.
There are conventions (eg RFC4155 ), but in practice... who knows.
No reason we couldn't change back, but really, anything relying on "From -" should be changed instead. Such scripts would break with most "proper" mbox files, which will have an email address and timestamp there anyway.

I don't like to see divergence from the standard MBOX format, which is FROM and a space. I recently looked at Thunderbird data files and noticed this had changed. This is about the time I started seeing sporadic issues with headers.

See Also: → 1873134

Is anything blocking uplift to beta?
And will this resolve storage size issues? bug 1878541, bug 1879897, and friends

Status: ASSIGNED → NEW
Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #47)

Is anything blocking uplift to beta?

Nothing I'm aware of. I think it'd be worth uplifting.

And will this resolve storage size issues? bug 1878541, bug 1879897, and friends

I don't think so, but you never know :-)

Flags: needinfo?(benc)

Comment on attachment 9380941 [details]
Bug 1872849 - Add protection against interleaved message writes to nsImapMailFolder. r=#thunderbird-reviewers

[Triage Comment]
Approved for beta

Attachment #9380941 - Flags: approval-comm-beta+

This missed getting uplifted to beta, perhaps because it has not been closed FIXED. But we're still OK

I must have jinxed it when I wrote it's not a problem anymore this morning. Got another case now with daily 2024-03-14 :(

See Also: → 1888790

Magnus: Are there any other factors you can think of?
e.g.

  • does it only happen with folders that receive messages via filter rules?
  • does reducing the max number of IMAP connections down to 1 help?
  • could it be related to folder compaction?
    • do you see it in folders where you never delete anything?
    • Does it happen if you don't have autocompaction enabled?

Yes, clutching at straws a little bit here :-)

I've been rewriting the folder compaction code, and it does seem like folder locking isn't quite working as intended. I could imagine that if a message came in while a compaction was in progress, bad things might happen...

I haven't seen it since I last wrote above. The only factor I have suspicions about is processing incoming during heavy load - so something timing related.

  • does it only happen with folders that receive messages via filter rules?

No, had it for the Inbox as well.

  • does reducing the max number of IMAP connections down to 1 help?

I haven't tried.

  • could it be related to folder compaction?

In my case, unlikely.

  • do you see it in folders where you never delete anything?

I only have a few such folders. I did see it earlier in a list archive where I normally do not delete anything.

  • Does it happen if you don't have autocompaction enabled?

I don't use autocompaction. It is set to ask, but I didn't get any prompt close to the cases I saw.

Comment on attachment 9380941 [details]
Bug 1872849 - Add protection against interleaved message writes to nsImapMailFolder. r=#thunderbird-reviewers

[Triage Comment]
Per matrix we decided this patch won't be uplifted to 115

Attachment #9380941 - Flags: approval-comm-esr115-

Comment on attachment 9380941 [details]
Bug 1872849 - Add protection against interleaved message writes to nsImapMailFolder. r=#thunderbird-reviewers

[Triage Comment]
This is on beta 125 via train, it was never uplifted

Attachment #9380941 - Flags: approval-comm-beta+ → approval-comm-beta-

Comment on attachment 9380941 [details]
Bug 1872849 - Add protection against interleaved message writes to nsImapMailFolder. r=#thunderbird-reviewers

Clearing 115 minus (wrong bug)

Attachment #9380941 - Flags: approval-comm-esr115-
Attachment #9380941 - Flags: approval-comm-beta-

This bug has a patch and we want to be able to close it. So I'm going to transfer the meta aspects of this to a new meta bug.
You will find yourself CC on both bugs.

See Also: 1888790

I'm still seeing this running 125 beta 2:

nstemp-1 1.3Tb 04.04.23 13:29

Blocks: 1890230

This is no longer a meta bug

Status: NEW → RESOLVED
Closed: 5 months ago
Component: General → Networking: IMAP
Keywords: meta
Product: Thunderbird → MailNews Core
Resolution: --- → FIXED
Summary: [meta] folder corruption tracker for thunderbird 122+ → imap folder corruption for thunderbird 122+
Target Milestone: --- → 125 Branch

Does WARNING: Interleaved messages?: '(uidOfMessage == m_curMsgUid)', file /builds/worker/checkouts/gecko/comm/mailnews/imap/src/nsImapMailFolder.cpp:4329 have any significance? I saw this using a debug build.

Yes that warning was added to warn about this issue happening.

So if the warning comes out, that's an unexpected case of corruption? I just started the debug version on a profile that hadn't been used for a while and saw the warning while it was downloading messages. Do you not see it at all?

It's not good, but I don't think it'll cause any serious problems.
It's the folder detecting the imap protocol code is finishing a message write, but to a different message to the one currently being written.
It's just a warning message here, but there are more robust checks earlier in the process (when starting to write a message), which do bail out.

There's also protection against interleaved writes in the local storage code to prevent corruption.

The real issue is architectural. There's just not a proper interface to arbitrate multiple things wanting to write messages into a folder at once.
I want to come up with a "Proper" solution, but that's part of a more comprehensive overhaul of folder/protocol interfaces and I want that to be done properly and cover more than just IMAP.
So for now we're stuck with slightly bodged anti-interleaving protection, in the wrong layers.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: