Open Bug 1878541 Opened 5 months ago Updated 5 days ago

nstmp file filling up the whole hard drive. (caused by compact) - need to clean up nstmp that already filled up the disk

Categories

(MailNews Core :: Database, defect, P1)

Thunderbird 123

Tracking

(thunderbird_esr115 unaffected, thunderbird126 affected, thunderbird127 affected, thunderbird128+ affected)

Tracking Status
thunderbird_esr115 --- unaffected
thunderbird126 --- affected
thunderbird127 --- affected
thunderbird128 + affected

People

(Reporter: omry, Assigned: benc)

References

(Blocks 1 open bug)

Details

Attachments

(6 files)

Attached image Screenshot_9.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0

Steps to reproduce:

I am not sure how to trigger it.
I get system warnings about the disk being full and then I see a nstmp file using 1TB.
I deleted it, and hours later it happened again.
This have happened a few times for me in the past few weeks, but it seems to be happening more and more.

Additional context:
I am using imap. The entire mailbox on the server is 9.9GB.

Thunderbird 123.0b2 (64-bit)

OS:
Edition Windows 11 Pro
Version 22H2
Installed on ‎22/‎01/‎2023
OS build 22621.3007
Experience Windows Feature Experience Pack 1000.22681.1000.0

That's only used for compacting folders. https://searchfox.org/comm-central/rev/1452d8f1e1582cc44529f638e1eba1c7b3804fe4/mailnews/base/src/nsMsgFolderCompactor.cpp#393
The sent folder is also quite large, 50GB.

Is auto compact enabled, or do you do compaction only when it asks/or manually?

Is auto compact enabled, or do you do compaction only when it asks/or manually?

I didn't change the setting, so I think it's probably enabled by default.
I dug through the settings now and was not able to locate it.

The sent folder is also quite large, 50GB.
I noticed that too. I deleted it and it. The reconstructed file is 16GB.

That's only used for compacting folders. https://searchfox.org/comm-central/rev/1452d8f1e1582cc44529f638e1eba1c7b3804fe4/mailnews/base/src/nsMsgFolderCompactor.cpp#393

It looks like this function is not always cleaning up after itself.
For example, if an exception is thrown here, CleanupTempFilesAfterError is never called:
https://searchfox.org/comm-central/rev/1452d8f1e1582cc44529f638e1eba1c7b3804fe4/mailnews/base/src/nsMsgFolderCompactor.cpp#411

Why not wrap it with a big "try finally" to always delete the file if it exists when the function returns?

There is also the underlying question of why on earth the function is creating a file that is 1TB in size. It sounds like there is a quadratic usage of disk space in the compaction algorithm.

There is an attempt to clean up the file if it's still there in the destructor:
https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#171-177

However, it only kicks in if NV_FAILED returns true.

Sounds like the Sent-1 file is so big it's not possible to do compacting maybe because either Thunderbird is interrupted during the compacting eg: shutdown, crash or receiving mail into the folder being compacted, maybe another program is scanning the file or it's possible that you might not be able to compact a large folder because you don't have enough free disk space or memory or there is some type of corruption in the file. 50GB is very large for a single text file and it looks like it's been failing to compact for a while, so is full of old deleted mail.

Does it say anything in the Error console ?
Open Error console (Ctrl + Shift + J') or 'Tools' > 'Developer Tools' > 'Error Console'
Clear Error console - top left bin icon

right click on 'Sent' folder and select 'compact'
If compacting does not work - Sent-1 is still 50Gb and nsmtp appears.
What do you see in Error Console - please post an image.

Either way you need a solution to overcome the current issue.

Suggestion:

  • Create a new folder called 'Sent Store'.
  • Move all the wanted sent emails into 'Sent Store' as a precaution to make sure you do not lose them.

So now Sent appears empty.

First try to fix Sent msf file:

  • Right click on Sent folder and select 'Properties' - click on 'Repair' folder and click on OK.

then try to compact Sent folder:

  • Go into 'Offline' mode to ensure nothing gets received into 'Sent' and then right click on 'Sent' to see if it manages to compact the Sent folder.

If it fails to compact and produces a new nsmtp:

  • Exit Thunderbird
  • Access profile
  • Delete the nsmtp file and also the Sent-1 mbox and Sent-1.msf files.
  • Empty the computer Recycle Bin.

Restart Thunderbird and the Sent files should get created.

It might be useful to keep the 'Sent Store' folder for all the older sent mail or Archive it or if you really want to you can move the emails back into the 'Sent' folder.

Please report on whether this fixed the problem.

auto compacting - I dug through the settings now and was not able to locate it.

  • Settings > General
  • scroll down to 'Network & Disk Space'
  • Under 'Disk Space'
  • Is this checkbox selected 'Compact all folders when it will save over' ?
  • What is the number of MB set ?
  • Is this checkbox selected 'Ask every time before compacting' ?
Flags: needinfo?(omry)

Thanks for the detailed answer.

Sounds like the Sent-1 file is so big it's not possible to do compacting maybe because either Thunderbird is interrupted during the compacting eg: shutdown, crash or receiving mail into the folder being compacted, maybe another program is scanning the file or it's possible that you might not be able to compact a large folder because you don't have enough free disk space or memory or there is some type of corruption in the file. 50GB is very large for a single text file and it looks like it's been failing to compact for a while, so is full of old deleted mail.

I have 1 TB of free storage so it's not out of disk space (except when Thunderbird fills it all up with nstmp).
Normally, I never delete things from sent items (I have messages as old as 2005 there).
I already deleted the Sent-1 file, thinking Thunderbird will rebuild it from the server but it looks like it's satisfied with Sent-1.msf and did not recreate Sent-1.
I did not run into the problem since I opened the bug report.

Does it say anything in the Error console?

Manually compacting Sent works now and there is nothing printed to the Error Console.

Is this checkbox selected 'Compact all folders when it will save over' ?
Yes.

What is the number of MB set ?
20 MB

Is this checkbox selected 'Ask every time before compacting'?
No.

Flags: needinfo?(omry)

I have 1 TB of free storage so it's not out of disk space

It's not about the actual size of total free disk space, but maybe due to not being able to open such a large file in the current folder OR RAM not being able to handle such a large file.

I already deleted the Sent-1 file, thinking Thunderbird will rebuild it from the server but it looks like it's satisfied with Sent-1.msf and did not recreate Sent-1.

That sounds like you are not downloading full copies of emails into that particular folder, only headers are being downloaded, hence you only see a .msf file. That Sent.msf file may be much smaller as well. If you were intending to create a backup of profile then you would need to ensure you are downloading full copies of emails first.

  • 'Account Settings' > 'Synchronisation & Storage'
  • click on 'Advanced' button and select checkboxes for all folders which you want to download emails.

Is this checkbox selected 'Ask every time before compacting'?
No.

I would suggest you change that option to Yes by selecting that checkbox. Why? because then you will be aware that compacting is occuring meaning you must not exit Thunderbird or eg: send email when it's compacting the Sent folder.

Attached image Screenshot_1.png
Attached image Screenshot_2.png

It's not about the actual size of total free disk space, but maybe due to not being able to open such a large file in the current folder OR RAM not being able to handle such a large file.

I understand that there could be many reasons a failure, but regardless of the reason - the temp file should be deleted afterwards.
In fact, it should probably be cleared on startup as well if it's there.

I ran into a problem where the indexing process kept getting stuck on message 25 of many thousands in some folder.
I backed up the profile and cleared the imap mail directory and allowed Thunderbird to download everything again.
Manual compacting finishes for Sent items without an error (I was never able to reproduce it manually).
However, indexing still gets stuck on the same folder. The Error Console does not show anything that is obviously related (see attachments).

Thanks for adding more details.
I see nothing interesting in the error console.

Can't stress enough that you should attempt to trim your sent folder size. Suggest target to reduce by half.

  1. Add the message size column, sort on it, and delete as many of the larger messages as possible. Messages which contain attachments, for example images, are twice as big as their attachment sizes and so very inefficient - if their contents have been saved elsewhere you should delete the message or delete the attachments from the message.
  2. Sort on message date and delete as many of the older messages as possible.
  3. Archive remaining messages by year.

Some additional thought is needed to resolve the compact issue. One possibility is antivirus interferring.

Can't stress enough that you should attempt to trim your sent folder size. Suggest target to reduce by half.

I think you are still reacting to the inexplicit 50GB from the first screen shot.
My current size (after fully downloading the Sent folder) is 3GB.

Also, to clarify: I did not reproduce the compact issue when it creates a massive file since I deleted my entire IMapMail dir and reconstructed it.
At this point I think there was some corruption.
In general, I think at this day and age it would be better to use sqlite to store the messages instead of creating a custom flat file implementation. That would be much more robust and reliable if a bit slower.

Re virus scanner:
I disabled the realtime windows virus protection and it doesn't make a difference for the indexing.
I am also running a virus scanner on the entire mail directory in the mail server now to be sure there are no mines there that can trigger the realtime virus protection.

When I disable the indexing for the 2009 folder, Thunderbird is able to download and index everything.
At this point I can either leave it excluded, or try to hunt down the problematic message.
Is there a way to enable verbose logging for the indexer?
If not, I can probably do a binary search by moving messages around in that folder until I find the offending message.

Thanks for setting me straight.

In general, I think at this day and age it would be better to use sqlite to store the messages instead of creating a custom flat file implementation. That would be much more robust and reliable if a bit slower.

It is headed in this direction - work is in progress.

I disabled the realtime windows virus protection and it doesn't make a difference for the indexing.

Good to know.

When I disable the indexing for the 2009 folder, Thunderbird is able to download and index everything.
At this point I can either leave it excluded, or try to hunt down the problematic message.
Is there a way to enable verbose logging for the indexer?

I don't foresee us attempting to twiddle old code that is soon to be replaced - cost/benefit - especially considering AV is somehow implicated. Unless there is a clear smoking gun or the code change out gets delayed.

It is headed in this direction - work is in progress.

Good to hear.

I don't foresee us attempting to twiddle old code that is soon to be replaced - cost/benefit - especially considering AV is somehow implicated. Unless there is a clear smoking gun or the code change out gets delayed.

Is the indexer being rewritten as well?
It feels like it hits a bad message and either gets stuck processing it or maybe an error is causing endless retries.
I was hoping to be able to pinpoint the offending msg via verbose logging of the indexer (that hopefully logs details about each msg as it's indexing it) and provide the offending msg to a new bug in the indexer (if the problem is of relevance to other people).

I am able to reproduce by manually compacting my Sent items.
The Sent file grew back to 47GB, even though the sent items folder on the server is only 1.5GB.
When compacting, I can see nstmp growing beyond the size of the Sent file which I believe should never happen while "compacting".
As a side note, exiting TB with alt-f4 while it's compacting does not delete nstmp.

A few additional observations:
I deleted Sent and Sent.msf and redownloaded the the folder fully.
As mentioned, the size on the server is around 1.5GB.
The local size on disk after downloading is matching that.
One strange thing, toward the end of the download, Thunderbird got stuck with the last few messages.
After restarting it it wouldn't download any new messages and things seemed ok.

Manually compacting does not do anything at this stage (finishes immediately, and report 0KB saved).

I should still be able to reproduce by unpacking the 50GB Sent and corresponding msf.

As a side note, exiting TB with alt-f4 while it's compacting does not delete nstmp.

If the compacting process is terminated by eg: exiting Thunderbird then it will create and leave the nstmp file - not delete it. Hence why you should not interupt the compacting process.
You see nstmp because the compacting process was interupted or could not complete for some reason. Maybe the Sent file was corrupted or contained a corrupted email.

In topicbox beta forum there has been a report of large nstmp file. The compacting process was not able to complete hence nstmp file of enormouse size.
Users reporting the issue did have some emails that were missing headers and garbled text.
After deleting those 'bad' messages, the compacting process worked ok.

Thanks Anje,

If the compacting process is terminated by eg: exiting Thunderbird then it will create and leave the nstmp file - not delete it. Hence why you should not interupt the compacting process.

I would call this is a bug.

  1. The tmp file should not be left there on orderly exit. (e.g. not kill -9 or a power outage).
  2. The tmp file should probably be cleaned up on startup.

You see nstmp because the compacting process was interupted or could not complete for some reason. Maybe the Sent file was corrupted or contained a corrupted email.

Everything is possible of course.
As a user I find it hard to debug though. I have 26k mails in my Sent items.
Whatever corruption causes it to grow from 1.5GB to 50GB, it happened at least twice.

In topicbox beta forum there has been a report of large nstmp file. The compacting process was not able to complete hence nstmp file of enormouse size.

Can you point to that message?
I did not read the code of the compacting process, but I would guess it's simply iterating on the messages and writing each non-deleted message to nstmp, and then swaping nstmp with the flat file.
If this is the case, the only situation where the compacting process would get stuck is if the iteration on the mails in the folder is infinite.
It should be possible to detect such a case (simply by observing that we iterated more emails than exist in the folder) and automatically repair the folder by redownloading it from the server.

Users reporting the issue did have some emails that were missing headers and garbled text.
After deleting those 'bad' messages, the compacting process worked ok.

Any help in detecting such 'bad' messages would be appreciated.

A few people experienced this so with can confirm it.
Ben, can you please investigate this and give it top priority?

Severity: -- → S1
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(benc)
Priority: -- → P1
See Also: → 1879897
Component: Untriaged → Database
Product: Thunderbird → MailNews Core
See Also: → 1872849

Sorry, that was meant for bug 1872849

I have this exact issue, dozens of useless nstmp files wasting 150 GB for an inbox weighing only 15 GB... This is a serious bug. Why not always clean up orphan files upon startup? What to do in the meantime?

(In reply to Erwin from comment #23)

I have this exact issue, dozens of useless nstmp files wasting 150 GB for an inbox weighing only 15 GB... This is a serious bug. Why not always clean up orphan files upon startup? What to do in the meantime?

Compact overhaul is nearing completion. Then we can give better advice.

Do you have automatic compact disabled?

Flags: needinfo?(traderwin)

(In reply to Wayne Mery (:wsmwk) from comment #24)

(In reply to Erwin from comment #23)

I have this exact issue, dozens of useless nstmp files wasting 150 GB for an inbox weighing only 15 GB... This is a serious bug. Why not always clean up orphan files upon startup? What to do in the meantime?

Compact overhaul is nearing completion. Then we can give better advice.

Do you have automatic compact disabled?

Automatic compact is enabled (provided it'll save at least 200 MB).

Flags: needinfo?(traderwin)

Automatic compact is enabled (provided it'll save at least 200 MB).

For now, suggest you disable automatic compact.

omry, Are you still seeing this problem with 125.0b1 or newer?

The patch for bug 1872849 is in 125.0b1 (available 2024-03-20/21) and newer.

Flags: needinfo?(omry)

(In reply to Wayne Mery (:wsmwk) from comment #27)

omry, Are you still seeing this problem with 125.0b1 or newer?

The patch for bug 1872849 is in 125.0b1 (available 2024-03-20/21) and newer.

I was just on a version of 125.0bx (< beta 4, but i was on 125) and had this happen. just reported yesterday bug 1889755

Duplicate of this bug: 1889755

I did not see it for a while, but I did have my auto-compaction disabled per troubleshooting done here.
I re-enabled it and report here if I see it again.
I just restarted my thunderbird for an update, my current version is 125.0b4.

Using 125b4 - still see nstmp fill the whole disk.

Clearing needinfo under the assumption that it still exists, I will still report in I run into it myself with 125b4 or newer.

Flags: needinfo?(omry)

I think this might the same issue as Bug 1890135.
If so, the problem occurs when you compact folders which have had all their messages deleted.
Can anyone confirm this?

If it's not the case, then I have actually re-written all message compaction code, and will be aiming to get that reviewed to land in the next week or so. It's much simpler and robust than the old compaction code, and much better at cleaning up after errors!

Flags: needinfo?(benc)
See Also: → 1890135

No, the problem here is not related to compacting a folder after deleting all the messages.
Good to hear you rewrote that code. Can you point to your diff?

A few points/ideas that would make things more robust (some mentioned above):

  1. Diligent error handling while writing the tmp file.
  2. Making sure that multiple processes/threads would not write to the same tmp file concurrently.
  3. Always cleaning up compaction tmp files on error.
  4. Clean up compaction tmp files on startup on case of an unclean shutdown during compaction.
  5. Ensure that the number of written messages while compacting does not exceed the number of messages in the folder being compacted.
See Also: → 1890230
See Also: → 1890448

(In reply to omry from comment #34)

No, the problem here is not related to compacting a folder after deleting all the messages.
Good to hear you rewrote that code. Can you point to your diff?

See Bug 1890448. There's a patch in phabricator attached there.

A few points/ideas that would make things more robust (some mentioned above):

  1. Diligent error handling while writing the tmp file.

Indeed.

  1. Making sure that multiple processes/threads would not write to the same tmp file concurrently.

There is some folder locking, but nothing at the FS level. All folder stuff is on the main thread anyway (so the JS frontend can use it).

  1. Always cleaning up compaction tmp files on error.

Yup. Using an existing file class which deletes itself if an explicit commit is not issued.

  1. Clean up compaction tmp files on startup on case of an unclean shutdown during compaction.

Nope... but that's mostly covered by 3. A hard power-down or extreme crash would probably leave something behind, I think...
Potentially could add a startup check, but there are some corner-case issues there too.

  1. Ensure that the number of written messages while compacting does not exceed the number of messages in the folder being compacted.

Not a problem with the new code. The store iterates through the existing messages as it writes out the new mbox, and never revisits ones it's already written or skipped.

See Bug 1890448. There's a patch in phabricator attached there.
Thanks.

  1. Making sure that multiple processes/threads would not write to the same tmp file concurrently.

There is some folder locking, but nothing at the FS level. All folder stuff is on the main thread anyway (so the JS frontend can use it).
Is it? There is a progress bar on compaction and the UI is not locked as far as I remember.

As for multiple processes:
It looks like attempting to open TB a second time focuses on the first window instead of opening a new instance so this is probably something safe to ignore and in unlikely to be related anyway.

  1. Clean up compaction tmp files on startup on case of an unclean shutdown during compaction.

Nope... but that's mostly covered by 3. A hard power-down or extreme crash would probably leave something behind, I think...
Potentially could add a startup check, but there are some corner-case issues there too.

What I have done before in cases like this was to drop a pid file in the folder, and check it on startup.
If the PID inside the pid file is different than my process pid and if the file was not touched recently (per file system metadata) I assume it's a stale file and clean it up (along with the old pid file).

  1. Ensure that the number of written messages while compacting does not exceed the number of messages in the folder being compacted.

Not a problem with the new code. The store iterates through the existing messages as it writes out the new mbox, and never revisits ones it's already written or skipped.

I suggested it as a simple fallback protection mechanism. If the file grows indefinitely the problem is likely either an infinite stream of messages coming while iterating (another possibility is very large messages being written due to some bug).
We can punt this for now as a total rewrite like yours have a good chance of working out the underlying bug anyway.

Duplicate of this bug: 1890290
Blocks: 1879897
See Also: 1879897
Duplicate of this bug: 1892566
Duplicate of this bug: 1896640
Duplicate of this bug: 1879897
Duplicate of this bug: 1883001
Blocks: 1890230
See Also: 1890230
Summary: nstmp file filling up the whole hard drive → nstmp file filling up the whole hard drive. (caused by compact)

FWIW, Bug 1798181 - when compact runs out of disk, partial nstmp is left behind, disk remaining full - fixed in version 102 a listener issue.

See Also: → 1798181

FWIW, Bug 1798181 - when compact runs out of disk, partial nstmp is left behind, disk remaining full - fixed in version 102 a listener issue.

Duplicate of this bug: 1899696

With bug 1890448 now fixed, I guess this bug is for cleaning up cases where nstmp files already got left behind.

Summary: nstmp file filling up the whole hard drive. (caused by compact) → nstmp file filling up the whole hard drive. (caused by compact) - need to clean up nstmp that already filled up the disk
Duplicate of this bug: 1901557
See Also: → 1901716

Ben, can you tackle this?

Assignee: nobody → benc

(In reply to Alessandro Castellani [:aleca] from comment #48)

Ben, can you tackle this?

Just to be clear, this means looking for and deleting nstmp files from failed compaction attempts before the compaction rewrite, right?
I'm a little nervous about just deleting any file matching nstmp(-[0-9]+)?... you just know there's someone out there with folders called "nstmp".
It'd have to be matched up against the folder structure to make sure you weren't deleting a legit mbox called "nstmp-53".

Would it make more sense to do this in javascript as part of a startup migration-check pass?

Flags: needinfo?(alessandro)

Yes indeed, I was thinking a startup migration check should be the way to go.
Do we have a way to safely detect only those files that have been generated by folder compaction? To make sure we exclude any user generated content.
NI Geoff and Magnus to get some validation or suggestions here.

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(geoff)
Flags: needinfo?(alessandro)

That's why it would probably be a good idea to write those temp files to a dedicated directory instead of the same location as the actual mailboxes.

Best I've got is to check a candidate file doesn't match the filePath or summaryFile of any nsIMsgFolder. If we're only looking for things starting with nstmp speed shouldn't be any more of an issue compared to iterating through the files.

This wouldn't prevent accidentally deleting files in accounts that have been removed from Thunderbird but not from the file system. I guess we could look only in the files of active accounts.

Flags: needinfo?(geoff)

(In reply to Frank Winkler from comment #51)

That's why it would probably be a good idea to write those temp files to a dedicated directory instead of the same location as the actual mailboxes.

You need to write it to the same filesystem, so you can be sure of an atomic(ish) replacement.

(In reply to Geoff Lankow (:darktrojan) from comment #52)

Best I've got is to check a candidate file doesn't match the filePath or summaryFile of any nsIMsgFolder. If we're only looking for things starting with nstmp speed shouldn't be any more of an issue compared to iterating through the files.

This wouldn't prevent accidentally deleting files in accounts that have been removed from Thunderbird but not from the file system. I guess we could look only in the files of active accounts.

What Geoff said :-)
But if it's a startup/migration-check routine, do we have the accounts and folder structure up and running by that time?

It could be done at folder-compaction time (in c++, ick ;-), but it does mean looking at folders outside the one you're actually compacting which just feels a little odd (but I'm sure I'll get over it!)

Also bear in mind these files can be in a non-standard location, not in the profile directory proper. People do all kinds of weird, wacky stuff in the name of efficiency, performance or whatever comes to mind.

The accounts etc. are available for migration tasks yes.
I don't have much to add. Both migration and doing it compact time has their own merits.

Flags: needinfo?(mkmelin+mozilla)

(In reply to Ben Campbell from comment #53)

(In reply to Frank Winkler from comment #51)

That's why it would probably be a good idea to write those temp files to a dedicated directory instead of the same location as the actual mailboxes.

You need to write it to the same filesystem, so you can be sure of an atomic(ish) replacement.

A temporary subdirectory with a unique name that cannot be otherwise created would solve this?

(In reply to everx80 from comment #56)

A temporary subdirectory with a unique name that cannot be otherwise created would solve this?

Yeah, that'd probably do it, but I'm not planning to make further changes to that just now. My compaction rewrite a few weeks back uses an existing file class from m-c (via NS_NewSafeLocalFileOutputStream()) which creates a unique file and automatically deletes it if an explicit commit is not performed. The old code required explicit cleanup handling, and there were lots of complicated error conditions could mean it'd get skipped.
So the new code is waaaaaay more robust and better at cleaning up.

Yes, it's probably possible to screw it up by abruptly killing (rather than shutting down) the app at a critical point, or powering off... but I'm not so worried about that. That way lies fsync() madness :-) The benefits of using the battle-tested NS_NewSafeLocalFileOutputStream() rather than rolling my own seems like a good trade-off for now.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: