Closed Bug 1833665 Opened 10 months ago Closed 8 months ago

TB115 Large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes)

Categories

(Thunderbird :: Folder and Message Lists, defect, P1)

Thunderbird 115
Unspecified
All

Tracking

(thunderbird_esr102 unaffected, thunderbird_esr115+ fixed, thunderbird115 wontfix, thunderbird116+ fixed)

RESOLVED FIXED
117 Branch
Tracking Status
thunderbird_esr102 --- unaffected
thunderbird_esr115 + fixed
thunderbird115 --- wontfix
thunderbird116 + fixed

People

(Reporter: markhfm, Assigned: darktrojan)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, regression, Whiteboard: [supernova3p])

Attachments

(2 files, 1 obsolete file)

Attached file Thunderbird Daily Memory Usage.docx (obsolete) —

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0

Steps to reproduce:

Deleted a large number of emails from trash folder.

Actual results:

The deletion took a long time (over 4 minutes) and Thunderbird Inbox for the accounts consumed over 13 Gig of memory.

Expected results:

The mail deletion should have taken less time AND Thunderbird should not consume that amount of memory.

Deleted a large number of emails from trash folder.

Please be more clear on the number of messages.
And, did it it not fail for you in the same way in previous versions of Thunderbird?
Thanks.

Flags: needinfo?(markhfm)
Version: 15 Branch → Thunderbird 115

The deleted messages were in the hundreds; I am not sure of the exact number.
The deletion did eventually complete successfully. (Note that while this was executing, Task Manager listed the inbox as not responding.)
I have not had this issue with the Beta version of Thunderbird, though I cannot say I have tried to delete the exact number of "trash folder" messages from both versions.

In general deleting emails from the trash folder does seem to take longer in Daily than in Beta. I do not have any timed, equal number of emails comparisons however.

Flags: needinfo?(markhfm)
Keywords: perf
Summary: Thunderbird Daily Consumes Large Amounts of Memory → Thunderbird Daily consumes large amounts of memory

Reporter's screenshot extracted from comment 0, attachment 9334633 [details].

Attachment #9334633 - Attachment is obsolete: true

Bug 1828372 is remotely similar territory.

  • Btw, Mark, bug seen on POP or IMAP?
  • Can you reproduce with ≡ > Help > Troubleshoot Mode…?

CC'ed Sean and Gene.

See Also: → 1828372
Summary: Thunderbird Daily consumes large amounts of memory → Thunderbird Daily consumes large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes)

I tested with >600 messages on Mac with 115.0a1 (2023-05-18) (64-bit). Results were virtually instantaneous, and no significant increase in memory.
gmail, imap.

Can you reproduce with ≡ > Help > Troubleshoot Mode…?

I suggest test with Windows started in safe mode

Flags: needinfo?(markhfm)
OS: Unspecified → Windows 10

Meanwhile, bug 1834056 reports pretty much the same (leaving open for separate analysis until we're sure it's a duplicate - feel free!)

See Also: 18283721834056

(In reply to Wayne Mery (:wsmwk) from comment #5)

I tested with >600 messages on Mac with 115.0a1 (2023-05-18) (64-bit). Results were virtually instantaneous, and no significant increase in memory.
gmail, imap.

  • What's the size of that folder before the deletion?
  • Has it been a Trash folder?
  • What's the size of those > 600 deleted messages?
Flags: needinfo?(vseerror)

(In reply to Thomas D. (:thomas8) from comment #6)

Meanwhile, bug 1834056 reports pretty much the same (leaving open for separate analysis until we're sure it's a duplicate - feel free!)

I'm seeing something very similar and possibly worse over at bug 1834056 comment 5

Something's not clear (to me) here. Is the reporter deleting messages from Inbox to Trash or emptying Trash folder? The summary description says the problem occurs when "...deleting hundreds of messages from Trash folder..." Should this actually say "...deleting hundreds of messages to the Trash folder..."?

Anyhow, assuming the reporter means "deleting to trash folder", what I describe at bug 1834056 comment 5 is exactly what the reporter sees. The more messages you are trying to move to trash, the longer it takes and the more memory TB consumes in preparing the single imap "move" command that does the transfer. If you select 1000's of emails to delete into trash, all your physical memory can be consumed and TB crashes for lack of memory. (TB mozilla framework doesn't seem to use any "swap" memory and there is plenty of that left, at least on my system.)

I don't know when this high memory consumption started but with a 110 daily I still have available to run (pre-SN), it doesn't happen and it doesn't happen with 102 release. However, even with those version, I see the deletion taking longer depending on the number of messages being deleted. If I try to delete a lot of messages like 30,000, it also takes a while to prepare the imap command and CPU usage is actually higher than with the current daily, but memory usage (per linux "top") remains constant and about 7% and never goes up and the UI doesn't respond. Once the imap command is built and sent, the UI responds again and the time to complete the move to Trash is mostly due to the imap server doing the moves to Trash internally and after maybe a few minutes reports back an OK response to the imap move command.

Status: UNCONFIRMED → NEW
Ever confirmed: true

My issue occurred when deleting emails from the Trash folder. Moving emails to the trash folder does not cause the issue for me. However, I have never moved hundreds of emails to the trash folder.

Also, my issue occurred on Windows 11 (Build 25357).

Flags: needinfo?(markhfm)

markhfm,
You are saying you are not emptying a Trash folder with a lot of message. You are selecting a large number of message that have gradually accumulated in Trash and doing delete on them. If you do this instead of "empty" trash, what happens is the selected messages just get marked with "deleted" imap flag. If you don't "compact" the trash folder after deleting the selected messages, the messages will still remain in TB database and on the server (but not really related to the issue).

Anyhow, doing this with 6000 msgs in Trash, I see the same problem: log prints "entering", memory usages ramps up slowly, UI locked up and I get tired of waiting and/or tb crashes due to lack of memory. When I restart tb, the messages seem to be deleted but it's not true. Looking at "exists" response it still shows the 6000 messages. Have to do "repair" of Trash to show the messages again which takes a while because all headers have to be downloaded.

Another observation: If I remain selected on a folder (like Trash in this example) and do a repair, the download of headers is very slow. If I move off of the folder doing the repair, the download speeds up and quickly finishes. I think the constant message pane updates slows down the repair process. This is also observed when a lot of new messages arrive in a folder and have to be downloaded, so not directly caused by the repair.

Don't see these problems with 110 and earlier. Attaching daily with gdb while "entering" in progress and memory use ramping up, I don't see explicit TB code running and hogging the CPU. Just javascript, garbage collection, jit processing and eventually see something oddly called "SnowWhiteKiller::Visit()", but this might be right before memory is completely gone.

Note: while writing this comment and again deleting about 6000 msgs from trash, it actually finished OK but took an extreme long time. I think memory usage reached over 50% but I wasn't looking at "top" when it finished. I definitely saw a crash when trying to move about 30k msgs to trash previously.

Tried to debug this more with gdb. It appears that TB is quickly forming the URL to do the delete but something non-IMAP related is taking over the CPU and and blocking the imap activities.

Gene,
Thanks for your detailed response. I typically sort the items in the trash folder by sender and then delete them by selecting 50 to 70 at a time. I do this this as a check to ensure I did not inadvertently send an email to the trash folder that I actually need. I encountered the issue I initially referred to when I selected (Ctrl + A) the "remaining" several hundred (number is an estimate) emails and pressed the Del key.

Selecting all messages in the Trash folder and then using Empty Trash worked quickly and without memory spiking, although I had significantly fewer emails to delete.

Also, to answer Thomas D.'s question, this is on POP.

(In reply to markhfm from comment #12)

Also, to answer Thomas D.'s question, this is on POP.

I had assumed this was just an IMAP issue and I've only tested with that and saw it there.
I'll try similar activity on a local mailbox trash folder and see what happens.

Ok, I tried a moving 14000 message from a folder to trash and from trash back to the original folder, both within Local Folders (which would be equivalent to a POP3 folder for the most part). I see the same thing as with IMAP. It takes a long time and the UI freezes and memory usages slowly ramps up to about 50% (of 16G ram) before the move completes.
Doing this with 102 release, the memory usage remains constant at about 5% and the transfer of 14000 messages within Local Folders is much faster. However, even with 102 the UI locks up during the transfer which takes about 30 seconds vs. maybe 5 minutes for daily.
I didn't try deleting the messages from Trash since I wanted to keep the messages around (just test messages). But I expected I would see the same thing.

(In reply to Thomas D. (:thomas8) from comment #7)

(In reply to Wayne Mery (:wsmwk) from comment #5)

I tested with >600 messages on Mac with 115.0a1 (2023-05-18) (64-bit). Results were virtually instantaneous, and no significant increase in memory.
gmail, imap.

  • What's the size of that folder before the deletion?
  • Has it been a Trash folder?
  • What's the size of those > 600 deleted messages?

The messages involved were in a trash folder IIRC.
I don't know the size, but the size should not matter. For imap the limiting performance factor tends to be number of messages, not size.

Flags: needinfo?(vseerror)
Severity: -- → S3
Summary: Thunderbird Daily consumes large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes) → Large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes)

Let's assume this is supernova related.
We'd need to identify the source of the message MEMORY increase.

Blocks: sn-msglist
Whiteboard: [supernova3p]

WIndows 11, 64 bit, SolidState drive. TB 115.0b4
As an experiment, I copied over 6000 emails from a trash folder to trash on a second account, verified that they were there, and then emptied trash on the second account (the copies). Emptying was instantaneous. I didn't even really have time to start my stopwatch.
Copying the emails from the original trash to the other account took maybe 90 seconds at most.
If someone can tell me how to modify the experiment, I will try again.

I repeat the test I described in comment 14 with 14000 messages. It's working fine now. This is with a self-built 116.0a1 with debug enabled so it's not going to be optimally fast (but seems OK). So I would say this can be closed as INVALID or WORKSFORME since somehow it has become "fixed" as a side-effect of some other fix.

can be closed as INVALID or WORKSFORME

Maybe I need to do more tests. When I compacted out about 6000 messages from a IMAP trash folder, a major hang occurred. I even attached gdb to see where the hang was. It was the main task mentioning a lot of "Gecko" stuff on stack which I know nothing about. It didn't show any problems in TB specific code that I could tell. Anyhow, after I detached from gdb, the UI eventually freed up again and seems OK now.

Pinging Geoff and Ben to see if they can quickly investigate this on Daily with the latest improvements on message deletion.
If nothing comes up, I'm okay with tentatively marking this as WORKSFORME

Status: NEW → RESOLVED
Closed: 8 months ago
Flags: needinfo?(geoff)
Flags: needinfo?(benc)
Resolution: --- → WORKSFORME

Ouch, sorry, clicked the wrong flag.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

I think I'm seeing the same problem... but big caveats.
Self-built debug build, but haven't pulled new code since last week (It's also got all my mbox patches in there, so it could be something I've done wrong in there).
In any case, I'll do a full sync up tonight and try again with a fresh, unadulterated, up-to-date debug build tomorrow.

My test steps:

  1. Craft an mbox with 14000 dummy messages in it.
  2. copy the mbox file into my "Local Folders" as "fook".
  3. run TB and see there's a new local folder "fook", with 14000 messages in it.
  4. CTRL-a to select all the messages in "fook"
  5. Drag them onto the "Trash" folder

Observations:

  • the status bar message says it's "moved X messages of 14000 to Trash", counts up to about 13900... then UI stops responding for about 10 mins.
  • didn't see any runaway memory use. Usage went up by about 0.2GB while UI was locked up.
  • once it came back, it all seems to have worked fine.
  • copying back from Trash to another local folder produces the same UI lockup. Looking at the filesystem, all the messages seem to copy over into the destination folder mbox file nice and quick. All the messages seem there while the UI is locked.

Anyway. I'll do a new build and some proper debugging (printf-ing!) tomorrow.

I see the about the same thing when I move 14000 msgs from a maildir based local folder to local folder Trash (maildir, not mbox). UI locks up for maybe 5 minutes while the transfer occurs with top showing 100% cpu in TB process. The memory use goes up by about 2% from normal peak 6% to about 8% constant during the xfer and then goes back to normal after the xfer.

Tried a similar test with imap. Copied about 30k msgs to Trash which went OK. Then I tried to delete all the messages from trash and it did what I saw before (similar to comment 11): cpu went to 100% and memory use ramped up to around 50% and eventually TB crashed after about 40 minutes. Not sure what is causing this. Will look closer.

Easier steps to reproduce: Select all the messages in a local folder, and press SHIFT+DELETE (delete without copy to trash), and click yes on the confirmation dialogbox.

The slowdown is occurring while it's deleting the message headers from the msgDB
https://searchfox.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#1034

          for (auto hdr : msgHeaders) {
            rv = msgDB->DeleteHeader(hdr, nullptr, false, true);
          }

This loop runs at about 10 messages/second on my machine in this case, which means that it takes a loooong time.
The msgDB DeleteHeader() notifies DB listeners about the delete.

For each header delete, I hit an assertion in nsMsgDBView::GetIndicesForSelection():

ASSERTION: selection count is wrong: 'indices.Length() == uint32_t(count)'

Basically, indices.Length() counts down from 14000 (or whatever), but count is always out by 1, eg:
indices.Length()=3727 count=3728

The C++ call stack at the time of assertion is:
nsMsgDBView::GetIndicesForSelection()
nsMsgDBView::SelectionChangedXPCOM()
... javascriptland...

And the JS call stack is:

(gdb) js
0 _fireSelectionChanged() ["chrome://messenger/content/tree-selection.mjs":691:26]
    this = [object Object]
1 set selectEventsSuppressed(suppress = "false") ["chrome://messenger/content/tree-selection.mjs":674:11]
    this = [object Object]
2 adjustSelection(index = "0", count = "-1") ["chrome://messenger/content/tree-selection.mjs":656:4]
    this = [object Object]
3 rowCountChanged(index = "0", delta = "-1") ["chrome://messenger/content/tree-view.mjs":1052:20]
    this = [object HTMLElement]
4 rowCountChanged(index = "0", count = "-1") ["chrome://messenger/content/about3Pane.js":4566:19]
    this = [object Object]
5 cmd_shiftDeleteMessage() ["chrome://messenger/content/mailCommon.js":279:26]
    this = [object Object]
6 doCommand(command = ""cmd_shiftDeleteMessage"") ["chrome://messenger/content/mailCommon.js":674:37]
    this = [object Object]
7 anonymous() ["chrome://messenger/content/about3Pane.js":6117:22]
8 doCommand(command = ""cmd_shiftDelete"") ["chrome://messenger/content/mailCommon.js":674:37]
    this = [object Object]
9 goDoCommand(command = ""cmd_shiftDelete"") ["chrome://messenger/content/globalOverlay.js":99:17]
    this = [object Window]
10 oncommand(event = "[object XULCommandEvent]") ["chrome://messenger/content/messenger.xhtml":1:11]
    this = [object XULElement]

(I think depth 5 is where the delete operation as a whole was kicked off, and my guess is that 0-4 are per-message, called via notification from the msgDB DeleteHeader() call).

(In reply to gene smith from comment #23)

Tried a similar test with imap. Copied about 30k msgs to Trash which went OK. Then I tried to delete all the messages from trash and it did what I saw before (similar to comment 11): cpu went to 100% and memory use ramped up to around 50% and eventually TB crashed after about 40 minutes. Not sure what is causing this. Will look closer....

That was with a not quite up-to-date build. When I pull in the current tip I see on imap what Ben sees. When I select all my 30k messages in Trash and delete them (with move to trash selected) I see this (replacing the assert with a printf):

gds:length=30190, count=30191
gds:length=30189, count=30190
gds:length=30188, count=30189
gds:length=30187, count=30188
gds:length=30186, count=30187
gds:length=30185, count=30186
:

While this is printing (kind of slow, maybe 10 per sec?) I see memory usage ramping up and it eventually crashes before it finishes.
I think the assert may be because the display of selected messages is always changing while deleting.

But if I set "just mark as deleted" it works OK deleting 30k messages except for a very short UI lockup.
Maybe because the display is not changing except for a strike-through being drawn.

I suspect if you wrap part or all of nsMsgDBView::DeleteMessages with mJSTree->BeginUpdateBatch/EndUpdateBatch calls the problem might go away. Can you try that Gene?

Flags: needinfo?(geoff)

I think the ASSERTION: selection count is wrong: 'indices.Length() == uint32_t(count)' is a red herring. It's still an issue (which I'll write up as a separate bug), but just a minor cosmetic one.

I think the real problem is that it's constantly shrinking the selection, one message at a time, and causing loads of UI churn...
So, if you select 15000 messages and SHIFT-DELETE them, TreeSelection.adjustSelection() gets called 15000 times, each time just shaving off the first message in the selection. And each time it does that, it ends up calling TreeView.#doInvalidateRow() a whole bunch of times (I think once for each row on screen?). A lot of row invalidations. This seems wrong. There are some bits in the code which hint at disabling such events when doing bulk operations, but I really don't know it well enough to know how it should be done...
I could be wrong here - I'm not at all familiar with how the GUI decides when to draw (or not). So I think I've taken it about as far as I can. But NI me again if there's anything you want me to poke at further!

I guess the XUL treeview interface was all based around single-row operations. If we're not bound by that any more (we might be, I'm really not sure!), maybe it's worth considering swapping out methods for ones that operate on multiple rows?

(In reply to Geoff Lankow (:darktrojan) from comment #26)

I suspect if you wrap part or all of nsMsgDBView::DeleteMessages with mJSTree->BeginUpdateBatch/EndUpdateBatch calls the problem might go away. Can you try that Gene?

Note to self: Refresh browser before adding a comment!
I just tried slapping a BeginUpdateBatch() and EndUpdateBatch() pair at the start/end of nsMsgDBView::DeleteMessages(), and it seems to have done the trick!

(In reply to Ben Campbell from comment #27)

...There are some bits in the code which hint at disabling such events when doing bulk operations, but I really don't know it well enough to know how it should be done...

That'd be BeginUpdateBatch() and EndUpdateBatch() then :-)

I don't have time right now to do a proper patch, and I'll likely be away most of tomorrow, but this looks like the solution.
ScopeExit looks like it'd be a good way to ensure EndUpdateBatch() runs even if DeleteMessages() exits early...

Flags: needinfo?(benc)

It looks like if I hadn't screwed up and cleared the flag m_deletingRows in bug 1837041, this would be working properly now. I'm not sure what the problem was before that patch landed, but everything seems okay after I fix my mistake.

Assignee: nobody → geoff
Status: REOPENED → ASSIGNED

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/e5e02e731940
Stop clearing flag in nsMsgDBView::OnHdrDeleted. r=kaie

Status: ASSIGNED → RESOLVED
Closed: 8 months ago8 months ago
Resolution: --- → FIXED

The comment 30 patch fixes the main problem: deleting a large number of messages from trash now completes fairly quickly and doesn't use excess memory, at least for imap. I haven't tested it but should also work for Local Folders. Looks good on Local Folders too.

However, there are still problems that should maybe be put in another bug:

  • There is still a short period while deleting where the UI is not responding. However, 102 has the same issue when deleting lots of messages so probably OK.
  • After copying ~30K messages to Trash (or probably any other folder) often I see continuous attempts to imap SELECT trash folder if I try to open a messages in trash while the headers are downloading. The attempt to run the select URL always fails (maybe since trash is already selected) and tries again and never stops retrying and holds the CPU at 100+%. Not sure if memory use increases. The only way to stop this is to restart.
    Edit: need to try some more, but I haven't been able to duplicate this.
  • When downloading headers for a folder that just got the ~30K messages copied into, the header download is much slower (almost sluggish) when the folder is selected. If you move off of that folder the header download goes much faster.

The messages all get marked as read before being moved to the trash. This could be causing some of the not-responding time. And it happens before the user is prompted about deleting, which is bad. I plan to look at it today in a new bug.

As an aside, I've written up the assertion issue as Bug 1842083

See Also: → 793353

Nice work. Ugh, just missed this for beta.

Keywords: regression
OS: Windows 10 → All
Priority: -- → P1
Regressed by: 1837041
Summary: Large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes) → TB115 Large amounts of memory (13 GB) when deleting hundreds of messages from Trash folder, and takes long (4 minutes)
Target Milestone: --- → 117 Branch

(In reply to gene smith from comment #32)

The comment 30 patch fixes the main problem: deleting a large number of messages from trash now completes fairly quickly and doesn't use excess memory, at least for imap. I haven't tested it but should also work for Local Folders. Looks good on Local Folders too.

However, there are still problems that should maybe be put in another bug:

  • There is still a short period while deleting where the UI is not responding. However, 102 has the same issue when deleting lots of messages so probably OK.
  • After copying ~30K messages to Trash (or probably any other folder) often I see continuous attempts to imap SELECT trash folder if I try to open a messages in trash while the headers are downloading. The attempt to run the select URL always fails (maybe since trash is already selected) and tries again and never stops retrying and holds the CPU at 100+%. Not sure if memory use increases. The only way to stop this is to restart.
    Edit: need to try some more, but I haven't been able to duplicate this.
  • When downloading headers for a folder that just got the ~30K messages copied into, the header download is much slower (almost sluggish) when the folder is selected. If you move off of that folder the header download goes much faster.

No lack of bug reports, starting with the most recent :

  • Bug 1805356 - Thunderbird "not responding", for up to 30 minutes, while moving the messages to trash
  • Bug 1808614 - gui freeze while moving large amounts of email messages (mbox, Local Folders, offline)
  • more generally https://mzl.la/3OgJLOz, which includes
  • Bug 398684 - Virtual memory size of Thunderbird increases 40MB after each "Shift+Delete of all 40,000 mails and Compact folder" (large selection)

FYI bulk message operations is our number one cause of OOM crashes (according to user comments), and OOM is our #1 crash signature https://crash-stats.mozilla.org/topcrashers/?product=Thunderbird&version=102.13.0

Comment on attachment 9342544 [details]
Bug 1833665 - Stop clearing flag in nsMsgDBView::OnHdrDeleted. r=#thunderbird-reviewers

[Triage Comment]
Approved for beta

Attachment #9342544 - Flags: approval-comm-beta+

Comment on attachment 9342544 [details]
Bug 1833665 - Stop clearing flag in nsMsgDBView::OnHdrDeleted. r=#thunderbird-reviewers

[Triage Comment]
approved for esr115

Attachment #9342544 - Flags: approval-comm-esr115+
You need to log in before you can comment on or make changes to this bug.