Closed Bug 1988329 Opened 3 months ago Closed 16 days ago

Send unsent messages action hangs when outbox has many messages queued in ver 142

Categories

(MailNews Core :: Networking: SMTP, defect)

Thunderbird 142
defect

Tracking

(thunderbird_esr140 unaffected, thunderbird146 affected)

RESOLVED FIXED
147 Branch
Tracking Status
thunderbird_esr140 --- unaffected
thunderbird146 --- affected

People

(Reporter: lloyd, Assigned: gds)

References

(Regression)

Details

(Keywords: regression)

Attachments

(4 files)

Steps to reproduce:

Used Mail Merge add-on https://addons.thunderbird.net/en-us/thunderbird/addon/mail-merge/ to compose several hundred personalized emails to students at the start of term (as we have done in the past). The add on stored the queued messages in the Outbox for release after inspection. Selected "Send Unsent Messages" after the content was reviewed

Actual results:

Ten messages were sent then sending stalled. Activity Manager showed the stalled message. Restarting Thunderbird did not clear the backlog. Tried several times with 2 smtp servers.

Expected results:

All the messages in the Outbox should have been sent. Installed Thunderbird version 128 and repeated and all the messages sent.

On Windows 11 and Windows 10 machines.

Does the 'Error Console' mention anything about the stalling ?
if yes, it would be helpful to see what it says - upload an image file showing the error console.
Tools > Developer Tools > Error Console

Did you get any pop up error in sending etc type of message ? It would be useful to see an image of this.

You did not mention what server you were using.
Did you by any chance reach some sending limit set by the server which may have time lapsed by the time you had gone through the performance of downgrading.?
Could the stalling be coincidental as perhaps others were also sending a load of emails on same network at same time and it caused a temp network overload due to lots of traffic? I ask this as I note you have .edu email address, so it's possible you were on a campus network?

You mention trying two smtp outgoing servers, but email address would need to be ok to send using different server or is it same server just different email address - in which case is it same network?

If you upgrade Thunderbird to current version is the issue repeatable?

Flags: needinfo?(lloyd)

(In reply to Anje from comment #2)

Does the 'Error Console' mention anything about the stalling ?

Failed to send "RCPT TO: <address>" because socket state is closed

if yes, it would be helpful to see what it says - upload an image file showing the error console.
Tools > Developer Tools > Error Console

Did you get any pop up error in sending etc type of message ? It would be useful to see an image of this.

No. The number of messages counts down by 10 then stops. If I bring up the Activity Manager the blue line is bouncing back and forth while the text reads "Sending Message". It can stay in this state overnight.

You did not mention what server you were using.

Our internal smtp server and smtp.office365.com both failed.

Did you by any chance reach some sending limit set by the server which may have time lapsed by the time you had gone through the performance of downgrading.?

Could the stalling be coincidental as perhaps others were also sending a load of emails on same network at same time and it caused a temp network overload due to lots of traffic? I ask this as I note you have .edu email address, so it's possible you were on a campus network?

You mention trying two smtp outgoing servers, but email address would need to be ok to send using different server or is it same server just different email address - in which case is it same network?

With the Windows 64-bit Thunderbird 142.0 (Release date Aug 19 2025) on 3 computers and 4 separate user profiles I've found that the "Send Unsent Messages" option on the Outbox successfully sends a multiple of 10 of the queued messages before stalling on the next message and staying hung for many hours. Restarting Thunderbird might release another batch of 10, but not reliably, and will again stall as seen in the Activity Manager.

Happens even in a profile without any third party add-ons installed where email messages are written manually and queued to the Outbox using File => Send Later (Ctrl-Shift-Enter)

When using Ver 142 and using the add-on with Delivery Mode "Send Now" ( https://addons.thunderbird.net/user-media/previews/full/206/206934.png ) where the messages are not queued up in the Outbox for inspection, but instead passed to the smtp server immediately after the add-on composes them, equivalent to File => Send Now (Ctrl-Enter), all the messages are sent.

The addresses used for testing are all internal and the logs on our server indicate 10 messages sent then the client apparently closes the connection or stalls. The same addresses work fine from older versions of Thunderbird (128.0esr, 140.0esr on Window, 128esr on Linux) Download and upgrade to 142.0 Release Channel from https://www.thunderbird.net/en-US/thunderbird/all/ and the failure to send problem returns with the same test conditions.

If you upgrade Thunderbird to current version is the issue repeatable?

The un-install / downgrade / upgrade testing has all been done on the same Windows 11 machine. 142.0 is the current version from the link above.

Flags: needinfo?(lloyd)
Attached image image.png

Screen shot showing

  1. Error Console message socket state is closed
  2. Activity Manager stalled sending message

You can try setting mail.smtpserver.default.max_messages_per_connection to something other than 10

Lloyd, can you try comment 5?

(In reply to Magnus Melin [:mkmelin] from comment #5)

You can try setting mail.smtpserver.default.max_messages_per_connection to something other than 10

It sends up to the new limit then hangs on sending the next message

(In reply to Lloyd Smith from comment #7)

(In reply to Magnus Melin [:mkmelin] from comment #5)

You can try setting mail.smtpserver.default.max_messages_per_connection to something other than 10

It sends up to the new limit then hangs on sending the next message

Thanks for feedback.
Maybe you can check to see what sending limits apply to the relevant smtp server. Some have daily sending limits, others there may be an hourly sending limit.
Then you can set the preference to something below that limit to ensure you are able to send as much as possible.

If I install an older release of Thunderbird (eg. 140.3.0esr ), run a mail merge of say 200 messages set to accumulate in the Outbox then "Send unsent messages" all the messages are sent and the Outbox clears. If I then install the latest Thunderbird 143.0 from the monthly release channel and repeat the test, only 10 messages send and the next message stalls. Thunderbird is still usable, the Outbox just never clears. No changes in what smtp server is used and mail.smtpserver.default.max_messages_per_connection is 10 for 140.3.0esr same as 143.0.

We can use the current version if we bypass the Thunderbird Outbox; which is what staff are now doing. I've had them modify their workflow to use the Mail Merge add-on set for delivery mode "Send Now".

Previously, (and with 140.3.0esr), they were able to inspect the generated emails for correctness, (spelling, addressing, attachments, etc.), then release the messages from the Outbox. Now with Tbird 142 and as of last week also ver 143 once they are content with the message appearance they just discard the messages in Outbox and re-run the merge sending the messages directly.

FWIW, our smtp server is Postfix. We see the same issue with smtp.office365.com, but I did one test with smtp.gmail.com and all 200 test messages sent, so maybe there is a server component.

Most of our Linux systems have an ESR version of Thunderbird, which still works. I did try v142.0 on one Linux machine and it failed to clear the Outbox also, so I don't think it's just the Windows version.

I am the developer of the add-on Mail Merge.

A user of my add-on recently reported a problem that Thunderbird hangs after a few messages when using "File -> Send Unsent Messages" - starting with Thunderbird 142. (Thunderbird 140 is not affected!) While I was investigating the issue, I found this bug report, which seems to be the same problem.

I was able to reproduce the problem with my day-to-day profile for multiple identities and multiple email providers: After 10 messages Thunderbird hangs and does not send any further messages from the "Outbox"; afterwards the menu item "File -> Send Unsent Messages" is deactivated.

It is also no longer possible to send new messages via the toolbar button "Send" or the menu item "Send Now" in the compose window for the affected identity. (Or via Mail Merge with "Send Now" as "Deliver Mode".)

I have run mozregression-gui, but - because the problem is not 100% reproducible for me - I wasn't able get a meaningful result. However, I am pretty confident that 2025-06-29 is the last known good and 2025-07-01 is the first known bad version - I have confirmed this through multiple manual test runs. This is also in line with Bug 1965690, which I suspect to be the culprit and which eventually landed in Thunderbird 142.

The exact number of successful delivered messages depends on the setting mail.smtpserver.default.max_messages_per_connection = 10 and it seems to also depend on a bit of luck or misfortune. I suspect there to be some kind of timing problem involved.

Here is the relevant SMTP log:

mailnews.smtp: Reusing a connection SmtpClient.sys.mjs:136:19
...
mailnews.smtp: Total RCPTs during this connection: 10 SmtpClient.sys.mjs:1297:19
...
mailnews.smtp: Message sent successfully. SmtpClient.sys.mjs:1359:21
mailnews.smtp: C: QUIT SmtpClient.sys.mjs:683:19
mailnews.smtp: Reusing a connection SmtpClient.sys.mjs:136:19    <--- ?????
mailnews.smtp: C: MAIL FROM:<xxx> BODY=8BITMIME SIZE=1512 SmtpClient.sys.mjs:683:19
mailnews.smtp: S: 221 2.0.0 Bye SmtpClient.sys.mjs:452:17
mailnews.smtp: Socket closed. SmtpClient.sys.mjs:550:17
mailnews.smtp: MAIL FROM successful, proceeding with 1 recipients SmtpClient.sys.mjs:1230:17

This last "Reusing a connection" is disturbing, because I don't think it should happen at all. Thunderbird already has sent 10 messages over the same connection and sent QUIT to the server. Nevertheless it apparently tries to send yet another message over the still open connection, which is going to be closed imminently.

When I set mail.smtpserver.default.max_messages_per_connection = 1, which should effectively disable reusing a connection, then Thunderbird apparently still tries to reuse a connection:

mailnews.smtp: Message sent successfully. SmtpClient.sys.mjs:1359:21
mailnews.smtp: C: QUIT SmtpClient.sys.mjs:683:19
mailnews.smtp: Reusing a connection SmtpClient.sys.mjs:136:19
mailnews.smtp: S: 221 2.0.0 Bye SmtpClient.sys.mjs:452:17
mailnews.smtp: Socket closed. SmtpClient.sys.mjs:550:17

Unfortunately I was not yet able to reproduce the problem in a new profile.

Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: SMTP
Ever confirmed: true
Keywords: regression
Product: Thunderbird → MailNews Core
See Also: → 1965690

My add-on Mail Merge is not directly involved wrt the problem:

The problem occurs even if add-ons are disabled or not installed and even in "Troubleshoot Mode". Interestingly delivering the individual messages directly via Mail Merge with "Send Now" as "Deliver Mode" does work without problems and doesn't trigger the problem. (As long as the problem hasn't occured before.)

The only indirect effect of Mail Merge wrt the problem: My users tend to create a lot of messages in the "Outbox", which are then delivered by Thunderbird itself via "File -> Send Unsent Messages".

I have been in close contact with the user, who originally reported the problem to me.

He is actually able to reproduce the problem in a new profile - reliably!

He just tested Daily 2025-06-29 and Daily 2025-07-01 - the two versions I suspected as last known good and first known bad in my comment #10 - but both versions didn't work for him and show the same problem. This means that Bug 1965690 may not be the culprit after all. (As I am still not able to reproduce the problem reliably I don't trust my analysis regarding the affected versions in my comment #10 anymore.)

He also tested the latest Daily 2025-11-14, which also didn't work for him. Then he tested Thunderbird 142 - with problem! Finally, he tested Thunderbird 141 - with success! (So once again everything points to a regression range between 141 and 142.)

The SMTP logs show the same pattern as mentioned before:

mailnews.smtp: Message sent successfully. SmtpClient.sys.mjs:1359:21
mailnews.smtp: C: QUIT SmtpClient.sys.mjs:683:19
mailnews.smtp: Reusing a connection SmtpClient.sys.mjs:136:19    <--- ?????
mailnews.smtp: C: MAIL FROM:<xxx> BODY=8BITMIME SIZE=456 SmtpClient.sys.mjs:683:19
mailnews.smtp: S: 221 OK SmtpClient.sys.mjs:452:17
mailnews.smtp: Socket closed. SmtpClient.sys.mjs:550:17

The last message "Socket closed" is sometimes missing and there may be other additional messages like "MAIL FROM" and even "RCPT TO" before the connection gets lost finally.

I will now ask him to run mozregression-gui. Maybe we can identify the problematic patch.

Alexander, Thanks for info. I'll try to see what's going on.

Alexander,
I'm not seeing it with mail.smtpserver.default.quit_delay_ms at 5000 (5 seconds, default) and with mail.smtpserver.default.max_messages_per_connection set to 2 (10 is default).
I then queue up 3 message to send into L.F./Outbox using "send later". Then go to Outbox and "send now". 2 messages go out on the initial connection and QUIT is immediately sent to close the connection. TB then open a new connection and sends the 3rd message, waits 5 seconds and sends QUIT to close the connection.
This is with self-built Nov 11 daily (147.0a1).

Anyhow, pretty sure it's something wrong (again) with my patch for Bug 1965690 (and, I think, confirmed in the bisect in the email you had sent to me).
I think I know how to fix it but the servers I've checked don't delay closing the connection after not-delayed-QUIT is sent as apparently you and the reporters are seeing. Is there a server you know that frequently shows this bug that I might get a temporary account on for testing?

Ok, I can see the problem by having two emails ready to send (2 compose windows side-by-side). I then quickly send one and then the other. This is with 1 message per connection. I didn't see it using outbox since, apparently, there is almost a second delay between triggering the send of the next messages using that method (based on ctrl-j log timestamps).

Assignee: nobody → gds

Alexander, Here's try build based on daily that basically fixes the problem: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=0e3a956d80792b6c9164aa635d5d663c81ac2c78
I see the bug with max messages per conn set to 1 and I have 2 or more messages ready to send side-by-side on the screen and quickly click the send buttons. What happened was the 1st one sends OK and QUIT was sent with no delay. The 2nd one got enabled to send and started sending before the QUIT caused the server to close the connection and started sending, but then the server sees the QUIT and closed the connection causing the 2nd send to fail and any more don't send.
The change is to not enable the next send (call to "_free()") until the connection is actually closed.
Not sure how I missed this during the last fix back in June!
Anyhow, I need to look a bit closer and see if I can find an improved fix or a way to test it better. You may notice I made a change to the tcp timeout to work around a possible problem in that "_free" is never called to enable the next send unless we get a complete response to QUIT, so a shorter TCP timeout after sending QUIT is added and might help with that. But not sure how to test it since I always get a quick 221 QUIT response and the server always closes the connection.

Over the weekend the user was able to run mozregression-gui and came up with this pushlog:
https://hg-edge.mozilla.org/comm-central/pushloghtml?fromchange=75a8a3d50581ca2f79aa9aa369e35306842d126c&tochange=3ef458305fa4b63ec25c59bec85dbc3a2044221a

Once again with last known good 2025-06-30 and first known bad 2025-07-01 this points to Bug 1965690.

I have asked the user to test the try build in comment 16 and I will report the result.

(I have forwarded Gene the SMTP logs in private and we are both in direct contact with the user.)

Alexander, I replied to your email and ask for more details on how this was tested. I hadn't looked in detail the the many logs you attached to the email and the info you and your user provided on testing. Apparently you have to queue to outbox maybe 10 emails or more to see the problem. I only queued up a few in Outbox and said "yeah, it works". So I think with more emails queued, you have less delay between send attempts. This causes QUIT to be sent and the next message is then sent before the connection goes down; then the connection is dropped by the server during the next send.
Your user also reported via email that the try build in comment 16 allows sending to work OK. So my test of sending side-by-side emails quickly is a good test to see the bug.
I need to look some more at the code ...

Update:
Alexander and a user of his mail merge addon have both provided me an account for testing the sending of LOTS of messages via SMTP without the concern of getting "locked out" of my personal accounts for "abuse" (as actually happened with one of my personal accounts while testing tb SMTP JS code). Alexander also provided two special addons just for testing. One that composes and sends immediately 10 messages and another that composes 10 messages and places them in Outbox so they can be "Sent later" manually. This made testing a lot better with less worry.

As noted before, this bug is regressed by my previous change in Bug 1965690. That bug fixed the case where the next message is sent after the delayed QUIT is sent but before the connection is closed. This caused the connection to go down while attempting to send an SMTP command as seen in Bug 1965690 comment 0. However, the change to fix the case for the delayed QUIT (delayed after last-sent message when connection can be reused) broke the case for when not-delayed QUIT is sent after the max messages per connection is reached and the connection is not to be reused for the next message. I didn't see the problem with my testing using side-by-side messages quickly sent (I probably only tested 2 and it often takes 3 or more to see the problem!). Also, didn't see the problem with messages in Outbox using the servers I tested with at the time. Also, based on info from Alexander in Bug 1965690 comment 25, it appeared that the patch was OK and, at that time, was not causing any regressions in release 143. (The patch was never put into 140esr so it doesn't have the problem reported here in bug 1988329.)

The proposed patch, in connect() function, now handles the case where a message is sent right after any QUIT is sent (delayed or otherwise) and when the current connection has not yet been closed. Before, only the delayed QUIT was handled in _sendCommand() and the changes I made there are reverted.
The variable _reuseConnection now (again) has file scope since it's needed in connect(). Also, the variable _isDelayedQuitSent is removed since sending QUIT (delayed or not) always causes _reuseConnection to be false since a new connection will always be needed.

Alexander wrote in comment 10:

This last "Reusing a connection" is disturbing, because I don't think it should happen at all. Thunderbird already has sent 10 messages over the same connection and sent QUIT to the server. Nevertheless it apparently tries to send yet another message over the still open connection, which is going to be closed imminently.

With the proposed patch, this will not occur. Before, the "Reusing a connection" line was logged when a new message was sent (in connect()) and the socket was seen as open. Now, using the _reuseConnection variable, the connection is not reused unless _reuseConnection is true AND the socket is open. If _reuseConnection is false OR the socket is closed, a new connection is made; but if the socket is open, before the new connection is made, the old connection is closed "immediately".

Here's the try build with the proposed patch:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=3e435f783c67f9005719c54d1ff5448c2675b9ea
Edit: This works but running the try build linked in the next comment 20 is now the the latest.

I decided to make another change to the patch.
In connect(), where immediate close(true); is called, it needs to prevent onFree from being called from within _onClose. If onFree is called in _onClose, it causes another connection to occur when a message or messages are awaiting to be sent. This causes Alexander's server to flag errors about too many connections when using his "send now" addon and several composed message remain unsent. I fixed this in the comment 19 "try" patch by keeping _freed true until it is set false in _onOpen where it was added. This worked OK but seemed like it might cause unforeseen problems and I had to change how and where _freed is set in connect().
In my next patch, I now have added a parameter to _onClose called doFree (defaults to true). Now instead of setting _freed true before close(true) is called in connect(), the onclose socket handler is changed to _onClose(false) which skips the call to _free() when the the socket closes and _onClose is called. This also doesn't set this._freed = false in _onOpen since it is not needed (_freed still only set false in connect()).

Here's the "try" build with the current patch: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=71a668fe379327eeb3dcec88e8accb12cd49e1a2
Try builds to test patches are supposed to be applied to comm-central (daily). However, there seemed to be a broken state at comm-central regarding "rust" and it failed my build on 3 attempts. So the above is applied to release repository for 145.0.
When you run this it will report as "daily" in Help|About but the version will appear as 145.0.

I'll go ahead and submit the patch for formal review.

Please see comment 19 and comment 20 for details about the changes.

Regressed by: 1965690
See Also: 19656901864924

This is one of the two test add-ons I created and Gene mentioned in Comment #19. The test add-on sendlater@example.net opens 10 compose windows, then uses "Send Later" in parallel for these compose windows, i.e. 10 individual messages are created in the "Local Folder -> Outbox".

You can use "Tools -> Developer Tools -> Debug Add-ons" to install the add-on and then use the button "Reload" to repeat the process of creating the individual messages.

(I have attached both add-ons in the spirit they may come in handy when reproducing the bug as well as testing and reviewing the patch.)

This is one of the two test add-ons I created and Gene mentioned in Comment #19. The test add-on sendnow@example.net opens 10 compose windows, then uses "Send Now" in parallel for these compose windows, i.e. 10 individual messages are sent directly.

You can use "Tools -> Developer Tools -> Debug Add-ons" to install the add-on and then use the button "Reload" to repeat the process of creating the individual messages.

(I have attached both add-ons in the spirit they may come in handy when reproducing the bug as well as testing and reviewing the patch.)

Alexander and an active user of his mail-merge addon have done some additional tests to verify the fix with good results. So going ahead and requested check-in.

Pushed by sking@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/bc313910ea8e
Fix SMTP sending hangs when multiple messages queued for sending. r=mkmelin

Status: NEW → RESOLVED
Closed: 16 days ago
Resolution: --- → FIXED
Target Milestone: --- → 147 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: