Closed Bug 1816540 Opened 3 years ago Closed 2 years ago

Canceling SMTP send before progress reaches 100% doesn't stop the transfer. Mail is still sent.

Categories

(MailNews Core :: Networking: SMTP, defect)

Thunderbird 108
defect

Tracking

(thunderbird_esr115 fixed)

RESOLVED FIXED
122 Branch
Tracking Status
thunderbird_esr115 --- fixed

People

(Reporter: gds, Assigned: gds)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file, 4 obsolete files)

I noticed this while working on bug 1745130 and made a note about it here near the bottom of the comment: https://phabricator.services.mozilla.com/D169263#inline-936046

When testing https://searchfox.org/comm-central/rev/a3c3161bb7688e40e6f61dfe0d9af852f5439ce7/mailnews/compose/src/SmtpService.jsm#161, I sent a very large message, about 16M. The progress in percent shows as expected but if I click "Cancel" during the send it does not stop the send. The message is still sent even if I cancel the transfer immediately or well before reaching 100%.

After clicking cancel, the progress dialog goes away and after a while the messages appears saying the message was sent (it really was) but not saved to sent folder (and it's not saved). Clicking retry or save to local at this point doesn't work either. The message remains on screen and save to drafts, template or file does work so there is no real data loss.

In the SMTP sending code linked to above there is a comment that says: "In practice, chunks are buffered by TCPSocket, progress reaches 100% almost immediately". It looks like the code transfers in 64K chunks so I doubt that it can buffer 16M immediately and then send it. But I could be wrong.

See Also: → 258232

(guessing this alis also fails on older versions)

Version: unspecified → Thunderbird 91

Don't know about 91. But I have 68.10.0 on an old laptop I seldom use and it does cancel the SMTP send so it's not received at the destination.
It does cancel the transfer but the response message is a bit inaccurate:

The message could not be sent because the connection to Outgoing server (SMTP) 
<server name>  was lost in the middle of the transaction. Try again. 
[OK]

I guess hitting cancel causes the connection break so technically it's accurate.

There's a possibility this bug is linux specific. I've received reports that it does cancel the send properly on windows. I haven't tested it myself on windows yet.

OS: Unspecified → Linux

Tried it with today's daily 113a on windows and see the same thing. Cancel while sending in progress (e.g., 50%) still does the SMTP send and 16M message received at yahoo account. So, for now, setting Platform/OS to All.
Tried it with 102.9.0 on windows and it cancelled the send correctly so this appears to be a regression caused by a post-102 change.

Keywords: regression
OS: Linux → All
Version: Thunderbird 91 → Thunderbird 113

I remembered I actually saw this first at nightly 2023-01-18 so after downloading several nightlies and bisecting I found that the problem first occurs on 2022-11-04. The comm-central pushlog for that date show a couple of changes related to SMTP:
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2022-11-04&enddate=2022-11-05
So maybe https://hg.mozilla.org/comm-central/rev/880b61ab6557243f3a0a1e90fb17510f69d0c02a or
https://hg.mozilla.org/comm-central/rev/b68d61082ca1287e5dc9edc1309a545762d10fa5

Hardware: Unspecified → All
Version: Thunderbird 113 → Thunderbird 108
See Also: → 1824401

Bug 136871 made sendMailMessage() async by calling async withClient() here:
https://hg.mozilla.org/comm-central/rev/880b61ab6557#l4.27
During debugging we noticed that this affects the out-parameter outRequest which pretends to be a nsIRequest when in fact only the cancel() method is populated: https://hg.mozilla.org/comm-central/rev/880b61ab6557#l4.213. This out-parameter is now returned null in the caller in MessageSend.jsm, and that damages cancelling later here:
https://searchfox.org/comm-central/rev/f8c1a9e5db5ea9efeb0ef6c1a996c25d039ae3d3/mailnews/compose/src/MessageSend.jsm#313-314

IOW, before bug 136871 with a sync sendMailMessage() passing the out-parameter worked, it was seen in the caller as

this._smtpRequest Object { value: XPCWrappedNative_NoHelper }

After bug 136871 the out-parameter comes back as Object { value: null }.

We implemented a solution avoiding XPCOM:
https://github.com/Betterbird/thunderbird-patches/blob/main/115/bugs/1816540-make-cancel-send-work-again-no-XPCOM.patch

Since we removed the unused URL parameter, various tests need to be adjusted.

See Also: → 1860430

Allows user to potentially cancel a message send while transfer to server
is in progress. If message is very short, it probably won't be possible.
This restores functionality that was present with ESR 102 and earlier.
Based on https://github.com/Betterbird/thunderbird-patches/blob/main/115/bugs/1816540-make-cancel-send-work-again-no-XPCOM.patch
Note: This required updating tests that call sendMailMessage() to
remove two deleted paremeters. However, there are two test that are
failing that I've been unable to find a fix for (due to "server" undefined
in SmtpService.jsm):
mailnews/compose/test/unit/test_bcc.js
mailnews/extensions/mdn/test/unit/test_askuser.js

Edit 12-8-23: With the current version of this patch, all the test are now passing.

Assignee: nobody → gds
Status: NEW → ASSIGNED

(From comment #7)

However, there are two test that are
failing that I've been unable to find a fix for (due to "server" undefined
in SmtpService.jsm):
mailnews/compose/test/unit/test_bcc.js
mailnews/extensions/mdn/test/unit/test_askuser.js

Here's the try build that show the still failing unit tests:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=3fa33f016c421d154edd34666c373bba734643c2&selectedTaskRun=Ecc7WE4YTyCXOanNHef5og.0

Attachment #9366163 - Attachment is obsolete: true

Allows user to potentially cancel a message send while transfer to server
is in progress. If message is very short, it probably won't be possible.
This restores functionality that was present with ESR 102 and earlier.
Based on https://github.com/Betterbird/thunderbird-patches/blob/main/115/bugs/1816540-make-cancel-send-work-again-no-XPCOM.patch

https://phabricator.services.mozilla.com/D195331 isn't based on our work any more. It doesn't appear to be correct.

withClient() is called twice, once to get the "cancel function" and second to actually connect. The in the first call it calls _getNextClient() which adds the client to the busy queue, then it is removed and put it back into the free queue and then the next call to withClient() hopefully uses it again, but all this is async and it's not guranteed that the client whose cancel function was obtained is the one that will do the sending.
The client running behind these two lines:

    await this._getCancel();
    this._deliverAsMail();

can be different. So if the user sends two messages at the same time, they might cancel the wrong one.

The issue at hand is to make sendMailMessage() pass the client's cancel function back, and that doesn't appear to work if the call is executed via XPCOM.

Attachment #9366636 - Attachment is obsolete: true

(In reply to gene smith from comment #7)

Note: This required updating tests that call sendMailMessage() to
remove two deleted parameters. However, there are two test that are
failing that I've been unable to find a fix for (due to "server" undefined
in SmtpService.jsm):
mailnews/compose/test/unit/test_bcc.js
mailnews/extensions/mdn/test/unit/test_askuser.js

These tests appear to be faulty since they set up identities without corresponding SMTP servers. Using
https://phabricator.services.mozilla.com/D195030
and adding this hunk demonstrates the issue:

     this._logger.debug(`Sending message ${messageId}`);
     const server = this.getServerByIdentity(userIdentity);
+    if (!server) {
+      // Huh, what's happening here? No server for the identity?
+      // This is likely running in some broken test that already failed without
+      // ever being noticed in `_getRunningUri()`.
+      // Run test_askuser.js without the patch an you'll see:
+      // JavaScript Error: "TypeError: can't access property "serverURI", server is null" {file: "resource:///modules/SmtpService.jsm" line: 335}]
+      // Known broken tests are:
+      // mailnews/extensions/mdn/test/unit/test_askuser.js
+      // mailnews/compose/test/unit/test_bcc.js
+      // mailnews/compose/test/unit/test_sendMailAddressIDN.js.
+      console.log(`No server found for identity with email ${userIdentity.email} and smtpServerKey ${userIdentity.smtpServerKey}`);
+      return;
+    }
     if (password) {
       server.password = password;
     }
     const runningUrl = this._getRunningUri(server);

With the hunk, test_askuser.js passes immediately, the other two time out but still pass.

Allows user to cancel an SMTP message send while transfer to server
is in progress. This restores functionality that was present with
ESR 102 and earlier. All tests pass with no time outs.
Attribution, if needed, will be added later.

Attached patch cancel-send.patch (obsolete) — — Splinter Review

There really is no point to code around broken tests. The tests should be fixed.

This is https://phabricator.services.mozilla.com/D195030 with the fixes for test test_askuser.js and test_bcc.js.

test_bcc.js has the issue that it creates one (fake) SMTP server and lots of new identities that reference further outgoing servers. The for the last test the server isn't stopped so gServer.playTransaction() returns everything done by the server in all the subtests and that doesn't match just the last bit.

test_askuser.js ideally should be fixed by setting up a server.

We can look at test_sendMailAddressIDN.js later.

This fixes test_askuser.js, test_bcc.js and test_sendMailAddressIDN.js.

Attachment #9367393 - Attachment is obsolete: true
Attachment #9367549 - Flags: feedback?(gds)
Attachment #9366163 - Attachment is obsolete: false
Attachment #9366163 - Attachment description: Bug 1816540 - Make SMTP cancel of send work again. r=mkmelin → Bug 1816540 - Clean up async issues created in bug 136871 to make cancel sending work again. r=mkmelin
Attachment #9366636 - Attachment description: Bug 1816540 - Make SMTP cancel of send work again. r=mkmelin → Bug 1816540 - Make SMTP cancel of send work again using new _getCancel(). r=mkmelin
Attachment #9366636 - Attachment is obsolete: false

I've re-opened the diffs to provide 3 options so Magnus can decide which patch, if any, to go with.

Attachment #9367549 - Flags: feedback?(gds) → feedback+

Comment on attachment 9367549 [details] [diff] [review]
1816540-make-cancel-send-work-again-no-XPCOM.patch

Moved to moz-phab

Attachment #9367549 - Attachment is obsolete: true

https://phabricator.services.mozilla.com/D195729 codes around failures of two incorrect and broken tests that worked by accident in the past.
https://phabricator.services.mozilla.com/D195331 is formally incorrect since it can't be guaranteed that the correct client is cancelled. It unnecessarily manipulates the "busy" queue.
We don't see any point in considering those two approaches. The patch we provided is the one first proposed for review. All that was missing was to correct some tests. This has now been added.

I've ended up here, because of bug 1868160: "copy to sent folder" is highly unreliable for me since 2 months, and fails about 70% of the time, resulting in a hanging email send and lots of lost mail. I don't know whether it's the same bug. But obviously, an unreliable send process and/or lost mail makes an email app fairly unusable.

Regarding the latest patch from gene that Magnus reviewed: Changing to an async function is likely the right solution here. However, the patch is incomplete: If you change a function to async, you need to change all callers to have an await. Otherwise, errors end up as "uncaught exceptions in promise" and won't be handled. This is serious - e.g. node.js terminates the entire app, if that happens. At the very least, calling code won't know that something failed, and we cannot inform the user about errors, i.e. sending mail becomes unrealiable.

r-

Attachment #9367378 - Flags: review-

Gene, can you please not use a + for the string concatenation here:

`No server found for identity with email ${userIdentity.email} and ` +
          `smtpServerKey ${userIdentity.smtpServerKey}`

You can perfectly make a longer string with with a JS template literal (backtick). Linting will take care of it.

Ben, please note that sendMailMessage() has always been async, it's just declared async now so await can be used inside the function. It has always called async withClient() here:
https://searchfox.org/comm-central/rev/70000cfe660de10633c10e75c5bd670e6b521bab/mailnews/compose/src/SmtpService.jsm#94
https://searchfox.org/comm-central/rev/70000cfe660de10633c10e75c5bd670e6b521bab/mailnews/compose/src/SmtpServer.jsm#492

Attachment #9366636 - Attachment is obsolete: true
Attachment #9366163 - Attachment is obsolete: true

From comment #19:

Gene, can you please not use a + for the string concatenation here:

`No server found for identity with email ${userIdentity.email} and ` +
          `smtpServerKey ${userIdentity.smtpServerKey}`

You can perfectly make a longer string with with a JS template literal (backtick). Linting will take care of it.

eslint --fix didn't seem to change it at all, just left the long line in place and didn't break or wrap the string at 80 columns.
So I left it as is and since it works OK per my tests and MM didn't complain in the review.
(Not sure why eslint allows lines more the 80 but clang-format for c/c++ changes the code to 80 columns. Also, eslint does some strange formatting to my eyes used to the way C is usually formatted.)

Regressed by: 136871

If lint leaves it as this

    if (!server) {
      // Occurs for at least one unit test, but test does not fail if return
      // here. This check for "server" can be removed if tests are fixed.
      console.log(
        `No server found for identity with email ${userIdentity.email} and smtpServerKey ${userIdentity.smtpServerKey}`
      );
      return;
    }

then it should be landed like this. They whole idea about template strings is that the are evaluated once. Either you "compose" the string with + or you use one template string. So please remove the +.

Target Milestone: --- → 122 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/1c361d2ba2e1
Make SMTP cancel of send work again. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Hi Magnus, why did you land the patch despite my negative review in comment 18 and in Phabicator? You explicitly ignored it. Why? gene's comments did not resolve the issue I pointed out.

Review:

If you change a function to async, you need to change all callers to have an await. Otherwise, errors end up as "uncaught exceptions in promise" and won't be handled. This is serious - e.g. node.js terminates the entire app, if that happens. At the very least, calling code won't know that something failed, and we cannot inform the user about errors, i.e. sending mail becomes unreliable.

There are at least 2 callers of deliverAsMail(). The patch adds a await on line 891, that's good. The function _deliveryExitProcessing() also needs and await or a .catch(ex => ...) with a way to pass the error up to the caller, like calling this.notifyListenerOnStopSending() or some other function with the error message (only an error code is not sufficient, because the user needs to see a clear error message why sending failed).

As for sendMailMessage(), there are 2 real callers, you changed one of them, so that's OK, and another is MdnGenerator. That is actually XPCOM. Have you checked how that behaves now in error situations, after your change? Are errors passed on to the caller?

There are many more callers in test code. Have you looked at those callers? Picking one at random, https://searchfox.org/comm-central/source/mailnews/compose/test/unit/test_smtpPassword.js#62 . The function is already async and already uses await for other functions, so it would be trivial to add await here. It would seem obvious to do that. The test puts this call in a try/catch block, which now no longer works as before. As mentioned above and in my review, if sendMailMessage() now throws, the exception will not be caught by the caller's try/catch, but be an "unhandled exception in Promise". I picked this test at random, but there are many other tests which likely need to be adapted.

[EDIT: Corrected some parts of this comment shortly after posting it]

Why it matters:

I am currently personally facing a bug where Thunderbird is hanging during mail send copy, and the reason may be exactly something like this: Not catching errors in async functions. If you don't have an await, any failures will just go into the nowhere (that is, error console), but the caller won't know it failed. Which means it cannot handle the error, repeat, save the mail etc. It also means the user cannot see the error message. Which makes any underlying errors far more serious. Errors must be handled, not ignored.

Basically, this patch is killing all error handling.

You cannot just assume that everything will work as expected. Esp. during actions like sending email.

I was merely trying to give you a hint of where something is wrong.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED

As explained on phab, the path is not changing to to be async: It was async to begin with, just not marked as such. There's an url listener that you need to use to handle progress and/or errors. sendMailMessage() does not provide sending errors back to the caller except for through the url listener.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

You say "the path", as in 1 path. I showed at least 4 paths (of which 2 are OK, and 2 are not), with specific code citations. Plus incorrect test code.

If the code was doing async without await or catch() before, then that doesn't mean it's correct, but then it was also wrong before.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

It's a JavaScript language feature: If you have an async function, and it throws, and the caller didn't use await or catch(), then you have an error that goes to lala land. That is a serious bug.

This bug is closed. If you have some specifics to add, please file a new one. But I suggest re-reading comment 25 first.

We have a ton of old code that works async work, and relies on listeners to handle errors.

(In reply to Ben Bucksch (:BenB) from comment #27)

If you have an async function, and it throws, and the caller didn't use await or catch(), then you have an error that goes to lala land.

Sure but that has nothing to do with the practical error handling here afaict.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

Magnus, I re-read your comment 25. I see that you re-phrased it. It's now much clearer, thank you. However, I still don't know how that relates to what I wrote in comment 24. Specifically, _deliveryExitProcessing() calls deliverAsMail(). I hope we can agree that sending mail can fail, so deliverAsMail() can throw. If deliverAsMail() was throwing before this patch, it would throw into _deliveryExitProcessing(), which goes to sendDeliveryCallback(), and then eventually ends up in OnStopRunningURL(). Now, that's not good, I agree. But that's a bug that needs to be fixed, that's my point. And this patch is making it worse. Now, if async deliverAsMail() (as this patch makes it async) is throwing, then the error goes to nowhere right there in _deliveryExitProcessing(). It will not be propagated from there.

This is a bug, and as-is, it will be extremely hard to find, because it's simply a missing await. This is almost impossible to find when somebody later debugs this.

If you use await deliverAsMail() (and the same for the call stack up), or deliverAsMail().catch(ex => {...}), and pass that error on the caller, then it's fixed. At the very least, please make a code comment. But don't leave a hidden bomb like that in the code, please.

I agree the error propagation could be improved, but I don't see how anything is now worse, since deliverAsMail was already acting asynchronously (clearly it would, it involves network operations). Anything from sendDeliveryCallback isn't acted on anyway.

Sounds like a new bug would be useful to focus on these issues, if there isn't one already. Or perhaps moot if not a major problem in the wild and imap is to be rewritten.

If this bug has in some way potentially made behavior worse, we should make landing dependent on fixing that new bug. But if not worse then it could proceed up the chain.

The user-observable problem is that errors during email send would not be reported back to the end user, nor handled at all. The sending progress would hang. In the other bug 1868160, the send copy process is hanging, so these kind of issues are real.
This patch is making the problem worse, at least code-wise, instead of fixing it. This is exactly what reviews are supposed to catch.
(FWIW, this is the new SMTP code. There are no plans to rewrite it, as far as I know.)

Component: Networking: IMAP → Networking: SMTP
Whiteboard: [TM 115.7.0]

Good for 115 uplift?

I found no beta or daily regression reports

Flags: needinfo?(gds)

Not sure how many users actually try to cancel their SMTP send after clicking "Send". So don't know how much real world use this has seen. But I did test it a lot while working on this and it works as I expect. So I'd say OK to uplift.

Flags: needinfo?(gds)

Comment on attachment 9367378 [details]
Bug 1816540 - Make SMTP cancel of send work again. r=mkmelin

[Triage Comment]
approved for esr115

Attachment #9367378 - Flags: approval-comm-esr115+
Whiteboard: [TM 115.7.0]
See Also: → 1745130
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: