Closed Bug 1665652 Opened 2 months ago Closed 2 months ago

Intermittent comm/mailnews/compose/test/unit/test_sendMailAddressIDN.js | onStopSending - [onStopSending : 63] 0 == 2153066735

Categories

(MailNews Core :: Networking: SMTP, defect, P5)

defect

Tracking

(thunderbird_esr78 unaffected)

RESOLVED FIXED
82 Branch
Tracking Status
thunderbird_esr78 --- unaffected

People

(Reporter: intermittent-bug-filer, Assigned: gds)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file, 1 obsolete file)

(In reply to Ping Chen (:rnons) from Bug 1563891 comment #54)

Don't know why test_sendMailAddressIDN.js becomes unstable on my local, if I run it ten times, about five times will fail. It seems there are two places that can trigger onStopSending:

  1. https://searchfox.org/comm-central/source/mailnews/compose/src/nsSmtpProtocol.cpp#2113
  2. https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgProtocol.cpp#393

When nsMsgProtocol::OnStopRequest is called first, the aStatus will be 0, and the test will fail.

Is this only failing on OSX?

On linux I see (1.) occurring first with the correct error code so it doesn't fail with many runs. (2.) also occurs but with error 0x80004004 (NS_ERROR_ABORT). I never see this with a 0 status code. (I think the NS_ERROR_ABORT occurs because I call do_test_finish().)

It appears that the failure is always with the last test based on the referenced logs. Is that always true? If so, you should see this as the last passing test in the log: Passed test børken.to@invalid.foo.invalid. But if the last test passes you will see this last: Passed test børken.to. (The last test tests the case where there is no @ in the address.) Only these 2 tests cause a check at the failing line 63.

Note: There is one error in my comments added to nsSmtpProtocol.cpp but it doesn't affect the tests or how the code works AFAICT . When the last test attempts to send a message to an address without @, the address actually received by the protocol code is empty. This is based on observations with printfs added to the protocol code running on linux. I assume this is also true for Windows and OSX but I can't test with those locally.

I can reproduce on two of my Linux machines. Both the last two cases can fail, see the latest build https://treeherder.mozilla.org/#/jobs?repo=comm-central&selectedTaskRun=ZwA9AhHvTaG1piRN0CKQ0g.0.

(In reply to Ping Chen (:rnons) from comment #2)

I can reproduce on two of my Linux machines. Both the last two cases can fail, see the latest build https://treeherder.mozilla.org/#/jobs?repo=comm-central&selectedTaskRun=ZwA9AhHvTaG1piRN0CKQ0g.0.

I'm not real familiar with reading the test results. Is the failure always in the ones labeled "X2"? If so, I think I only see failure for optimized OSX.
I have only tested locally with linux debug build. Are you testing locally with optimized or debug on linux?

I have run the test like this many time with my debug build and never see an error:

gene@hplt:~/mozilla$ ./mach xpcshell-test comm/mailnews/compose/test/unit/test_sendMailAddressIDN.js
:
:
xpcshell
~~~~~~~~
Ran 27 checks (26 subtests, 1 tests)
Expected results: 27
Unexpected results: 0
OK
 0:01.75 INFO Node moz-http2 server shutting down ...
 0:01.75 INFO http3Server server shutting down ...

I rebuilt with optimization and without debug on another linux machine. Now I see the failure. Not sure if the failure is due to optimization or because this machine is a somewhat faster desktop rather than the slower laptop that always passes running the debug build.

Because it's an xpcshell test, it will always be in the X group. There is some script that divides all xpcshell tests to X1-X4.

I didn't enable optimization or debug. I only have ac_add_options --enable-application=comm/mail in my mozconfig. Also, I learnt about --verify option yesterday to run a test for 10 times, so you can use
./mach test --verify comm/mailnews/compose/test/unit/test_sendMailAddressIDN.js.

Some modernization of the test, but it's still unreliable. Dunno why. I think it's like Ping reported...

I think I may have a fix. The problem is because the return of NS_ERROR_BUT_DONT_SHOW_ALERT here
https://searchfox.org/comm-central/rev/27cb9951c097a9b15f8952a79617921fdcc568da/mailnews/compose/src/nsSmtpProtocol.cpp#706
is causing SendQuit() to be called here
https://searchfox.org/comm-central/rev/27cb9951c097a9b15f8952a79617921fdcc568da/mailnews/compose/src/nsSmtpProtocol.cpp#2140
sooner than the SMTP_ERROR_DONE state, where the error state NS_ERROR_BUT_DONT_SHOW_ALERT is set here
https://searchfox.org/comm-central/rev/27cb9951c097a9b15f8952a79617921fdcc568da/mailnews/compose/src/nsSmtpProtocol.cpp#2113
After the QUIT and the server disconnects OnStopRequest() is triggered with status 0. As Ping pointed out, if this occurs before SMTP_ERROR_DONE is acted upon, the 0 status is returned to the test and it fails. But if the SMTP_ERROR_DONE state is reached and acts first, the test passes because NS_ERROR_BUT_DONT_SHOW_ALERT is returned to the test as expected.

So the change is to return NS_OK here instead of NS_ERROR_BUT_DONT_SHOW_ALERT:
https://searchfox.org/comm-central/rev/27cb9951c097a9b15f8952a79617921fdcc568da/mailnews/compose/src/nsSmtpProtocol.cpp#706
This will ensure SendQuit() occurs last.

This seems to fix the problem on the machine that fails.

I don't think this affects anything in the real world but I'll test some more to be sure. Another easy fix is just remove the offending check from the test but that just ignores the race issue.

P/S: I haven't yet tried Magnus' new test code.

Here's a patch that fixes the problem by changing the mentioned return value to NS_OK. I also updated a few comments that were not quite right. These are both in nsSmtpProtocol.cpp

I also tested with your modernized version of test_sendMailAddressIDN.js and included those changes in the patch.

Assignee: nobody → gds
Attachment #9176517 - Attachment is obsolete: true
Attachment #9176714 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9176714 [details] [diff] [review]
1665652-fixes-for-unit-test.patch

Review of attachment 9176714 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/test/unit/test_sendMailAddressIDN.js
@@ +123,5 @@
>      "nsIMsgCopyServiceListener",
>    ]),
>  };
>  
>  function DoSendTest(aRecipient, aRecipientExpected, aExceptionExpected) {

I noticed that with this change to the test the number of tests went down from 27 to 23 and I didn't know why. I now see that it is because aExceptionExpected is never checked for the 4 sub-tests. The final error is now checked in the Assert() of the catch clause below and is expected to be zero so it is not really ignored.  Also aExceptionExpected is never passed to the function as 0 like before when called so should it be removed?
Attachment #9176714 - Flags: feedback+
Comment on attachment 9176714 [details] [diff] [review]
1665652-fixes-for-unit-test.patch

Review of attachment 9176714 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, r=mkmelin
I'll remove the param an land tomorrow

::: mailnews/compose/test/unit/test_sendMailAddressIDN.js
@@ +123,5 @@
>      "nsIMsgCopyServiceListener",
>    ]),
>  };
>  
>  function DoSendTest(aRecipient, aRecipientExpected, aExceptionExpected) {

Yes, the aExceptionExpected param should be removed.
Attachment #9176714 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/5be6a8a9d8b1
Fix to avoid failure in and updates to test_sendMailAddressIDN.js r=mkmelin

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
You need to log in before you can comment on or make changes to this bug.