Intermittent TEST-UNEXPECTED-TIMEOUT | comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js | Test timed out
Categories
(MailNews Core :: Networking: IMAP, defect, P5)
Tracking
(thunderbird_esr102+ fixed)
People
(Reporter: intermittent-bug-filer, Assigned: mkmelin)
References
Details
(Keywords: intermittent-failure)
Attachments
(5 files)
|
8.74 KB,
text/plain
|
Details | |
|
11.99 KB,
text/plain
|
Details | |
|
5.45 KB,
patch
|
gds
:
review+
|
Details | Diff | Splinter Review |
|
11.11 KB,
text/plain
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
wsmwk
:
approval-comm-esr102+
|
Details | Review |
Filed by: mkmelin [at] iki.fi
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=365731718&repo=comm-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IWV6dKGaQ6WK0KrEyeTRsQ/runs/0/artifacts/public/logs/live_backing.log
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 41•3 years ago
|
||
For me it fails consistently when I manually run unit test mailnews/imap/test/unit/test_imapStatusCloseDBs.js. It hangs due to ensure success:
0:00.65 pid:736381 [Parent 736381, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF (NS_ERROR_UNEXPECTED): file /home/gene/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:809
I haven't proved it but I think this is probably a regression of bug 163964. When I record the imap log while running the test it appears there is an attempt to start a new connection slightly before the original imap connection goes away. With real servers this isn't a problem but with the "fakeserver" used by unit test, it can apparently only handle a single connection at at time so it can't handle this transition.
I tried setting pref max_cached_connection to 2 in test_imapStatusCloseDBs.js but it seemed to totally stop any imap activity as observed in the log, but the test "passed" anyhow!
I'll try backing out the changes from bug 163964 from daily and see if that fixes it.
I was also wondering if there's a way to run a unit test on an older archived version without rebuilding it? That is, how could I run the unit test test_imapStatusCloseDBs on, say, the last 91.x version to see how it behaved before the changes for bug 163964? I tried rebuilding 91 from source but got tons of compile error using the current daily toolchain.
Comment 42•3 years ago
|
||
I'll try backing out the changes from bug 163964 from daily and see if that fixes it.
Backed it out plus several later and related changes and it still fails.
Comment 43•3 years ago
•
|
||
I tried setting pref max_cached_connection to 2 in test_imapStatusCloseDBs.js but it seemed to totally stop any imap activity as observed in the log, but the test "passed" anyhow!
I think I am getting fooled by the imap log. Adding Services.prefs.setIntPref("mail.server.server1.max_cached_connections", 2); to the test file actually does let it pass the test. When I look with wireshark there actually is imap stuff happening even though the log file is empty after I ran the test manually. What's happening is the imap log is getting cleared somehow after the tests run. If I add the MOZ_LOG parameter append to the env var, I can see imap activitiy in the log and it's not blank. I.e., I run it like this:
$ MOZ_LOG=IMAP:5,timestamp,sync,append MOZ_LOG_FILE=~/tblog0 ./mach xpcshell-test comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
But a couple things I still don't understand is that the log file seems to indicate some authentication errors returned by the fakeserver. But I don't see the same corresponding errors in the wireshark dump. Edit: Looking closer, it appears the the "authentication failures" in the log correspond with fakeserver FIN'ing/dropping the connection but on retry the connection is re-established and continues on and the tests pass.
Also it appears that the bottom half of the imap log file is truncated when compared with wireshark, even with the added append parameter. So maybe trying to run an imap log while running xpcshell-test for imap will give misleading results.
Comment 44•3 years ago
|
||
Mossop, do you know what is going wrong with the logging in comment 43? If not, who might?
Gene, Emilio suggests you might use https://rr-project.org/ to debug this.
| Comment hidden (Intermittent Failures Robot) |
Comment 46•3 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #44)
Mossop, do you know what is going wrong with the logging in comment 43? If not, who might?
padenot maybe
Comment 47•3 years ago
|
||
Can we try to not use MOZ_LOG_FILE and use shell redirection in append mode (e.g. >>) to get all logs (or just have a terminal with infinite scroll-back I suppose) ? With the sync mode it should work.
Comment 48•3 years ago
|
||
Gene, Emilio suggests you might use https://rr-project.org/ to debug this.
I think that using rr was suggested before while debugging another MOZ_LOG problem. Don't think it helped.
Can we try to not use MOZ_LOG_FILE and use shell redirection in append mode (e.g. >>) to get all logs (or just have a terminal with infinite scroll-back I suppose) ? With the sync mode it should work.
Thanks, good idea. That works but I haven't checked the comparison of the log to wireshark results yet so not sure it tells me anything more. Also, don't know why this unit test doesn't fail consistently when tests are run when the official builds are done every day (or maybe it does?).
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 54•3 years ago
|
||
I see the issue, and for me the timeout is problematic.
And I suspect this time out is also an issue for treeherder compilation/test server farm since the test jobneeds to stay long than necessary.
Below is an excerpt from the summary created locally from a local xpcshell test run.
I simply compare the timestamp at the beginning of the local log line and print if no activity was recorded more than 20 seconds.
========================================
TIMEOUT
========================================
00:15.88 TEST_END: Test PASS. Subtests passed 43/43. Unexpected 0
00:15.88 TEST_START: xpcshell-imap.ini:comm/mail/components/extensions/test/xpcshell/test_ext_messages.js
LONG PAUSE FOUND: 25.950000 seconds
00:49.82 TEST_END: Test FAIL. Subtests passed 999/1003. Unexpected 4
00:00.39 TEST_START: comm/mail/components/shell/test/unit/test_shellService.js
--
00:48.20 TEST_END: Test PASS. Subtests passed 38/38. Unexpected 0
00:48.20 TEST_START: xpcshell-cpp.ini:comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
LONG PAUSE FOUND: 299.710000 seconds
05:48.20 TEST_END: Test TIMEOUT. Subtests passed 5/5. Unexpected 0
00:00.34 TEST_START: comm/mailnews/import/test/unit/test_AddrBookFileImporter.js
I regularly run the whole xpcshell test sequentially with --verbose flag on my local linux PC to capture and analyze warnings and failures that may be introduced by my local patches.
The whole xpcshell test run takes (including the time out above) slightly longer than 1100 seconds.
* * * * * * * * * * * * * *
It took 1148 seconds.
* * * * * * * * * * * * * *
If the almost 5 minutes timeout (299.7 ~ 300 seconds) is not there, the run will be shortened to 848 seconds. That will be a significant decrease.
On the treeherder, the xpcshell test is run concurrently, but still the elapsed time would decrease if the timeout is avoided.
So how should we debug this?
Is the fakeserver not capable enough?
From the attached log, the test itself seems to think it has finished cleanly?
But somehow after the errror at mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:808 is recorded, nothing happens for five minutes and times out.
The line number may be off a bit due to local patch to dump variables, etc.
00:48.49 INFO (xpcshell/head.js) | test run_next_test 3 finished (1)
00:48.49 INFO exiting test
00:48.49 pid:257556 [Parent 257556, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF (NS_ERROR_UNEXPECTED): file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:808
05:48.20 TEST_END: Test TIMEOUT. Subtests passed 5/5. Unexpected 0
05:48.20 INFO xpcshell return code: None
05:48.20 INFO xpcshell-cpp.ini:comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js | Process still running after test!
05:48.21 INFO xpcshell-cpp.ini:comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js failed or timed out, will retry.
05:48.21 ERROR TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
05:48.21 INFO INFO | Result summary:
The line 808 is as follows.
803
804 nsCOMPtr<nsIMsgMailNewsUrl> mailnewsUrl = do_QueryInterface(m_runningUrl);
805 nsCOMPtr<nsIMsgIncomingServer> server = do_QueryReferent(m_server);
806 if (!server) {
807 rv = mailnewsUrl->GetServer(getter_AddRefs(server));
* 808 NS_ENSURE_SUCCESS(rv, rv);
809 m_server = do_GetWeakReference(server);
810 }
Looking at the log, I noticed another early failure.: As Gene noted, there may be an authentication screwup (?).
00:48.37 INFO xpcshell-cpp.ini:comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js | Starting setupTest
00:48.37 INFO (xpcshell/head.js) | test setupTest pending (2)
00:48.39 pid:257556 [Parent 257556, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF (NS_ERROR_UNEXPECTED): file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgAccountManager.cpp:2065
00:48.42 pid:257556 AUTH PLAIN line -AHVzZXIAcGFzc3dvcmQ=-
00:48.42 pid:257556 authorize-id: --, username: -user-, password: -password-
00:48.43 pid:257556 NS_NewBufferedOutputStream: outputStream (= std::move(aOutputputStream)) =0x7ff4bc0119e8
mozilla/comm/mailnews/base/src/nsMsgAccountManager.cpp:2065 is this.
Oh, I hate the coding style. I won't be surprised if there are bugs in there.
Does it have to be nested-if instead of the if-statements at the same level with proper if-expression?
// try ("nobody","Local Folders","none"), and work down to any "none" server.
2052 rv = findServerInternal("nobody"_ns, "Local Folders"_ns, "none"_ns, 0,
2053 aServer);
2054 if (NS_FAILED(rv) || !*aServer) {
2055 rv = findServerInternal("nobody"_ns, EmptyCString(), "none"_ns, 0, aServer);
2056 if (NS_FAILED(rv) || !*aServer) {
2057 rv = findServerInternal(EmptyCString(), "Local Folders"_ns, "none"_ns, 0,
2058 aServer);
2059 if (NS_FAILED(rv) || !*aServer)
2060 rv = findServerInternal(EmptyCString(), EmptyCString(), "none"_ns, 0,
2061 aServer);
2062 }
2063 }
2064
2065 NS_ENSURE_SUCCESS(rv, rv); <---
2066
Hmm. I will dump some values to see what is going on.
Comment 55•3 years ago
|
||
OK, I inserted some dump statements to see what the values of some variables involved.
But it did not give me much insight.
However, while I was adding dump statements, I realized one issue.
|m_runningUrlLatest| was replaced with the value of |m_runningUrl| even if the call to |do_QueryInterface(aURL, &rv)| to set it returned failure code in |rv|.
That seems to be wrong.
So replaced the value of |m_runningUrlLatest| only after the |do_QueryInterface| call succeeded.
Voila. No time out.
I don't know the code at all. I am attaching the debug patch and let someone in the know evaluate the issue. The attached log is created with the patch in the following attachment.
Comment 56•3 years ago
|
||
This is the patch that solved the timeout issue.
I would ask Gene who has been working on this for some time to see if this looks right.
The essential part is this:
m_runningUrl = do_QueryInterface(aURL, &rv);
// m_runningUrlLatest = m_runningUrl;
// We should not be changing m_runningUrlLatest if this is bogus?
NS_WARNING(nsPrintfCString(
"(debug) "
"do_QueryInterface(aURL, &rv) = "
"rv=0x%" PRIx32 " m_runningUrl =%s\n",
static_cast<uint32_t>(rv), m_runningUrl? "true" : "false").get());
if (NS_FAILED(rv)) return rv;
m_runningUrlLatest = m_runningUrl;
I moved the | m_runningUrlLatest = m_runningUrl;| after the error return.
This seems to solve the issue. But I defer the judgement if this is correct or not to Gene and others.
Comment 57•3 years ago
|
||
Oh, my comment was premature.
If I executed the test as a single target, it did not time out with my WIP patch. Great.
However, if I run the test as part of the WHOLE xpcshell test, it still times out (!?).
This seems to happen on local PC.
I am quoting the excerpt from my sequential
So it is quite likely the problem is with the fakeserver side (?) although I think "DO NOT update m_runningUrlLastest if m_runningUrl is bogus (=0)" is probably true.
the following part (marked with *) is only in the failed log (from the whole xpecshell execution) and is not found in the single target execution. Strange.
00:53.00 pid:292470 : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgAccountManager.cpp:2088
00:53.00 pid:292470 [Parent 292470, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF (NS_ERROR_UNEXPECTED): file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgAccountManager.cpp:2096
00:53.00 pid:292470 [Parent 292470, Main Thread] WARNING: (debug) findServerInternal("nobody"_ns, "Local Folders"_ns, "none"_ns, 0, aServer)=rv=0x0 aServer=7ffdbe1f95e8
00:53.00 pid:292470 : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/base/src/nsMsgAccountManager.cpp:2058
* 00:53.02 pid:292470 [Parent 292470, Main Thread] WARNING: (debug) do_QueryInterface(aURL, &rv) = rv=0x0 m_runningUrl =true
* 00:53.02 pid:292470 : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:803
* 00:53.02 pid:292470 [Parent 292470, Main Thread] WARNING: (debug) After do_QueryInterface(m_server): m_server=0, server = 0, m_runningUrl=true
* 00:53.02 pid:292470 : file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:813
00:53.05 pid:292470 AUTH PLAIN line -AHVzZXIAcGFzc3dvcmQ=-
00:53.05 pid:292470 authorize-id: --, username: -user-, password: -password-
More investigation seems to be necessary.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 60•3 years ago
|
||
Hi Chiaki, Sorry for the long delay in responding to this. I'm not sure if you still need a review on this but I keep getting emails reminding me that it's overdue.
I did make the "essential" change in comment 56 (moving the line after the return) which does seem like a potential bug. However, it didn't prevent the timeout when I run:
./mach xpcshell-test ./comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
If I patch the test like this it does pass (with or without the proposed change to nsImapProtocol.cpp):
diff --git a/mailnews/imap/test/unit/test_imapStatusCloseDBs.js b/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
--- a/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
+++ b/mailnews/imap/test/unit/test_imapStatusCloseDBs.js
@@ -7,18 +7,19 @@
var { PromiseTestUtils } = ChromeUtils.import(
"resource://testing-common/mailnews/PromiseTestUtils.jsm"
);
var gFolder1, gFolder2;
add_task(function setupTest() {
Services.prefs.setBoolPref("mail.check_all_imap_folders_for_new", true);
+ Services.prefs.setIntPref("mail.server.server1.max_cached_connections", 2);
setupIMAPPump();
IMAPPump.daemon.createMailbox("folder 1", { subscribed: true });
IMAPPump.daemon.createMailbox("folder 2", { subscribed: true });
IMAPPump.server.performTest("SUBSCRIBE");
let rootFolder = IMAPPump.incomingServer.rootFolder;
Also, FWIW, I can see the imap log at t.log when I run this:
MOZ_LOG=IMAP:5,sync,timestamp ./mach xpcshell-test ./comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js | grep IMAP > t.log
Comment 61•3 years ago
|
||
Comment on attachment 9308385 [details] [diff] [review]
WIP: dump some values and DO NOT update m_runningUrlLatest if m_runningUrl is bogus (=0?).
The essential change to nsImapProtocol.cpp looks like it's needed.
Comment 62•3 years ago
|
||
Chiaki, One more thing (actually two).
Do you know if there is a way to run a unit test on an old archived TB version like ESR 78? I thought maybe this could be used to tell where the test failure of this bug started. I couldn't find a way to do that.
The other question which is totally off-topic that I've been meaning to ask you: Do you ever use gdb debugger with TB? It seems that recently something has changed and I can't use it like I used to. See bug 1805773. I don't use it a lot but when I need it I would like to still be able to put breakpoints on a c++ line number.
Comment 63•3 years ago
|
||
(In reply to gene smith from comment #61)
Comment on attachment 9308385 [details] [diff] [review]
WIP: dump some values and DO NOT update m_runningUrlLatest if m_runningUrl is bogus (=0?).The essential change to nsImapProtocol.cpp looks like it's needed.
So what should I do?
Should I clean up the comment (extra comment) and set r+=gds?
Comment 64•3 years ago
|
||
(In reply to gene smith from comment #62)
Chiaki, One more thing (actually two).
Do you know if there is a way to run a unit test on an old archived TB version like ESR 78? I thought maybe this could be used to tell where the test failure of this bug started. I couldn't find a way to do that.The other question which is totally off-topic that I've been meaning to ask you: Do you ever use gdb debugger with TB? It seems that recently something has changed and I can't use it like I used to. See bug 1805773. I don't use it a lot but when I need it I would like to still be able to put breakpoints on a c++ line number.
The answer to the first question: Well, I have never thought about it.
It seems if you could find the original source file tree that corresponds to the particular ESR version, then it should be doable in principle.
Right, in principle. but I have never tried that.
Question No. 2: It rings a bell. I was quite tied up with a technology exhibition in November, December, but I distinctively remember some trouble using gdb with locally created DEBUG version of TB, and funny thing was that I found the very slow built-in GDB of valgrind somehow seemed to work. I mean, at least valgrind could print out the stacktrace and such, so I figured maybe the built-in gdb functionality of valgrind might work. But then I stopped short of verifying that due the busy schedule of technology exhibition.
So I think the gdb problem is real.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 74•3 years ago
|
||
Updated•3 years ago
|
| Assignee | ||
Comment 75•3 years ago
|
||
Thanks for the investigations ISHIKAWA and Gene!
| Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 76•3 years ago
|
||
Pushed by elizabeth@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/798fcc52e90e
Fix comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js failures. r=gds
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 78•3 years ago
|
||
Comment on attachment 9322019 [details]
Bug 1752323 - Fix comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js failures. r=gds
[Approval Request Comment]
Regression caused by (bug #): not a regression
User impact if declined: this is primarily a test fix, but it could potentially clear up some unknown bad behaviour in imap handling
Testing completed (on c-c, etc.): c-c and beta
Risk to taking this patch (and alternatives if risky): safe
| Assignee | ||
Updated•3 years ago
|
Comment 79•3 years ago
|
||
Comment on attachment 9322019 [details]
Bug 1752323 - Fix comm/mailnews/imap/test/unit/test_imapStatusCloseDBs.js failures. r=gds
[Triage Comment]
Approved for esr102
Comment 80•3 years ago
|
||
| bugherder uplift | ||
Thunderbird 102.9.1:
https://hg.mozilla.org/releases/comm-esr102/rev/a798b427a174
Comment 81•3 years ago
|
||
This seems to still be a problem on comm-esr102 even after the uplift.
Description
•