Closed Bug 1422622 Opened 7 years ago Closed 4 years ago

convert MockFactory users in c-c to MockRegistrar

Categories

(MailNews Core :: Testing Infrastructure, enhancement)

enhancement
Not set
normal

Tracking

(thunderbird_esr78 wontfix, thunderbird82 wontfix)

RESOLVED FIXED
83 Branch
Tracking Status
thunderbird_esr78 --- wontfix
thunderbird82 --- wontfix

People

(Reporter: aceman, Assigned: aceman)

Details

Attachments

(2 files, 1 obsolete file)

In bug 1401528 comment 115, Jorg has found out c-c has a MockFactory.jsm module for tests that does almost the same as MockRegistrar.jsm code in m-c.
In this bug we can determine if we can switch over to the m-c version as it seems more modern code and we do not need to duplicate the module if it does the same.
Product: Thunderbird → MailNews Core
Attached patch 1422622-mockRegistrar.patch WIP (obsolete) — Splinter Review
This is a preliminary patch that works on Linux and OS X, but fails on Windows, see https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=2f1fa92f3f64792c3850ffb4408cb467914b311a .
So there is som slight difference between the MockRegistrar version of c-c and m-c. Notice there is a MockRegistry.jsm in m-c and mock_windows_reg_factory.js in c-c which may do the same thing (fake Windows registry for tests) but in quite different ways. I do not aim to merge those in this bug.
The difference was that argument passing into the object constructor needs to be in an array.

Successful try run: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=777efadba7f33e7bb6485590fdc7c1496cff3811
Assignee: nobody → acelists
Attachment #8934013 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8934255 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8934255 [details] [diff] [review]
1422622-mockRegistrar.patch

Can you please get a Windows debug try run.
mozilla/mach xpcshell-test mailnews/base/test/unit/test_compactFailure.js
still fails locally as mentioned in bug 1401528 comment #135. It wouldn't be any good to land this and then back it out straight away.

The errors are:
"CONSOLE_MESSAGE: (error) [JavaScript Error: "uncaught exception: 2152857614" {file: "c:/mozilla-source/comm-central/obj-i686-pc-mingw32/_tests/xpcshell/mailnews/base/test/unit/test_compactFailure.js" line: 126}]"
and
"CONSOLE_MESSAGE: (error) [JavaScript Error: "uncaught exception: 2152857606" {file: "c:/mozilla-source/comm-central/obj-i686-pc-mingw32/_tests/xpcshell/mailnews/base/test/unit/test_compactFailure.js" line: 126}]"

2152857614 = 0x8052000E = NS_ERROR_FILE_IS_LOCKED
2152857606 = 0x80520006 = NS_ERROR_FILE_TARGET_DOES_NOT_EXIST

:-(
Those exceptions are actually expected and provoked by the test. The question is why they now cause the test to fail. Maybe they are just unexpectedly printed.
OK, looking a little closer at the debug console, I see:

 0:02.71 TEST_END: Thread-1 Harness FAIL, expected PASS. Subtests passed 2/2. Unexpected 1
xpcshell return code: 1
PROCESS-CRASH | mailnews/base/test/unit/test_compactFailure.js | application crashed [unknown top frame]
Crash dump filename: c:\mozilla-source\comm-central\obj-i686-pc-mingw32\temp\xpc-other-xz8c84\a719e9df-a7cf-48a2-9c62-7ce4ec702a1a.dmp
MINIDUMP_STACKWALK not set, can't process dump.
 0:02.73 LOG: MainThread INFO INFO | Result summary:
 0:02.73 LOG: MainThread INFO INFO | Passed: 0
 0:02.73 LOG: MainThread INFO INFO | Failed: 1
 0:02.73 LOG: MainThread INFO INFO | Todo: 0
 0:02.73 LOG: MainThread INFO INFO | Retried: 0
 0:02.73 SUITE_END: MainThread
Summary
=======

Ran 3 tests (1 parents, 2 subtests)
Expected results: 2
Unexpected results: 1 (FAIL: 1)

Unexpected Results
==================

mailnews/base/test/unit/test_compactFailure.js
----------------------------------------------
FAIL [Parent]

===

So looks like the tests passed, but the test harness failed. Sadly my try was busted by the latest M-C merge of 260 changesets in one hit. I triggered a build on C-C, so we'll see what's busted.
I can see the failure locally on Linux, but I do not see the crash there.

 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "2017-12-04 22:35:04<----->test.test<----->INFO<-->[Context: test.test:4 state: started] Starting test: test_compact_without_crash"
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "JavaScript error: tbird-bin/_tests/xpcshell/mailnews/base/test/unit/test_compactFailure.js, line 126: uncaught exception: 2152857614"
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "[9829, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8052000E: file mailnews/base/src/nsMsgFolderCompactor.cpp, line 294"
 0:01.10 TEST_STATUS: Thread-1 compact_with_exception PASS [compact_with_exception : 129] 2152857614 == 2152857614
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "2017-12-04 22:35:04<----->test.test<----->INFO<-->[Context: test.test:4 state: finished] Finished test: test_compact_without_crash"
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "2017-12-04 22:35:04<----->test.test<----->INFO<-->[Context: test.test:5 state: started] Starting test: teardown_output_stream_stub"
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "2017-12-04 22:35:04<----->test.test<----->INFO<-->[Context: test.test:5 state: finished] Finished test: teardown_output_stream_stub"
 0:01.29 PROCESS_OUTPUT: Thread-1 (pid:9829) "2017-12-04 22:35:04<----->test.test<----->INFO<-->[Context: test.test:6 state: started] Starting test: setup_open_failure_folder"
 0:01.11 LOG: Thread-1 INFO (xpcshell/head.js) | test _async_driver pending (3)
 0:01.11 LOG: Thread-1 INFO (xpcshell/head.js) | test _async_driver finished (3)
 0:01.11 LOG: Thread-1 INFO "Ignoring XPCOM error: NS_ERROR_FILE_IS_LOCKED"
 0:01.11 LOG: Thread-1 INFO "CONSOLE_MESSAGE: (error) [JavaScript Error: "uncaught exception: 2152857614" {file: "tbird-bin/_tests/xpcshell/mailnews/base/test/unit/test_compactFailure.js" line: 126}]"

There is even a mechanism to say this error is OK (the 'Ignoring XPCOM error'), but it something is still triggering the test to fail.
I was going to run the xpcshell test in the debugger with --interactive, but something I did now caused a full rebuild, so I can't do it today any more. Windows builds are busted, see the C-C treeherder and bug 1363667 comment #160.
For me the previous c-c push (https://hg.mozilla.org/comm-central/rev/f0026407a38866fcea2d5331dcdc2ad98eea45c5) caused a rebuild (I didn't update m-c). Fortunately ccache saved the day and it was really fast.
OK, I got that M-C bug fixed, so here's another try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=83b85511bbc83ff1294894b8cfb778e605de3101

I'll check out the crash in my local build when that finishes.
From my local output it actually looks like the main tests have passed, but some "parent" is triggering the failure.
Maybe the "run_test" or some other infrastructure is hanging around after the tests have run.

Ran 3 tests (1 parents, 2 subtests)
Expected results: 2
Unexpected results: 1 (FAIL: 1)

Unexpected Results
==================

mailnews/base/test/unit/test_compactFailure.js
----------------------------------------------
FAIL [Parent]

The crash on windows try server is interesting:
17:42:01     INFO -  PID 12188 | [12188, Main Thread] WARNING: some msg dbs left open: '!m_dbCache.Length()', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 80
17:42:01     INFO -  PID 12188 | db left open c:\\users\\cltbld~1.000\\appdata\\local\\temp\\xpc-profile-8bbvk6\\Mail\\Local Folders\\Trash.msf
17:42:01     INFO -  PID 12188 | db left open (null)
17:42:01     INFO -  PID 12188 | db left open
17:42:01     INFO -  <<<<<<<
17:42:01     INFO -  mozcrash Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld~1.000\appdata\local\temp\xpc-other-aldcdr\3d96387e-1259-4802-9577-b22e11b941fc.dmp C:\slave\test\build\symbols

Notice the open databases without name. I can see the Trash.msf open locally, but not the others. But adding more debugging to the nsMsgDBService::~nsMsgDBService() there are 6 DBs left open, but only one is "real" (Trash) and prints.
(In reply to :aceman from comment #12)
> From my local output it actually looks like the main tests have passed, but
> some "parent" is triggering the failure.
Umm, thanks for confirming. You're basically saying the same thing as I did in comment #7.

The try run from comment #11 has:
TEST-UNEXPECTED-FAIL | mailnews/base/test/unit/test_compactFailure.js | xpcshell return code: 1
PROCESS-CRASH | mailnews/base/test/unit/test_compactFailure.js | application crashed [@ strnlen + 0xb7]

I'll run the xpcshell test in the debugger locally with --interactive during the day. Note that the crash looks like those almost-impossible-to-catch crashes in bug 1377692 with [@ strnlen + 0xb7] :-( - Or maybe we're lucky now and we will catch it.
OK, there is something fishy going on here. We're running into the shutdown crash from bug 1377692. This is what I did:

$ mozilla/mach xpcshell-test --interactive mailnews/base/test/unit/test_compactFailure.js
Started the test with |_execute_test();|
The test ran through successfully, so I typed |quit();|, and bang, the shutdown crash from 1377692 which is, ta dah!!, also the crash I've lamented a few times from bug 1396655 comment #15. And here we have it finally:

ucrtbase.dll!_strnlen()	Unknown
ucrtbase.dll!__crt_stdio_output::output_processor<char,class __crt_stdio_output::console_output_adapter<char>,class __crt_stdio_output::standard_base<char,class __crt_stdio_output::console_output_adapter<char> > >::type_case_s(void)	Unknown
ucrtbase.dll!__crt_stdio_output::output_processor<char,class __crt_stdio_output::stream_output_adapter<char>,class __crt_stdio_output::standard_base<char,class __crt_stdio_output::stream_output_adapter<char> > >::state_case_type(void)	Unknown
ucrtbase.dll!__crt_stdio_output::output_processor<char,class __crt_stdio_output::stream_output_adapter<char>,class __crt_stdio_output::standard_base<char,class __crt_stdio_output::stream_output_adapter<char> > >::process(void)	Unknown
ucrtbase.dll!__crt_stdio_output::output_processor<char,class __crt_stdio_output::string_output_adapter<char>,class __crt_stdio_output::positional_parameter_base<char,class __crt_stdio_output::string_output_adapter<char> > >::output_processor<char,class __crt_stdio_output::string_output_adapter<char>,class __crt_stdio_output::positional_parameter_base<char,class __crt_stdio_output::string_output_adapter<char> > >(class __crt_stdio_output::string_output_adapter<char> const &,unsigned __int64,char const * const,struct __crt_locale_pointers * const,char * const)	Unknown
ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<class <lambda_db08b09ef7aa9d4f8620ce68402612bc>,class <lambda_4f2c1eaeead2a5fc776db5b62ea0fb9b> &,class <lambda_3ae262bc35d4ba6a3825c4f6bcaaf95c> >(class <lambda_db08b09ef7aa9d4f8620ce68402612bc> &&,class <lambda_4f2c1eaeead2a5fc776db5b62ea0fb9b> &,class <lambda_3ae262bc35d4ba6a3825c4f6bcaaf95c> &&)	Unknown
ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<class <lambda_db08b09ef7aa9d4f8620ce68402612bc>,class <lambda_4f2c1eaeead2a5fc776db5b62ea0fb9b> &,class <lambda_3ae262bc35d4ba6a3825c4f6bcaaf95c> >(class <lambda_db08b09ef7aa9d4f8620ce68402612bc> &&,class <lambda_4f2c1eaeead2a5fc776db5b62ea0fb9b> &,class <lambda_3ae262bc35d4ba6a3825c4f6bcaaf95c> &&)	Unknown
ucrtbase.dll!___stdio_common_vfprintf()	Unknown
xul.dll!_vfprintf_l(_iobuf * const _Stream, const char * const _Format, __crt_locale_pointers * const _Locale, char * _ArgList) Line 638	C++
xul.dll!printf(const char * const _Format, ...) Line 953	C++
xul.dll!nsMsgDBService::~nsMsgDBService() Line 85	C++
xul.dll!nsMsgDBService::Release() Line 68	C++
xul.dll!mozilla::services::`anonymous namespace'::ShutdownObserver::ShutdownServices() Line 15	C++
xul.dll!mozilla::services::`anonymous namespace'::ShutdownObserver::Observe(nsISupports * aSubject, const char * aTopic, const char16_t * aData) Line 78	C++
xul.dll!nsObserverList::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * someData) Line 112	C++
xul.dll!nsObserverService::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * aSomeData) Line 300	C++
xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager * aServMgr) Line 886	C++
xul.dll!XRE_XPCShellMain(int argc, char * * argv, char * * envp, const XREShellData * aShellData) Line 1389	C++
xul.dll!mozilla::BootstrapImpl::XRE_XPCShellMain(int argc, char * * argv, char * * envp, const XREShellData * aShellData) Line 57	C++
xpcshell.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 74	C++
xpcshell.exe!wmain(int argc, wchar_t * * argv) Line 114	C++

So it dies inside
  printf("db left open %s\n", (const char *) pMessageDB->m_dbName.get());
in nsMsgDBService::~nsMsgDBService() Line 85.

Backing out 2c194c1a63a9 from bug 1396655 does NOT fix the problem.
Comment on attachment 8934255 [details] [diff] [review]
1422622-mockRegistrar.patch

This is fine, but interestingly it triggers the crash that we've been trying to track down for a while now. So we can't just land this right now.
Attachment #8934255 - Flags: review+
(In reply to Jorg K (GMT+1) from comment #14)
> So it dies inside
>   printf("db left open %s\n", (const char *) pMessageDB->m_dbName.get());
> in nsMsgDBService::~nsMsgDBService() Line 85.

I think so too. I'm adding debugging to this nsMsgDBService::~nsMsgDBService() so see what the state it.

I've noticed that nsMsgDBService::~nsMsgDBService() is entered twice at the end of this test. Once open DBs is 0, second time it is 6 (but only one (Trash) passes the !pMessageDB check). The other 5 may be some dangling remnants which may cause the problems here.

I also fine the output "db left open (null)" suspicious. Is pMessageDB->m_dbName.get() null here? But we always initialize the name to "" so it shouldn't ever be a nullptr (unless "" is implemented as nullptr internally).
Disabling the printf() line in the DEBUG block fixes the crash (hidden on my machine) and passes the test...
Sure, but why does it crash sometimes and not others? What's the story about pMessageDB->m_dbName()?
I think I am missing some output, as if some printf() messages and the crash indication is missing from my log. I've pushed a new patch to try that should help, can you try it locally on Windows debug?
Hmm, your try run doesn't include the calendar patch and setting jsloader.shareGlobal back to true.

I applied it locally and I get:
"[6904, Main Thread] WARNING: leaking headers: 'false', file c:/mozilla-source/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 630"
"leaking 20 headers in c:\\mozilla-source\\comm-central\\obj-i686-pc-mingw32\\temp\\xpc-profile-wvhja0\\Mail\\Local Folders\\Trash.msf"
"[6904, Main Thread] WARNING: leaking headers: 'false', file c:/mozilla-source/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 630"
"leaking 10 headers in c:\\mozilla-source\\comm-central\\obj-i686-pc-mingw32\\temp\\xpc-profile-wvhja0\\Mail\\Local Folders\\Target.msf"
"[6904, Main Thread] WARNING: leaking headers: 'false', file c:/mozilla-source/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 630"
"leaking 10 headers in c:\\mozilla-source\\comm-central\\obj-i686-pc-mingw32\\temp\\xpc-profile-wvhja0\\Mail\\Local Folders\\ShouldFail.msf"
"dbs open 0"
"[6904, Main Thread] WARNING: some msg dbs left open: '!m_dbCache.Length()', file c:/mozilla-source/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 80"
"dbs open 6"
"DB open1: 0"
"DB open2: 0"
"db name"
"db name len1:103"
"db name len2:103"
"db left open c:\\mozilla-source\\comm-central\\obj-i686-pc-mingw32\\temp\\xpc-profile-wvhja0\\Mail\\Local Folders\\Trash.msf"
"DB open1: 1"
"DB open2: 1"
"db name"

compactFailure.js | application crashed [unknown top frame]

So you crash on accessing the name of the second open DB. Nice-ö :-(
Oops, ignore the comment about jsloader.shareGlobal, I got confused with the remaining failure in bug 1401528 :-(
Most likely the try run will yield the same result. No need for the opt run, BTW, that won't fail.
(In reply to Jorg K (GMT+1) from comment #20)
> Hmm, your try run doesn't include the calendar patch and setting
> jsloader.shareGlobal back to true.

Yes, the failure here is independent of the shareGlobal patches.
 

> "dbs open 0"
> "[6904, Main Thread] WARNING: some msg dbs left open: '!m_dbCache.Length()',
> file c:/mozilla-source/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp,
> line 80"
> "dbs open 6"
> "DB open1: 0"
> "DB open2: 0"
> "db name"
> "db name len1:103"
> "db name len2:103"
> "db left open
> c:\\mozilla-source\\comm-central\\obj-i686-pc-mingw32\\temp\\xpc-profile-
> wvhja0\\Mail\\Local Folders\\Trash.msf"
> "DB open1: 1"
> "DB open2: 1"
> "db name"

OK, then this is exactly what I see. So we crash at fetching (pMessageDB->m_dbName).get(), not attempting to printf("%s") on it. That's interesting as the last try on server produced 2 more "db left open" lines with "null" names.
I've created a patch to better visualize what is going on with caching the msg databases in the msgDBService. You can run with this patch on the test_compactFailure2.js

My analysis of the problem is this:
In the destructor of DBService where we show the names of the databases still in cache, we are really tripping on dead objects in the list, that is why printing/accessing the name crashes.
In the test some folders get registered (cached) in the normal DBService, but then we create the mock/fake DBservice. Some of the folders close while this mock DBService (cache) is running and try to unregister from it, which silently just does nothing. They never unregister from the original real DBService thus it holds pointers to now dead objects (whose destructor has run).
And when we try to access those, it crashes

When running this with the c-c MockFactory, this does not happen for some reason. The original DbService is restored after the test and the folders unregister from it properly, thus there are no dead pointers. I need to find out why MockFactory and MockRegistrar differ in this behaviour.

I also enabled some more debugging output, which now points out some leak of headers in some databases, which may cause these to not be destructed properly and still hang around when DBService is destructing, which is why it asserts and prints them out. The databases then destruct after the DBService is dead, which looks like incorrect order.

You can try to use this patch on the other bug which reports "some msg dbs left open", maybe you get some clues.
Comment on attachment 8934255 [details] [diff] [review]
1422622-mockRegistrar.patch

Review of attachment 8934255 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thx! r=mkmelin
Attachment #8934255 - Flags: review?(mkmelin+mozilla) → review+

I stumbled upon this, unbitrotted it and going to land it soon.
Successful try here: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=2cb1b05379958d73761958c7ae18063946c5e4a1

Target Milestone: --- → 83 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/d2efc80282cd
replace MockFactory with MockRegistrar in xpcshell tests. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Strange, I thought this isn't finished yet, as it didn't fully work.
Has some code changed that it now works properly?
Also, the new patch does not remove MockFactory.jsm, is it still used elsewhere?

So many things changed in 3 yrs, so no idea what fixed it. Tests do seem happy so should be alright.
Indeed seems I missed removing MockFactory.jsm when unbitrotting. Landing a follow-up soon.

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/320be10cad85
remove now unused MockFactory.jsm. rs=me
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: