Leaks detected by xpcshell tests failing on the atom leak assertion (NNTP)

RESOLVED FIXED in Thunderbird 54.0

Status

MailNews Core
Backend
--
critical
RESOLVED FIXED
5 months ago
2 months ago

People

(Reporter: aleth, Assigned: Jorg K (GMT+2))

Tracking

Trunk
Thunderbird 54.0
Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird_esr5253+ fixed, thunderbird53 fixed, thunderbird54 fixed)

Details

(Whiteboard: [Thunderbird-testfailure: X all debug])

Attachments

(4 attachments, 8 obsolete attachments)

5.67 KB, patch
aceman
: review+
Details | Diff | Splinter Review
5.27 KB, patch
Details | Diff | Splinter Review
4.04 KB, patch
aceman
: review+
Jorg K (GMT+2)
: approval-comm-aurora+
Jorg K (GMT+2)
: approval-comm-esr52+
Details | Diff | Splinter Review
1007 bytes, patch
aceman
: review+
Details | Diff | Splinter Review
(Reporter)

Description

5 months ago
PROCESS-CRASH | mailnews/imap/test/unit/test_imapFilterActions.js
PROCESS-CRASH | mailnews/imap/test/unit/test_imapFilterActionsPostplugin.js
PROCESS-CRASH | mailnews/db/gloda/test/unit/test_index_junk_imap_online.js
PROCESS-CRASH | mailnews/db/gloda/test/unit/test_index_junk_imap_offline.js
PROCESS-CRASH | mailnews/news/test/unit/test_server.js
PROCESS-CRASH | mailnews/news/test/unit/test_internalUris.js
(Assignee)

Updated

4 months ago
Whiteboard: [Thunderbird-testfailure]
(Assignee)

Updated

4 months ago
Whiteboard: [Thunderbird-testfailure] → [Thunderbird-testfailure: X all debug]
(Assignee)

Updated

4 months ago
Severity: normal → critical

Comment 1

4 months ago
Created attachment 8838956 [details]
Some short excerpts from valgrind/memcheck log regarding unreleased memory.

I am not sure where (which bugzilla) to comment.

This bug is about data structure not being released peroperly, correct?

I know this bug was mentioned for test failures in xpcshell-test.

I have been running |make mozmill| with full DEBUG build of C-C TB
and occasionally run this under valgrind/memcheck.
valgrind/memcheck no longer runs on latest Debian GNU/Linux kernel for unknown reason (I know it is a SIGSEGV but noone knows exactly why and where it is caused. I have nailed it down to a region of code where a fork() is called inside
about the time a helper program is called with the called stack. (this is the forked side, i.e. children
before spawn/clone. 
#0  0x0000000005bae19a in __libc_fork () at ../sysdeps/nptl/fork.c:145
#1  0x0000000013fe7347 in fire_glxtest_process() ()
    at /NREF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/glxtest.cpp:313
#2  0x0000000013fddfbe in XREMain::XRE_mainInit(bool*) (this=<optimized out>, aExitFlag=<optimized out>)
    at /NREF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:3118
#3  0x0000000013fe223d in XREMain::XRE_main(int, char**, (anonymous namespace)::BootstrapConfig const&) (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aConfig=...)
    at /NREF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4628
The caller/parent experiences a SIGSEGV after the fork was invoked.
This problem is kernel specific and dependent on some features such as heap/stack randomization and MMAP features, and phase of the moon when the kernel was compiled (so many kernel options are different from the kernel where valgrind can run C-C TB successfully and the kernel where it fails. It is almost impossible to tell what is going on. My current theory is that after fork() is called and some twiddling of getenv/setenv is done BEFORE the signal handler is modified/reset for handling the signal propagation 
betwen the child and parent correctly, a sigsegv condtion arises and since it is not trapped very well at the time, there is no clear explanation of what went wrong and simply valgrind segfaults. Current educated theory. But I digress.)

When valgrind/C-C TB combination runs well under locally built 3.19.5 kernel, and finishes more or less cleanly, I realize that I can capture the UNRELEASED memory (i.e., leaked memory) very clearly.
Mind you, this records ALL the allocated memory structure during run-time which the developer may not care aoubt: for example, if a top-level global variable contains the copy (strdup'ed) option strings at the time of program termination, probably noone cares.
Or for that matter if a whole JavaString environment is kept around because of bad release practice,
we are overwhelmed with unreleased data structure that were created and attached to that JS envrionment.

However, valgrind/memcheck records all such allocations done during execution and still
left unallocated at the time of termination.

So the data structure for JS environment (presumably the root of such big data structure) is reported
in the valgrind/memcheck log.
At least, someone in the know should be able to identify such data structure (valbrind/memcheck does not say exactly what the memory blob is. But it reports the SIZE of such memory blob and WHERE
it is allocated in a nice stack trace.
Unfortunately, the allocation in JS code will be reported by way of JS interpreter and so may not be
very revealing to what JS source file and line caused the allocation to happen.

If someone is interested in looking at the log, I can make the log available somewhere.
It is HUGE. (256MB and 16+MB compressed).

Some interesting short excerpts are attached.

Comment 2

4 months ago
If we can restrict the |make mozmill| run to a test that shows similar errors as noted by xpcshell-test,
then the turn around to see the fix works is rather short.
Hoever, full |make mozmill| run takes more than 24 hours on my PC currently (due to the
debug/Full DEBUG) version. I need to co-relate the messages I dump explicitly with some memory issues reported by valgrind/memcheck.
(Assignee)

Comment 3

4 months ago
Note that test_nsMsgDBView.js and test_junkingWhenDisabled.js also fail again due to a backout in bug 1334874.
(Assignee)

Comment 4

4 months ago
Documenting the process of trying to find leaks.

First replace the NS_ASSERT() at nsAtomTable.cpp:411 with a print so application doesn't crash before logs are closed.

According to
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Refcount_tracing_and_balancing
ran:

XPCOM_MEM_LOG_CLASSES=nsMsgNewsFolder XPCOM_MEM_REFCNT_LOG=$HOME/refcounts.log mozilla/mach xpcshell-test mailnews/news/test/unit/test_internalUris.js

python mozilla/tools/rb/find_leakers.py $HOME/refcounts.log

Output:
000000000C4B6C00 (1) @ <nsMsgNewsFolder>

According to Aleth' instructions from bug 1334874 comment #35 repeated with
XPCOM_MEM_LOG_CLASSES=nsMsgNewsFolder XPCOM_MEM_LOG_OBJECTS=1 XPCOM_MEM_REFCNT_LOG=$HOME/refcounts.log mozilla/mach xpcshell-test mailnews/news/test/unit/test_internalUris.js

python mozilla/tools/rb/find_leakers.py $HOME/refcounts.log

Output:
000000000C5D6C00 (1) @ <nsMsgNewsFolder>

Then
perl -w mozilla/tools/rb/make-tree.pl --ignore-balanced --object 000000000C5D6C00 < $HOME/refcounts.log > $HOME/xx.log

Now the art is to interpret the resulting file which is huge.

Also a quick run through for test_server.js:
XPCOM_MEM_LOG_CLASSES=nsMsgNewsFolder XPCOM_MEM_REFCNT_LOG=$HOME/refcounts1.log mozilla/mach xpcshell-test mailnews/news/test/unit/test_server.js
python mozilla/tools/rb/find_leakers.py $HOME/refcounts1.log
000000000C84D400 (1) @ <nsMsgNewsFolder>
XPCOM_MEM_LOG_CLASSES=nsMsgNewsFolder XPCOM_MEM_LOG_OBJECTS=1 XPCOM_MEM_REFCNT_LOG=$HOME/refcounts1.log mozilla/mach xpcshell-test mailnews/news/test/unit/test_server.js
python mozilla/tools/rb/find_leakers.py $HOME/refcounts1.log
000000000C34F400 (1) @ <nsMsgNewsFolder>
perl -w mozilla/tools/rb/make-tree.pl --ignore-balanced --object 000000000C34F400 < $HOME/refcounts1.log > $HOME/xx1.log

With luck, test_internalUris.js and test_server.js detect the same leak somewhere in news code.
(Assignee)

Comment 5

4 months ago
I inspect the code and found a few leaking objects:
https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNewsFolder.cpp#1624
https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNewsFolder.cpp#1652

https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNntpService.cpp#1729
This one could be repaired with a RefPtr. Leaking the object misses running the destructor:
https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNewsDownloader.cpp#343

Also, these constructors seem funny:
https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNewsDownloader.cpp#306
https://dxr.mozilla.org/comm-central/rev/b3cdf98b16ef062856ad86526c8635e5eff27ac6/mailnews/news/src/nsNewsDownloader.cpp#320
since they call the base class constructors bug also assign |m_newsDB = newsDB;| which the base class constructor already does.
(Assignee)

Comment 6

4 months ago
Created attachment 8840404 [details] [diff] [review]
1337865-unrelated-news-leaks.patch

I found these leaks, but they don't fix
PROCESS-CRASH | mailnews/news/test/unit/test_server.js
PROCESS-CRASH | mailnews/news/test/unit/test_internalUris.js
Too bad.
(Assignee)

Comment 7

4 months ago
Created attachment 8840421 [details] [diff] [review]
1337865-unrelated-news-leaks.patch (v2).
Attachment #8840404 - Attachment is obsolete: true
(Assignee)

Comment 8

4 months ago
Created attachment 8840423 [details] [diff] [review]
1337865-unrelated-news-leaks.patch (v2b).
Attachment #8840421 - Attachment is obsolete: true

Comment 9

4 months ago
Created attachment 8840473 [details]
Log of running xpcshell-test mailnews/imap/test/unit/test_imapFilterAction under valgrind/memcheck.js

I am uploading log from a run of the following command:
$ mach xpcshell-test --verbose --debugger valgrind --debugger-args="--malloc-fill=0xA5 --free-fill=0xC3 --leak-check=full --show-mismatched-frees=no" mailnews/imap/test/unit/test_imapFilterActions.js

This means that I ran the test for test_imapFilterActions.js, and let valgrind/memcheck to report memory leaks at the time of TB binary terminating.

I have not patched my local source tree with the suggestions of comment 4, and so when unfreed atoms are detected, the abort is taken and so maybe objects that would have been released if abort is not taken are present as leaks.

But you get the idea what valgrind/memcheck can report.

I wonder if this may be helpful in addition to the high-level leak detector mentioned previously (as in comment 4).

If this looks promising, I can tweak the local source and run the tests with similar valgrind/memcheck setting. 

BTW, the intense slow down by memcheck (since it emulates each instruction to check for memory access issues), some valid operation inside C-C TB may timeout occasionally and the test may report something which is not usually seen.
(However, |mach| seems to understand the timeout issue rather well by seemingly extending the timeout internally to avoid such errors. Unfortunately we may be seeing one such error here.)

Suggestions welcome.

TIA
(Assignee)

Comment 10

4 months ago
Looking at test_internalUris.js again and commenting out the subtests, I found the test that causes the leak, it's test_grouplist() and in there this statement:
  subserver.subscribeListener = subscribeListener;
Changing that to
  subserver.subscribeListener = null;
still leaks. I did some debugging in nsSubscribableServer::SetSubscribeListener(): mSubscribeListener which held a null pointer gets assigned a null pointer. So that can possibly not leak.

Reducing the test to
function* test_grouplist() {
  let subserver = localserver.QueryInterface(Ci.nsISubscribableServer);
  subserver.subscribeListener = null;
leaks, and strangely enough it leaks 22 atoms, the ones from nsMsgDBFolderAtomList.h and "subscribe" and "nntp" which are allocated here:
https://dxr.mozilla.org/comm-central/rev/819fc5b85f129d9ab1c4ab222c68ce4aba65ae7a/mailnews/news/src/nsNntpIncomingServer.cpp#109

So by the looks of if, the whole nsNntpIncomingServer object doesn't get destroyed, which I've just confirmed, the constructor runs but not the destructor.
(Assignee)

Comment 11

4 months ago
OK, some more debugging. The nsNntpIncomingServer is the server that's created in the test in the run_test function like so: |localserver = setupLocalServer(server.port);|.

It gets created in the nsMsgAccountManager::createKeyedServer() like so:
  nsCOMPtr<nsIMsgIncomingServer> server =
           do_CreateInstance(serverContractID.get(), &rv);

As soon as in the test I have:
  let subserver = localserver.QueryInterface(Ci.nsISubscribableServer);
  subserver.subscribeListener = null;
or
  localserver instanceof Ci.nsISubscribableServer;
  localserver.subscribeListener = null;

the object held in localserver leaks, that is, its destructor isn't run. When removing the line with the assignment, the destructor is run. I find that very weird, Kent, can you shed some light on this?
Flags: needinfo?(rkent)
(Assignee)

Comment 12

4 months ago
Also leaking:
function* test_grouplist() {
  let subserver = localserver.QueryInterface(Ci.nsISubscribableServer);
  let x = subserver.subscribeListener;

Or variations of that:
  subserver.delimiter = ",";
  let x = subserver.delimiter;

Not leaking:
function* test_grouplist() {
  let subserver = localserver.QueryInterface(Ci.nsISubscribableServer);
  if (subserver) {
    dump("huhu\n");
  }

So basically any access to member variables of the "cast" object will cause it to leak.
(Assignee)

Comment 13

4 months ago
I did a bit more poking around. I checked that the nsMsgAccountManager object is destroyed. Good. I checked that nsMsgAccountManager::UnloadAccounts() is run, good, since that empties out internal references to incoming servers which are helps in some sort of list (m_incomingServers).

Next I tried doing the cast in C++. In nsMsgAccountManager::createKeyedServer() I added:

  nsCOMPtr<nsIMsgIncomingServer> server =
           do_CreateInstance(serverContractID.get(), &rv);
  NS_ENSURE_SUCCESS(rv, NS_ERROR_NOT_AVAILABLE);
  
  nsCOMPtr<nsISubscribableServer> ss = do_QueryInterface(server);
  printf("=== SS=%p\n", (void*)ss);

That doesn't leak.

  nsCOMPtr<nsIMsgIncomingServer> server =
           do_CreateInstance(serverContractID.get(), &rv);
  NS_ENSURE_SUCCESS(rv, NS_ERROR_NOT_AVAILABLE);
  
  nsCOMPtr<nsISubscribableServer> ss = do_QueryInterface(server);
  printf("=== SS=%p\n", (void*)ss);
  
  if (ss) {
    char d;
    ss->GetDelimiter(&d);
  }

That leaks. So as we've seen in the JS example, any access to the "cast" nsISubscribableServer object causes a leak.

Finally, I found the problem. nsNntpIncomingServer::GetDelimiter() calls EnsureInner() and that creates a new incoming server, d'oh!!

So no XPCOM magic gone wrong, just bad code!!
Flags: needinfo?(rkent)
(Assignee)

Comment 14

4 months ago
Comment on attachment 8840423 [details] [diff] [review]
1337865-unrelated-news-leaks.patch (v2b).

Miscellaneous leaks found by code inspection and addref/release removals. Try run to come.
Attachment #8840423 - Flags: review?(acelists)
(Assignee)

Comment 15

4 months ago
Created attachment 8841270 [details] [diff] [review]
1337865-subscribeCleanup.patch fixes test test_internalUris.js

As discussed with Kent on IRC, we should remove mIncomingServer from nsSubscribableServer.cpp, since that back reference will dead-lock object destruction. But that's for another bug.
Attachment #8841270 - Flags: review?(acelists)

Comment 16

4 months ago
(In reply to Jorg K (GMT+1) from comment #15)
> As discussed with Kent on IRC, we should remove mIncomingServer from
> nsSubscribableServer.cpp, since that back reference will dead-lock object
> destruction.

This idea is experimentally implemented at 
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=906c3e9295c4251a246488c2a989b32026129553

It seems to only fix mailnews/news/test/unit/test_server.js but not the others.
(Assignee)

Comment 17

4 months ago
Created attachment 8841302 [details] [diff] [review]
leak-search.patch

And you waited for the result until 3 AM. That's dedication ;-)

While I was hoping for more tests to get fixed, it's good that fixing a problem we recognised at least fixes one test.

Let's return to test_internalUris.js. As we diagnosed, the nsNntpIncomingServer holds a reference to the nsSubscribableServer object and vice versa. With your change, we lose the back reference.

This *DOES* fix one of the problems I detected. With your patch applied, the reduced test passes and the nsNntpIncomingServer object is destroyed. Run with this debug patch to see it.

Remember that the leaked object I found was a nsMsgNewsFolder (comment #4) and not a nsNntpIncomingServer, for which I didn't look. So there are more leaks to be found here.
(Assignee)

Comment 18

4 months ago
Created attachment 8841303 [details] [diff] [review]
1337865-break-reference-loop.patch

I think this is the way to go. It fixes one test and part of another.
Attachment #8841303 - Flags: review+
(Assignee)

Comment 19

4 months ago
Created attachment 8841305 [details] [diff] [review]
1337865-release-reference.patch - fixes test test_internalUris.js

This together with attachment 8841303 [details] [diff] [review] fixes test_internalUris.js.
Attachment #8841270 - Attachment is obsolete: true
Attachment #8841270 - Flags: review?(acelists)
Attachment #8841305 - Flags: review?(acelists)
(Assignee)

Comment 20

4 months ago
Comment on attachment 8840473 [details]
Log of running xpcshell-test mailnews/imap/test/unit/test_imapFilterAction under valgrind/memcheck.js

With all due respect, I don't think this helps.
Attachment #8840473 - Attachment is obsolete: true
(Assignee)

Updated

4 months ago
Attachment #8838956 - Attachment is obsolete: true
(Assignee)

Comment 21

4 months ago
Let's do the IMAP and Gloda parts in bug 1342745.
Summary: Leaks detected by xpcshell tests failing on the atom leak assertion → Leaks detected by xpcshell tests failing on the atom leak assertion (NNTP)
(Assignee)

Comment 22

4 months ago
Try run with the three patches here plus the "view cycle leaks" one from bug 1334874 (and one from bug 1339811) for a greener result:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=223953da26deb389039f473922a951bc2d029f90
I'm not convinced that Aleth "view cycle leaks" patch from bug 1334874 wasn't the way to go. It has some opposition from Kent and Aceman, but it's not correct that it only fixed tests: It did address issues in code.

This this, we should be down to four failures, IMAP and Gloda from bug 1342745.

Comment 23

4 months ago
Created attachment 8841312 [details] [diff] [review]
1337865-break-reference-loop.patch v1.1

Let's add some comment on why we do not store exactly what is passed to us. Carrying over r=jorgk.
Attachment #8841303 - Attachment is obsolete: true
Attachment #8841312 - Flags: review+

Comment 24

4 months ago
Comment on attachment 8840423 [details] [diff] [review]
1337865-unrelated-news-leaks.patch (v2b).

Review of attachment 8840423 [details] [diff] [review]:
-----------------------------------------------------------------

I assume these leaks do not cause test failures. But the fixed look useful.
Attachment #8840423 - Flags: review?(acelists) → review+
(Assignee)

Comment 25

4 months ago
(In reply to :aceman from comment #23)
> Let's add some comment on why we do not store exactly what is passed to us.
> Carrying over r=jorgk.
Great!

(In reply to :aceman from comment #24)
> I assume these leaks do not cause test failures. But the fixes look useful.
They do, because they avoid running up a high memory bill. Besides, that addref/release stuff is extremely error-prone and ugly. But let's check that those changes don't do damage. That's why I started that try run (comment #22) with all the accumulated goodies ;-)

Comment 26

4 months ago
Comment on attachment 8841305 [details] [diff] [review]
1337865-release-reference.patch - fixes test test_internalUris.js

Review of attachment 8841305 [details] [diff] [review]:
-----------------------------------------------------------------

It's interesting that this is needed. I would expect the reference to be killed when the let-declared variable goes out of scope (at the end of test_grouplist). Maybe because it is a generator we never run it to the end of the function because we stop execution on the 'yield true', so it keeps handing around waiting if we call it again. Maybe we should call .return() on the generator when it yields true in https://hg.mozilla.org/comm-central/file/tip/mailnews/test/resources/asyncTestUtils.js
Attachment #8841305 - Flags: feedback?(rkent)
Attachment #8841305 - Flags: feedback?(arai.unmht)
(Assignee)

Comment 27

4 months ago
(In reply to :aceman from comment #26)
> ... so it keeps handing around waiting if we call it again.
That's what I thought. Alternate fixes welcome, especially those, which will fix more tests ;-)
(Assignee)

Comment 28

4 months ago
(In reply to Jorg K (GMT+1) from comment #22)
> Try run with the three patches here plus the "view cycle leaks" one from bug
> 1334874 (and one from bug 1339811) for a greener result:
> https://treeherder.mozilla.org/#/jobs?repo=try-comm-
> central&revision=223953da26deb389039f473922a951bc2d029f90
> With this, we should be down to four failures, IMAP and Gloda from bug
> 1342745.
Right, four failures left, the ones I moved to bug 1342745.

Aceman, how much more time would you like to invest into understanding JS GC when I'm pretty sure that we fixed the TB code issues here ;-)
Comment on attachment 8841305 [details] [diff] [review]
1337865-release-reference.patch - fixes test test_internalUris.js

Review of attachment 8841305 [details] [diff] [review]:
-----------------------------------------------------------------

at least the lexical scope itself doesn't hold the reference at the end of test.
I don't see the scope alive in cc log.

maybe null-ing the somehow triggers cc for underlying XPCOM things, I guess.
as said in IRC, doing Cu.forceCC() at the end of test may give us some hint.
Attachment #8841305 - Flags: feedback?(arai.unmht)
(Assignee)

Comment 30

4 months ago
Created attachment 8841334 [details] [diff] [review]
1337865-release-reference.patch (v2) - fixes test test_internalUris.js

Alternate solution only nulling the listener.
Attachment #8841305 - Attachment is obsolete: true
Attachment #8841305 - Flags: review?(acelists)
Attachment #8841305 - Flags: feedback?(rkent)
Attachment #8841334 - Flags: review?(acelists)
(Assignee)

Comment 31

4 months ago
Created attachment 8841335 [details] [diff] [review]
1337865-release-reference.patch (v2b) - fixes test test_internalUris.js

Sorry about the noise. Fixed comment and commit message.

The sad thing is that the test still leaks two databases:
db left open c:\\users\\jorgk\\appdata\\local\\temp\\firefox\\xpcshellprofile\\News\\localhost\\test.filter.msf"
db left open c:\\users\\jorgk\\appdata\\local\\temp\\firefox\\xpcshellprofile\\News\\localhost\\test.malformed&-name.msf"

I suggest that once we plugged the leaking atoms, we invest some time into stopping those leaked databases.
Attachment #8841334 - Attachment is obsolete: true
Attachment #8841334 - Flags: review?(acelists)
Attachment #8841335 - Flags: review?(acelists)

Updated

4 months ago
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
OS: Unspecified → All
Hardware: Unspecified → All
Version: unspecified → Trunk

Comment 32

4 months ago
Comment on attachment 8841335 [details] [diff] [review]
1337865-release-reference.patch (v2b) - fixes test test_internalUris.js

Review of attachment 8841335 [details] [diff] [review]:
-----------------------------------------------------------------

I like this version as it is a balanced operation. We set the listener then remove it.
Attachment #8841335 - Flags: review?(acelists) → review+
(Assignee)

Comment 33

4 months ago
https://hg.mozilla.org/comm-central/rev/d4e70f7093cf4a40c7d17c6c80bdf7b39ff8db37
https://hg.mozilla.org/comm-central/rev/5f72602f1ff089df3a141c104facc656efdc6cd8
https://hg.mozilla.org/comm-central/rev/ef7420896e68b4a9ffd30c7ad50ac0cb9c25b990
https://hg.mozilla.org/comm-central/rev/39ee657ffbfd508614ff577604ab750b18cf5061
For bonus points I cleaned up the white-space in nsNntpIncomingServer.cpp, and yes, I did a 'diff' ignoring white-space to ensure I didn't mess it up ;-)
Status: ASSIGNED → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 54.0
(Assignee)

Comment 34

4 months ago
Comment on attachment 8841312 [details] [diff] [review]
1337865-break-reference-loop.patch v1.1

We should uplift this in due course to fix leaks and tests on the branches.
Attachment #8841312 - Flags: approval-comm-beta?
Attachment #8841312 - Flags: approval-comm-aurora?

Comment 35

4 months ago
(In reply to Jorg K (GMT+1) from comment #31)
> Created attachment 8841335 [details] [diff] [review]
> 1337865-release-reference.patch (v2b) - fixes test test_internalUris.js
> 
> Sorry about the noise. Fixed comment and commit message.
> 
> The sad thing is that the test still leaks two databases:
> db left open
> c:
> \\users\\jorgk\\appdata\\local\\temp\\firefox\\xpcshellprofile\\News\\localho
> st\\test.filter.msf"
> db left open
> c:
> \\users\\jorgk\\appdata\\local\\temp\\firefox\\xpcshellprofile\\News\\localho
> st\\test.malformed&-name.msf"
> 
> I suggest that once we plugged the leaking atoms, we invest some time into
> stopping those leaked databases.

This would be great, and I think would have a significant impact on memory in the wild. The test doesn't set msgDatabase = null for either folder, but I suspect even if it did, and even if that were followed up with an explicit GC, there would be a 'leak'. Meaning that eventually the msgDatabase would be released, just not when expected/measured.
(Assignee)

Comment 36

4 months ago
(In reply to alta88 from comment #35)
> Meaning that eventually the msgDatabase would be released, just not when expected/measured.
Yes, that's an interesting aspect which we'll investigate in bug 1342858. Help welcome!

Comment 37

4 months ago
(In reply to Jorg K (GMT+1) from comment #12)
bableServer);
>   if (subserver) {
>     dump("huhu\n");
>   }
> 
> So basically any access to member variables of the "cast" object will cause
> it to leak.

Great detective work.
I think the setting of the attribute (even if the value is null) makes the object to have a permanent value and thus non-eligible for automatic destruction.
(Assignee)

Comment 38

4 months ago
Let's keep to the facts here and not create any myth: The problem was caused by a "reference loop", A points to B and B points to A, so unless you manually break the loop, these objects won't be refcounted down and die. We removed the back reference, one test got fixed and a second one needed a little helping along by triggering JS GC.

> So basically any access to member variables of the "cast" object will cause
> it to leak.
That was *only* true in this case, since the access internally created a new object ("composition": https://en.wikipedia.org/wiki/Composition_over_inheritance) which had the back reference.

Comment 39

4 months ago
(In reply to Jorg K (GMT+1) from comment #38)
> Let's keep to the facts here and not create any myth: The problem was caused
> by a "reference loop", A points to B and B points to A, so unless you
> manually break the loop, these objects won't be refcounted down and die. We
> removed the back reference, one test got fixed and a second one needed a
> little helping along by triggering JS GC.
> 
> > So basically any access to member variables of the "cast" object will cause
> > it to leak.
> That was *only* true in this case, since the access internally created a new
> object ("composition":
> https://en.wikipedia.org/wiki/Composition_over_inheritance) which had the
> back reference.

Thanks for the clarification. I am not familiar with the internal implementation of JS symbol tables, and such.

So, OK, in this case, the reference to delimiter in JS code a la
let x = subserver.delimiter;

causes the corresponding Getter function via XPCOM and
the call of GetDelimiter() automatically created the loop-reference. Tough.

I am afraid that there may be tons of such yet-to-be-discovered/dormant issues in resolving the
unclosed database reference. :-(

Well, but as you say, we finally get proper attention on this issue. That *IS* great.

TIA
(Assignee)

Comment 40

4 months ago
Aurora (TB 53):
https://hg.mozilla.org/releases/comm-aurora/rev/c2477d8ef1dddbb12f6b02511e2d829965759eb0
https://hg.mozilla.org/releases/comm-aurora/rev/c405fd514d27ced105c407a1e5034e2aceede4ad
https://hg.mozilla.org/releases/comm-aurora/rev/55c06474b49149a7684e31ee49f10f4f601f0c39
https://hg.mozilla.org/releases/comm-aurora/rev/9ef94779740488b7a12f015a9d1084093adbd499
status-thunderbird52: --- → affected
status-thunderbird53: --- → fixed
status-thunderbird54: --- → fixed
status-thunderbird_esr52: --- → affected
tracking-thunderbird_esr52: --- → +
(Assignee)

Updated

4 months ago
Attachment #8841312 - Flags: approval-comm-aurora? → approval-comm-aurora+
(Assignee)

Updated

4 months ago
Attachment #8841312 - Flags: approval-comm-esr52?

Updated

4 months ago
Depends on: 1346338
(Assignee)

Updated

4 months ago
Attachment #8841312 - Flags: approval-comm-beta?
(Assignee)

Comment 41

2 months ago
TB 52 ESR:
https://hg.mozilla.org/releases/comm-esr52/rev/1972ef3b71edd283f2c25f8be667ada4fdbcf221
https://hg.mozilla.org/releases/comm-esr52/rev/d1d78fd2816cd234cc062b1652d52bfd3bf1a8d0
https://hg.mozilla.org/releases/comm-esr52/rev/1abefe79835a7801968435c1914351851123f084
https://hg.mozilla.org/releases/comm-esr52/rev/cc329ac550c6bd64a7cfb7587c300fd8c78e07c8
status-thunderbird52: affected → ---
status-thunderbird_esr52: affected → fixed
tracking-thunderbird_esr52: + → 53+
(Assignee)

Updated

2 months ago
Attachment #8841312 - Flags: approval-comm-esr52? → approval-comm-esr52+
You need to log in before you can comment on or make changes to this bug.