Open Bug 1596649 Opened 2 years ago Updated 3 months ago

Leaks memory due to reference count cycle via nsMsgQuote

Categories

(MailNews Core :: Composition, defect, P2)

Unspecified
All
defect

Tracking

(thunderbird_esr78+ affected, thunderbird83 affected)

ASSIGNED
84 Branch
Tracking Status
thunderbird_esr78 + affected
thunderbird83 --- affected

People

(Reporter: jcranmer, Assigned: benc)

References

Details

(Keywords: leave-open, memory-leak)

Attachments

(2 files, 1 obsolete file)

:darktrojan pointed me to an impressively major leak log involving composition. Some selected lines:
[task 2019-11-15T01:49:40.497Z] 01:49:40 INFO - 754 |QuotingOutputStreamListener | 168 1008| 6 6|
[task 2019-11-15T01:49:40.589Z] 01:49:40 INFO - 1641 |nsMsgQuote | 72 432| 6 6|
[task 2019-11-15T01:49:40.589Z] 01:49:40 INFO - 1642 |nsMsgQuoteListener | 32 192| 6 6|

All of these instances are being created but never freed. There appears to be a reference count cycle going on here:
https://searchfox.org/comm-central/source/mailnews/compose/src/nsMsgCompose.cpp#2958 creates a QuotingOutputStreamListener, who helpfully stores it in a strong reference: https://searchfox.org/comm-central/source/mailnews/compose/src/nsMsgCompose.cpp#2958.
Then, nsMsgCompose passes it to nsMsgQuote::QuoteMessage (https://searchfox.org/comm-central/source/mailnews/compose/src/nsMsgQuote.cpp#87), who happily stores the listener parameter into the strong reference on the class itself.

One of those references should be a weak reference. Possibly nsMsgQuote's listener, actually, since it's only actually held in nsMsgQuote to pass to nsStreamConverter via a hidden channel, and I think AsyncConvertData is actually called synchronously, so a strong-ref across that call is all that is needed.

Keywords: memory-leak
Priority: -- → P2
OS: Unspecified → All
Version: unspecified → Trunk
Assignee: nobody → benc

Not sure this'll plug the leak, but here's an easy change that removes at least one hard reference.
If this doesn't fix it I'll dig deeper. I did start mapping it all out, but the ownership of the various objects gets very murky very quickly...

How do I generate the leak log :darktrojan reported? Is it a case of fiddling with XPCOM_MEM_ environment variables? (I think Magnus said that try builds might generate them - there's a try run here, but I didn't spot any obvious leak logging)

Attachment #9155856 - Flags: review?(Pidgeot18)

Has Joshua become more active lately (I don't follow Matrix chat) or should you find a different reviewer?

He's around matrix, dunno about bugzilla.

You need to look at a debug build, there's at least the "LEAK AND BLOAT STATISTICS" section, like https://firefoxci.taskcluster-artifacts.net/Qqx7byQMTdOHnD7-cMz5fw/0/public/logs/live_backing.log

See https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Mochitest#Diagnosing_and_fixing_leakcheck_failures

(In reply to Ben Campbell from comment #1)

How do I generate the leak log :darktrojan reported? Is it a case of fiddling with XPCOM_MEM_ environment variables? (I think Magnus said that try builds might generate them - there's a try run here, but I didn't spot any obvious leak logging)

XPCOM_MEM_LEAK_LOG generates essentially the view you find in the logs. For local failures, you can run XPCOM_MEM_REFCNT_LOG and XPCOM_MEM_COMPTR_LOG to figure out which reference isn't being freed (with a healthy dose of mozilla-central/tools/rb/find_leakers.py).

Comment on attachment 9155856 [details] [diff] [review]
1596649-remove-nsMsgCompose-mQuoteStreamListener-1.patch

Review of attachment 9155856 [details] [diff] [review]:
-----------------------------------------------------------------

This misses the actual refcount cycle. It's QuotingOutputStreamListener's strong ref of nsMsgQuote, as well as nsMsgQuote's strong ref of the listener that's the issue.
Attachment #9155856 - Flags: review?(Pidgeot18) → review-

Take two :-)

I went with making nsMsgQuote's reference to the QuotingOutputStreamListener a weak one, mostly because it felt like nsMsgQuote should 'contain' the streamlistener. But I never quite worked out the proper relationship between the various actors here - it feels more like a soup of objects.

I wasn't able to replicate the leak locally - my build is a debug one and exiting is pretty flaky. It tends to crash and not display the leak summary. I'll do release build with --enable-logrefcnt and try again, but I think this patch to break the co-dependent refs is worthwhile, even if the leak turns out to be something else.

Attachment #9157856 - Flags: review?(Pidgeot18)

Oh, I did do a try build, but there was some breakage at the time so I'm not sure it's useful:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=48ad26ec5eb952baacdc7ef247cc6d7f0ff11893

It looks useful to me. There's a big drop in the number of bytes leaked† when the composition tests are run, and the three lines mentioned in comment 0 have gone.

60783032 vs 70245921 on Linux64 although it may not be an exact comparison. (Find the string "bytes leaked".)

I was hoping this would clean up far more leaks than it actually did. While the apparent improvement in leaked code is ~10MB (!), I doubt we fixed more than a couple of KB with this leak. The main source of the leaks is the entire compose window being leaked (indicated by the presence of nsMsgComposeParams in the leaks; it's not used outside of initializing a compose window), and leaking a full HTML/XUL window causes entire DOMs to be leaked, and that's where the MB of leaks come from.

Still, that's no reason not to accept this patch.

Comment on attachment 9157856 [details] [diff] [review]
1596649-break-cycle-in-nsMsgQuote-1.patch

Review of attachment 9157856 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/src/nsMsgCompose.h
@@ +146,5 @@
>  // THIS IS THE CLASS THAT IS THE STREAM Listener OF THE HTML OUTPUT
>  // FROM LIBMIME. THIS IS FOR QUOTING
>  ////////////////////////////////////////////////////////////////////////////////////
> +class QuotingOutputStreamListener : public nsIMsgQuotingOutputStreamListener,
> +                                    nsSupportsWeakReference {

This should be "public nsSupportsWeakReference", no?
Attachment #9157856 - Flags: review?(Pidgeot18)
Summary: Leaks due to reference count cycle via nsMsgQuote → Leaks memory due to reference count cycle via nsMsgQuote

Let's get this finished up and landed. We can improve more in other bugs.

Status: NEW → ASSIGNED

Just adds the missing 'public' as per Comment 10.
More to do on this bug but yes: some improvement is better than none.

Attachment #9157856 - Attachment is obsolete: true
Attachment #9184442 - Flags: review?(mkmelin+mozilla)
Keywords: leave-open

Unless you have very concrete plans for fixing further cases atm, I'd suggest to close this after landing and clone another bug to track down further cases to fix.

Target Milestone: --- → 84 Branch
Comment on attachment 9184442 [details] [diff] [review]
1596649-break-cycle-in-nsMsgQuote-2.patch

Review of attachment 9184442 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, r=mkmelin
Attachment #9184442 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/628f8f5c15e0
Break leaking cyclic reference in nsMsgQuote. r=mkmelin DONTBUILD

(In reply to Magnus Melin [:mkmelin] from comment #13)

Unless you have very concrete plans for fixing further cases atm, I'd suggest to close this after landing and clone another bug to track down further cases to fix.

I didn't want to lose track - I've got other stuff to work on first but I'd like to come back to this one.
My plan is to sit down and just manually map out all the code - which classes derive/interact with each other. Lots of awful UML scribblings :-)
Once the overall shape is sketched out, a lot of these kinds of ownership/reference issues become clear. (caveat: the compose code has a bit of a "Here be dragons" reputation)

Time to close this perhaps?

You need to log in before you can comment on or make changes to this bug.