Closed Bug 844148 Opened 11 years ago Closed 10 years ago

Leaks MBs in a few minutes in nsImapFlagAndUidState::ClearCustomFlags

Categories

(MailNews Core :: Networking: IMAP, defect)

x86_64
Linux
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 29.0

People

(Reporter: chrisccoulson, Assigned: chrisccoulson)

Details

(Keywords: memory-leak, valgrind, Whiteboard: [external-api-bustage])

Attachments

(2 files, 2 obsolete files)

I see lots of these types of leaks with valgrind when running Thunderbird (this is just from running for a couple of minutes):

==14864== 917,504 bytes in 3 blocks are possibly lost in loss record 32,834 of 32,837
==14864==    at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14864==    by 0x9215F5C: ChangeTable(PLDHashTable*, int) (pldhash.cpp:527)
==14864==    by 0x9216438: PL_DHashTableOperate (pldhash.cpp:612)
==14864==    by 0x8F6DC04: nsImapFlagAndUidState::AddUidCustomFlagPair(unsigned int, char const*) (nsTHashtable.h:184)
==14864==    by 0x8F60078: nsImapServerResponseParser::flags() (nsImapServerResponseParser.cpp:1768)
==14864==    by 0x8F5EFEE: nsImapServerResponseParser::msg_fetch() (nsImapServerResponseParser.cpp:1060)
==14864==    by 0x8F5E6B3: nsImapServerResponseParser::response_data() (nsImapServerResponseParser.cpp:709)
==14864==    by 0x8F5D15A: nsImapServerResponseParser::ParseIMAPServerResponse(char const*, bool, char*) (nsImapServerResponseParser.cpp:210)
==14864==    by 0x8F4B757: nsImapProtocol::FetchMessage(nsCString const&, nsIMAPeFetchFields, char const*, unsigned int, unsigned int, char*) (nsImapProtocol.cpp:3575)
==14864==    by 0x8F5770C: nsImapProtocol::ProcessMailboxUpdate(bool) (nsImapProtocol.cpp:4085)
==14864==    by 0x8F592EB: nsImapProtocol::ProcessSelectedStateURL() (nsImapProtocol.cpp:2960)
==14864==    by 0x8F59924: nsImapProtocol::ProcessCurrentURL() (nsImapProtocol.cpp:1726)
==14864==    by 0x8F5608F: nsImapProtocol::ImapThreadMainLoop() (nsImapProtocol.cpp:1347)
==14864==    by 0x8F56260: nsImapProtocol::Run() (nsImapProtocol.cpp:1025)
==14864==    by 0x924042C: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:627)
==14864==    by 0x92156F0: NS_ProcessNextEvent_P(nsIThread*, bool) (nsThreadUtils.cpp:238)
==14864==    by 0x9240C7D: nsThread::ThreadFunc(void*) (nsThread.cpp:265)
==14864==    by 0x6264532: _pt_root (ptthread.c:156)
==14864==    by 0x5647F8D: start_thread (pthread_create.c:311)
==14864==    by 0x5956E1C: clone (clone.S:113)

==14864== 227,457 bytes in 28,547 blocks are possibly lost in loss record 32,824 of 32,837
==14864==    at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14864==    by 0x7AE0F3E: moz_xmalloc (mozalloc.cpp:54)
==14864==    by 0x92132D7: NS_strdup(char const*) (nsCRTGlue.cpp:117)
==14864==    by 0x8F6DBE5: nsImapFlagAndUidState::AddUidCustomFlagPair(unsigned int, char const*) (nsImapFlagAndUidState.cpp:277)
==14864==    by 0x8F60078: nsImapServerResponseParser::flags() (nsImapServerResponseParser.cpp:1768)
==14864==    by 0x8F5EFEE: nsImapServerResponseParser::msg_fetch() (nsImapServerResponseParser.cpp:1060)
==14864==    by 0x8F5E6B3: nsImapServerResponseParser::response_data() (nsImapServerResponseParser.cpp:709)
==14864==    by 0x8F5D15A: nsImapServerResponseParser::ParseIMAPServerResponse(char const*, bool, char*) (nsImapServerResponseParser.cpp:210)
==14864==    by 0x8F4B757: nsImapProtocol::FetchMessage(nsCString const&, nsIMAPeFetchFields, char const*, unsigned int, unsigned int, char*) (nsImapProtocol.cpp:3575)
==14864==    by 0x8F5770C: nsImapProtocol::ProcessMailboxUpdate(bool) (nsImapProtocol.cpp:4085)
==14864==    by 0x8F592EB: nsImapProtocol::ProcessSelectedStateURL() (nsImapProtocol.cpp:2960)
==14864==    by 0x8F59924: nsImapProtocol::ProcessCurrentURL() (nsImapProtocol.cpp:1726)
==14864==    by 0x8F5608F: nsImapProtocol::ImapThreadMainLoop() (nsImapProtocol.cpp:1347)
==14864==    by 0x8F56260: nsImapProtocol::Run() (nsImapProtocol.cpp:1025)
==14864==    by 0x924042C: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:627)
==14864==    by 0x92156F0: NS_ProcessNextEvent_P(nsIThread*, bool) (nsThreadUtils.cpp:238)
==14864==    by 0x9240C7D: nsThread::ThreadFunc(void*) (nsThread.cpp:265)
==14864==    by 0x6264532: _pt_root (ptthread.c:156)
==14864==    by 0x5647F8D: start_thread (pthread_create.c:311)
==14864==    by 0x5956E1C: clone (clone.S:113)

==14864== 95,658 bytes in 11,953 blocks are definitely lost in loss record 32,796 of 32,837
==14864==    at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14864==    by 0x7AE0F3E: moz_xmalloc (mozalloc.cpp:54)
==14864==    by 0x92132D7: NS_strdup(char const*) (nsCRTGlue.cpp:117)
==14864==    by 0x8F6DBE5: nsImapFlagAndUidState::AddUidCustomFlagPair(unsigned int, char const*) (nsImapFlagAndUidState.cpp:277)
==14864==    by 0x8F60078: nsImapServerResponseParser::flags() (nsImapServerResponseParser.cpp:1768)
==14864==    by 0x8F5EFEE: nsImapServerResponseParser::msg_fetch() (nsImapServerResponseParser.cpp:1060)
==14864==    by 0x8F5E6B3: nsImapServerResponseParser::response_data() (nsImapServerResponseParser.cpp:709)
==14864==    by 0x8F5D15A: nsImapServerResponseParser::ParseIMAPServerResponse(char const*, bool, char*) (nsImapServerResponseParser.cpp:210)
==14864==    by 0x8F4B757: nsImapProtocol::FetchMessage(nsCString const&, nsIMAPeFetchFields, char const*, unsigned int, unsigned int, char*) (nsImapProtocol.cpp:3575)
==14864==    by 0x8F57664: nsImapProtocol::ProcessMailboxUpdate(bool) (nsImapProtocol.cpp:4071)
==14864==    by 0x8F592EB: nsImapProtocol::ProcessSelectedStateURL() (nsImapProtocol.cpp:2960)
==14864==    by 0x8F59924: nsImapProtocol::ProcessCurrentURL() (nsImapProtocol.cpp:1726)
==14864==    by 0x8F5608F: nsImapProtocol::ImapThreadMainLoop() (nsImapProtocol.cpp:1347)
==14864==    by 0x8F56260: nsImapProtocol::Run() (nsImapProtocol.cpp:1025)
==14864==    by 0x924042C: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:627)
==14864==    by 0x92156F0: NS_ProcessNextEvent_P(nsIThread*, bool) (nsThreadUtils.cpp:238)
==14864==    by 0x9240C7D: nsThread::ThreadFunc(void*) (nsThread.cpp:265)
==14864==    by 0x6264532: _pt_root (ptthread.c:156)
==14864==    by 0x5647F8D: start_thread (pthread_create.c:311)
==14864==    by 0x5956E1C: clone (clone.S:113)

The attached patch seems to do the trick
Attachment #717162 - Flags: review?(irving)
Severity: normal → major
Keywords: mlk
Assignee: nobody → chrisccoulson
Status: NEW → ASSIGNED
Comment on attachment 717162 [details] [diff] [review]
Don't leak in nsImapFlagAndUidState::ClearCustomFlags

Review of attachment 717162 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for looking for leaks; we don't run valgrind on TB anywhere near often enough.

This change catches some of the ways we'll leak the char*, but I don't think it gets all of them. The underlying hash table implementation in pldhash.h takes a function pointer to reliably clear entries when they're destroyed; nsTHashTable<whatever> provides an implementation of that at http://dxr.mozilla.org/mozilla-central/xpcom/glue/nsTHashtable.h#l460 that calls the destructor on the entry type for the hash table.

My preferred solution would be to change the hash table to have nsCString elements at http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapFlagAndUidState.h#44 (like the definition of mCustomAttributesHash in the next line)

Failing that, another approach would be to figure out how to define a element type for that hash table that frees the char* in the clear-entry callback.

Also, as far as I can tell the first leak in your list:

==14864== 917,504 bytes in 3 blocks are possibly lost in loss record 32,834 of 32,837
==14864==    at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14864==    by 0x9215F5C: ChangeTable(PLDHashTable*, int) (pldhash.cpp:527)

is the main body of the hash table not being released, which means we're probably not destroying the entire m_CustomFlagsHash properly in some cases.
Attachment #717162 - Flags: review?(irving) → review-
Chris, what time span was required to realize the ~1MB memory loss?
Flags: needinfo?(chrisccoulson)
Whiteboard: [patchlove]
(In reply to Wayne Mery (:wsmwk) from comment #2)
> Chris, what time span was required to realize the ~1MB memory loss?

"(this is just from running for a couple of minutes)" from comment 0?

I could look at what irving is proposing but I can not test if the leak is gone. Has Chris abandoned the patch?
Hi,

Sorry, this got a bit sidelined with other changes at work recently. I do intend to look at it again, but I have to do this in my spare time now
Flags: needinfo?(chrisccoulson)
We should ignore "possibly lost" bugs in Gecko code, by using the "--show-possibly-lost=no" flag (in addition to "--smc-check=all-non-file" when running in Valgrind. Thus, the only interesting leak is the one that is "definitely lost" in comment 0.
Keywords: valgrind
I run thunderbird under valgrind with "--smc-check=all-non-file" occasionally.
I changed my development environment from 32 bits linux to 64 bits linux.
As a result, the suppression pattern we use to ignore valgrind/memcheck warnings
that are uninteresting (at the moment) need to be changed and new suprresions
need to be added to match the new toolchain and libraries.

I ran |make mozmill| test running thunderbird under valgrind and got a whopping 
~50MB session log under the new 64 bits environment. 
The leaks printed at the end of major test occupy that many bytes.

After struggling to create a list of suppression, eventually creating 
an awk script to collect and generate a list of suppression automatically,
I found about 500-600 suppression patterns (definitely lost kind
need to be created and used for valgrind to suppress leak information completely.
(I am interested in uninitialized memory use and invalid address accesses for the moment.
Thus, I want to shut up leak report for now. Also, *IF NEW DEFINITE LEAK APPEARS*, then
I can spot it during cursory browsing and can report it if I notice it.
A new code change that introduces new leak is likely to be fixed very quickly.)

If someone is interested, I can upload the ~50MB session log and the list
of  suppressions somewhere. The log compresses very well and it takes only about
1.6MiB to store it (something we can send by e-mails these days).

TIA

(Maybe I should file a bug so that running TB under valgrind for |make mozmill|
becomes easy and many can try it. 
Firefox's test suite seems to have a switch that allows invocation of valgrind
during test. Thunderbird's scripts do not have such switches (especially mozmill) and
if it does, it is hidden very well.
> If someone is interested, I can upload the ~50MB session log and the list
> of  suppressions somewhere. The log compresses very well and it takes only
> about
> 1.6MiB to store it (something we can send by e-mails these days).

Please file a new bug for this.

> 
> (Maybe I should file a bug so that running TB under valgrind for |make
> mozmill|
> becomes easy and many can try it. 

Yes, this should be very useful too.
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #7)
> > (Maybe I should file a bug so that running TB under valgrind for |make
> > mozmill|
> > becomes easy and many can try it. 
> 
> Yes, this should be very useful too.

Is this Bug 916505?
Or do we still need a new bug?


(In reply to :aceman from comment #3)
> (In reply to Wayne Mery (:wsmwk) from comment #2)
> > Chris, what time span was required to realize the ~1MB memory loss?
> 
> "(this is just from running for a couple of minutes)" from comment 0?
> 
> I could look at what irving is proposing but I can not test if the leak is
> gone. Has Chris abandoned the patch?

~1MB in a few minutes is quite a lot.

Is this likely to continue to increase for the average user over more time?
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk) from comment #8)
> (In reply to Gary Kwong [:gkw] [:nth10sd] from comment #7)
> > > (Maybe I should file a bug so that running TB under valgrind for |make
> > > mozmill|
> > > becomes easy and many can try it. 
> > 
> > Yes, this should be very useful too.
> 
> Is this Bug 916505?
> Or do we still need a new bug?

My answer to the first half of the question (the second half is for Chris, aceman)


> Is this Bug 916505?

I am afraid the number is incorrect here. Did you mean a different bugzilla
index.

TIA
Flags: needinfo?(ishikawa)
Attached patch patch v2 (obsolete) — Splinter Review
Is this something you had in mind?
This does compile (and seems to pass xpcshell tests) but I have not a proper setup to test the cases as reported. Can anybody try the patch? Warning: dataloss quite possible.
Attachment #8355651 - Flags: feedback?(irving)
Comment on attachment 8355651 [details] [diff] [review]
patch v2

Review of attachment 8355651 [details] [diff] [review]:
-----------------------------------------------------------------

This looks OK for the second and third leaks in the report. It's still worth a careful code review to make sure we always correctly destroy m_flagState for the first (potential) leak.

While I was reviewing this, I formed a suspicion that this code doesn't correctly handle custom flags that overlap on the *tail* of the first flag - that is, in the code starting around https://hg.mozilla.org/comm-central/annotate/81acbc96e463/mailnews/imap/src/nsImapFlagAndUidState.cpp#l254, if we have a custom flag 'BOGUS' and we add a new flag 'BOG', we'll notice that the string match isn't at the *end* of the existing word, so we'll add the new flag to the list. However, if we have 'BOGUS' and we add 'US', the PL_strstr() will find US, we'll detect that it's at the end of the existing string, and we won't add the new flag.

I'm not sure this is an issue in the real world; this would only happen if an IMAP server actually returns two different message flags where one matches the tail of another. We might get away with this because most IMAP flags (as returned by the server) appear to start with '\'.

I can't get a trunk TB build to run reliably, so I haven't been able to test this.
Attachment #8355651 - Flags: feedback?(irving) → feedback+
Attached patch patch v3Splinter Review
OK, I tried to fix the substring flag problem. For the leak of m_flagState I would only suggest to add 'delete m_flagState' into nsImapProtocol::~nsImapProtocol() however I don't know if that plays well with the NS_ADDREF and similar refcounting done on m_flagState. Somebody should do a followup patch to rework that variable.
Attachment #8355651 - Attachment is obsolete: true
Attachment #8357364 - Flags: feedback?(irving)
Comment on attachment 8357364 [details] [diff] [review]
patch v3

Review of attachment 8357364 [details] [diff] [review]:
-----------------------------------------------------------------

It looks like we're doing the manual reference counting for nsImapProtocol::m_flagState correctly at https://hg.mozilla.org/comm-central/diff/4ccd97dc684c/mailnews/imap/src/nsImapProtocol.cpp and https://hg.mozilla.org/comm-central/diff/e4f4569d451a/mailnews/imap/src/nsImapProtocol.cpp; all the other places that store references use nsCOMPtr<>, so while I'd be happy to see that field switched to nsCOMPtr I don't think it's leaking, so I don't think we need to file a new bug on it.

The rest of the patch looks landable to me, so I'm giving an r+ instead of f+.
Attachment #8357364 - Flags: feedback?(irving) → review+
Comment on attachment 717162 [details] [diff] [review]
Don't leak in nsImapFlagAndUidState::ClearCustomFlags

Obsoleted by another approach in the new patches.
Attachment #717162 - Attachment is obsolete: true
How do we decide if there is anything else to do here?
Chris, are you able to run TB with valgrind to verify the patch solves some leaks?
Flags: needinfo?(chrisccoulson)
Keywords: checkin-needed
Whiteboard: [patchlove]
https://hg.mozilla.org/comm-central/rev/b7cb6717287c
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 29.0
Comment on attachment 8357364 [details] [diff] [review]
patch v3

>+    *customFlags = NS_strdup(value.get());
We also have ToNewCString, but these methods should probably be changed to use string parameter types.
* Include nsMsgUtils.h for kNotFound
* Use .Find in the easy case
* Use MsgFind in the hard case

The nsDependentCString change isn't really necessary but at one point I thought I could get rid of the nsCString and when I changed it back I automatically used the most efficient string type for the use case.
Attachment #8358711 - Flags: review?(irving)
Whiteboard: [external-api-bustage]
Comment on attachment 8358711 [details] [diff] [review]
External API fixes

Review of attachment 8358711 [details] [diff] [review]:
-----------------------------------------------------------------

I was tempted to go farther and ask that the external APIs of all these methods be changed to String classes, but that would lead to IDL changes and probably be more pain than it's worth.

Thanks for pointing out nsDependentCString, I'll try to remember that in future.

::: mailnews/imap/src/nsImapFlagAndUidState.cpp
@@ +258,5 @@
>             ((existingCustomFlagPos == 0) ||
>              (oldValue.CharAt(existingCustomFlagPos - 1) == ' ')))
>          return NS_OK;
>        // else, advance to next flag
> +      existingCustomFlagPos = MsgFind(oldValue, customFlagString, false, existingCustomFlagPos + customFlagLen);

Why not leave this as oldValue.Find() as in line 248?  That way you don't need to include nsMsgUtils.h.
Attachment #8358711 - Flags: review?(irving) → review+
(In reply to Irving Reid from comment #19)
> Why not leave this as oldValue.Find() as in line 248?  That way you don't
> need to include nsMsgUtils.h.

The arguments are different in the external API. If you're not using them that's OK, because the defaults work out the same either way, just that if you are using them, then you need to use MsgFind to support the external API (it's just a define for Find under the internal API).
Flags: needinfo?(chrisccoulson)
Summary: Leak in nsImapFlagAndUidState::ClearCustomFlags → Leaks MBs in a few minutes in nsImapFlagAndUidState::ClearCustomFlags
You need to log in before you can comment on or make changes to this bug.