Closed Bug 706517 Opened 13 years ago Closed 13 years ago

Intermittent test leak of about 1-3KB (1 CondVar, 2 Mutex, some nsDNSAsyncRequest, 1 nsHTMLDNSPrefetch::nsListener, 1 nsHostRecord, ...)

Categories

(Core :: Networking, defect)

11 Branch
defect
Not set
minor

Tracking

()

RESOLVED DUPLICATE of bug 463724

People

(Reporter: mbrubeck, Assigned: sworkman)

References

Details

(Keywords: intermittent-failure, memory-leak)

https://tbpl.mozilla.org/php/getParsedLog.php?id=7644429&tree=Mozilla-Inbound
Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitests-1/5 on 2011-11-29 15:50:45 PST for push 39346d506e54

TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3240 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of CondVar with size 32 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of Mutex with size 24 bytes each (48 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 29 instances of nsDNSAsyncRequest with size 88 bytes each (2552 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsHTMLDNSPrefetch::nsListener with size 24 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsHostRecord with size 128 bytes
TEST-INFO | automationutils.processLeakLog() | leaked 1 instance of nsHostResolver with size 232 bytes
TEST-INFO | automationutils.processLeakLog() | leaked 27 instances of nsStringBuffer with size 8 bytes each (216 bytes total)
TEST-INFO | automationutils.processLeakLog() | leaked 1 instance of nsTArray_base with size 8 bytes

and

https://tbpl.mozilla.org/php/getParsedLog.php?id=7644414&tree=Mozilla-Inbound
Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitests-5/5 on 2011-11-29 15:59:43 PST for push 4b085d906272

TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 760 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of CondVar with size 32 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of Mutex with size 24 bytes each (48 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3 instances of nsDNSAsyncRequest with size 88 bytes each (264 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsHTMLDNSPrefetch::nsListener with size 24 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsHostRecord with size 128 bytes
TEST-INFO | automationutils.processLeakLog() | leaked 1 instance of nsHostResolver with size 232 bytes
TEST-INFO | automationutils.processLeakLog() | leaked 3 instances of nsStringBuffer with size 8 bytes each (24 bytes total)
TEST-INFO | automationutils.processLeakLog() | leaked 1 instance of nsTArray_base with size 8 bytes
Moving this to necko as it seems the most likely culprit.  

We're detecting this leak on buildfarms--not sure how best to repro.

Heard on IRC:

mbrubeck: I think those leaks are your bug 706517, and I think both they and it are "we leak when dns in the buildfarm is kinda busted" since in the midst of them was a talos graphserver name resolution failure

<catlee-away> philor: why does dns in buildfarm cause leaks?

<philor> catlee-away: because we don't know how to give up? I'm no necko hacker, just a phenomenologist

<catlee-away> that smells of tests trying to access something remote

<catlee-away> which we all know is BAD BAD BAD

<philor> I don't doubt for a second that we have them again, but they could also be prefetching dns without accessing anything, I don't think we ban dns

<jduell> philor catlee-away: do we have a necko bug open for the dns leak?

<philor> jduell: you can have bug 706517 if you want it, I'm pretty sure nobody else will
Component: General → Networking
QA Contact: general → networking
Those two being:

https://tbpl.mozilla.org/php/getParsedLog.php?id=7895893&tree=Firefox
Rev4 MacOSX Snow Leopard 10.6 mozilla-central debug test reftest on 2011-12-12 17:43:50 PST for push 3f0c8604e2c1

https://tbpl.mozilla.org/php/getParsedLog.php?id=7895970&tree=Firefox
Rev3 Fedora 12 mozilla-central debug test mochitests-3/5 on 2011-12-12 17:54:31 PST for push 351fcbc12030

so not specific to an OS, or a test or two, or even a broad family of tests, with that reftest stuck in there.
OS: Mac OS X → All
Hardware: x86_64 → All
Summary: Intermittent OS X64 mochitest leak of about 1-3KB (1 CondVar, 2 Mutex, some nsDNSAsyncRequest, 1 nsHTMLDNSPrefetch::nsListener, 1 nsHostRecord, ...) → Intermittent test leak of about 1-3KB (1 CondVar, 2 Mutex, some nsDNSAsyncRequest, 1 nsHTMLDNSPrefetch::nsListener, 1 nsHostRecord, ...)
This pair are from the midst of the downtime which is supposed to be fixing the buildfarm's dns troubles, so they could wind up being your last two.

https://tbpl.mozilla.org/php/getParsedLog.php?id=7921363&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=7921369&tree=Mozilla-Inbound
Still during the downtime, snagged these off try because I like how they both were in mochitest-a11y, yet another suite heard from.

https://tbpl.mozilla.org/php/getParsedLog.php?id=7921510&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=7921405&tree=Try
Did some analysis and discussed with mcmanus offline.  I agree with his first thoughts on this (bug 707930 comment 2), based on looking at the code and verifying using XPCOM_MEM_LEAK_LOG.

Looks like the following is happening, from DNS requests being made to shutdown happening:

1. A thread is created using nsHostResolver::ThreadFunc.
-- It is given a strong ref to nsHostResolver.
-- It does a lookup, resulting in an nsHostRecord being dequeued from High, Med or Low queue.

2. nsHostResolver::Shutdown is called.
-- All nsHostRecords on the pending queues have OnLookupComplete called on them, which ends with them and their callbacks (nsDNSAsyncRequest or nsDNSSyncRequest objects) being released.
-- The hash table is cleared.  This should only be the nsHostDBEnt objects which have a simple ptr to the nsHostRecord.

3. The thread is killed (this is my assumption)
-- Since it has an nsHostRecord that was dequeued but not released, the ptr's refcount is > 0.  Same with all the nsDNSAsync | SyncRequests attached to the record.
-- Since is also has a strong ref to nsHostResolver, it's refcount is also > 0.


I have also checked this with mem leak testing, and a page with a 99999 links on it, resulting in the max number of link prefetches (512).

1. Load up said page.
2. Cut network access once page is loaded and prefetching has started.
3. Wait for 10 mins or so to allow all prefetching failures to timeout.
4. Check mem leak logs.

I didn't see any mem leaks for these classes using this method.  So, it looks like the leak is not progressive when access to the DNS server is cut.  Instead, as Patrick said, it's a timing issue at shutdown.

Since it's low priority and doesn't seem to be progressive, I'm going to change importance to minor.
Assignee: nobody → sworkman
Severity: normal → minor
Just discovered bug 463724, which has a patch to avoid the leaks, except for an intentional mutex leak.  Duping this one to that since it's the same root issue.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.