Closed
Bug 1182515
Opened 9 years ago
Closed 9 years ago
Intermittent leakcheck | default process: 11322 bytes leaked (CondVar, Dashboard, DataStorage, EventTokenBucket, HttpBaseChannel, ...)(BackstagePass, CondVar, Dashboard, DataStorage, EventTokenBucket, ...)
Categories
(Core :: Networking: WebSockets, defect)
Tracking
()
RESOLVED
FIXED
mozilla43
Tracking | Status | |
---|---|---|
firefox41 | --- | fixed |
firefox42 | --- | fixed |
firefox43 | --- | fixed |
firefox-esr38 | --- | unaffected |
People
(Reporter: RyanVM, Assigned: michal)
References
Details
(Keywords: intermittent-failure, memory-leak, Whiteboard: [Analysis in comment 60])
Attachments
(2 files, 1 obsolete file)
1.15 KB,
patch
|
Details | Diff | Splinter Review | |
1.48 KB,
patch
|
mcmanus
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
05:20:06 INFO - Leaked URLs: 05:20:06 INFO - ws://localhost/ 05:20:06 INFO - https://localhost/ 05:20:06 INFO - nsStringStats 05:20:06 INFO - => mAllocCount: 559074 05:20:06 INFO - => mReallocCount: 36939 05:20:06 INFO - => mFreeCount: 559053 -- LEAKED 21 !!! 05:20:06 INFO - => mShareCount: 1225579 05:20:06 INFO - => mAdoptCount: 26685 05:20:06 INFO - => mAdoptFreeCount: 26683 -- LEAKED 2 !!! 05:20:06 INFO - => Process ID: 1307, Thread ID: 140735090425024 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 CaptivePortalService (120 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 CondVar (40 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 Dashboard (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 DataStorage (328 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 2 EventTokenBucket (352 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 HttpBaseChannel (1056 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 LoadInfo (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 10 Mutex (320 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 4 ReentrantMonitor (160 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 Service (144 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 2 StringAdopt (2 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 ThirdPartyUtil (32 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 WebSocketChannel (1992 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsAuthURLParser (24 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsCookiePermission (64 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsCookieService (152 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsDNSService (168 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 4 nsDeque (384 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsEffectiveTLDService (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 2 nsHttpAuthCache::AppDataClearObserver (64 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsHttpChannel (1416 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsHttpConnectionMgr (312 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsHttpHandler (704 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsHttpRequestHead (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsIDNService (120 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsIOService (288 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsJSPrincipals (16 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsLocalFile (192 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 4 nsMainThreadPtrHolder<T> (96 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsNetworkLinkService (72 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsObserverService (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsPermissionManager (184 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsPrefBranch (128 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsRandomGenerator (40 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsSiteSecurityService (64 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsSocketTransportService (272 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 2 nsStandardURL (496 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 21 nsStringBuffer (168 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 11 nsTArray_base (88 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 2 nsThread (480 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsTimerImpl (96 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 1 nsUnicodeNormalizer (24 bytes) 05:20:06 INFO - TEST-INFO | leakcheck | default process: leaked 7 nsWeakReference (224 bytes) 05:20:06 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: 11322 bytes leaked (CaptivePortalService, CondVar, Dashboard, DataStorage, EventTokenBucket, ...)
Flags: needinfo?(valentin.gosu)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 3•9 years ago
|
||
I don't think this is a new leak, just that the captive portal happens to be first in the list. Clearing the ref when observing the shutdown notification should remove the captive portal service from the list.
Attachment #8632780 -
Flags: review?(mcmanus)
Updated•9 years ago
|
Assignee: nobody → valentin.gosu
Status: NEW → ASSIGNED
Updated•9 years ago
|
Flags: needinfo?(valentin.gosu)
Comment 4•9 years ago
|
||
Comment on attachment 8632780 [details] [diff] [review] clear reference to captive portal service when observing xpcom-shutdown Review of attachment 8632780 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/base/nsIOService.cpp @@ +1403,5 @@ > > if (mCaptivePortalService) { > static_cast<CaptivePortalService*>(mCaptivePortalService.get())->Stop(); > } > + mCaptivePortalService = nullptr; might as well move the null assignment into the if (!null) block as long as we have already done the check.
Attachment #8632780 -
Flags: review?(mcmanus) → review+
Comment 5•9 years ago
|
||
Updated•9 years ago
|
Attachment #8632780 -
Attachment is obsolete: true
Updated•9 years ago
|
Keywords: checkin-needed
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 9•9 years ago
|
||
Looks like the patch just altered the leak signature. Also, bug 1128662 looks possibly-related (similar signature and platform). Though this is obviously way more frequent. I'm also trying to narrow this down on inbound to get a better regression range.
See Also: → 1128662
Summary: Intermittent leakcheck | default process: 11322 bytes leaked (CaptivePortalService, CondVar, Dashboard, DataStorage, EventTokenBucket, ...) → Intermittent leakcheck | default process: 11322 bytes leaked (CondVar, Dashboard, DataStorage, EventTokenBucket, HttpBaseChannel, ...)(BackstagePass, CondVar, Dashboard, DataStorage, EventTokenBucket, ...)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 32•9 years ago
|
||
Ehsan, the pile o' retriggers below is pointing at bug 1180448 as the regressor here. Can you please take a look? https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=os%20x%2010.6%20debug%20mochitest-browser-chrome&tochange=272a2cde6bd1&fromchange=fa77180112f7
Flags: needinfo?(ehsan)
Keywords: leave-open
Comment 34•9 years ago
|
||
Yes, will do!
Comment 35•9 years ago
|
||
The most interesting observation here is that the thing that is leaking is the observer service. All of the other things are either nsIObservers, or are held alive by another one of these objects that is an nsIObserver. This kind of suggests that my uninit() method may not be running, for some reason. I'm testing some hypotheses on the try server, but in the mean time, Mike, have you ever seen something like this?
Flags: needinfo?(mconley)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 41•9 years ago
|
||
Progress report: I have three patches on the try server trying various different things: https://treeherder.mozilla.org/#/jobs?repo=try&revision=307fd412343e https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d63a733f0f2 https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a9aed771458 So far, out of 500+ runs of mochitest-bc1 on OSX 10.6 (that number is _not_ an exaggeration!!) I have managed to reproduce this exactly _zero_ times. That is not encouraging. But furthermore, it seems like this happens in other mochitest-bc chunks too, and on other platforms too. I have re-triggered bc2/3 runs on all of those try pushes but I am losing faith in ever being able to reproduce this on the try server. Looking at the objects that have been leaking more closely, nsCookieService and nsPermissionManager also hold the observer service alive, which makes it difficult to guess which object is the root of the leaked subgraph. I have posted fixes for those in the dependencies of this bug, and once they land, we should see how the signature of these leaks change. If one of these objects is the root of the leaked graph, the observer service and many of the existing leaked observers should disappear from the list once those patches land!
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 44•9 years ago
|
||
Note to self: there is an interesting leaked URL: "ws://localhost/" in <https://treeherder.mozilla.org/logviewer.html#?job_id=11776247&repo=mozilla-inbound> which seems to be used by only one test in the tree: http://mxr.mozilla.org/mozilla-central/source/browser/components/loop/test/mochitest/browser_loop_fxa_server.js Not sure how relevant this is yet...
Comment 45•9 years ago
|
||
This run <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-9d63a733f0f2/try-macosx64-debug/try_snowleopard-debug_test-mochitest-browser-chrome-2-bm108-tests1-macosx-build2197.txt.gz> from <https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d63a733f0f2> shows that we call my init() and uninit() function an equal number of times successfully when the leak happens, which almost definitely clears the name of bug 1180448 here, but this bug is now bothering me, so I would like to dig in some more. Joel, you have some magical tool that can retrigger test runs on inbound/etc in order to find what introduced an orange. Can you please take that tool and run it against this bug? Note that you probably want to trigger all mochitest-bc runs, not just bc1, as I was mistakenly doing before... Thanks!
Flags: needinfo?(jmaher)
Comment 46•9 years ago
|
||
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #45) > This run > <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla. > com-9d63a733f0f2/try-macosx64-debug/try_snowleopard-debug_test-mochitest- > browser-chrome-2-bm108-tests1-macosx-build2197.txt.gz> from > <https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d63a733f0f2> shows > that we call my init() and uninit() function an equal number of times > successfully when the leak happens, which almost definitely clears the name > of bug 1180448 here, but this bug is now bothering me, so I would like to > dig in some more. > Can confirm.
Flags: needinfo?(mconley)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 48•9 years ago
|
||
enjoy the retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=fb0c6275d5c2&tochange=5ff1786019f0&filter-searchStr=rev4%20macosx%20snow%20leopard%2010.6%20mozilla-inbound%20debug%20test%20mochitest-browser-chrome might take a few more hours before these get out of the pending state, but we will get there.
Flags: needinfo?(jmaher)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 60•9 years ago
|
||
OK, I finally figured out what is happening here. I noticed that the test log includes only two leaked URLs, one of which is ws://localhost. Some quick investigation showed that the one of the few browser-chrome tests in the tree that is using this URL is browser/components/uitour/test/browser_UITour_loop.js. Looking at the log of one of these intermittent failures, you may notice the following: 16:33:38 INFO - console.info: Loop: PushHandler: attempt to open websocket to PushServer: ws://localhost/ 16:33:38 INFO - [1172] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/slave/try-m64-d-00000000000000000000/build/src/netwerk/protocol/websocket/WebSocketChannel.cpp, line 2545 16:33:38 INFO - JavaScript error: resource:///modules/loop/MozLoopPushHandler.jsm, line 73: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIWebSocketChannel.asyncOpen] And, of the leaked objects is WebSocketChannel. The code in question here is <https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#2542>. There, the call to GenerateRandomBytes is failing which comes from here <https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/nsRandomGenerator.cpp#25> indicating that the object has been shut down. To test my theory, I pushed a patch to the try server to make this intermittent failure of GenerateRandomBytes a persistent one, and as you can see here <https://treeherder.mozilla.org/#/jobs?repo=try&revision=a88ce3d5d8f3>, that turns this intermittent failure into a nice 100% reproducible one on OSX bc1. I would expect further debugging as to why we end up leaking the WebSocket channel should be straightforward for someone who knows that code. As you can notice in the try link above, bc2 and bc3 test failures did not reproduce with this injected fault. I expect that we end up leaking the WebSocket channel object in a variety of runtime error conditions, and my guess is that we're hitting other instances of runtime failures on bc2 and bc3, but I have not investigated further. Patrick, do you mind forwarding this to the right people in the Necko team, please? Thanks!
Assignee: valentin.gosu → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(ehsan) → needinfo?(mcmanus)
Updated•9 years ago
|
Component: Networking → Networking: WebSockets
Updated•9 years ago
|
Whiteboard: [Analysis in comment 60]
Comment 61•9 years ago
|
||
michal can you work this out? Ehsan has put a lot of work into chasing this leak...
Flags: needinfo?(mcmanus) → needinfo?(michal.novotny)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 74•9 years ago
|
||
There is a reference cycle between WebSocketChannel and nsHttpChannel. It is broken in WebSocketChannel::StopSession or WebSocketChannel::OnStopRequest, but none of them is called when WebSocketChannel::AsyncOpen fails. https://treeherder.mozilla.org/#/jobs?repo=try&revision=091a6e8c7cb6
Attachment #8649460 -
Flags: review?(mcmanus)
Comment 75•9 years ago
|
||
Comment on attachment 8649460 [details] [diff] [review] fix Review of attachment 8649460 [details] [diff] [review]: ----------------------------------------------------------------- thanks! ::: netwerk/protocol/websocket/WebSocketChannel.cpp @@ +3279,5 @@ > + mWebSocketChannel->mHttpChannel = nullptr; > + } > + } > + > + WebSocketChannel* mWebSocketChannel; still using type *var around here..
Attachment #8649460 -
Flags: review?(mcmanus) → review+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 77•9 years ago
|
||
Can we please get this landed?
status-firefox41:
--- → affected
status-firefox42:
--- → affected
status-firefox43:
--- → affected
status-firefox-esr38:
--- → unaffected
Flags: needinfo?(michal.novotny)
Keywords: leave-open
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(michal.novotny)
Comment 79•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8f626f6be974
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Reporter | ||
Comment 80•9 years ago
|
||
Thanks for getting to the bottom of this! Can you please nominate this patch for Aurora/Beta approval too? :)
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 81•9 years ago
|
||
Comment on attachment 8649460 [details] [diff] [review] fix Approval Request Comment [Feature/regressing bug #]: unknown [User impact if declined]: memory leak [Describe test coverage new/current, TreeHerder]: should be covered by existing mochitests [Risks and why]: low, this is a small change that just cleans up member variables on failure [String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8649460 -
Flags: approval-mozilla-beta?
Attachment #8649460 -
Flags: approval-mozilla-aurora?
Comment on attachment 8649460 [details] [diff] [review] fix This is an intermittent failure and a mem leak fix. The patch seems safe enough to uplift to Aurora42 and Beta41.
Attachment #8649460 -
Flags: approval-mozilla-beta?
Attachment #8649460 -
Flags: approval-mozilla-beta+
Attachment #8649460 -
Flags: approval-mozilla-aurora?
Attachment #8649460 -
Flags: approval-mozilla-aurora+
Reporter | ||
Comment 83•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/cff5853462a0
Reporter | ||
Comment 84•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/cc5472880f4f
You need to log in
before you can comment on or make changes to this bug.
Description
•