Closed Bug 1074416 Opened 5 years ago Closed 4 years ago
Nav History Result does a QI during cycle collection traversal, causing an ns XPCWrapped JS to leak
Every run of Mochitest bc1 and bc2 has a few small indirect leaks: Indirect leak of 720 byte(s) in 6 object(s) allocated from: #0 0x7f15e9a134f1 in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74 #1 0x7f15e23cbcbd in moz_xmalloc memory/mozalloc/mozalloc.cpp:52 #2 0x7f15d0a108cd in operator new obj-firefox/js/xpconnect/src/../../../dist/include/mozilla/mozalloc.h:208 #3 0x7f15d0a108cd in nsXPCWrappedJS::GetNewOrUsed(JS::Handle<JSObject*>, nsID const&, nsXPCWrappedJS**) js/xpconnect/src/XPCWrappedJS.cpp:373 #4 0x7f15d099643a in XPCConvert::JSObject2NativeInterface(void**, JS::Handle<JSObject*>, nsID const*, nsISupports*, tag_nsresult*) js/xpconnect/src/XPCConvert.cpp:939 #5 0x7f15d099417b in XPCConvert::JSData2Native(void*, JS::Handle<JS::Value>, nsXPTType const&, nsID const*, tag_nsresult*) js/xpconnect/src/XPCConvert.cpp:713 Indirect leak of 144 byte(s) in 6 object(s) allocated from: #0 0x7f15e9a134f1 in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74 #1 0x7f15e23cbcbd in moz_xmalloc memory/mozalloc/mozalloc.cpp:52 #2 0x7f15cf99deea in operator new obj-firefox/xpcom/reflect/xptcall/../../../dist/include/mozilla/mozalloc.h:208 #3 0x7f15cf99deea in NS_GetXPTCallStub xpcom/reflect/xptcall/xptcall.cpp:64 #4 0x7f15d0a11db4 in GetClass obj-firefox/js/xpconnect/src/../../../dist/include/nsXPTCUtils.h:28 #5 0x7f15d0a11db4 in nsXPCWrappedJS::nsXPCWrappedJS(JSContext*, JSObject*, nsXPCWrappedJSClass*, nsXPCWrappedJS*, tag_nsresult*) js/xpconnect/src/XPCWrappedJS.cpp:391 #6 0x7f15d0a10913 in nsXPCWrappedJS::GetNewOrUsed(JS::Handle<JSObject*>, nsID const&, nsXPCWrappedJS**) js/xpconnect/src/XPCWrappedJS.cpp:373 I have no idea what is holding this alive. It would be nice to fix so we could report all indirect leaks.
It looks like we're leaking nsXPTCStubBase and nsXPCWrappedJS. From local investigation, they are pointing to each other, but nothing else refers to them. I have no idea what the matter is. I guess I can try to narrow down what the tests are that trigger this. It could be an instance of bug 947049. One submystery is why the nsXPCWrappedJS is not being reported as a leak with the normal leak system. nsXPTCStubBase just needs MOZ_CTOR/DTOR added to it.
I can reproduce this leak in a non-ASan, debug build with DMD, so it isn't some opt issue.
One test this seems to be leaking in is browser/components/places/tests/browser/browser_475045.js
Oops, there's actually a number more tests in that chunk.
If I add MOZ_COUNT_CTOR/DTOR to nsXPCWrappedJS, the regular leak logging stuff shows that we leak one. The problem is what are we doing wrong with refcounting that we don't notice the leak. I tried to investigate with refcount logging, but that says that 6 separate objects are leaking. Why does that not agree with anything?
One way to reproduce the leak is to go to browser/components/places/tests/browser/browser.ini and comment out every test except browser_drag_bookmarks_on_toolbar.js (some other tests also leak a WFS), then run ./mach mochitest browser/components/places/tests/browser/ This leaks a single nsXPCWrappedJS, which you can see if you do the MOZ_COUNT_CTOR/DTOR logging. nsXPTCStubBase contains a pointer to the WJS, but that is weak. The refcount of the leaking WJS goes to zero, but we don't free it for some reason. I have no idea how that could happen. We don't seem to addref or release the WJS after its refcount drops to zero, based on some assertions I added. It doesn't look like any XPCWJS are getting booted out by the SnowWhiteKiller either. There's a bunch of weird stuff involving delegated refcounts and weak references going on here, so I wouldn't be too surprised if there's a bug somewhere but I haven't been able to figure that out. And I have no idea how any of that could somehow cause us to not delete the object while leaving the refcount at 0.
I think this is an actual leak, albeit a small uncommon one.
Component: XPConnect → Places
Product: Core → Toolkit
Summary: Fix or suppress the persistent leaks under nsXPCWrappedJS::GetNewOrUsed() → nsNavHistoryResult does a QI during cycle collection traversal, causing an nsXPCWrappedJS to leak
I finally figured out what is happening here. The Traversal for nsNavHistoryResult iterates over a nsMaybeWeakPtrArray and reports the elements to the cycle collector. However, as part of doing that, it does a QI: #01: nsXPTCStubBase::QueryInterface(nsID const&, void**) (xptcall.cpp:21, in XUL) #02: nsQueryInterfaceWithError::operator()(nsID const&, void**) const (nsCOMPtr.cpp:40, in XUL) #03: nsMaybeWeakPtr<nsINavHistoryResultObserver>::GetValue() const (nsCOMPtr.h:1058, in XUL) #04: nsNavHistoryResult::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) (AlreadyAddRefed.h:116, in XUL) #05: CCGraphBuilder::BuildGraph(js::SliceBudget&) (nsCycleCollector.cpp:2239, in XUL) This does a suspect, but because we're in the middle of graph traversal, it doesn't add it to the purple buffer. Somehow this must be the last reference to the cycle collector, so we never end up actually deleting it and it leaks. What seems to fix this is to make a new method that does a QI to the special nsCycleCollectionISupports interface that does no AddRef or Release. We should also see if we can assert if somebody does a suspect during CC, or maybe even crash, as this is going to be a source of difficult-to-fix bugs.
Doing an AddRef or Release of a cycle collected object during CC traversal can cause leaks. traverseResultObservers does this in two places. First, it accidentally copies the array that is passed in. Second, the GetValue() method that is being implicitly called in the line with |aObservers.ElementAt(i);| does a QI. This patch fixes the former problem by passing a reference. It fixes the latter problem by returning the raw underlying pointer held by the nsMaybeWeakPtrArray, which is what the CC wants anyways, because it does not care about weak referents. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=992761a5cdc3
Attachment #8640881 - Flags: review?(bugs)
Comment on attachment 8640881 [details] [diff] [review] Don't AddRef or Release nav history result observers during CC traversal. What a horrible old code.
Attachment #8640881 - Flags: review?(bugs) → review+
Comment on attachment 8640881 [details] [diff] [review] Don't AddRef or Release nav history result observers during CC traversal. Approval Request Comment [Feature/regressing bug #]: very old [User impact if declined]: possible crashes, also minor leaks [Describe test coverage new/current, TreeHerder]: this code is tested on TreeHerder [Risks and why]: low, this patch removes some weird old code [String/UUID change made/needed]: none
Attachment #8640881 - Flags: approval-mozilla-aurora?
Comment on attachment 8640881 [details] [diff] [review] Don't AddRef or Release nav history result observers during CC traversal. Approved. Has been in m-c for a few days, seems safe to uplift to m-a.
Attachment #8640881 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Andrew, is there a way to get this fix tested and verified? Thanks!
(In reply to Ritu Kothari (:ritu) from comment #16) > Andrew, is there a way to get this fix tested and verified? Thanks! The easiest thing would just be to land bug 1189122 on Aurora, too. Then existing tests will fail unless this works. I'll mark that for approval.
You need to log in before you can comment on or make changes to this bug.