Closed Bug 1074416 Opened 5 years ago Closed 4 years ago

nsNavHistoryResult does a QI during cycle collection traversal, causing an nsXPCWrappedJS to leak

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox41 --- fixed
firefox42 --- fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink])

Attachments

(1 file)

Every run of Mochitest bc1 and bc2 has a few small indirect leaks:

Indirect leak of 720 byte(s) in 6 object(s) allocated from:
    #0 0x7f15e9a134f1 in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74
    #1 0x7f15e23cbcbd in moz_xmalloc memory/mozalloc/mozalloc.cpp:52
    #2 0x7f15d0a108cd in operator new obj-firefox/js/xpconnect/src/../../../dist/include/mozilla/mozalloc.h:208
    #3 0x7f15d0a108cd in nsXPCWrappedJS::GetNewOrUsed(JS::Handle<JSObject*>, nsID const&, nsXPCWrappedJS**) js/xpconnect/src/XPCWrappedJS.cpp:373
    #4 0x7f15d099643a in XPCConvert::JSObject2NativeInterface(void**, JS::Handle<JSObject*>, nsID const*, nsISupports*, tag_nsresult*) js/xpconnect/src/XPCConvert.cpp:939
    #5 0x7f15d099417b in XPCConvert::JSData2Native(void*, JS::Handle<JS::Value>, nsXPTType const&, nsID const*, tag_nsresult*) js/xpconnect/src/XPCConvert.cpp:713

Indirect leak of 144 byte(s) in 6 object(s) allocated from:
    #0 0x7f15e9a134f1 in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74
    #1 0x7f15e23cbcbd in moz_xmalloc memory/mozalloc/mozalloc.cpp:52
    #2 0x7f15cf99deea in operator new obj-firefox/xpcom/reflect/xptcall/../../../dist/include/mozilla/mozalloc.h:208
    #3 0x7f15cf99deea in NS_GetXPTCallStub xpcom/reflect/xptcall/xptcall.cpp:64
    #4 0x7f15d0a11db4 in GetClass obj-firefox/js/xpconnect/src/../../../dist/include/nsXPTCUtils.h:28
    #5 0x7f15d0a11db4 in nsXPCWrappedJS::nsXPCWrappedJS(JSContext*, JSObject*, nsXPCWrappedJSClass*, nsXPCWrappedJS*, tag_nsresult*) js/xpconnect/src/XPCWrappedJS.cpp:391
    #6 0x7f15d0a10913 in nsXPCWrappedJS::GetNewOrUsed(JS::Handle<JSObject*>, nsID const&, nsXPCWrappedJS**) js/xpconnect/src/XPCWrappedJS.cpp:373

I have no idea what is holding this alive.  It would be nice to fix so we could report all indirect leaks.
It looks like we're leaking nsXPTCStubBase and nsXPCWrappedJS.  From local investigation, they are pointing to each other, but nothing else refers to them.  I have no idea what the matter is.  I guess I can try to narrow down what the tests are that trigger this.  It could be an instance of bug 947049.

One submystery is why the nsXPCWrappedJS is not being reported as a leak with the normal leak system.  nsXPTCStubBase just needs MOZ_CTOR/DTOR added to it.
I can reproduce this leak in a non-ASan, debug build with DMD, so it isn't some opt issue.
One test this seems to be leaking in is browser/components/places/tests/browser/browser_475045.js
Oops, there's actually a number more tests in that chunk.
If I add MOZ_COUNT_CTOR/DTOR to nsXPCWrappedJS, the regular leak logging stuff shows that we leak one.  The problem is what are we doing wrong with refcounting that we don't notice the leak.  I tried to investigate with refcount logging, but that says that 6 separate objects are leaking.  Why does that not agree with anything?
One way to reproduce the leak is to go to browser/components/places/tests/browser/browser.ini and comment out every test except browser_drag_bookmarks_on_toolbar.js (some other tests also leak a WFS), then run
  ./mach mochitest browser/components/places/tests/browser/

This leaks a single nsXPCWrappedJS, which you can see if you do the MOZ_COUNT_CTOR/DTOR logging.  nsXPTCStubBase contains a pointer to the WJS, but that is weak. The refcount of the leaking WJS goes to zero, but we don't free it for some reason. I have no idea how that could happen. We don't seem to addref or release the WJS after its refcount drops to zero, based on some assertions I added. It doesn't look like any XPCWJS are getting booted out by the SnowWhiteKiller either.

There's a bunch of weird stuff involving delegated refcounts and weak references going on here, so I wouldn't be too surprised if there's a bug somewhere but I haven't been able to figure that out. And I have no idea how any of that could somehow cause us to not delete the object while leaving the refcount at 0.
I think this is an actual leak, albeit a small uncommon one.
Whiteboard: [MemShrink]
Blocks: LSan
Component: XPConnect → Places
Product: Core → Toolkit
Summary: Fix or suppress the persistent leaks under nsXPCWrappedJS::GetNewOrUsed() → nsNavHistoryResult does a QI during cycle collection traversal, causing an nsXPCWrappedJS to leak
Assignee: nobody → continuation
I finally figured out what is happening here. The Traversal for nsNavHistoryResult iterates over a nsMaybeWeakPtrArray and reports the elements to the cycle collector. However, as part of doing that, it does a QI:

#01: nsXPTCStubBase::QueryInterface(nsID const&, void**) (xptcall.cpp:21, in XUL)
#02: nsQueryInterfaceWithError::operator()(nsID const&, void**) const (nsCOMPtr.cpp:40, in XUL)
#03: nsMaybeWeakPtr<nsINavHistoryResultObserver>::GetValue() const (nsCOMPtr.h:1058, in XUL)
#04: nsNavHistoryResult::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) (AlreadyAddRefed.h:116, in XUL)
#05: CCGraphBuilder::BuildGraph(js::SliceBudget&) (nsCycleCollector.cpp:2239, in XUL)

This does a suspect, but because we're in the middle of graph traversal, it doesn't add it to the purple buffer. Somehow this must be the last reference to the cycle collector, so we never end up actually deleting it and it leaks.

What seems to fix this is to make a new method that does a QI to the special nsCycleCollectionISupports interface that does no AddRef or Release. We should also see if we can assert if somebody does a suspect during CC, or maybe even crash, as this is going to be a source of difficult-to-fix bugs.
Blocks: 1189122
Doing an AddRef or Release of a cycle collected object during CC
traversal can cause leaks. traverseResultObservers does this in two
places. First, it accidentally copies the array that is passed
in. Second, the GetValue() method that is being implicitly called in
the line with |aObservers.ElementAt(i);| does a QI.

This patch fixes the former problem by passing a reference. It fixes
the latter problem by returning the raw underlying pointer held by the
nsMaybeWeakPtrArray, which is what the CC wants anyways, because it
does not care about weak referents.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=992761a5cdc3
Attachment #8640881 - Flags: review?(bugs)
Comment on attachment 8640881 [details] [diff] [review]
Don't AddRef or Release nav history result observers during CC traversal.

What a horrible old code.
Attachment #8640881 - Flags: review?(bugs) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7e990e527c4a
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Comment on attachment 8640881 [details] [diff] [review]
Don't AddRef or Release nav history result observers during CC traversal.

Approval Request Comment
[Feature/regressing bug #]: very old
[User impact if declined]: possible crashes, also minor leaks
[Describe test coverage new/current, TreeHerder]: this code is tested on TreeHerder
[Risks and why]: low, this patch removes some weird old code
[String/UUID change made/needed]: none
Attachment #8640881 - Flags: approval-mozilla-aurora?
Comment on attachment 8640881 [details] [diff] [review]
Don't AddRef or Release nav history result observers during CC traversal.

Approved. Has been in m-c for a few days, seems safe to uplift to m-a.
Attachment #8640881 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Andrew, is there a way to get this fix tested and verified? Thanks!
Flags: needinfo?(continuation)
(In reply to Ritu Kothari (:ritu) from comment #16)
> Andrew, is there a way to get this fix tested and verified? Thanks!

The easiest thing would just be to land bug 1189122 on Aurora, too. Then existing tests will fail unless this works. I'll mark that for approval.
Flags: needinfo?(continuation)
You need to log in before you can comment on or make changes to this bug.