Closed Bug 1065536 Opened 5 years ago Closed 5 years ago

CompositorChild leaks in e10s content process

Categories

(Core :: Graphics: Layers, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
e10s + ---

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(4 files, 2 obsolete files)

Every e10s Mochitest leaks a few kb of what looks like Compositor related stuff:
1 AsyncTransactionTrackersHolder (40 bytes)
2 CompositorChild (904 bytes)
2 CondVar (48 bytes)
3 Mutex (60 bytes)
1 PCompositorChild (400 bytes)
1 PImageBridgeChild (404 bytes)
1 ReentrantMonitor (24 bytes)
2 RefCountedMonitor (96 bytes)
4 RefCountedTask (32 bytes)
2 WeakReference<MessageListener> (16 bytes)
2 ipc::MessageChannel (568 bytes)
6 nsTArray_base (24 bytes)
1 nsThread (140 bytes)

If this is guaranteed to be one per content process than the leak isn't too big of a deal, but for the purple of leak checking it would be nice to at least tear it down in debug builds.
tracking-e10s: --- → ?
With this and bug 1065117 fixed, it looks like the leak threshold for content processes on desktop can be reduced to 0.  This can be done by removing the entry for "tab" in options.leakThresholds in testing/mochitest/mochitest_options.py
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
The AsyncTransactionTrackersHolder seems to be an ImageBridgeChild.
In some light local testing, the ImageBridgeChild leak can be fixed simply by calling layers::ImageBridgeChild::ShutDown() unconditionally in gfxPlatform::ShutdownLayersIPC(), rather than only when we're in a default process.
Assignee: nobody → continuation
This patch does a few things.

- Remove the unneeded MOZ_COUNT_CTOR/DTOR for CompositorChild. The refcounting will take care of the leak checking for this class.

- The code in CompositorChild::Create and ::ActorDestroy seems to think there are two AddRefs being done on the CompositorChild when it is created, but this is not true. I removed a comment about a second AddRef that never happens, and I removed a second Release of sCompositor. With the rest of the patch here, we actually run this code and end up with a refcount underflow in the leak checker.

- For some reason, ActorDestroy isn't being called for one of the CompositorChildren. As far as I can see, there's no corresponding CompositorParent, which is weird. So I create a new shutdown method that destroys sCompositor if it is still around in the child process.
This patch adds a call to layers::ImageBridgeChild::ShutDown() in the child process. We're still calling it in the parent process, because if you don't, then you end up with a leak.  Why does the parent process have an ImageBridgeChild?  I have no idea.

Anyways, with these two patches applied, then when you just open and close the browser the only thing leaking in the child process is 48 bytes of StoreRef, which does not look particularly layers-related, so I filed a new bug, bug 1116261, to track those leaks.
Summary: CompositorChild leak in e10s content process → CompositorChild and ImageBridgeChild leaks in e10s content process
I split this up into separate parts.

try push: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2637c9739481

I also did a push on top of my other patches to get content process leak checking working and it didn't seem to make anything worse.
Attachment #8542277 - Attachment is obsolete: true
Attachment #8542279 - Attachment is obsolete: true
Attachment #8542713 - Flags: review?(nical.bugzilla)
This was added in
  http://hg.mozilla.org/mozilla-central/rev/7a05dad0a4a2

but I can't find any justification for it in the current code, and didn't try to figure out how this worked before bjacob's huge series of shutdown patches.
Attachment #8542715 - Flags: review?(nical.bugzilla)
It seems slightly concerning that we're not calling ActorDestroy otherwise (apparently?) but oh well.
Attachment #8542716 - Flags: review?(nical.bugzilla)
If we can assume that the process is either Default or Content then this could be hoisted up and always run, but it seemed safer to explicitly opt in.

If we fail to call this when we should, we'll just leak, not crash.

Also note that this code only runs in debug builds because of the quick exit in ContentChild.
Attachment #8542719 - Flags: review?(nical.bugzilla)
Attachment #8542713 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8542715 [details] [diff] [review]
part 2 - Remove extra Release of sCompositor.

Review of attachment 8542715 [details] [diff] [review]:
-----------------------------------------------------------------

The shutdown mess of gfx protocols doesn't fit in my head so there is a little bit of faith in this review. r+ if it works. One thing to keep in mind when dealing with touching top-level protocols is that they must be created and destroyed on the main thread of each process (for some reason related to nuwa).
Attachment #8542715 - Flags: review?(nical.bugzilla) → review+
Attachment #8542719 - Flags: review?(nical.bugzilla) → review+
Attachment #8542716 - Flags: review?(nical.bugzilla) → review+
Looks like Windows debug is very angry so I'll have to figure out what is going on there:
  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0cf6e7ee8d57
Surprisingly to me, it looks like that is caused by the ImageBridge change, so I'll just land the first three patches and split off ImageBridge to its own bug, when the tree opens again.
first 3: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=fc06770bdf50
just the last: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=43a1611643d8
Blocks: 1117203
Summary: CompositorChild and ImageBridgeChild leaks in e10s content process → CompositorChild leaks in e10s content process
Attachment #8542719 - Flags: checkin-
Given how many fits Compositor shutdown has given us in the past, CCing the sheriffs to be on the lookout for any new oranges that look related to this landing.
Bug 845982 is the reason that we never call ActorDestroy on CompositorChild. Maybe we should try to land some version of the patch in that bug.
Over in bug 1120331 comment 13, Ben said that invoking ActorDestroy() directly like this is a bad idea, so I backed out part 3.
  https://hg.mozilla.org/integration/mozilla-inbound/rev/1db2020eae48

I guess this should get backed out from Aurora, too.
No longer blocks: 1117203
I'll get part 3 backed out from Aurora assuming it sticks on inbound.
Flags: needinfo?(continuation)
Well, I guess I'll just leave it alone on Aurora, given that it should only affect debug builds and the tree seems stable enough...
Flags: needinfo?(continuation)
Backed out part 2 for being in the general vicinity of bug 1120331.

https://hg.mozilla.org/integration/mozilla-inbound/rev/989ff55c0404
Depends on: 1151711
You need to log in before you can comment on or make changes to this bug.