Closed Bug 1215265 Opened 4 years ago Closed 3 years ago

CompositorChild leaks in e10s content process

Categories

(Core :: Graphics: Layers, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---
firefox46 --- wontfix
firefox48 --- fixed

People

(Reporter: mccr8, Assigned: nical)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [gfx-noted])

Attachments

(5 files, 7 obsolete files)

40 bytes, text/x-review-board-request
nical
: review+
Details
11.67 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
26.74 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
4.55 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
11.78 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
The previous bug for this (bug 1065536) ended up as kind of a mess, so I'm filing a new bug for this instead. CompositorChild does not properly shutdown in the content process, which results in leaks. There is only one per process, and it is expected to live as long as the process itself, so this is not an issue from a memory usage perspective, but it does make it harder to find other leaks.
This class also leaks a Transport, because it does not have the DeleteTask boilerplate in its dtor. (As in bug 1213407.)
I'm looking into this with rr. The leaked reference is in CompositorChild::sCompositor. CompositorChild::ActorDestroy is not called before the process shuts down.
For chrome processes, nsBaseWidget::Shutdown calls nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content processes, the PuppetWidget doesn't have a mCompositorChild. It's a global singleton; I don't know what's supposed to be responsible for cleaning it up on Shutdown. nical, do you?
Flags: needinfo?(nical.bugzilla)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3)
> For chrome processes, nsBaseWidget::Shutdown calls
> nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content
> processes, the PuppetWidget doesn't have a mCompositorChild. It's a global
> singleton; I don't know what's supposed to be responsible for cleaning it up
> on Shutdown. nical, do you?

Hm, I assumed the PuppetWidget would handle this since it inherits nsBaseWidget's code that does the work, but mCompositorChild is indeed null.
It's not quite clear to me whether CompositorChild was originally intended to be a singleton or if it was hacked into being one.
If we never want to have several CompositorChilds in the same process, I'd be game for making it REALLY a singleton, and take its destruction out of nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the other protocols. Otherwise we could make sure PuppetWidget has its mCompopositorChild member properly set like other widgets but then let's remove the singleton in the process!
Considering the current state of the code, I have a slight preference for the first approach.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #4)
> Hm, I assumed the PuppetWidget would handle this since it inherits
> nsBaseWidget's code that does the work, but mCompositorChild is indeed null.
> It's not quite clear to me whether CompositorChild was originally intended
> to be a singleton or if it was hacked into being one.
> If we never want to have several CompositorChilds in the same process, I'd
> be game for making it REALLY a singleton, and take its destruction out of
> nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the
> other protocols. Otherwise we could make sure PuppetWidget has its
> mCompopositorChild member properly set like other widgets but then let's
> remove the singleton in the process!
> Considering the current state of the code, I have a slight preference for
> the first approach.

The current situation is that in chrome processes there is one CompositorChild per window, but in content processes there is only one CompositorChild. Assuming we keep that, destroying the CompositorChild gfxPlatform::ShutdownLayersIPC looks like a good option.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> The current situation is that in chrome processes there is one
> CompositorChild per window,

I think you meant CompositorParent? My understanding is that the chrome process also has 1 and only CompositorChild but I may be wrong.

> but in content processes there is only one
> CompositorChild. Assuming we keep that, destroying the CompositorChild
> gfxPlatform::ShutdownLayersIPC looks like a good option.

Ok
(In reply to Nicolas Silva [:nical] from comment #6)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> > The current situation is that in chrome processes there is one
> > CompositorChild per window,
> 
> I think you meant CompositorParent? My understanding is that the chrome
> process also has 1 and only CompositorChild but I may be wrong.

I think that's incorrect. I think in the chrome process there's one CompositorParent and one CompositorChild per window, and one CrossProcessCompositorParent per content process. Each content process gets one CompositorChild corresponding to its CrossProcessCompositorParent.
Bug 1215265. Don't leak CompositorChild in content processes. r=nical

This makes the shutdown sequence of CrossProcessCompositorParent
work very much like the shutdown sequence of CompositorParent:
the child does SendStop(); the parent's RecvStop does some cleanup
and then queues a DeferredDestroy task; DeferredDestroy releases
the parent, destroying it; IPDL sends a message to destroy the child.
Attachment #8688935 - Flags: review?(nical.bugzilla)
Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical
Attachment #8688936 - Flags: review?(nical.bugzilla)
Please also remove the appendExpectedLeakCounts() chunk for bug 1215265 from leaklog.py:
  http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozleak/mozleak/leaklog.py#49
(I recently added object-count-based leak thresholds for content process leak checking, because byte-based thresholds let too much through.)
Also, thanks for fixing this leak! It makes it harder to notice real leaks.
Attachment #8688935 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

https://reviewboard.mozilla.org/r/25451/#review22977

::: gfx/thebes/gfxPlatform.cpp:751
(Diff revision 1)
> +        CompositorChild::ShutdownLayersIPC();

It would be great if all CompositorChilds (in parent and child process) had the same shutdown logic, but we can look into that as a followup.
Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

https://reviewboard.mozilla.org/r/25453/#review22975

::: gfx/layers/ipc/ImageBridgeParent.cpp:62
(Diff revision 1)
> +  MOZ_COUNT_CTOR(ImageBridgeParent);

Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all PImageBridgeParents are ImageBridgeParents leaks will be reported twice.
Attachment #8688936 - Flags: review?(nical.bugzilla) → review+
(In reply to Nicolas Silva [:nical] from comment #12)
> It would be great if all CompositorChilds (in parent and child process) had
> the same shutdown logic, but we can look into that as a followup.

That would require either enforcing one PuppetWidget per process (which we definitely don't want to do) or using one CompositorChild per PuppetWidget (which would require significant changes, e.g. CompositorParent relies on being able to find a LayerTransactionParent given a child process ID).
(In reply to Nicolas Silva [:nical] from comment #13)
> Comment on attachment 8688936 [details]
> MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs.
> r=nical
> 
> https://reviewboard.mozilla.org/r/25453/#review22975
> 
> ::: gfx/layers/ipc/ImageBridgeParent.cpp:62
> (Diff revision 1)
> > +  MOZ_COUNT_CTOR(ImageBridgeParent);
> 
> Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all
> PImageBridgeParents are ImageBridgeParents leaks will be reported twice.

True, although we already do that in many cases. But I'll drop the ImageBridge changes.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> That would require either enforcing one PuppetWidget per process (which we
> definitely don't want to do) or using one CompositorChild per PuppetWidget
> (which would require significant changes, e.g. CompositorParent relies on
> being able to find a LayerTransactionParent given a child process ID).

So IIUC, we currently have several (puppet) widgets which use the same CompositorChild (of which we have exactly one instance per process, child or parent alike). What I have in mind is a static CompositorChild::ShutDown method (or whatever its name) which cleans up the singleton CompsitorChild, or at least all of its IPC components, maybe leaving a refcounted CompositorChild in a state where it can't do anything but won't crash either if some widgets are still alive past that point (which would surprise me because it'd be super late in in the shutdown sequence). But I may be missing something.
(In reply to Nicolas Silva [:nical] from comment #16)
> What I have in mind is a static CompositorChild::ShutDown
> method (or whatever its name) which cleans up the singleton CompsitorChild,
> or at least all of its IPC components, maybe leaving a refcounted
> CompositorChild in a state where it can't do anything but won't crash either
> if some widgets are still alive past that point (which would surprise me
> because it'd be super late in in the shutdown sequence). But I may be
> missing something.

How would that be different from my patch here?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17)
> How would that be different from my patch here?

Not much. In your patch, CompositorChild::ShutdownLayersIPC() is called only in child processes, and it would be nice if instead of having the widget destroy the CompsoitorChild in the parent process, we'd just unconditionally shut down the CompositorChild singleton in both the child and parent processes the same way from XPCOM's shutdown. The work should be the same so i'd rather not have to remember 2 paths to clean CompositorChild up.
I don't mean to block landing your patches, I say this out loud so I'll remember and maybe put a patch up next time I have some time to spare on small cleanups.
The problem is that CompositorChild is not a singleton in the parent process, there's one per widget (because there's one CompositorParent were widget). Maybe we should fix that but it's quite a lot of work I think.
Backed out for being the likely cause of mass bustage on inbound. If it wasn't these, it was one or more of the other bugs patched in that push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=162ded2e49c8



remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/564f5491ccdd
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/acbac8b86619
Flags: needinfo?(roc)
Blocks: 1227347
Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/1-2/
Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/1-2/
Bug 1215265. Shut down ImageBridgeChild properly. r=nical
Attachment #8691204 - Flags: review?(nical.bugzilla)
Comment on attachment 8691204 [details]
MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical

https://reviewboard.mozilla.org/r/26107/#review23467

nice
Attachment #8691204 - Flags: review?(nical.bugzilla) → review+
Duplicate of this bug: 1117203
Duplicate of this bug: 1228064
Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical
Attachment #8692657 - Flags: review?(nical.bugzilla)
Without this latest patch, the first three patches cause shutdown hangs because a CompositorThreadHolder leaks.
Flags: needinfo?(roc)
Two problems showed up in the latest try push. We're leaking CompositorThreadHolder, on Windows only. I plan to add some logging to try to track that down.

On B2G, content processes are crashing in CompositorChild::ActorDestroy:
  // Due to poor lifetime management of gralloc (and possibly shmems) we will
  // crash at some point in the future when we get destroyed due to abnormal
  // shutdown. Its better just to crash here. On desktop though, we have a chance
  // of recovering.
  if (aWhy == AbnormalShutdown) {
    NS_RUNTIMEABORT("ActorDestroy by IPC channel failure at CompositorChild");
  }
We reach ActorDestroy(AbnormalShutdown) on desktop too.

The basic problem seems to be that CompositorChild::sCompositor (used only in content processes) is assigned in CompositorChild::Create by forget()ing the reference we created. So there's an implicit 1 refcount held by sCompositor. But the only balancing Release is in CompositorChild::ActorDestroy. So during content process shutdown, we call SendStop to trigger destruction of the CompositorParent and then do CompositorChild::DeferredDestroy, but DeferredDestroy does not cause the CompositorChild to be deleted (since it adds as many refs as it releases). So CompositorChild stays alive until IPC notices that the CompositorParent's channel has been destroyed, at which point we get a channel error and CompositorChild::ActorDestroy with an abnormal reason.

I think that DeferredDestroy should release the reference that ActorDestroy releases.
Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/2-3/
Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/2-3/
Comment on attachment 8691204 [details]
MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26107/diff/1-2/
Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26297/diff/1-2/
Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical
Attachment #8692833 - Flags: review?(nical.bugzilla)
Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical

With the new setup, RecvStop gets called and then it's possible for the
CompositorChild to go away, causing ActorDestroy to get called. We need
to make sure it's safe to call Destroy() twice.
Attachment #8692834 - Flags: review?(nical.bugzilla)
Attachment #8692657 - Flags: review?(nical.bugzilla)
Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

https://reviewboard.mozilla.org/r/26297/#review23777

The patch title says something about mCompositorThreadHolder and mSelfRef but I don't see references to them in the actual patch, is this the complete patch?
Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

https://reviewboard.mozilla.org/r/26297/#review23785
Attachment #8692657 - Flags: review+
Comment on attachment 8692833 [details]
MozReview Request: Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical

https://reviewboard.mozilla.org/r/26335/#review23781
Attachment #8692833 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8692834 [details]
MozReview Request: Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical

https://reviewboard.mozilla.org/r/26337/#review23783
Attachment #8692834 - Flags: review?(nical.bugzilla) → review+
https://reviewboard.mozilla.org/r/26297/#review23777

CrossProcessCompositorParent::ActorDestroy calls Destroy, which we're changing to trigger DeferredDestroy.
Still got bugs on Windows and B2G:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06d486170e47&group_state=expanded
I can't afford to spend any more time on this for the forseeable future.
Attached patch patch (obsolete) — Splinter Review
That try push included this patch for ImageBridgeChild. (As well as some logging code.)
Talked to billm and mccr8 about this. Bill thinks the core problem is that we need to call CompositorParent::Close, perhaps from a runnable dispatched by RecvStop.
See Also: → 1232549
Robert, bug 1232549 caused the mozilla-aurora tree to be closed. Do you have an eta to land that?
Flags: needinfo?(roc)
The work in this bug is harder than I thought and needs to be redone. After redoing it, which will take a while to get right, there will still be a risk of shutdown crashes and other things, so this won't be a good fit for Aurora.
Flags: needinfo?(roc)
I tried to take the PImageBridge part of this patch queue and apply it separately but things break because of media resources being still alive when shutting gfx (and the ImageBridge) down (see bug 1207220). We won't be able to do anything with ImageBridge's shutdown as long as all media resources aren't properly destroyed when gfx is shut down, so marking dependent although it's a bit of a stretch since this bug fixes the shutdown of both the CompositorChild and ImageBridgeChild protocols, and the former isn't blocked on media issues.
Depends on: 1207220
Feel free to deduplicate bug 1117203 and separate out fixing the two leaks. Any forward progress on this would be great. :)
I'll give this bug another shot as soon as bug 1207220 is resolved which should be soon.
Whiteboard: [gfx-noted]
Blocks: 1243369
Blocks: 1239504
In bug 1239504, I'm landing a hack to work around an LSan report that arises
because the ImageBridgeChild thread is not shut down properly.
When this thread shutdown is fixed, it would be good to back out this hack
at the same time.
Andrew, do you want to pick this up from roc?
Flags: needinfo?(continuation)
Priority: -- → P3
This really needs somebody who understands layers to work on it. Maybe Nicolas could take a look per comment 52?
Flags: needinfo?(continuation) → needinfo?(nical.bugzilla)
Blocks: 1252263
I'll look into this (I already did since comment 52 but unfortunately bug 1207220's resolution wasn't enough to get the patch queue to work).
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
I have been looking into this, lately. In order to shut CompositorChild down properly I first want to shut the ImageBridgeChild down properly since the later depends on the former. We currently don't shut Imagebridge down for content processes doing it mostly works except on windows where a lot of tests end up timing out on shutdown. For most of them, the problem is that a CompositorThreadHolder is leaked, and CompositorParent::ShutDown spins the event loop until CompositorThreadHolder is destroyed. few things actually hold references to CompsoitorThreadHolders, and ImageBridgeParent is one of them.

On most platforms, content processes' ImageBridge shutdown always ends with the ActorDestroy hook called with AbnormalShutdown as the reason, except on Windows. The fact that windows differs there is a bit worrying. We expect ImageBridge to run the ActorDestroy hook for every possible shutdown case at some point. Ideally with "Deletion" rather than "AbnormalShutdown".
ImageBridgeParent holds a self reference that is cleared only if ActroDestroy is called, hence the leak, the the deadlock.

Ironically, not shutting ImageBridge down means it is always still alive when the channel is destroyed, which causes the abnormal shutdown. AbnornmalShutdown sounds sad but at least it does get ActorDestroy called. I can't remember why we don't call __delete__ on top level protocols, I guess I'll figure that out quickly enough, but that should ensure ActorDestroy is called.
So, the reason we didn't Close() the channel (you can't __delete__ a top-level protocol, you close its channel instead) was because of some other bug that got resolved after a lot of the shutdown code was written, and closing the channel properly wasn't revisited afterwards.

This patch replaces the PImageBridge::Stop message by closing the ImageBridge's channel. What was in ImageBridgeParent::RecvStop got moved into ActorDestroy and actor destroy is now properly called in any situation (be it a normal or an abnormal shutdown). We have the guarantee that CompositorParent::DeferredDestroy is called, mSelfRef never leaked and all is well... or so I hope and this try push will tell: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c18425ba56e
Attachment #8691204 - Attachment is obsolete: true
Attachment #8734750 - Flags: review?(sotaro.ikeda.g)
The try push doesn't look very good, unfortunately. A lot of the failures I've looked at are media stuff that outlived the gfx shutdown, and later released by the cycle collector. Hopefully this can be worked around throwing some cycle collections here and there.
Comment on attachment 8734750 [details] [diff] [review]
Shut ImageBridge down properly

The code seems ok, though it caused test failures.
Attachment #8734750 - Flags: review?(sotaro.ikeda.g) → review+
:nical, do you have an idea why the following crash happned? Crash of OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be deleted during the mutex is locked.

  https://treeherder.mozilla.org/logviewer.html#?job_id=18592065&repo=try
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #61)
> :nical, do you have an idea why the following crash happned? Crash of
> OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be
> deleted during the mutex is locked.

Yes, in this case the HTMLMediaElement is destroyed too late in the shutdown sequence (during a cycle collection happening after gfx shutdown), so it calls into stuff that's already destroyed.
We'll have to do another pass of finding resources that outlive the gfx shutdown and clean them up synchronously. This problem is not caused by this patch per say, it's just that not shutting down the ipdl protocols and threads for child processes was hiding the issue. It does block my patches from landing, though, so we'll have to do something about it.

I'll try to land the shutdown patches preff'ed off so that we can more easily make progress in this area.
Flags: needinfo?(nical.bugzilla)
This patch should trigger a cycle collection before shutting down gfx. I am not actually sure that it's the proper way to do this.
Attachment #8735902 - Flags: review?(continuation)
Doesn't this mean we get crashes if something leaks media elements? That doesn't sound great. I think media people have had to deal with this issue for stream graph stuff. Maybe jesup or somebody has some ideas?
Comment on attachment 8735902 [details] [diff] [review]
do a cycle collection before shutting down gfx

Review of attachment 8735902 [details] [diff] [review]:
-----------------------------------------------------------------

Doesn't this mean we get use-after-free crashes if something leaks media elements? That s
Attachment #8735902 - Flags: review?(continuation) → review-
We also want to avoid shutdown GCs in non-DEBUG builds, though I think we never run ShutdownXPCOM in non-DEBUG builds, so that should be okay.
(In reply to Andrew McCreight [:mccr8] from comment #65)
> Doesn't this mean we get use-after-free crashes if something leaks media
> elements? That s

Ignore that, I just meant the same block of text as I put in comment 64.
See Also: → 1254829
(In reply to Andrew McCreight [:mccr8] from comment #64)
> Doesn't this mean we get crashes if something leaks media elements? That
> doesn't sound great.

I don't understand how the crashes you mention relate to the extra CC.
Currently we get crashes if a gfx resource is used after the gfx module is shut down (destroying it too late counts as used).

Is the r- because we don't want to spend time on extra CC during shutdown, or is there something else? I think that ShutdownXPCOM is run in non-debug builds.
That extra CC during shutdown seems like a hack and doesn't work if someone happens to leak object.
Also, it doesn't guarantee everything is released after it. For example some code might on purpose keep objects alive until some other notifications happening after xpcom-shutdown.
Or ( this is a leak sure ) the common case where non-weak ObserverService observers are kept alive until the ObserverService is shutdown and those listeners may keep other objects alive.

In general code shouldn't rely on CC for any cleans up happening before/during shutdown.

Sounds like media stuff needs to explicitly release some resources when xpcom-shutdown notification is fired.
Blocks: 1245574
Yes, what Olli said. Thanks for explaining it better than me. As I said, the media people have had to deal with similar issues for a while, so I'd talk to jesup about what they do. I don't know the details.
Blocks: 1242442
Similar to shutting ImageBridge down. This patch includes the stuff Roc wrote that I had reviewed and add code to close the channel is properly and make sure we don't leak CompositorThreadHolders.
the calling CleanupRemoteResources was moved from the compositor to the widget because the Compositor some times outlives the widget and adding a strong ref to to the widget would create a cycle.

looks good on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=692f220c11a1b17dabd7571afa99c990c9177439
Attachment #8737880 - Flags: review?(sotaro.ikeda.g)
There's still a few things to figure out on windows specifically. The try push in the previous comment was with this patch applied.
Attachment #8688935 - Attachment is obsolete: true
Attachment #8688936 - Attachment is obsolete: true
Attachment #8692833 - Attachment is obsolete: true
Attachment #8692834 - Attachment is obsolete: true
Attachment #8693359 - Attachment is obsolete: true
Attachment #8735902 - Attachment is obsolete: true
Attachment #8737882 - Flags: review?(sotaro.ikeda.g)
(In reply to Nicolas Silva [:nical] from comment #72)
> There's still a few things to figure out on windows specifically. The try
> push in the previous comment was with this patch applied.

Is it going to be addressed in this bug?
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #73)
> Is it going to be addressed in this bug?

Yes. Depending on the amount of work I might take a break from this and come back to the remaining windows bits later if no one else wants to have fun with shutdown issues.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8737880 [details] [diff] [review]
Shut ShadowLayers down properly

Review of attachment 8737880 [details] [diff] [review]:
-----------------------------------------------------------------

I have one question about calling thread of CleanupRemoteDrawing(). The question is the following.

::: gfx/layers/ipc/CompositorBridgeChild.cpp
@@ +33,5 @@
>  
>  namespace mozilla {
>  namespace layers {
>  
> +static StaticRefPtr<CompositorBridgeChild> sCompositor;

//nit. It might better change "sCompositor" to like "sCompositorBridge".

@@ +185,5 @@
>  }
>  
> +// static
> +bool
> +CompositorBridgeChild::ChildProcessHasCompositor()

//nit. It might better to change "ChildProcessHasCompositor()" to like "ChildProcessHasCompositorBridge()"

::: widget/nsBaseWidget.cpp
@@ +269,5 @@
>  }
>  
>  void nsBaseWidget::DestroyCompositor()
>  {
> +  CleanupRemoteDrawing();

// Is it ok to call it here? It is on main thread. It seems to be expected that CleanupRemoteDrawing() is called on Compositor thread.
(In reply to Sotaro Ikeda [:sotaro] from comment #75)
> 
> // Is it ok to call it here? It is on main thread. It seems to be expected
> that CleanupRemoteDrawing() is called on Compositor thread.

All another remote drawing functions are called on compositor thread.
Attachment #8737882 - Flags: review?(sotaro.ikeda.g) → review+
Attachment #8737880 - Flags: review?(sotaro.ikeda.g)
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #76)
> All another remote drawing functions are called on compositor thread.

You are right, code seems to expect CleanupRemoteResources to happen on the compositor thread. I'm digging some more and I think we should be good if we call it when receiving the WillClose message, I'm doing some more testing to be sure.
Flags: needinfo?(nical.bugzilla)
Modifications to the patch "Shut Shadowlayers down properly" I figured it would be easier to review this in isolation than the whole thing again. calling CleanupRemoteDrawing from RecvWillClose fixed the use-after-free issue without having to move the call off the compositor thread.
Attachment #8738514 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8737880 [details] [diff] [review]
Shut ShadowLayers down properly

Review of attachment 8737880 [details] [diff] [review]:
-----------------------------------------------------------------

Re-setting the review flag since there is a followup fix addressing the review comments.
Attachment #8737880 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8738514 [details] [diff] [review]
Shut dhadowlayers down properly (adendum)

Review of attachment 8738514 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8738514 - Flags: review?(sotaro.ikeda.g) → review+
Attachment #8737880 - Flags: review?(sotaro.ikeda.g) → review+
Blocks: 1262886
Blocks: 1262898
Depends on: 1246529
See Also: 1254829
Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on file and allowed me to re-enable a few entire *directories* of tests! Is there any chance we can safely uplift this to Aurora as well so it's included in the next e10s experiments on Beta47?
Flags: needinfo?(nical.bugzilla)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #83)
> Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on
> file and allowed me to re-enable a few entire *directories* of tests! Is
> there any chance we can safely uplift this to Aurora as well so it's
> included in the next e10s experiments on Beta47?

\o/

I think that it's a bit risky to uplift, though. We could try and be ready to back it out in case of issue, I'll see how cleanly it applies to aurora but I suspect it will require some fiddling.
Flags: needinfo?(nical.bugzilla)
Depends on: 1263515
This seems to have broken hardware accelerated rendering on Linux.
Blocks: 1264293
Duplicate of this bug: 1122045
Uplifting this to 47 would also make uplifting bug 1264161 easier. In particular this patch changed some stuff in CompositorParent::ActorDestroy and without that change I'm not entirely clear on what impact bug 1264161 will have. Now that this has had some more bake time, do you feel more comfortable uplifting it?
Flags: needinfo?(nical.bugzilla)
I am a bit worried that uplifting this will mean uplifting a lot of other shutdown stuff we've been doing in the last few months and I don't have a clear idea of what depends on what. I don't know how hard it would be to have a fix for bug 1264161 in 47 but it's most likely still less risky.
Flags: needinfo?(nical.bugzilla)
Depends on: 1334655
You need to log in before you can comment on or make changes to this bug.