Closed
Bug 1215265
Opened 9 years ago
Closed 9 years ago
CompositorChild leaks in e10s content process
Categories
(Core :: Graphics: Layers, defect, P3)
Core
Graphics: Layers
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: mccr8, Assigned: nical)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [gfx-noted])
Attachments
(5 files, 7 obsolete files)
40 bytes,
text/x-review-board-request
|
nical
:
review+
|
Details |
11.67 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
26.74 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
4.55 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
11.78 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
The previous bug for this (bug 1065536) ended up as kind of a mess, so I'm filing a new bug for this instead. CompositorChild does not properly shutdown in the content process, which results in leaks. There is only one per process, and it is expected to live as long as the process itself, so this is not an issue from a memory usage perspective, but it does make it harder to find other leaks.
Reporter | ||
Comment 1•9 years ago
|
||
This class also leaks a Transport, because it does not have the DeleteTask boilerplate in its dtor. (As in bug 1213407.)
Updated•9 years ago
|
tracking-e10s:
--- → +
I'm looking into this with rr. The leaked reference is in CompositorChild::sCompositor. CompositorChild::ActorDestroy is not called before the process shuts down.
For chrome processes, nsBaseWidget::Shutdown calls nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content processes, the PuppetWidget doesn't have a mCompositorChild. It's a global singleton; I don't know what's supposed to be responsible for cleaning it up on Shutdown. nical, do you?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3) > For chrome processes, nsBaseWidget::Shutdown calls > nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content > processes, the PuppetWidget doesn't have a mCompositorChild. It's a global > singleton; I don't know what's supposed to be responsible for cleaning it up > on Shutdown. nical, do you? Hm, I assumed the PuppetWidget would handle this since it inherits nsBaseWidget's code that does the work, but mCompositorChild is indeed null. It's not quite clear to me whether CompositorChild was originally intended to be a singleton or if it was hacked into being one. If we never want to have several CompositorChilds in the same process, I'd be game for making it REALLY a singleton, and take its destruction out of nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the other protocols. Otherwise we could make sure PuppetWidget has its mCompopositorChild member properly set like other widgets but then let's remove the singleton in the process! Considering the current state of the code, I have a slight preference for the first approach.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #4) > Hm, I assumed the PuppetWidget would handle this since it inherits > nsBaseWidget's code that does the work, but mCompositorChild is indeed null. > It's not quite clear to me whether CompositorChild was originally intended > to be a singleton or if it was hacked into being one. > If we never want to have several CompositorChilds in the same process, I'd > be game for making it REALLY a singleton, and take its destruction out of > nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the > other protocols. Otherwise we could make sure PuppetWidget has its > mCompopositorChild member properly set like other widgets but then let's > remove the singleton in the process! > Considering the current state of the code, I have a slight preference for > the first approach. The current situation is that in chrome processes there is one CompositorChild per window, but in content processes there is only one CompositorChild. Assuming we keep that, destroying the CompositorChild gfxPlatform::ShutdownLayersIPC looks like a good option.
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5) > The current situation is that in chrome processes there is one > CompositorChild per window, I think you meant CompositorParent? My understanding is that the chrome process also has 1 and only CompositorChild but I may be wrong. > but in content processes there is only one > CompositorChild. Assuming we keep that, destroying the CompositorChild > gfxPlatform::ShutdownLayersIPC looks like a good option. Ok
(In reply to Nicolas Silva [:nical] from comment #6) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5) > > The current situation is that in chrome processes there is one > > CompositorChild per window, > > I think you meant CompositorParent? My understanding is that the chrome > process also has 1 and only CompositorChild but I may be wrong. I think that's incorrect. I think in the chrome process there's one CompositorParent and one CompositorChild per window, and one CrossProcessCompositorParent per content process. Each content process gets one CompositorChild corresponding to its CrossProcessCompositorParent.
Bug 1215265. Don't leak CompositorChild in content processes. r=nical This makes the shutdown sequence of CrossProcessCompositorParent work very much like the shutdown sequence of CompositorParent: the child does SendStop(); the parent's RecvStop does some cleanup and then queues a DeferredDestroy task; DeferredDestroy releases the parent, destroying it; IPDL sends a message to destroy the child.
Attachment #8688935 -
Flags: review?(nical.bugzilla)
Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical
Attachment #8688936 -
Flags: review?(nical.bugzilla)
Reporter | ||
Comment 10•9 years ago
|
||
Please also remove the appendExpectedLeakCounts() chunk for bug 1215265 from leaklog.py: http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozleak/mozleak/leaklog.py#49 (I recently added object-count-based leak thresholds for content process leak checking, because byte-based thresholds let too much through.)
Reporter | ||
Comment 11•9 years ago
|
||
Also, thanks for fixing this leak! It makes it harder to notice real leaks.
Assignee | ||
Updated•9 years ago
|
Attachment #8688935 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8688935 [details] MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical https://reviewboard.mozilla.org/r/25451/#review22977 ::: gfx/thebes/gfxPlatform.cpp:751 (Diff revision 1) > + CompositorChild::ShutdownLayersIPC(); It would be great if all CompositorChilds (in parent and child process) had the same shutdown logic, but we can look into that as a followup.
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8688936 [details] MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical https://reviewboard.mozilla.org/r/25453/#review22975 ::: gfx/layers/ipc/ImageBridgeParent.cpp:62 (Diff revision 1) > + MOZ_COUNT_CTOR(ImageBridgeParent); Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all PImageBridgeParents are ImageBridgeParents leaks will be reported twice.
Attachment #8688936 -
Flags: review?(nical.bugzilla) → review+
(In reply to Nicolas Silva [:nical] from comment #12) > It would be great if all CompositorChilds (in parent and child process) had > the same shutdown logic, but we can look into that as a followup. That would require either enforcing one PuppetWidget per process (which we definitely don't want to do) or using one CompositorChild per PuppetWidget (which would require significant changes, e.g. CompositorParent relies on being able to find a LayerTransactionParent given a child process ID).
(In reply to Nicolas Silva [:nical] from comment #13) > Comment on attachment 8688936 [details] > MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. > r=nical > > https://reviewboard.mozilla.org/r/25453/#review22975 > > ::: gfx/layers/ipc/ImageBridgeParent.cpp:62 > (Diff revision 1) > > + MOZ_COUNT_CTOR(ImageBridgeParent); > > Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all > PImageBridgeParents are ImageBridgeParents leaks will be reported twice. True, although we already do that in many cases. But I'll drop the ImageBridge changes.
Assignee | ||
Comment 16•9 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14) > That would require either enforcing one PuppetWidget per process (which we > definitely don't want to do) or using one CompositorChild per PuppetWidget > (which would require significant changes, e.g. CompositorParent relies on > being able to find a LayerTransactionParent given a child process ID). So IIUC, we currently have several (puppet) widgets which use the same CompositorChild (of which we have exactly one instance per process, child or parent alike). What I have in mind is a static CompositorChild::ShutDown method (or whatever its name) which cleans up the singleton CompsitorChild, or at least all of its IPC components, maybe leaving a refcounted CompositorChild in a state where it can't do anything but won't crash either if some widgets are still alive past that point (which would surprise me because it'd be super late in in the shutdown sequence). But I may be missing something.
(In reply to Nicolas Silva [:nical] from comment #16) > What I have in mind is a static CompositorChild::ShutDown > method (or whatever its name) which cleans up the singleton CompsitorChild, > or at least all of its IPC components, maybe leaving a refcounted > CompositorChild in a state where it can't do anything but won't crash either > if some widgets are still alive past that point (which would surprise me > because it'd be super late in in the shutdown sequence). But I may be > missing something. How would that be different from my patch here?
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17) > How would that be different from my patch here? Not much. In your patch, CompositorChild::ShutdownLayersIPC() is called only in child processes, and it would be nice if instead of having the widget destroy the CompsoitorChild in the parent process, we'd just unconditionally shut down the CompositorChild singleton in both the child and parent processes the same way from XPCOM's shutdown. The work should be the same so i'd rather not have to remember 2 paths to clean CompositorChild up. I don't mean to block landing your patches, I say this out loud so I'll remember and maybe put a patch up next time I have some time to spare on small cleanups.
The problem is that CompositorChild is not a singleton in the parent process, there's one per widget (because there's one CompositorParent were widget). Maybe we should fix that but it's quite a lot of work I think.
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b113a6167e94e97645286e7431b98d17bc2610c Bug 1215265. Don't leak CompositorChild in content processes. r=nical https://hg.mozilla.org/integration/mozilla-inbound/rev/0cbe4b80d443dd21899a600b7a9e0e8449ccdc4c Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical
Backed out for being the likely cause of mass bustage on inbound. If it wasn't these, it was one or more of the other bugs patched in that push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=162ded2e49c8 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/564f5491ccdd remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/acbac8b86619
Flags: needinfo?(roc)
Comment on attachment 8688935 [details] MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/1-2/
Comment on attachment 8688936 [details] MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/1-2/
Bug 1215265. Shut down ImageBridgeChild properly. r=nical
Attachment #8691204 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 25•9 years ago
|
||
Comment on attachment 8691204 [details] MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical https://reviewboard.mozilla.org/r/26107/#review23467 nice
Attachment #8691204 -
Flags: review?(nical.bugzilla) → review+
Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical
Attachment #8692657 -
Flags: review?(nical.bugzilla)
Without this latest patch, the first three patches cause shutdown hangs because a CompositorThreadHolder leaks.
Flags: needinfo?(roc)
Two problems showed up in the latest try push. We're leaking CompositorThreadHolder, on Windows only. I plan to add some logging to try to track that down. On B2G, content processes are crashing in CompositorChild::ActorDestroy: // Due to poor lifetime management of gralloc (and possibly shmems) we will // crash at some point in the future when we get destroyed due to abnormal // shutdown. Its better just to crash here. On desktop though, we have a chance // of recovering. if (aWhy == AbnormalShutdown) { NS_RUNTIMEABORT("ActorDestroy by IPC channel failure at CompositorChild"); }
We reach ActorDestroy(AbnormalShutdown) on desktop too. The basic problem seems to be that CompositorChild::sCompositor (used only in content processes) is assigned in CompositorChild::Create by forget()ing the reference we created. So there's an implicit 1 refcount held by sCompositor. But the only balancing Release is in CompositorChild::ActorDestroy. So during content process shutdown, we call SendStop to trigger destruction of the CompositorParent and then do CompositorChild::DeferredDestroy, but DeferredDestroy does not cause the CompositorChild to be deleted (since it adds as many refs as it releases). So CompositorChild stays alive until IPC notices that the CompositorParent's channel has been destroyed, at which point we get a channel error and CompositorChild::ActorDestroy with an abnormal reason. I think that DeferredDestroy should release the reference that ActorDestroy releases.
Comment on attachment 8688935 [details] MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/2-3/
Comment on attachment 8688936 [details] MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/2-3/
Comment on attachment 8691204 [details] MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26107/diff/1-2/
Comment on attachment 8692657 [details] MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26297/diff/1-2/
Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical
Attachment #8692833 -
Flags: review?(nical.bugzilla)
Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical With the new setup, RecvStop gets called and then it's possible for the CompositorChild to go away, causing ActorDestroy to get called. We need to make sure it's safe to call Destroy() twice.
Attachment #8692834 -
Flags: review?(nical.bugzilla)
Assignee | ||
Updated•9 years ago
|
Attachment #8692657 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8692657 [details] MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical https://reviewboard.mozilla.org/r/26297/#review23777 The patch title says something about mCompositorThreadHolder and mSelfRef but I don't see references to them in the actual patch, is this the complete patch?
Assignee | ||
Comment 41•9 years ago
|
||
Comment on attachment 8692657 [details] MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical https://reviewboard.mozilla.org/r/26297/#review23785
Attachment #8692657 -
Flags: review+
Assignee | ||
Comment 42•9 years ago
|
||
Comment on attachment 8692833 [details] MozReview Request: Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical https://reviewboard.mozilla.org/r/26335/#review23781
Attachment #8692833 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 43•9 years ago
|
||
Comment on attachment 8692834 [details] MozReview Request: Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical https://reviewboard.mozilla.org/r/26337/#review23783
Attachment #8692834 -
Flags: review?(nical.bugzilla) → review+
https://reviewboard.mozilla.org/r/26297/#review23777 CrossProcessCompositorParent::ActorDestroy calls Destroy, which we're changing to trigger DeferredDestroy.
Still got bugs on Windows and B2G: https://treeherder.mozilla.org/#/jobs?repo=try&revision=06d486170e47&group_state=expanded I can't afford to spend any more time on this for the forseeable future.
That try push included this patch for ImageBridgeChild. (As well as some logging code.)
Talked to billm and mccr8 about this. Bill thinks the core problem is that we need to call CompositorParent::Close, perhaps from a runnable dispatched by RecvStop.
Comment 48•9 years ago
|
||
Robert, bug 1232549 caused the mozilla-aurora tree to be closed. Do you have an eta to land that?
Flags: needinfo?(roc)
The work in this bug is harder than I thought and needs to be redone. After redoing it, which will take a while to get right, there will still be a risk of shutdown crashes and other things, so this won't be a good fit for Aurora.
Flags: needinfo?(roc)
Assignee | ||
Comment 50•9 years ago
|
||
I tried to take the PImageBridge part of this patch queue and apply it separately but things break because of media resources being still alive when shutting gfx (and the ImageBridge) down (see bug 1207220). We won't be able to do anything with ImageBridge's shutdown as long as all media resources aren't properly destroyed when gfx is shut down, so marking dependent although it's a bit of a stretch since this bug fixes the shutdown of both the CompositorChild and ImageBridgeChild protocols, and the former isn't blocked on media issues.
Depends on: 1207220
Reporter | ||
Comment 51•9 years ago
|
||
Feel free to deduplicate bug 1117203 and separate out fixing the two leaks. Any forward progress on this would be great. :)
Assignee | ||
Comment 52•9 years ago
|
||
I'll give this bug another shot as soon as bug 1207220 is resolved which should be soon.
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Comment 53•9 years ago
|
||
In bug 1239504, I'm landing a hack to work around an LSan report that arises because the ImageBridgeChild thread is not shut down properly. When this thread shutdown is fixed, it would be good to back out this hack at the same time.
Comment 54•9 years ago
|
||
Andrew, do you want to pick this up from roc?
Flags: needinfo?(continuation)
Priority: -- → P3
Reporter | ||
Comment 55•9 years ago
|
||
This really needs somebody who understands layers to work on it. Maybe Nicolas could take a look per comment 52?
Flags: needinfo?(continuation) → needinfo?(nical.bugzilla)
Assignee | ||
Comment 56•9 years ago
|
||
I'll look into this (I already did since comment 52 but unfortunately bug 1207220's resolution wasn't enough to get the patch queue to work).
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 57•9 years ago
|
||
I have been looking into this, lately. In order to shut CompositorChild down properly I first want to shut the ImageBridgeChild down properly since the later depends on the former. We currently don't shut Imagebridge down for content processes doing it mostly works except on windows where a lot of tests end up timing out on shutdown. For most of them, the problem is that a CompositorThreadHolder is leaked, and CompositorParent::ShutDown spins the event loop until CompositorThreadHolder is destroyed. few things actually hold references to CompsoitorThreadHolders, and ImageBridgeParent is one of them. On most platforms, content processes' ImageBridge shutdown always ends with the ActorDestroy hook called with AbnormalShutdown as the reason, except on Windows. The fact that windows differs there is a bit worrying. We expect ImageBridge to run the ActorDestroy hook for every possible shutdown case at some point. Ideally with "Deletion" rather than "AbnormalShutdown". ImageBridgeParent holds a self reference that is cleared only if ActroDestroy is called, hence the leak, the the deadlock. Ironically, not shutting ImageBridge down means it is always still alive when the channel is destroyed, which causes the abnormal shutdown. AbnornmalShutdown sounds sad but at least it does get ActorDestroy called. I can't remember why we don't call __delete__ on top level protocols, I guess I'll figure that out quickly enough, but that should ensure ActorDestroy is called.
Assignee | ||
Comment 58•9 years ago
|
||
So, the reason we didn't Close() the channel (you can't __delete__ a top-level protocol, you close its channel instead) was because of some other bug that got resolved after a lot of the shutdown code was written, and closing the channel properly wasn't revisited afterwards. This patch replaces the PImageBridge::Stop message by closing the ImageBridge's channel. What was in ImageBridgeParent::RecvStop got moved into ActorDestroy and actor destroy is now properly called in any situation (be it a normal or an abnormal shutdown). We have the guarantee that CompositorParent::DeferredDestroy is called, mSelfRef never leaked and all is well... or so I hope and this try push will tell: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c18425ba56e
Attachment #8691204 -
Attachment is obsolete: true
Attachment #8734750 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 59•9 years ago
|
||
The try push doesn't look very good, unfortunately. A lot of the failures I've looked at are media stuff that outlived the gfx shutdown, and later released by the cycle collector. Hopefully this can be worked around throwing some cycle collections here and there.
Comment 60•9 years ago
|
||
Comment on attachment 8734750 [details] [diff] [review] Shut ImageBridge down properly The code seems ok, though it caused test failures.
Attachment #8734750 -
Flags: review?(sotaro.ikeda.g) → review+
Comment 61•9 years ago
|
||
:nical, do you have an idea why the following crash happned? Crash of OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be deleted during the mutex is locked. https://treeherder.mozilla.org/logviewer.html#?job_id=18592065&repo=try
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 62•9 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #61) > :nical, do you have an idea why the following crash happned? Crash of > OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be > deleted during the mutex is locked. Yes, in this case the HTMLMediaElement is destroyed too late in the shutdown sequence (during a cycle collection happening after gfx shutdown), so it calls into stuff that's already destroyed. We'll have to do another pass of finding resources that outlive the gfx shutdown and clean them up synchronously. This problem is not caused by this patch per say, it's just that not shutting down the ipdl protocols and threads for child processes was hiding the issue. It does block my patches from landing, though, so we'll have to do something about it. I'll try to land the shutdown patches preff'ed off so that we can more easily make progress in this area.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 63•9 years ago
|
||
This patch should trigger a cycle collection before shutting down gfx. I am not actually sure that it's the proper way to do this.
Attachment #8735902 -
Flags: review?(continuation)
Reporter | ||
Comment 64•9 years ago
|
||
Doesn't this mean we get crashes if something leaks media elements? That doesn't sound great. I think media people have had to deal with this issue for stream graph stuff. Maybe jesup or somebody has some ideas?
Reporter | ||
Comment 65•9 years ago
|
||
Comment on attachment 8735902 [details] [diff] [review] do a cycle collection before shutting down gfx Review of attachment 8735902 [details] [diff] [review]: ----------------------------------------------------------------- Doesn't this mean we get use-after-free crashes if something leaks media elements? That s
Attachment #8735902 -
Flags: review?(continuation) → review-
Reporter | ||
Comment 66•9 years ago
|
||
We also want to avoid shutdown GCs in non-DEBUG builds, though I think we never run ShutdownXPCOM in non-DEBUG builds, so that should be okay.
Reporter | ||
Comment 67•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #65) > Doesn't this mean we get use-after-free crashes if something leaks media > elements? That s Ignore that, I just meant the same block of text as I put in comment 64.
Assignee | ||
Comment 68•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #64) > Doesn't this mean we get crashes if something leaks media elements? That > doesn't sound great. I don't understand how the crashes you mention relate to the extra CC. Currently we get crashes if a gfx resource is used after the gfx module is shut down (destroying it too late counts as used). Is the r- because we don't want to spend time on extra CC during shutdown, or is there something else? I think that ShutdownXPCOM is run in non-debug builds.
Comment 69•9 years ago
|
||
That extra CC during shutdown seems like a hack and doesn't work if someone happens to leak object. Also, it doesn't guarantee everything is released after it. For example some code might on purpose keep objects alive until some other notifications happening after xpcom-shutdown. Or ( this is a leak sure ) the common case where non-weak ObserverService observers are kept alive until the ObserverService is shutdown and those listeners may keep other objects alive. In general code shouldn't rely on CC for any cleans up happening before/during shutdown. Sounds like media stuff needs to explicitly release some resources when xpcom-shutdown notification is fired.
Reporter | ||
Comment 70•9 years ago
|
||
Yes, what Olli said. Thanks for explaining it better than me. As I said, the media people have had to deal with similar issues for a while, so I'd talk to jesup about what they do. I don't know the details.
Assignee | ||
Comment 71•9 years ago
|
||
Similar to shutting ImageBridge down. This patch includes the stuff Roc wrote that I had reviewed and add code to close the channel is properly and make sure we don't leak CompositorThreadHolders. the calling CleanupRemoteResources was moved from the compositor to the widget because the Compositor some times outlives the widget and adding a strong ref to to the widget would create a cycle. looks good on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=692f220c11a1b17dabd7571afa99c990c9177439
Attachment #8737880 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 72•9 years ago
|
||
There's still a few things to figure out on windows specifically. The try push in the previous comment was with this patch applied.
Attachment #8688935 -
Attachment is obsolete: true
Attachment #8688936 -
Attachment is obsolete: true
Attachment #8692833 -
Attachment is obsolete: true
Attachment #8692834 -
Attachment is obsolete: true
Attachment #8693359 -
Attachment is obsolete: true
Attachment #8735902 -
Attachment is obsolete: true
Attachment #8737882 -
Flags: review?(sotaro.ikeda.g)
Comment 73•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #72) > There's still a few things to figure out on windows specifically. The try > push in the previous comment was with this patch applied. Is it going to be addressed in this bug?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 74•9 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #73) > Is it going to be addressed in this bug? Yes. Depending on the amount of work I might take a break from this and come back to the remaining windows bits later if no one else wants to have fun with shutdown issues.
Flags: needinfo?(nical.bugzilla)
Comment 75•9 years ago
|
||
Comment on attachment 8737880 [details] [diff] [review] Shut ShadowLayers down properly Review of attachment 8737880 [details] [diff] [review]: ----------------------------------------------------------------- I have one question about calling thread of CleanupRemoteDrawing(). The question is the following. ::: gfx/layers/ipc/CompositorBridgeChild.cpp @@ +33,5 @@ > > namespace mozilla { > namespace layers { > > +static StaticRefPtr<CompositorBridgeChild> sCompositor; //nit. It might better change "sCompositor" to like "sCompositorBridge". @@ +185,5 @@ > } > > +// static > +bool > +CompositorBridgeChild::ChildProcessHasCompositor() //nit. It might better to change "ChildProcessHasCompositor()" to like "ChildProcessHasCompositorBridge()" ::: widget/nsBaseWidget.cpp @@ +269,5 @@ > } > > void nsBaseWidget::DestroyCompositor() > { > + CleanupRemoteDrawing(); // Is it ok to call it here? It is on main thread. It seems to be expected that CleanupRemoteDrawing() is called on Compositor thread.
Comment 76•9 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #75) > > // Is it ok to call it here? It is on main thread. It seems to be expected > that CleanupRemoteDrawing() is called on Compositor thread. All another remote drawing functions are called on compositor thread.
Updated•9 years ago
|
Attachment #8737882 -
Flags: review?(sotaro.ikeda.g) → review+
Updated•9 years ago
|
Attachment #8737880 -
Flags: review?(sotaro.ikeda.g)
Updated•9 years ago
|
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 77•9 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #76) > All another remote drawing functions are called on compositor thread. You are right, code seems to expect CleanupRemoteResources to happen on the compositor thread. I'm digging some more and I think we should be good if we call it when receiving the WillClose message, I'm doing some more testing to be sure.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 78•9 years ago
|
||
Modifications to the patch "Shut Shadowlayers down properly" I figured it would be easier to review this in isolation than the whole thing again. calling CleanupRemoteDrawing from RecvWillClose fixed the use-after-free issue without having to move the call off the compositor thread.
Attachment #8738514 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 79•9 years ago
|
||
Comment on attachment 8737880 [details] [diff] [review] Shut ShadowLayers down properly Review of attachment 8737880 [details] [diff] [review]: ----------------------------------------------------------------- Re-setting the review flag since there is a followup fix addressing the review comments.
Attachment #8737880 -
Flags: review?(sotaro.ikeda.g)
Comment 80•9 years ago
|
||
Comment on attachment 8738514 [details] [diff] [review] Shut dhadowlayers down properly (adendum) Review of attachment 8738514 [details] [diff] [review]: ----------------------------------------------------------------- Looks good!
Attachment #8738514 -
Flags: review?(sotaro.ikeda.g) → review+
Updated•9 years ago
|
Attachment #8737880 -
Flags: review?(sotaro.ikeda.g) → review+
Comment 81•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f07fce7daf2 https://hg.mozilla.org/integration/mozilla-inbound/rev/577472ad5c38 https://hg.mozilla.org/integration/mozilla-inbound/rev/125f5b4294f9
Comment 82•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1f07fce7daf2 https://hg.mozilla.org/mozilla-central/rev/577472ad5c38 https://hg.mozilla.org/mozilla-central/rev/125f5b4294f9
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment 83•9 years ago
|
||
Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on file and allowed me to re-enable a few entire *directories* of tests! Is there any chance we can safely uplift this to Aurora as well so it's included in the next e10s experiments on Beta47?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 84•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #83) > Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on > file and allowed me to re-enable a few entire *directories* of tests! Is > there any chance we can safely uplift this to Aurora as well so it's > included in the next e10s experiments on Beta47? \o/ I think that it's a bit risky to uplift, though. We could try and be ready to back it out in case of issue, I'll see how cleanly it applies to aurora but I suspect it will require some fiddling.
Flags: needinfo?(nical.bugzilla)
Comment 85•9 years ago
|
||
This seems to have broken hardware accelerated rendering on Linux.
Reporter | ||
Updated•9 years ago
|
status-firefox44:
affected → ---
status-firefox46:
--- → wontfix
Comment 87•9 years ago
|
||
Uplifting this to 47 would also make uplifting bug 1264161 easier. In particular this patch changed some stuff in CompositorParent::ActorDestroy and without that change I'm not entirely clear on what impact bug 1264161 will have. Now that this has had some more bake time, do you feel more comfortable uplifting it?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 88•9 years ago
|
||
I am a bit worried that uplifting this will mean uplifting a lot of other shutdown stuff we've been doing in the last few months and I don't have a clear idea of what depends on what. I don't know how hard it would be to have a fix for bug 1264161 in 47 but it's most likely still less risky.
Flags: needinfo?(nical.bugzilla)
Comment 89•9 years ago
|
||
Ok, thanks.
No longer depends on: 1334655
You need to log in
before you can comment on or make changes to this bug.
Description
•