Closed Bug 1432708 Opened 3 years ago Closed 3 years ago

DidComposite is not notified to a Pipeline that was removed

Categories

(Core :: Graphics: WebRender, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: sotaro, Assigned: nical)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(2 files, 4 obsolete files)

AsyncImagePipelineManager::Update() expects to receive DidComposite of the pipeline. It is necessary for managing pipeline resources lifetimes, since gecko needs to hold the resources when they are used for external image by WebRender. If it is not done correctly, LockExternalImage() could fail.

https://dxr.mozilla.org/mozilla-central/source/gfx/webrender_bindings/RendererOGL.cpp#21

When AsyncImagePipelineManager::Update() was implemented, DidComposite notification worked as expected. But from some point, DidComposite becomes not to be notified for the DidComposite that the pipeline was removed.
Then AsyncImagePipelineManager leaks AsyncImagePipelineManager::PipelineTexturesHolder now. It is easily confirmed by the following STR.

[1] Enable "Bookmarks toolbar"
[2] Add Youtube video to "Bookmarks toolbar".
[3] Load the video from "Bookmarks toolbar".

When I repeated [3], the leak of the PipelineTexturesHolder was confirmed easily. AsyncImagePipelineManager expected to receive DidComposite with the epoc that the pipeline was destroyed. But WebRender does not notify the DidComposite with the epoc.
Blocks: 1431862
WebRender seems to push only alive pipeline ids for DidComposite.
  https://github.com/servo/webrender/blob/master/webrender/src/frame.rs#L1086
With current gecko and webrender, even when there is an update to epoch to a pipeline, if the pipeline were removed before generating frame, the epoch was not notified to gecko as rendered_epochs.
There could be several ways to address the problem like the followings.

[1] Change WebRender as to notify removed pipelines as rendered_epochs.
[2] Wait one generating frame's completion before removing PipelineTexturesHolder.

[2] assumes that pipeline removal happens synchronously. It is correct with current WebRender. And I think that it is a reasonable assumption even with asynchronous scene building of future webrender.
This implements [2] in Comment 4 and confirmed that the problem was addressed with the patch.
Assignee: nobody → sotaro.ikeda.g
Comment on attachment 8945029 [details] [diff] [review]
patch - Wait one generating frame's completion before removing PipelineTexturesHolder

:nical, can you feedback to the patch?
Attachment #8945029 - Flags: review?(nical.bugzilla)
Attachment #8945029 - Flags: review?(nical.bugzilla) → feedback?(nical.bugzilla)
Comment on attachment 8945029 [details] [diff] [review]
patch - Wait one generating frame's completion before removing PipelineTexturesHolder

Review of attachment 8945029 [details] [diff] [review]:
-----------------------------------------------------------------

I am very sorry for the late feedback. I think that this solution should work, although it's quite complicated. It took me a little while wrap my head around it so if you want to go down this road please add a big comment that explains why we end up with two systems when deciding whether a texture is in use (one based on epochs and a fallback to counting frames if the pipeline was just removed).

I think that solution [1] of making sure webrender notifies the epoch even if the pipeline was removed would be simpler, and having pipeline removal eat the epoch notifications is error prone (this bug being a proof of it), so fixing that in webrender would be beneficial to all webrender users.

Maybe let me first see if I can make that work in webrender and if that turns out to be complicated for whatever reason we can always land this fix in gecko.

::: gfx/layers/wr/AsyncImagePipelineManager.h
@@ +181,2 @@
>    nsClassHashtable<nsUint64HashKey, AsyncImagePipeline> mAsyncImagePipelines;
> +  std::queue<RefPtr<PipelineTexturesHolder> > mDestroyingPipeline;

mDestroyedPipelines (plural) would make some other parts of the code easier to understand.
Attachment #8945029 - Flags: feedback?(nical.bugzilla)
No problem, if we could receive notifications of removing pipeline, it is better fix.
Assign to :nical, since he is working to address the problem.
Assignee: sotaro.ikeda.g → nical.bugzilla
Attachment #8945029 - Attachment is obsolete: true
Blocks: 1394806
Here is the gecko side of the implementation on top of https://github.com/servo/webrender/pull/2347

I haven't properly tested this yet as there were other breaking changes in webrender, I'll try again after the next webrender update.
Attachment #8947160 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8947160 [details] [diff] [review]
Remove async image pipelines when the renderer notifies a pipeline removed.

Looks good!
Attachment #8947160 - Flags: review?(sotaro.ikeda.g) → review+
For some reason the patch had the commit message of another patch. This is the same thing with the right commit message.
Attachment #8947160 - Attachment is obsolete: true
^
Flags: needinfo?(nical.bugzilla)
What seems to be happening in these failures is that we destroyed a pipeline id and created a new one immediately with the same id before getting the notification. As a result the new pipeline's texture holder is removed when receiving notice of the old pipeline's removal.
The fix is straightforward: since we store the epoch of the pipeline's removal and set it to Nothing() if the id is reused, just checked that the epoch is not equal to Nothing() before removing the holder.
Flags: needinfo?(nical.bugzilla)
Attachment #8949772 - Flags: review?(sotaro.ikeda.g)
Same patch with windows build fix.
Attachment #8949070 - Attachment is obsolete: true
We're getting a fair amount of reports with 10's of gigabytes of memory leaked. I know this isn't enabled by default, but there are plenty of enthusiasts enabling it.
Whiteboard: [MemShrink:P1]
Attachment #8949772 - Flags: review?(sotaro.ikeda.g) → review+
Oh! There's two patches now. My bad, I only applied one. Will try again.
Yeah it's looking better with both patches applied. Sorry for the noise!
I've added these two patches to the patchset on bug 1436058 since they need to land together with the WR update. They'll land as part of that bug and I'll close this one once it's in.
Landed with bug 1436058.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
This being fixed with a PR - is there a Gecko side reftest that we can add that catches this problem?
Regressions: 1569724
You need to log in before you can comment on or make changes to this bug.