Closed Bug 1339625 Opened 7 years ago Closed 7 years ago

All the WR tests are orange on m-c

Categories

(Core :: Graphics: WebRender, defect, P3)

Other Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: kats, Assigned: mattwoodrow)

References

Details

(Whiteboard: [gfx-noted][stockwell fixed])

Attachments

(3 files, 4 obsolete files)

Luckily we should be able to bisect this on inbound by using the magical "Add jobs" and adding some QR jobs to inbound pushes. I'm doing that now.
Hm, the add jobs might not be working :(
I kicked off another try push using cset bcda57aa0c3b just before matt's patches to confirm [1]. But it's almost certainly a result of Matt's patches. Matt, do you have time to investigate? To build locally you need to add `ac_add_options --enable-webrender` to your mozconfig. It should run with webrender enabled by default, and you can flip it off using gfx.webrender.enabled=false (startup pref).

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=07325befcd427474f01b4e0ae5e8e3294cb18842
Flags: needinfo?(matt.woodrow)
Attached file Backtrace
I can repro locally on Linux. Seems to be some sort of deadlock, I think. I have to head out but here's a stacktrace where the code is stuck.
WebRenderPaintedLayer was calling Updated on the content client while it was still between Begin/EndPaint, so the textures were already locked when we tried to take the lock on behalf of the compositor.

ClientPaintedLayer had the same issue (fixed in [1]), so I guess WebRenderPaintedLayer inherited the bug when it was copied over.


[1] https://hg.mozilla.org/integration/mozilla-inbound/rev/8255c2bb705a
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Attachment #8837376 - Flags: review?(bugmail)
Comment on attachment 8837376 [details] [diff] [review]
unlock-textures-before-calling-updated

Review of attachment 8837376 [details] [diff] [review]:
-----------------------------------------------------------------

Seems to fix it for me locally. Painting seems *super* slow though, not sure if that's from your patches or something else.

::: gfx/layers/wr/WebRenderPaintedLayer.h
@@ +51,5 @@
>    }
>  
>    Layer* GetLayer() override { return this; }
>    void RenderLayer() override;
> +  void PaintThebes(nsTArray<ReadbackProcessor::Update>* aReadbackUpdates)

needs trailing semicolon
Attachment #8837376 - Flags: review?(bugmail) → review+
Attached patch dont-lose-end-frame (obsolete) — Splinter Review
Compositor::EndFrame is where we release locks for textures, overriding it and not calling the base class impl probably means we never get to re-use textures (and will waste 500ms trying).

Untested, but I think this should fix it.
Attachment #8837383 - Flags: review?(bugmail)
Comment on attachment 8837383 [details] [diff] [review]
dont-lose-end-frame

Review of attachment 8837383 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you think this is right, but I applied it locally and it didn't seem to fix the slowness.
Attachment #8837383 - Flags: review?(bugmail) → review+
The first patch by itself also didn't fix all the failures. I assume that on try it's also running slow and getting killed due to timeout.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=85ed34a3a079894fd625e25937d139e37e64b3f6
I guess we're not calling EndFrame on our fake Compositor instance?

We probably need to do that after WR has finished compositing, but I don't really know where the right place is.
I don't think we are even using that class anymore. Bug 1339323 is removing it.
I also reproduce the problem. I am going to look into read lock problem.
With attachment 8837376 [details] [diff] [review], the problem was addressed for me.
Attachment #8837439 - Flags: review?(matt.woodrow)
Comment on attachment 8837439 [details] [diff] [review]
patch - Enable double buffering on LAYERS_WR

Review of attachment 8837439 [details] [diff] [review]:
-----------------------------------------------------------------

Can you debug why it isn't working with single buffering?

We don't have a way to render to textures directly with WR afaik, so we must be uploading to another texture. That would give us 3 (or 4?) buffers if we're using ContentClientDoubleBuffered, which isn't ideal.
Attachment #8837439 - Flags: review?(matt.woodrow) → review-
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> 
> Can you debug why it isn't working with single buffering?

If webrender is enabled, rendering to webrender is not optimised. CompositableHost::GetAsSurface() is called to get rendering surface. Composite() is not called and Compositor is not used.
  > https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#293

When LayerManagerComposite with hw acceleration is used, BufferTextureHost::MaybeUpload() calls ReadUnlock() and unlock the TextureHost. MaybeUpload() is triggered by AutoLockCompositableHost. But that code path is not used if webrender is enabled. It needs Compositor, but Compositor is not used if webrender is enabled. 

There is another route to call ReadUnlock(). TextureHost::ReleaseCompositableRef() also triggers ReadUnlock(). attachment 8837439 [details] [diff] [review] uses it. Second buffer drops ref of first buffer in ContentHost.
attachment 8837486 [details] [diff] [review] also address the problem. It also uses TextureHost::ReleaseCompositableRef() to trigger ReadUnlock().
attachment 8837486 [details] [diff] [review] caused crash on local debug build. But failed to attach gdb...
Comment on attachment 8837486 [details] [diff] [review]
patch - Force clear Texture

Review of attachment 8837486 [details] [diff] [review]:
-----------------------------------------------------------------

So, I think what happens is:

Our TextureHosts have the UPLOAD_IMMEDIATE flag, so we try upload them to a texture during the transaction. We don't have a compositor though, so MaybeUpload() fails (does it print a warning?), and doesn't hit the ReadUnlock path.

Normally, once we've uploaded the changes into the appropriate TextureSource, we know we won't read from the TextureHost again, so it's safe to ReadUnlock the contents.

I assume the same probably holds true here, once we've processed TOpDPPushExternalImageId, we shouldn't read from the textures within the CompositableHost again. We're sort out bypassing the design of TextureHost/TextureSource/CompositableHost though.

Given that, I think this (or just calling ReadUnlock directly) should be safe, but it's a bit scary.

Lets see what nical thinks.
Attachment #8837486 - Flags: feedback?(nical.bugzilla)
Can't we just use a fake Compositor (or pull the functionality out into a shared abstract class TextureSourceProvider?) to allocate TextureSources and do the upload using the normal gecko way?

Then we can just give WR the texture ID, rather than the memory pointer?
Attached patch patch - Force clear Texture (obsolete) — Splinter Review
Fixed crash on debug build.
Attachment #8837486 - Attachment is obsolete: true
Attachment #8837486 - Flags: feedback?(nical.bugzilla)
Attachment #8837950 - Flags: review?(nical.bugzilla)
Attachment #8837950 - Flags: review?(matt.woodrow)
Comment on attachment 8837383 [details] [diff] [review]
dont-lose-end-frame

Obsoleting this patch as it doesn't help and the file has been removed in the graphics branch.

With the two other patches (Matt's first patch and Sotaro's follow-up) things seem fixed to me. I'd like to get these landed ASAP as this bug is blocking doing merges between graphics and m-c.
Attachment #8837383 - Attachment is obsolete: true
Try push is green.
Attachment #8837950 - Flags: review?(matt.woodrow)
Cleaner patch, it uses WebRenderImageHost for WebRenderPaintedLayer, instead of ContentHostSingleBuffered. We use ContentHostSingleBuffered same way as WebRenderImageHost when wr is enabled.

WebRenderImageHost is wr specific CompositableHost implementation. It is cleaner to add wr specific workaround.
Attachment #8837950 - Attachment is obsolete: true
Attachment #8837950 - Flags: review?(nical.bugzilla)
Attachment #8838361 - Flags: review?(nical.bugzilla)
Attachment #8838361 - Flags: review?(nical.bugzilla) → review+
Attachment #8837439 - Attachment is obsolete: true
(In reply to Pulsebot from comment #32)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/264c1c5ac2c5
> Force clear Texture r=kats

This should be r=nical
https://hg.mozilla.org/mozilla-central/rev/c5761d8f5d23
https://hg.mozilla.org/mozilla-central/rev/264c1c5ac2c5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
(In reply to Wes Kocher (:KWierso) from comment #34)
> https://hg.mozilla.org/mozilla-central/rev/c5761d8f5d23
> https://hg.mozilla.org/mozilla-central/rev/264c1c5ac2c5

It seems to break chrome rendering on graphic branch but not on m-c.
Depends on: 1340950
Whiteboard: [gfx-noted] → [gfx-noted][stockwell fixed]
You need to log in before you can comment on or make changes to this bug.