Closed
Bug 1339625
Opened 8 years ago
Closed 8 years ago
All the WR tests are orange on m-c
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: kats, Assigned: mattwoodrow)
References
Details
(Whiteboard: [gfx-noted][stockwell fixed])
Attachments
(3 files, 4 obsolete files)
8.07 KB,
text/plain
|
Details | |
3.26 KB,
patch
|
kats
:
review+
|
Details | Diff | Splinter Review |
4.01 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
Something in [1] caused all the QR tests to go orange on m-c. See [2] for the before/after. We should hold off on doing m-c -> graphics merges until we figure this out.
[1] https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=1060668405a9399774c205430de4a7001d3f27ac
[2] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=quantum&fromchange=c5a25b056d1e66fcaf67d680abea5a80d5944092&tochange=1060668405a9399774c205430de4a7001d3f27ac
Reporter | ||
Comment 2•8 years ago
|
||
Luckily we should be able to bisect this on inbound by using the magical "Add jobs" and adding some QR jobs to inbound pushes. I'm doing that now.
Reporter | ||
Comment 3•8 years ago
|
||
Hm, the add jobs might not be working :(
Reporter | ||
Comment 4•8 years ago
|
||
Try push on mattwoodrow's push shows the same orange: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a593780730b22d733deab2bdbcfcda6e688756e
So that narrows it down to https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e244431cfea1&tochange=a19ac5da6082
Reporter | ||
Comment 5•8 years ago
|
||
I kicked off another try push using cset bcda57aa0c3b just before matt's patches to confirm [1]. But it's almost certainly a result of Matt's patches. Matt, do you have time to investigate? To build locally you need to add `ac_add_options --enable-webrender` to your mozconfig. It should run with webrender enabled by default, and you can flip it off using gfx.webrender.enabled=false (startup pref).
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=07325befcd427474f01b4e0ae5e8e3294cb18842
Flags: needinfo?(matt.woodrow)
Reporter | ||
Comment 6•8 years ago
|
||
I can repro locally on Linux. Seems to be some sort of deadlock, I think. I have to head out but here's a stacktrace where the code is stuck.
Assignee | ||
Comment 7•8 years ago
|
||
WebRenderPaintedLayer was calling Updated on the content client while it was still between Begin/EndPaint, so the textures were already locked when we tried to take the lock on behalf of the compositor.
ClientPaintedLayer had the same issue (fixed in [1]), so I guess WebRenderPaintedLayer inherited the bug when it was copied over.
[1] https://hg.mozilla.org/integration/mozilla-inbound/rev/8255c2bb705a
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Attachment #8837376 -
Flags: review?(bugmail)
Reporter | ||
Comment 8•8 years ago
|
||
Comment on attachment 8837376 [details] [diff] [review]
unlock-textures-before-calling-updated
Review of attachment 8837376 [details] [diff] [review]:
-----------------------------------------------------------------
Seems to fix it for me locally. Painting seems *super* slow though, not sure if that's from your patches or something else.
::: gfx/layers/wr/WebRenderPaintedLayer.h
@@ +51,5 @@
> }
>
> Layer* GetLayer() override { return this; }
> void RenderLayer() override;
> + void PaintThebes(nsTArray<ReadbackProcessor::Update>* aReadbackUpdates)
needs trailing semicolon
Attachment #8837376 -
Flags: review?(bugmail) → review+
Assignee | ||
Comment 9•8 years ago
|
||
Compositor::EndFrame is where we release locks for textures, overriding it and not calling the base class impl probably means we never get to re-use textures (and will waste 500ms trying).
Untested, but I think this should fix it.
Attachment #8837383 -
Flags: review?(bugmail)
Reporter | ||
Comment 10•8 years ago
|
||
Comment on attachment 8837383 [details] [diff] [review]
dont-lose-end-frame
Review of attachment 8837383 [details] [diff] [review]:
-----------------------------------------------------------------
r+ if you think this is right, but I applied it locally and it didn't seem to fix the slowness.
Attachment #8837383 -
Flags: review?(bugmail) → review+
Reporter | ||
Comment 11•8 years ago
|
||
The first patch by itself also didn't fix all the failures. I assume that on try it's also running slow and getting killed due to timeout.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85ed34a3a079894fd625e25937d139e37e64b3f6
Reporter | ||
Comment 12•8 years ago
|
||
I started another try push with both patches applied, just in case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=182aee71381a41963886f21aafbec46ef9b78fae
Assignee | ||
Comment 13•8 years ago
|
||
I guess we're not calling EndFrame on our fake Compositor instance?
We probably need to do that after WR has finished compositing, but I don't really know where the right place is.
Reporter | ||
Comment 14•8 years ago
|
||
I don't think we are even using that class anymore. Bug 1339323 is removing it.
Comment 15•8 years ago
|
||
I also reproduce the problem. I am going to look into read lock problem.
Comment 16•8 years ago
|
||
With attachment 8837376 [details] [diff] [review], the problem was addressed for me.
Updated•8 years ago
|
Attachment #8837439 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 17•8 years ago
|
||
Comment on attachment 8837439 [details] [diff] [review]
patch - Enable double buffering on LAYERS_WR
Review of attachment 8837439 [details] [diff] [review]:
-----------------------------------------------------------------
Can you debug why it isn't working with single buffering?
We don't have a way to render to textures directly with WR afaik, so we must be uploading to another texture. That would give us 3 (or 4?) buffers if we're using ContentClientDoubleBuffered, which isn't ideal.
Attachment #8837439 -
Flags: review?(matt.woodrow) → review-
Comment 18•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
>
> Can you debug why it isn't working with single buffering?
If webrender is enabled, rendering to webrender is not optimised. CompositableHost::GetAsSurface() is called to get rendering surface. Composite() is not called and Compositor is not used.
> https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#293
When LayerManagerComposite with hw acceleration is used, BufferTextureHost::MaybeUpload() calls ReadUnlock() and unlock the TextureHost. MaybeUpload() is triggered by AutoLockCompositableHost. But that code path is not used if webrender is enabled. It needs Compositor, but Compositor is not used if webrender is enabled.
There is another route to call ReadUnlock(). TextureHost::ReleaseCompositableRef() also triggers ReadUnlock(). attachment 8837439 [details] [diff] [review] uses it. Second buffer drops ref of first buffer in ContentHost.
Comment 19•8 years ago
|
||
Comment 20•8 years ago
|
||
attachment 8837486 [details] [diff] [review] also address the problem. It also uses TextureHost::ReleaseCompositableRef() to trigger ReadUnlock().
Comment 21•8 years ago
|
||
attachment 8837486 [details] [diff] [review] caused crash on local debug build. But failed to attach gdb...
Assignee | ||
Comment 22•8 years ago
|
||
Comment on attachment 8837486 [details] [diff] [review]
patch - Force clear Texture
Review of attachment 8837486 [details] [diff] [review]:
-----------------------------------------------------------------
So, I think what happens is:
Our TextureHosts have the UPLOAD_IMMEDIATE flag, so we try upload them to a texture during the transaction. We don't have a compositor though, so MaybeUpload() fails (does it print a warning?), and doesn't hit the ReadUnlock path.
Normally, once we've uploaded the changes into the appropriate TextureSource, we know we won't read from the TextureHost again, so it's safe to ReadUnlock the contents.
I assume the same probably holds true here, once we've processed TOpDPPushExternalImageId, we shouldn't read from the textures within the CompositableHost again. We're sort out bypassing the design of TextureHost/TextureSource/CompositableHost though.
Given that, I think this (or just calling ReadUnlock directly) should be safe, but it's a bit scary.
Lets see what nical thinks.
Attachment #8837486 -
Flags: feedback?(nical.bugzilla)
Assignee | ||
Comment 23•8 years ago
|
||
Can't we just use a fake Compositor (or pull the functionality out into a shared abstract class TextureSourceProvider?) to allocate TextureSources and do the upload using the normal gecko way?
Then we can just give WR the texture ID, rather than the memory pointer?
Comment hidden (Intermittent Failures Robot) |
Comment 25•8 years ago
|
||
Fixed crash on debug build.
Attachment #8837486 -
Attachment is obsolete: true
Attachment #8837486 -
Flags: feedback?(nical.bugzilla)
Updated•8 years ago
|
Attachment #8837950 -
Flags: review?(nical.bugzilla)
Attachment #8837950 -
Flags: review?(matt.woodrow)
Reporter | ||
Comment 26•8 years ago
|
||
Comment on attachment 8837383 [details] [diff] [review]
dont-lose-end-frame
Obsoleting this patch as it doesn't help and the file has been removed in the graphics branch.
With the two other patches (Matt's first patch and Sotaro's follow-up) things seem fixed to me. I'd like to get these landed ASAP as this bug is blocking doing merges between graphics and m-c.
Attachment #8837383 -
Attachment is obsolete: true
Reporter | ||
Comment 27•8 years ago
|
||
Reporter | ||
Comment 28•8 years ago
|
||
Try push is green.
Assignee | ||
Updated•8 years ago
|
Attachment #8837950 -
Flags: review?(matt.woodrow)
Comment hidden (Intermittent Failures Robot) |
Comment 30•8 years ago
|
||
Cleaner patch, it uses WebRenderImageHost for WebRenderPaintedLayer, instead of ContentHostSingleBuffered. We use ContentHostSingleBuffered same way as WebRenderImageHost when wr is enabled.
WebRenderImageHost is wr specific CompositableHost implementation. It is cleaner to add wr specific workaround.
Attachment #8837950 -
Attachment is obsolete: true
Attachment #8837950 -
Flags: review?(nical.bugzilla)
Updated•8 years ago
|
Attachment #8838361 -
Flags: review?(nical.bugzilla)
Comment 31•8 years ago
|
||
Updated•8 years ago
|
Attachment #8838361 -
Flags: review?(nical.bugzilla) → review+
Updated•8 years ago
|
Attachment #8837439 -
Attachment is obsolete: true
Comment 32•8 years ago
|
||
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5761d8f5d23
Unlock textures before calling updated r=kats
https://hg.mozilla.org/integration/mozilla-inbound/rev/264c1c5ac2c5
Force clear Texture r=kats
Reporter | ||
Comment 33•8 years ago
|
||
(In reply to Pulsebot from comment #32)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/264c1c5ac2c5
> Force clear Texture r=kats
This should be r=nical
Comment 34•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c5761d8f5d23
https://hg.mozilla.org/mozilla-central/rev/264c1c5ac2c5
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 37•8 years ago
|
||
(In reply to Wes Kocher (:KWierso) from comment #34)
> https://hg.mozilla.org/mozilla-central/rev/c5761d8f5d23
> https://hg.mozilla.org/mozilla-central/rev/264c1c5ac2c5
It seems to break chrome rendering on graphic branch but not on m-c.
Updated•8 years ago
|
Whiteboard: [gfx-noted] → [gfx-noted][stockwell fixed]
You need to log in
before you can comment on or make changes to this bug.
Description
•