Closed Bug 1364626 Opened 3 years ago Closed 3 years ago

Reftesting with WebRender happens out of order

Categories

(Core :: Graphics: WebRender, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

Details

Attachments

(2 files, 1 obsolete file)

I have an RR recording of this happening.

Here's the basic timeline:
Child:SynchronizeForSnapshot()
 - UpdateLayerTree at 79222
 - Record at 79299-79442
 - SendAddBlobImage 79442
 - SendDPEnd 79442
Child:sendSyncMessage("reftest:InitCanvasWithSnapshot") at 79442-81679

Chrome:RecvInitCanvasWithSnapshot
        - SendDPSyncEnd at 79504
Chrome:HandleDPEnd at 79538
Chrome:set_display_list(4) at 79546/55504007
Chrome:DrawWindowish 79567
Chrome:Readback at 79603 - 81266

There doesn't seem to be any mechanism to ensure that the messages sent at 79442 (SendAddBlobImage and SendDPEnd) are received by WebRender thread before the SendDPSyncEnd that's sent from Chrome.
Blocks: 1362115
It's also not clear to me how the synchronization is supposed to work with Client/Composite layers. Perhaps Matt knows?
Flags: needinfo?(matt.woodrow)
Don't we have guaranteed ordering already?

We send a message from client main thread -> compositor thread (on some process), followed by a message client main thread -> chrome main thread -> compositor thread.

It seems like the first message should be in the compositor threads queue (or the IO threads queue) by the time we send the second message to chrome, and then it must be delivered first.
Flags: needinfo?(matt.woodrow)
Do you know what mechanism is used to provide that guarantee? Here's the picture of things as I understand it:

Child Process IO thread:
  Pipe to Compositor
  Pipe to Chrome Main thread:

Chrome Process IO thread:
  Pipe for Chrome Main thread
  Pipe for Compositor

There are writes to both "Pipe to Compositor" and "Pipe to Chrome Main thread" and then a poll() in the Chrome Process and then a dispatch to appropriate thread. I don't see why writes on one pipe should ordered with respect to writes on another pipe.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(dvander)
There's no guaranteed ordering between channels. I don't think Client/Composite layers relies on an ordering like this. Each PLayerTransaction can submit updates as needed and if they arrive in between a vsync we may composite either content or chrome one frame off from the other.
Flags: needinfo?(dvander)
There are also rare cases where we do need such an ordering, for example, the PLayerTransactionConstructor message must arrive before the NotifyChildCreated message. In this case we make the NotifyChildCreated message synchronous.
IPDL doesn't provide any ordering guarantees between different top-level protocols. In some restricted cases (such as cross-thread communication) it happens that messages are ordered as you might expect. But for cross-process communication, I think it's actually non-deterministic.
Flags: needinfo?(wmccloskey)
So does anyone have an idea of what mechanism is used to ensure the PLayerTransaction from the child arrives before the sync paint/readback from the parent?
Maybe there isn't one. You can try adding a sleep on the one code path and run a non-WR reftest to see what happens.
I actually have no idea then!

The comments for updateLayerTree claim that it is sync, but I can't see anything that actually implements that.

If this is racy, why would WebRender be hitting it when we manage to reftest (on all platforms) with LayerManagerComposite?
I was able to induce reftest failures in regular Client/CompositeLayers on Linux with the attached, somewhat convoluted patch. So it seems like we're mostly getting lucky.

Normally, the messages will be somewhat ordered by submission to the IO thread. However, they can end up reordered if the sendmsg would block and we postpone sending them. BlobRendering currently has large messages so I suspect they are more likely to get postponed and the sync message to the parent gets ahead.

I suspect a decent solution to this is to actually make updateLayerTree synchronous.
Assignee: nobody → jmuizelaar
This makes UpdateLayerTree synchronous enough to ensure that the layer transaction from the child reaches the compositor. Given the comment in http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/dom/interfaces/base/nsIDOMWindowUtils.idl#106 this seems to be the original intent of this function anyways. Without this, we can have a race between the child talking to the compositor and the child talking to the parent talking to the compositor.
Attachment #8867858 - Flags: review?(dvander)
Fixes the previous patch when we didn't have a CompositorBridge on the widget.
Attachment #8867858 - Attachment is obsolete: true
Attachment #8867858 - Flags: review?(dvander)
Attachment #8868206 - Flags: review?(dvander)
Attachment #8868206 - Flags: review?(dvander) → review+
Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/projects/graphics/rev/33adaa3e8031
Ensure our transaction arrives at the compositor before asking the parent to paint. r=dvander
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.