Closed Bug 1247935 Opened 4 years ago Closed 4 years ago

Tearing during composition in latest Nightly

Categories

(Core :: Graphics, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: marco, Assigned: lsalzman)

References

Details

(Keywords: perf, regression, Whiteboard: gfx-noted)

Attachments

(5 files, 1 obsolete file)

The scrolling experience is extremely painful. It's very noticeable, on any page.
mozregression points me to bug 1241832.
Blocks: 1241832
Indeed, setting `gfx.xrender.enabled` to true fixes this bug.
I think bug 753228 is related.
See Also: → 753228
Fallout from disabling xrender.
Flags: needinfo?(lsalzman)
Keywords: perf, regression
Whiteboard: gfx-noted
Flags: needinfo?(lsalzman)
This bug report seems somewhat subjective. I don't see anything that I wouldn't expect to see in profile.

Could you explain more about what the problem is and how to reproduce it?
Flags: needinfo?(mcastelluccio)
Lee, I guess what I'm seeing is similar to what BenWa was describing here: http://www.hackermusings.com/2012/05/firefoxs-graphics-performance-on-x11/#comment-30.

I don't know how to describe it. Firefox fails to keep up with the scrolling, it draws with a small delay, and this causes very noticeable glitches.
It happens on almost any page when I scroll.

Here's the Graphics section from my about:support page:
Adapter Description: Intel Open Source Technology Center -- Mesa DRI Intel(R) Haswell Mobile
Asynchronous Pan/Zoom: wheel input enabled
Device ID: Mesa DRI Intel(R) Haswell Mobile
Driver Version: 3.0 Mesa 11.0.2
GPU Accelerated Windows: 0/2 Basic (OMTC)
Supports Hardware H264 Decoding: No
Vendor ID: Intel Open Source Technology Center
WebGL Renderer: Intel Open Source Technology Center -- Mesa DRI Intel(R) Haswell Mobile
windowLayerManagerRemote: true
AzureCanvasBackend: cairo
AzureContentBackend: cairo
AzureFallbackCanvasBackend: none
AzureSkiaAccelerated: 0
CairoUseXRender: 0

I have taken a couple of screencasts, but I don't know if it's easy to see the problem from them. I'll attach them to the bug.
Flags: needinfo?(mcastelluccio)
The only real thing I noticed in the profile on a second look is that about 13% of the compositor time was spent doing redundant copies that XShm should have been taking care of already. I just wrote up a fix for that in bug 1019856. Though shaving 13% off our compositing time for almost free is nothing to be ashamed of. :)
See Also: → 1019856
Attached video VID_20160219_120323.mp4
Still seeing the problem with the build from the other bug.
This video is clearer, can you see the blur when I scroll up on second 2?

My machine should be pretty common, if you are in a Mozilla office. It's a ThinkPad W541.
About 25% of non-idle compositor time is spent waiting in XSync() from nsShmImage::Put().  That offers rooms for improvement, but I don't know whether it is the cause of the issue here.
This doesn't appear to be related to checkerboarding or APZ. From the videos it looks like a tearing issue during composition.
Summary: Checkerboarding regression in latest Nightly → Tearing during composition in latest Nightly
Attached image Screenshot from video
Here is a screenshot from the video that shows how part of the screen is updated and other part hasn't yet updated.
Here's a better one taken from the screencast, so hopefully no camera artifacts. There are two instances of tearing - one in the comment with the needinfo near the top of the screen, one just above the "Vendor ID" text in the bottom field.
Talking with #dri-devel people, it seems that XShmPutImage is not exactly atomic with respect to window scan-out. So scan-out can actually happen during this operation, and thus lead to the tearing that is evident in the screenshot above.

This will take some reworking of how we're doing that to fix.
This double buffers our usage of nsShmImage to get rid of the XSync. The clip rectangle usage and LastKnownRequestProcessed handling was based on some suggestions from #dri-devel and looking into how recent versions of Cairo deal with XShm synchronization.

Put now just does an XFlush instead of an XSync, and we do the actual waiting in CreateDrawTarget.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8724156 - Flags: review?(jmuizelaar)
This makes nsShmImage use shared pixmaps when possible so that eventually we can work towards using the Present extension, which requires pixmaps.

There is some new ickyness to deal with initialized the extension and detecting pixmap supports. Also we have to manually inject a ClientMessage into the event queue to make sure our polling terminates in a timely fashion for the XCopyArea path that pixmaps use.

Some other cleanups suggestions from #dri-devel too, mainly to get rid of our Xlibint.h usage, so overall we should be handling this a bit more cleanly than Cairo does.
Attachment #8724157 - Flags: review?(jmuizelaar)
Also of note, with this double-buffering in place, it eliminates the sync overhead from the profile. So we spend more of our time in the compositor profile just compositing now, with the actual pixmap -> window copy being entirely in parallel.
Comment on attachment 8724156 [details] [diff] [review]
part 1 - double-buffer nsShmImage

Review of attachment 8724156 [details] [diff] [review]:
-----------------------------------------------------------------

This looks really good except for my worry about hanging while waiting for our request.

::: widget/nsShmImage.cpp
@@ +212,5 @@
> +  if (!mRequest) {
> +    return;
> +  }
> +
> +  while (long(LastKnownRequestProcessed(mDisplay) - mRequest) < 0) {

It seems like we could be in a situation where we wrap before waiting for a request().

e.g.
- request = Present() (= 5)
- the browser is idle for a long time
- we call WaitForRequest() and LastKnownRequestProcessed() returns 1<<31
- we end up hanging for another 1<<31 events.

If you call LastKnownRequestProcessed before calling XNextRequest you can then have a waiting condition like:

while (long(LastKnownRequestProcessed(mDisplay) - mRequest) < 0 &&
       long(LastKnownRequestProcessed(mDisplay) - mOriginalLastKnown) > 0)
Attachment #8724156 - Flags: review?(jmuizelaar) → review+
Attachment #8724157 - Attachment is patch: true
Comment on attachment 8724157 [details] [diff] [review]
part 2 - use shared pixmaps with XShm for nsShmImage

Review of attachment 8724157 [details] [diff] [review]:
-----------------------------------------------------------------

It would be good to add some comments about the differences between using pixmaps and images for XShm

::: widget/nsShmImage.cpp
@@ +304,5 @@
>    }
>  
> +  if (long(LastKnownRequestProcessed(mDisplay) - mRequest) < 0) {
> +    XEvent event;
> +    XPeekIfEvent(mDisplay, &event, FindEvent, (XPointer)this);

What's the advantage of this XPeekIfEvent vs. the previous idiom?
Attachment #8724157 - Flags: review?(jmuizelaar) → review+
Marco, can you try running this try build here and see if the situation is improved at all?
http://archive.mozilla.org/pub/firefox/try-builds/lsalzman@mozilla.com-1d6f9675984270b56c6af4f3afa63eff69204ffd/try-linux64/

It has my shared pixmaps changes in it, and I'm wondering if they alone are sufficient to fix your apparent regression.
Flags: needinfo?(mcastelluccio)
Yes, as far as I can tell it fixes the problem for me (I don't have very good eyes, so it might still be happening from time to time in a much less noticeable way :D).
Flags: needinfo?(mcastelluccio)
Addressed the issue with LastKnownRequestProcessed possibly wrapping around.

Since the reporter's issue appears to be resolved with this usage of pixmaps and XCopyArea here, I am going to skip incorporation of the Present extension since that is a more onerous change to make and probably shoots past what this bug really called for.

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9538b4b0f33e
Attachment #8724157 - Attachment is obsolete: true
Attachment #8724509 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/f1ada5d7a092
https://hg.mozilla.org/mozilla-central/rev/f37626eba5c6
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
here are the talos differences from this change:
https://treeherder.allizom.org/perf.html#/alerts?id=310

we have good wins on tresize/tscroll/tp5o_scroll, but a small loss on tp5 xres.

:lsalzman, any concerns with the xres increase?  I am happy with the wins we have, just want to make sure we didn't do anything silly to increase xres during tp5o.
Flags: needinfo?(lsalzman)
(In reply to Joel Maher (:jmaher) from comment #26)
> here are the talos differences from this change:
> https://treeherder.allizom.org/perf.html#/alerts?id=310
> 
> we have good wins on tresize/tscroll/tp5o_scroll, but a small loss on tp5
> xres.
> 
> :lsalzman, any concerns with the xres increase?  I am happy with the wins we
> have, just want to make sure we didn't do anything silly to increase xres
> during tp5o.

Not concerned. Since this patch both fixes a visual regression and reduces jank in many cases.
Flags: needinfo?(lsalzman)
I'm still seeing tearing from time to time, but much less noticeable than before.
(In reply to Marco Castelluccio [:marco] from comment #28)
> I'm still seeing tearing from time to time, but much less noticeable than
> before.

I talked with the various people in #dri-devel and there will always be some unavoidable tearing. The only way around this is basically to use accelerated layers right now, as that's the only place we implement proper vsync. None of the APIs we were using before for basic X11 can really take care of it, and until DRI3 becomes a thing, the Present API is still wait-and-see.
It's happening really often now, could it be a regression from some other change?

Or maybe it depends on the contents of the page? Scrolling on http://www.ilfattoquotidiano.it/2016/03/06/primarie-usa-cruz-e-lanti-trump-la-louisiana-a-clinton-per-sanders-kansas-e-nebraska/2522644/ is particularly painful.
No longer blocks: 1263222
Depends on: 1321129
You need to log in before you can comment on or make changes to this bug.