Closed Bug 1128934 Opened 11 years ago Closed 10 years ago

Serious tsvgr_opacity regression with OMTC enabled on Linux

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

firefox39

---

fixed

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(2 files)

Don't call nsWindow::GetClientOffset from the basic compositor on Linux 10 years ago Nicolas Silva [:nical] 5.94 KB, patch	karlt : review+ roc : superreview+	Details \| Diff \| Splinter Review
xflush at the end of each composition to avoid an accumulation of commands which can cause janks 10 years ago Nicolas Silva [:nical] 1.33 KB, patch	karlt : review+	Details \| Diff \| Splinter Review

Nicolas Silva [:nical]

Assignee

Description

•

11 years ago

http://perf.snarkfest.net/compare-talos/?oldRevs=028c6dfcff1d&newRev=13e16e590b1a&submit=true

Nicolas Silva [:nical]

Assignee

Comment 1

•

11 years ago

We are spending a massive amount of time in BasicCompositor::BeginFrame. Investigating.

Nicolas Silva [:nical]

Assignee

Comment 2

•

11 years ago

All of the time is spent in gdk_property_get which is called by nsWindow::GetClientOffset. gdk_property_get is apparently synchronously waiting for a response from the x server.

Depends on: 1097114

Nicolas Silva [:nical]

Assignee

Comment 3

•

10 years ago

Attached patch Don't call nsWindow::GetClientOffset from the basic compositor on Linux — Details — Splinter Review

As explained earlier, in tsvgr_opacity we spend 80% of the time waiting for the x server in GetClientOffset, but we don't even use the offset (we only care about the size).

Attachment #8561489 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Comment 4

•

10 years ago

Comment on attachment 8561489 [details] [diff] [review] Don't call nsWindow::GetClientOffset from the basic compositor on Linux sr request for the nsIWidget change including the gfx size type. >+#include "mozilla/gfx/Point.h" >+ virtual mozilla::gfx::IntSize GetClientSize() { nsIntSize would be more consistent with the rest of nsIWidget. > // The result of GetClientBounds is shifted over by the size of the window > // manager styling. We want to ignore that. > intRect.MoveTo(0, 0);

Attachment #8561489 - Flags: superreview?(roc)

Attachment #8561489 - Flags: review?(karlt)

Attachment #8561489 - Flags: review+

Karl Tomlinson (:karlt)

Comment 5

•

10 years ago

> // The result of GetClientBounds is shifted over by the size of the window > // manager styling. We want to ignore that. > intRect.MoveTo(0, 0); This can be removed now.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

10 years ago

Attachment #8561489 - Flags: superreview?(roc) → superreview+

Nicolas Silva [:nical]

Assignee

Comment 6

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d81339c7ac52

Carsten Book [:Tomcat]

Comment 7

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/d81339c7ac52

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox38: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

Carsten Book [:Tomcat]

Comment 8

•

10 years ago

backed this out for causing bug 1133484

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Carsten Book [:Tomcat]

Updated

•

10 years ago

status-firefox38: fixed → ---

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 9

•

10 years ago

This also caused a pretty serious scrolling issue for me with e10s in Linux. The STR is kind of as follows: 1. Visit a page where you can scroll a lot. Example: https://developer.mozilla.org/en-US/Firefox_OS/Installing_on_a_mobile_device 2. Scroll using the arrow keys. Expected: Scrolling should start immediately when the key is pressed and stop when the key is released. Actual: Scrolling starts a little bit after the key is pressed. When the key is released, scrolling stops, but then the page jumps a little bit afterwards. It seems like maybe we're drawing old frames or something?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

10 years ago

(Found this bug using mozregression.)

Nicolas Silva [:nical]

Assignee

Comment 11

•

10 years ago

(In reply to Carsten Book [:Tomcat] from comment #8) > backed this out for causing bug 1133484 This should not have caused bug 1133484, although the other bug that was backed out in the process may have. A try push to be on the safe side https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e96e5181b55 (In reply to Bill McCloskey (:billm) from comment #9) > This also caused a pretty serious scrolling issue for me with e10s in Linux. > Expected: Scrolling should start immediately when the key is pressed and > stop when the key is released. > Actual: Scrolling starts a little bit after the key is pressed. When the key > is released, scrolling stops, but then the page jumps a little bit > afterwards. Without this patch, there is a synchronous wait on the x server every time we composite. With the patch, the synchronous wait is removed, but there may be other places where we do a synchronous wait on the x server, but not every frames, which could introduce long frames (when the wait occurs) which makes the it look like the scrolling pauses and then jump. The two options would be either keep the sync wait every frame to have a terrible but even scrolling (this means accepting a ~55% regression on tsvr_opacity when enabling omtc/e10s), or remove all of the sync waits that may happen while scrolling (I don't know where these waits are and how feasible that is) On a debug build I can't see the difference between with and without the patch when following the str from comment 9, but it could be that the compositor thread is just slow enough in debug builds that it doesn't have to wait for too long on the x server's thread, or that it's driver-specific (I hope not).

Nicolas Silva [:nical]

Assignee

Comment 12

•

10 years ago

I just tested with a release build and I can see the janks from comment 9. Installing the gecko profiler makes the jank go away entirely, and it comes back when I remove the profiler addon...

Nicolas Silva [:nical]

Assignee

Comment 13

•

10 years ago

Running with the perf profiler also makes the jank go away >_<

Nicolas Silva [:nical]

Assignee

Comment 14

•

10 years ago

Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is enabled without e10s)

Nicolas Silva [:nical]

Assignee

Comment 15

•

10 years ago

Karl, do you know about other outstanding synchronous gdk/x11 round trips that we might be using (maybe related to input events)? I haven't found any documentation about what is synchronous and what is not in x11.

Flags: needinfo?(karlt)

Karl Tomlinson (:karlt)

Comment 16

•

10 years ago

(In reply to Nicolas Silva [:nical] from comment #15) > Karl, do you know about other outstanding synchronous gdk/x11 round trips > that we might be using (maybe related to input events)? I haven't found any > documentation about what is synchronous and what is not in x11. The precise answer is anything that calls _XReply, which is a function internal to Xlib. If you can work out how to make a gdb breakpoint conditional on the thread, then instances on the compositor thread can be detected. Functions with names like XGet* are candidates, but some, but mostly unchanging, data is cached in Xlib, and so there is no round trip. From a quick look at _XReply, it calls UnlockDisplay() and so seems to allow other threads to continue to send to the server while waiting in the calling thread, which implies that only the compositor thread would be a problem here. However ... (In reply to Bill McCloskey (:billm) from comment #9) > It seems like maybe we're drawing old frames or something? (In reply to Nicolas Silva [:nical] from comment #14) > Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is > enabled without e10s) ... I wonder whether there is a synchronization failure between the content process and the compositor thread, which would use different connections to the X server. When a pixmap is shared between content and browser processes, we need to know that the server has completed the drawing requested by the content process before the compositor makes the server use the Pixmap in the compositing. The way that was achieved was with an XSync() round trip in the content process before sending a message to the compositor. (There are other options, but that is the simple method.) Perhaps the compositor was slow enough before this change that the problem wasn't noticed. Failure here would only generate these symptoms I think if buffer rotation were done on the content process. Is buffer rotation still a thing? Where does that happen? We also need to know that the server has completed the compositor's requests before the content process reuses the Pixmap (if that happens). Perhaps removing the round trip in the browser process here changed that, but I would have thought the consequences would have been quite different to what is described here.

Flags: needinfo?(karlt)

Karl Tomlinson (:karlt)

Comment 17

•

10 years ago

Perhaps a more likely explanation of the symptoms: A round trip forces the compositor to flush its drawing operations output queue, sending to the server. If there are no round trips (because one has been removed), then the buffer is flushed when full or when XPending is called. Events are handled on the main thread, so XPending is called on the main thread. If there are user input events being processed, then XPending is likely to be called multiple times, each time flushing the output queue, which includes requests from the compositor process. If there are no pending events then XPending flushes once and waits. Subsequent requests will not be automatically flushed. I think we need a call to XFlush() when the compositor completes each update (on the compositor thread should be fine). However, I don't know why that would only be necessary with e10s. Perhaps because there is no drawing happening in the browser main thread. This should be easy to try out, at least.

Nicolas Silva [:nical]

Assignee

Comment 18

•

10 years ago

(In reply to Karl Tomlinson (:karlt) from comment #17) > I think we need a call to XFlush() when the compositor completes each > update (on the compositor thread should be fine). However, I don't know why > that would only be necessary with e10s. Perhaps because there is no drawing > happening in the browser main thread. > > This should be easy to try out, at least. Awesome, it worked. Thanks!

Nicolas Silva [:nical]

Assignee

Comment 19

•

10 years ago

Attached patch xflush at the end of each composition to avoid an accumulation of commands which can cause janks — Details — Splinter Review

Attachment #8574016 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Updated

•

10 years ago

Attachment #8574016 - Flags: review?(karlt) → review+

Nicolas Silva [:nical]

Assignee

Comment 20

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/395e21a559e2 https://hg.mozilla.org/integration/mozilla-inbound/rev/ec217b5c0645

Daniel Holbert [:dholbert]

Comment 21

•

10 years ago

Comment on attachment 8561489 [details] [diff] [review] Don't call nsWindow::GetClientOffset from the basic compositor on Linux >+++ b/widget/gtk/nsWindow.h > NS_IMETHOD GetClientBounds(nsIntRect &aRect) MOZ_OVERRIDE; >+ virtual gfx::IntSize GetClientSize() MOZ_OVERRIDE; This needed a "mozilla::" prefix (and caused build bustage on my system because of the lack of such a prefix). Fixed w/ rs=nical granted over IRC: https://hg.mozilla.org/integration/mozilla-inbound/rev/6f9a0364c884

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 22

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/395e21a559e2 https://hg.mozilla.org/mozilla-central/rev/ec217b5c0645 https://hg.mozilla.org/mozilla-central/rev/6f9a0364c884

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: mozilla38 → mozilla39

Karl Tomlinson (:karlt)

Updated

•

10 years ago

Depends on: 1133675

You need to log in before you can comment on or make changes to this bug.