Closed Bug 1128934 Opened 9 years ago Closed 9 years ago

Serious tsvgr_opacity regression with OMTC enabled on Linux

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(2 files)

We are spending a massive amount of time in BasicCompositor::BeginFrame. Investigating.
All of the time is spent in gdk_property_get which is called by nsWindow::GetClientOffset. gdk_property_get is apparently synchronously waiting for a response from the x server.
Depends on: 1097114
As explained earlier, in tsvgr_opacity we spend 80% of the time waiting for the x server in GetClientOffset, but we don't even use the offset (we only care about the size).
Attachment #8561489 - Flags: review?(karlt)
Comment on attachment 8561489 [details] [diff] [review]
Don't call nsWindow::GetClientOffset from the basic compositor on Linux

sr request for the nsIWidget change including the gfx size type.

>+#include "mozilla/gfx/Point.h"

>+    virtual mozilla::gfx::IntSize GetClientSize() {

nsIntSize would be more consistent with the rest of nsIWidget.

>   // The result of GetClientBounds is shifted over by the size of the window
>   // manager styling. We want to ignore that.
>   intRect.MoveTo(0, 0);
Attachment #8561489 - Flags: superreview?(roc)
Attachment #8561489 - Flags: review?(karlt)
Attachment #8561489 - Flags: review+
>   // The result of GetClientBounds is shifted over by the size of the window
>   // manager styling. We want to ignore that.
>   intRect.MoveTo(0, 0);

This can be removed now.
Attachment #8561489 - Flags: superreview?(roc) → superreview+
https://hg.mozilla.org/mozilla-central/rev/d81339c7ac52
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
backed this out for causing bug 1133484
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This also caused a pretty serious scrolling issue for me with e10s in Linux. The STR is kind of as follows:
1. Visit a page where you can scroll a lot. Example: https://developer.mozilla.org/en-US/Firefox_OS/Installing_on_a_mobile_device
2. Scroll using the arrow keys.

Expected: Scrolling should start immediately when the key is pressed and stop when the key is released.
Actual: Scrolling starts a little bit after the key is pressed. When the key is released, scrolling stops, but then the page jumps a little bit afterwards.

It seems like maybe we're drawing old frames or something?
(Found this bug using mozregression.)
(In reply to Carsten Book [:Tomcat] from comment #8)
> backed this out for causing bug 1133484

This should not have caused bug 1133484, although the other bug that was backed out in the process may have. A try push to be on the safe side https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e96e5181b55

(In reply to Bill McCloskey (:billm) from comment #9)
> This also caused a pretty serious scrolling issue for me with e10s in Linux.
> Expected: Scrolling should start immediately when the key is pressed and
> stop when the key is released.
> Actual: Scrolling starts a little bit after the key is pressed. When the key
> is released, scrolling stops, but then the page jumps a little bit
> afterwards.

Without this patch, there is a synchronous wait on the x server every time we composite. With the patch, the synchronous wait is removed, but there may be other places where we do a synchronous wait on the x server, but not every frames, which could introduce long frames (when the wait occurs) which makes the it look like the scrolling pauses and then jump.

The two options would be either keep the sync wait every frame to have a terrible but even scrolling (this means accepting a ~55% regression on tsvr_opacity when enabling omtc/e10s), or remove all of the sync waits that may happen while scrolling (I don't know where these waits are and how feasible that is)

On a debug build I can't see the difference between with and without the patch when following the str from comment 9, but it could be that the compositor thread is just slow enough in debug builds that it doesn't have to wait for too long on the x server's thread, or that it's driver-specific (I hope not).
I just tested with a release build and I can see the janks from comment 9. Installing the gecko profiler makes the jank go away entirely, and it comes back when I remove the profiler addon...
Running with the perf profiler also makes the jank go away >_<
Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is enabled without e10s)
Karl, do you know about other outstanding synchronous gdk/x11 round trips that we might be using (maybe related to input events)? I haven't found any documentation about what is synchronous and what is not in x11.
Flags: needinfo?(karlt)
(In reply to Nicolas Silva [:nical] from comment #15)
> Karl, do you know about other outstanding synchronous gdk/x11 round trips
> that we might be using (maybe related to input events)? I haven't found any
> documentation about what is synchronous and what is not in x11.

The precise answer is anything that calls _XReply, which is a function internal to Xlib.  If you can work out how to make a gdb breakpoint conditional on the thread, then instances on the compositor thread can be detected.

Functions with names like XGet* are candidates, but some, but mostly unchanging, data is cached in Xlib, and so there is no round trip.

From a quick look at _XReply, it calls UnlockDisplay() and so seems to allow other threads to continue to send to the server while waiting in the calling thread, which implies that only the compositor thread would be a problem here.

However ...

(In reply to Bill McCloskey (:billm) from comment #9)
> It seems like maybe we're drawing old frames or something?

(In reply to Nicolas Silva [:nical] from comment #14)
> Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is
> enabled without e10s)

... I wonder whether there is a synchronization failure between the content
process and the compositor thread, which would use different connections to
the X server.

When a pixmap is shared between content and browser processes, we need to know
that the server has completed the drawing requested by the content process
before the compositor makes the server use the Pixmap in the compositing.
The way that was achieved was with an XSync() round trip in the content
process before sending a message to the compositor.  (There are other options,
but that is the simple method.)  Perhaps the compositor was slow enough before
this change that the problem wasn't noticed.  Failure here would only generate
these symptoms I think if buffer rotation were done on the content process.
Is buffer rotation still a thing?  Where does that happen?

We also need to know that the server has completed the compositor's requests
before the content process reuses the Pixmap (if that happens).  Perhaps
removing the round trip in the browser process here changed that, but I would
have thought the consequences would have been quite different to what is
described here.
Flags: needinfo?(karlt)
Perhaps a more likely explanation of the symptoms:

A round trip forces the compositor to flush its drawing operations output
queue, sending to the server.

If there are no round trips (because one has been removed), then the buffer is
flushed when full or when XPending is called.

Events are handled on the main thread, so XPending is called on the main
thread.  If there are user input events being processed, then XPending is
likely to be called multiple times, each time flushing the output queue, which
includes requests from the compositor process.  If there are no pending events
then XPending flushes once and waits.  Subsequent requests will not be
automatically flushed.

I think we need a call to XFlush() when the compositor completes each
update (on the compositor thread should be fine).  However, I don't know why that would only be necessary with e10s.  Perhaps because there is no drawing happening in the browser main thread.

This should be easy to try out, at least.
(In reply to Karl Tomlinson (:karlt) from comment #17)
> I think we need a call to XFlush() when the compositor completes each
> update (on the compositor thread should be fine).  However, I don't know why
> that would only be necessary with e10s.  Perhaps because there is no drawing
> happening in the browser main thread.
> 
> This should be easy to try out, at least.

Awesome, it worked. Thanks!
Attachment #8574016 - Flags: review?(karlt) → review+
Comment on attachment 8561489 [details] [diff] [review]
Don't call nsWindow::GetClientOffset from the basic compositor on Linux

>+++ b/widget/gtk/nsWindow.h
>     NS_IMETHOD         GetClientBounds(nsIntRect &aRect) MOZ_OVERRIDE;
>+    virtual gfx::IntSize GetClientSize() MOZ_OVERRIDE;

This needed a "mozilla::" prefix (and caused build bustage on my system because of the lack of such a prefix).

Fixed w/ rs=nical granted over IRC:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/6f9a0364c884
Depends on: 1133675
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: