Closed
Bug 1128934
Opened 9 years ago
Closed 9 years ago
Serious tsvgr_opacity regression with OMTC enabled on Linux
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: nical, Assigned: nical)
References
Details
Attachments
(2 files)
5.94 KB,
patch
|
karlt
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
1.33 KB,
patch
|
karlt
:
review+
|
Details | Diff | Splinter Review |
Assignee | ||
Comment 1•9 years ago
|
||
We are spending a massive amount of time in BasicCompositor::BeginFrame. Investigating.
Assignee | ||
Comment 2•9 years ago
|
||
All of the time is spent in gdk_property_get which is called by nsWindow::GetClientOffset. gdk_property_get is apparently synchronously waiting for a response from the x server.
Depends on: 1097114
Assignee | ||
Comment 3•9 years ago
|
||
As explained earlier, in tsvgr_opacity we spend 80% of the time waiting for the x server in GetClientOffset, but we don't even use the offset (we only care about the size).
Attachment #8561489 -
Flags: review?(karlt)
Comment 4•9 years ago
|
||
Comment on attachment 8561489 [details] [diff] [review] Don't call nsWindow::GetClientOffset from the basic compositor on Linux sr request for the nsIWidget change including the gfx size type. >+#include "mozilla/gfx/Point.h" >+ virtual mozilla::gfx::IntSize GetClientSize() { nsIntSize would be more consistent with the rest of nsIWidget. > // The result of GetClientBounds is shifted over by the size of the window > // manager styling. We want to ignore that. > intRect.MoveTo(0, 0);
Attachment #8561489 -
Flags: superreview?(roc)
Attachment #8561489 -
Flags: review?(karlt)
Attachment #8561489 -
Flags: review+
Comment 5•9 years ago
|
||
> // The result of GetClientBounds is shifted over by the size of the window
> // manager styling. We want to ignore that.
> intRect.MoveTo(0, 0);
This can be removed now.
Attachment #8561489 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 6•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d81339c7ac52
Comment 7•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d81339c7ac52
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox38:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment 8•9 years ago
|
||
backed this out for causing bug 1133484
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•9 years ago
|
status-firefox38:
fixed → ---
This also caused a pretty serious scrolling issue for me with e10s in Linux. The STR is kind of as follows: 1. Visit a page where you can scroll a lot. Example: https://developer.mozilla.org/en-US/Firefox_OS/Installing_on_a_mobile_device 2. Scroll using the arrow keys. Expected: Scrolling should start immediately when the key is pressed and stop when the key is released. Actual: Scrolling starts a little bit after the key is pressed. When the key is released, scrolling stops, but then the page jumps a little bit afterwards. It seems like maybe we're drawing old frames or something?
(Found this bug using mozregression.)
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #8) > backed this out for causing bug 1133484 This should not have caused bug 1133484, although the other bug that was backed out in the process may have. A try push to be on the safe side https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e96e5181b55 (In reply to Bill McCloskey (:billm) from comment #9) > This also caused a pretty serious scrolling issue for me with e10s in Linux. > Expected: Scrolling should start immediately when the key is pressed and > stop when the key is released. > Actual: Scrolling starts a little bit after the key is pressed. When the key > is released, scrolling stops, but then the page jumps a little bit > afterwards. Without this patch, there is a synchronous wait on the x server every time we composite. With the patch, the synchronous wait is removed, but there may be other places where we do a synchronous wait on the x server, but not every frames, which could introduce long frames (when the wait occurs) which makes the it look like the scrolling pauses and then jump. The two options would be either keep the sync wait every frame to have a terrible but even scrolling (this means accepting a ~55% regression on tsvr_opacity when enabling omtc/e10s), or remove all of the sync waits that may happen while scrolling (I don't know where these waits are and how feasible that is) On a debug build I can't see the difference between with and without the patch when following the str from comment 9, but it could be that the compositor thread is just slow enough in debug builds that it doesn't have to wait for too long on the x server's thread, or that it's driver-specific (I hope not).
Assignee | ||
Comment 12•9 years ago
|
||
I just tested with a release build and I can see the janks from comment 9. Installing the gecko profiler makes the jank go away entirely, and it comes back when I remove the profiler addon...
Assignee | ||
Comment 13•9 years ago
|
||
Running with the perf profiler also makes the jank go away >_<
Assignee | ||
Comment 14•9 years ago
|
||
Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is enabled without e10s)
Assignee | ||
Comment 15•9 years ago
|
||
Karl, do you know about other outstanding synchronous gdk/x11 round trips that we might be using (maybe related to input events)? I haven't found any documentation about what is synchronous and what is not in x11.
Flags: needinfo?(karlt)
Comment 16•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #15) > Karl, do you know about other outstanding synchronous gdk/x11 round trips > that we might be using (maybe related to input events)? I haven't found any > documentation about what is synchronous and what is not in x11. The precise answer is anything that calls _XReply, which is a function internal to Xlib. If you can work out how to make a gdb breakpoint conditional on the thread, then instances on the compositor thread can be detected. Functions with names like XGet* are candidates, but some, but mostly unchanging, data is cached in Xlib, and so there is no round trip. From a quick look at _XReply, it calls UnlockDisplay() and so seems to allow other threads to continue to send to the server while waiting in the calling thread, which implies that only the compositor thread would be a problem here. However ... (In reply to Bill McCloskey (:billm) from comment #9) > It seems like maybe we're drawing old frames or something? (In reply to Nicolas Silva [:nical] from comment #14) > Interestingly, this jank is specific to e10s (doesn't reproduce when OMTC is > enabled without e10s) ... I wonder whether there is a synchronization failure between the content process and the compositor thread, which would use different connections to the X server. When a pixmap is shared between content and browser processes, we need to know that the server has completed the drawing requested by the content process before the compositor makes the server use the Pixmap in the compositing. The way that was achieved was with an XSync() round trip in the content process before sending a message to the compositor. (There are other options, but that is the simple method.) Perhaps the compositor was slow enough before this change that the problem wasn't noticed. Failure here would only generate these symptoms I think if buffer rotation were done on the content process. Is buffer rotation still a thing? Where does that happen? We also need to know that the server has completed the compositor's requests before the content process reuses the Pixmap (if that happens). Perhaps removing the round trip in the browser process here changed that, but I would have thought the consequences would have been quite different to what is described here.
Flags: needinfo?(karlt)
Comment 17•9 years ago
|
||
Perhaps a more likely explanation of the symptoms: A round trip forces the compositor to flush its drawing operations output queue, sending to the server. If there are no round trips (because one has been removed), then the buffer is flushed when full or when XPending is called. Events are handled on the main thread, so XPending is called on the main thread. If there are user input events being processed, then XPending is likely to be called multiple times, each time flushing the output queue, which includes requests from the compositor process. If there are no pending events then XPending flushes once and waits. Subsequent requests will not be automatically flushed. I think we need a call to XFlush() when the compositor completes each update (on the compositor thread should be fine). However, I don't know why that would only be necessary with e10s. Perhaps because there is no drawing happening in the browser main thread. This should be easy to try out, at least.
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #17) > I think we need a call to XFlush() when the compositor completes each > update (on the compositor thread should be fine). However, I don't know why > that would only be necessary with e10s. Perhaps because there is no drawing > happening in the browser main thread. > > This should be easy to try out, at least. Awesome, it worked. Thanks!
Assignee | ||
Comment 19•9 years ago
|
||
Attachment #8574016 -
Flags: review?(karlt)
Updated•9 years ago
|
Attachment #8574016 -
Flags: review?(karlt) → review+
Assignee | ||
Comment 20•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/395e21a559e2 https://hg.mozilla.org/integration/mozilla-inbound/rev/ec217b5c0645
Comment 21•9 years ago
|
||
Comment on attachment 8561489 [details] [diff] [review] Don't call nsWindow::GetClientOffset from the basic compositor on Linux >+++ b/widget/gtk/nsWindow.h > NS_IMETHOD GetClientBounds(nsIntRect &aRect) MOZ_OVERRIDE; >+ virtual gfx::IntSize GetClientSize() MOZ_OVERRIDE; This needed a "mozilla::" prefix (and caused build bustage on my system because of the lack of such a prefix). Fixed w/ rs=nical granted over IRC: https://hg.mozilla.org/integration/mozilla-inbound/rev/6f9a0364c884
https://hg.mozilla.org/mozilla-central/rev/395e21a559e2 https://hg.mozilla.org/mozilla-central/rev/ec217b5c0645 https://hg.mozilla.org/mozilla-central/rev/6f9a0364c884
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: mozilla38 → mozilla39
You need to log in
before you can comment on or make changes to this bug.
Description
•