Closed Bug 994541 Opened 10 years ago Closed 9 years ago

Enable BasicCompositor OMTC on linux

Categories

(Core :: Graphics: Layers, defect)

29 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 + fixed
relnote-firefox --- 40+

People

(Reporter: mattwoodrow, Assigned: nical)

References

Details

Attachments

(1 file, 2 obsolete files)

      No description provided.
Depends on: 993475, 994088
Depends on: 994546
Assignee: nobody → chrislord.net
Depends on: 994548
Depends on: 994554
Depends on: 994556
Depends on: 995216
Reassigning this to nical while I'm on PTO.
Assignee: chrislord.net → nical.bugzilla
Attached patch Enable BasicCompositor OMTC on Linux (obsolete) — — Splinter Review
Here's the patch to enable BasicCompositor on Linux... I'm not a huge fan of the way force-basic works, but there you go.
Attachment #8416454 - Flags: review?(bas)
So the big blocker for this now is really performance regressions. I imagine most are caused by a combination of extra copying and lack of XRender, but we need profiles to be sure.

Without OMTC: https://tbpl.mozilla.org/?tree=Try&rev=a639e58638b9
With OMTC: https://tbpl.mozilla.org/?tree=Try&rev=702930a03c51
Why would we not have XRender? We should be using BasicTextureHostX11.
I would absolutely take a temporary regression on Linux if Linux is the only thing stopping us from ripping out the MT composition path. Right now Windows is still being worked on, so we have a little bit time left.
Comment on attachment 8416454 [details] [diff] [review]
Enable BasicCompositor OMTC on Linux

Review of attachment 8416454 [details] [diff] [review]:
-----------------------------------------------------------------

Let's remove the force-basic stuff altogether. It serves no function at this point as far as I can tell.
Attachment #8416454 - Flags: review?(bas) → review-
Assignee: nical.bugzilla → chrislord.net
Depends on: 1011572
Attached patch Enable BasicCompositor OMTC on Linux (obsolete) — — Splinter Review
This gets rid of force-basic, and given that Windows is the only platform that doesn't have omtc/async-video turned on (for now :)), simplifies the prefs js file too.
Attachment #8416454 - Attachment is obsolete: true
Attachment #8423978 - Flags: review?(bas)
Blocks: 722012
OS: Mac OS X → Linux
Comment on attachment 8423978 [details] [diff] [review]
Enable BasicCompositor OMTC on Linux

>+// Off-main-thread compositing and asynchronous video compositing are enabled
>+// for all platforms except Windows.

Hasn't this just been turned on for Windows now?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #8)
> Comment on attachment 8423978 [details] [diff] [review]
> Enable BasicCompositor OMTC on Linux
> 
> >+// Off-main-thread compositing and asynchronous video compositing are enabled
> >+// for all platforms except Windows.
> 
> Hasn't this just been turned on for Windows now?

It's in the process of being turned on - I'll remove this comment (and the #ifndef) if necessary when I rebase this patch.
Status: NEW → ASSIGNED
Attachment #8423978 - Flags: review?(bas) → review+
Blocks: 759252
Depends on: 1015211
Once BasicOMTC is enabled for stable, make sure to relnote.
relnote-firefox: --- → ?
Keywords: feature
Is bug #1015262 a blocker?
(In reply to kxra from comment #11)
> Is bug #1015262 a blocker?

No, this is blocked on bug 1012639, and then dealing with the new test failures on mochitest 2 that have come up since this work was done.
Depends on: 1022852
Blocks: 1027073
relnote-firefox: is enough for the release notes
Keywords: feature
If try is green and a local build looks good, I'll push this today.

https://tbpl.mozilla.org/?tree=Try&rev=dfaaa30619b7
Attachment #8423978 - Attachment is obsolete: true
Attachment #8459444 - Flags: review+
(In reply to Chris Lord [:cwiiis] from comment #16)
> Pushed to inbound. Fingers crossed!
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=994556

The url there should have been https://hg.mozilla.org/integration/mozilla-inbound/rev/ea2341b06e6f but it doesn't really matter now.
https://hg.mozilla.org/mozilla-central/rev/ea2341b06e6f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Reopening, since this was backed out in comment 17.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
a heads up on talos alerts related to this:
+---------------------+-------------------------------+---------+--------------+---------+--------+
| platform            | test                          | percent | keyrevision  | status  | bug    |
+---------------------+-------------------------------+---------+--------------+---------+--------+
| Ubuntu HW 12.04 x64 | Tp5 Optimized                 | -7.74%  | e2e07dba5965 | Backout | 994541 |
| Ubuntu HW 12.04     | CanvasMark                    | -19.2%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | Customization Animation Tests | -8.36%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | Tab Animation Test            | -70%    | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | SVG-ASAP                      | -5.84%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | Tp5 Optimized (Private Bytes) | -2.37%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | Tp5 Optimized                 | -9.73%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04 x64 | CanvasMark                    | -17.4%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04 x64 | SVG-ASAP                      | -8.22%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04 x64 | Tab Animation Test            | -62.7%  | ea2341b06e6f | Backout | 994541 |
| Ubuntu HW 12.04     | Paint                         | -9.66%  | 3b5e46a568f1 | Backout | 994541 |
| Ubuntu HW 12.04     | TP5 Scroll                    | -6.38%  | 3b5e46a568f1 | Backout | 994541 |
| Ubuntu HW 12.04 x64 | TP5 Scroll                    | -7.19%  | 3b5e46a568f1 | Backout | 994541 |
| Ubuntu HW 12.04     | tscroll-ASAP                  | -17.4%  | 3b5e46a568f1 | Backout | 994541 |
| Ubuntu HW 12.04 x64 | tscroll-ASAP                  | -16.6%  | 3b5e46a568f1 | Backout | 994541 |
+---------------------+-------------------------------+---------+--------------+---------+--------+

revision 'e2e07dba5965' is when it landed
revision '3b5e46a568f1' is when it was backed out (note, these regressions are improvements we lost due to the backout)
WebGL-Terrain also regressed about 10% due to this, though bjacob dismissed it on dev.tree-management mailing list:

> This particular regression can be ignored: linux uses basic layers,
> which is slow anyway for compositing WebGL frames, and we dont care
> if OMTC makes it a bit slower still.
Reassigning this to nical - I don't think I have the time to dedicate to this and Gaia simultaneously and I don't want to hold this back.

It seems the reftest timeout is the only thing holding this back now.
Assignee: chrislord.net → nical.bugzilla
Target Milestone: mozilla33 → ---
Following up on this bug because it was marked as relnote?. This bug looks to have stopped midstep. Is work ongoing?
(In reply to Lawrence Mandel [:lmandel] from comment #25)
> Following up on this bug because it was marked as relnote?. This bug looks
> to have stopped midstep. Is work ongoing?

Yes.
The code that prevents OMTC to work with the basic compositor is constantly getting in my way and don't make much sense anymore so I landed the part of the patch that removes it. Note that OMTC is still off on linux, we still need to figure out the reftest timeout.

https://hg.mozilla.org/integration/mozilla-inbound/rev/ab4ef3fa85bd
Keywords: leave-open
Depends on: 1061209
This bug is flagged as relnote-firefox but seems like it may be too low level for inclusion in our release notes. nical, do you think this change warrants inclusion (looks like it was fixed in 34) and, if so, can you suggest the wording for the note?
Flags: needinfo?(nical.bugzilla)
Once we actually close this bug, it will be worth mentioning.  Right now, it is not on by default, so not much to say.  Once it's done, this is a big deal - not only are you starting to open up possibilities for async pan zoom, off main thread animation, async video, but it also shows we're not ignoring Linux.
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #30)
> Once we actually close this bug, it will be worth mentioning.  Right now, it
> is not on by default, so not much to say.  Once it's done, this is a big
> deal - not only are you starting to open up possibilities for async pan
> zoom, off main thread animation, async video, but it also shows we're not
> ignoring Linux.

I didn't realize that this was the OMTC on Linux bug. I agree that is worth noting. To be consistent with our Windows OMTC note, we can use:

"Linux: OMTC enabled by default"

As Milan said, need to wait for this bug to be resolved first.
Depends on: 1087080
No longer depends on: 1087080
Depends on: 1128543
Depends on: 1128934
Backed out for making various browser-chrome tests significantly more orange-prone (nearly permafail for the most part).
https://hg.mozilla.org/integration/mozilla-inbound/rev/43eb116a884d
I suspect it also made Wr orange much of the time too.
I'm concerned about the performance regressions and the plan to enable
this by default and let it ride the trains.  I understand that there
are other features in the pipeline that will regain some performance
but it's not clear to me how much and when.

Why can't we start by enabling this for non-release channels only,
like we usually do for features that aren't quite ready for release?
(i.e. #ifndef RELEASE_BUILD)
(In reply to Mats Palmgren (:mats) from comment #35)
> Why can't we start by enabling this for non-release channels only,
> like we usually do for features that aren't quite ready for release?
> (i.e. #ifndef RELEASE_BUILD)
We can't do that because that would mean that we would not be testing OMTC=off during the whole aurora/beta cycle.
(In reply to Mats Palmgren (:mats) from comment #35)
> I'm concerned about the performance regressions and the plan to enable
> this by default and let it ride the trains.  I understand that there
> are other features in the pipeline that will regain some performance
> but it's not clear to me how much and when.
> 

OMTC alone is a massive project, which is quite hard to land in the first place. The other projects that will help regaining the performance regressions are also pretty big and they all depend on OMTC so they can't land first. It is just way too risky to enable them all at once.

> Why can't we start by enabling this for non-release channels only,
> like we usually do for features that aren't quite ready for release?
> (i.e. #ifndef RELEASE_BUILD)

Note that all nightly users that did not opt out of e10s have been using OMTC since e10s got enabled.

Also, while there is a big regression in tp5o_scroll, there is a massive improvement with video playback, and some other improvement here and there. We just don't have a talos test for that. We are willing to ship OMTC on linux with its current improvements and regressions (except the wp and bc intermittents).
It's worth noting that, last I checked (which was a long time ago, to be fair), a lot of the performance regressions were because our tests don't take into account that X is async and they don't actually measure the time it takes to do the work. I expect if the tests were run with synchronous X calls, you may find the numbers align a lot better.

Not to say that forcing X to sync so much with OMTC on isn't a bug, however, we should certainly try to fix that, but there may be cases where it's not possible, especially with e10s, to not require an XSync or two here and there.
(In reply to Chris Lord [:cwiiis] from comment #38)
> It's worth noting that, last I checked (which was a long time ago, to be
> fair), a lot of the performance regressions were because our tests don't
> take into account that X is async and they don't actually measure the time
> it takes to do the work. I expect if the tests were run with synchronous X
> calls, you may find the numbers align a lot better.

I'm not familiar enough with the subject, but if you think you know how to improve the reliability of effectiveness of the test we currently have (or suggest new ones which would have more relevant coverage under Linux or in general), please let me know and we can assess how to approach it.

Thanks.
(In reply to Avi Halachmi (:avih) from comment #39)
> I'm not familiar enough with the subject, but if you think you know how to
> improve the reliability of effectiveness of the test we currently have (or
> suggest new ones which would have more relevant coverage under Linux or in
> general), please let me know and we can assess how to approach it.
> 
> Thanks.

I'm not really familiar enough with exactly how the tests work to suggest something better, but if we're measuring the time for discrete whole frames when drawing, simply calling XSync (or XFlush, or whatever the function happens to be, I forget) before doing the measurement might be a reasonable thing to do (and going on some of the test numbers, likely ends up matching testing behaviour on Windows and Mac).

Note that if we did that though, we might actually want a test that doesn't do that so that we can catch regressions of breaking asynchronicity :) (yes, I'd like to have my cake and eat it please)
Most animations tests don't fiddle much with Firefox internals. They just use requestAnimationFrame to draw each new frame, and measure how fast it can iterate - while in ASAP mode (vsync timing off == iterate as fast as possible without blocking on swapBuffers or equivalent).

So it might depend on the implementation of ASAP under linux. We do know it's reliable on Windows and OS X.

Let's talk it over IRC.
(In reply to Sylvestre Ledru [:sylvestre] from comment #36)
> > (i.e. #ifndef RELEASE_BUILD)
> We can't do that because that would mean that we would not be testing
> OMTC=off during the whole aurora/beta cycle.

Perhaps I'm mistaken, but #ifndef RELEASE_BUILD would enable it for
Nightly and Aurora, but not Beta and Release channels.
Please correct me if I'm wrong.

(perhaps you were thinking of NIGHTLY_BUILD?)
(In reply to Nicolas Silva [:nical] from comment #37)
> OMTC alone is a massive project, which is quite hard to land in the first
> place. The other projects that will help regaining the performance
> regressions are also pretty big and they all depend on OMTC so they can't
> land first. It is just way too risky to enable them all at once.

I appreciate that, but I'm not asking you to enable them all at once.
I'm asking that we enable OMTC only for Nightly and Aurora, and later
when other features are ready, we enable those for Nightly/Aurora too.
That gives you ample opportunity to address issues as they occur,
feature by feature.  Once we're happy with the overall performance on
Aurora we can let them ride the trains.  If issues comes up during Beta
we just disable all the features for another cycle.

> Also, while there is a big regression in tp5o_scroll, there is a massive
> improvement with video playback, and some other improvement here and there.

I'm afraid that's not how users reason at all.  I'm sorry, but the
average user will not notice your video improvements because everything
that just works without issues goes unnoticed.  What they will notice
however is if anything runs slower than normal and that's what will form
their opinion of the new version.

(I'm not sure why you mention tp5o_scroll as a regression.  IIUC,
the note in comment 21 says it's an improvement for that test.
It's the regressions for rev ea2341b06e6f that concerns me.)

> We are willing to ship OMTC on
> linux with its current improvements and regressions (except the wp and bc
> intermittents).

Linux is a Tier-1 platform.  I don't think the performance regressions
in comment 21 are acceptable to ship to users on a Tier-1 platform
(in a release build).
(In reply to Chris Lord [:cwiiis] from comment #38)
> It's worth noting that, last I checked (which was a long time ago, to be
> fair), a lot of the performance regressions were because our tests don't
> take into account that X is async and they don't actually measure the time
> it takes to do the work. I expect if the tests were run with synchronous X
> calls, you may find the numbers align a lot better.

That's a good point.  Some of the work is probably not noticed by our
tests due to that.  I don't think that implies though that forcing sync X
for tests would measure what the user perceives when running async.
I suspect that that would actually be more misleading than any missing
work due to async X.

In this discussion, I'm assuming that the numbers in comment 21 are
accurate and reflects what a user would perceive.
(In reply to Mats Palmgren (:mats) from comment #42)
> Perhaps I'm mistaken, but #ifndef RELEASE_BUILD would enable it for
> Nightly and Aurora, but not Beta and Release channels.
> Please correct me if I'm wrong.

That's exactly what that flag does, yes.
Avih and I spoke on IRC, to clarify what I said before re X and asynchronicity, the only test that I know that is *definitely* affected by this is tsvgr_opacity, where currently Linux returns much better results than all other platforms, but if you insert an XSync before a layer transaction gets sent over with OMTC, you'll get a much worse score that is more in-line with the performance on other platforms. tsvgr_opacity just records its value once, on mozafterpaint, which I suppose must be sent before all the work is actually flushed to the screen.

From what Avih said, this is unlikely to affect all tests, certainly not the ones that time several iterations (though perhaps the last iteration will record as ending sooner than it should), but certainly affects tsvgr_opacity, and may affect a couple of other tests that don't run multiple iterations.

I don't know whether it's worth fixing the affected tests or not (I don't have spare cycles atm, and I can't say I'm particularly invested either), but as long as people are aware.

When I last looked at the numbers, the tests that I have more confidence in reflecting some value that benefits the user were either unchanged or improved with OMTC on (so scroll, canvas and video performance, for example).
There was a very long equivalent discussion when we enabled OMTC on Windows. 
https://groups.google.com/d/msg/mozilla.dev.platform/7RyyPmvN6ds/9HSjg966NaYJ
Among the points that were made:
A regression with a talos tests that run in ASAP is not equivalent to the regression perceived by the users. OMTC is a big enough architectural change that some talos tests don't actually measure the same thing. So the talos numbers have to be taken with a grain of salt.
It is much more relevant to compare two revisions before the switch to OMTC or two revisions after the switch, than one revision with and one without OMTC.

I am not saying that there are no actual regressions, but the choice of whether to enable OMTC (which is not mine by the way), depends on more than whether talos numbers are above or below our usual threshold. I haven't seen reports of linux being slower with e10s enabled, even though OMTC with e10s is slower than OMTC without e10s.

If we enable OMTC on nightly and aurora but don't let it ride the trains (which we did for OMTC on Windows), the non-OMTC will quickly break (unless we run all linux tests with and without OMTC, but that's very expensive), and we'll end up with bit problems on beta. I am not inclined to doing this again.

On a side-note, users did notice that video playback was smoother when we enabled OMTC on windows.

I would prefer if we have this debate on the dev-platform list rather than bugzilla.
The above link seems to indicate there is a scroll regression after all:
tscroll-ASAP	Ubuntu HW 12.04		-6.89%
tscroll-ASAP	Ubuntu HW 12.04 x64	-3.15%**
TP5 Scroll	Ubuntu HW 12.04 x64	-13.2%
TP5 Scroll	Ubuntu HW 12.04		-15.6%
on the flip side here are the alerts when we backed out:
http://alertmanager.allizom.org:8080/alerts.html?rev=43eb116a884d&table=1  this does give weight to what is valid and not.  All regressions in this link were improvements during the landing
Blocks: 1111396
(In reply to Mats Palmgren (:mats) from comment #49)
> The above link seems to indicate there is a scroll regression after all:
> tscroll-ASAP	Ubuntu HW 12.04		-6.89%
> tscroll-ASAP	Ubuntu HW 12.04 x64	-3.15%**
> TP5 Scroll	Ubuntu HW 12.04 x64	-13.2%
> TP5 Scroll	Ubuntu HW 12.04		-15.6%

Just to make sure, Linux OMTC gets the higher numbers, right? since on these tests (and most others), lower numbers are better.
No. If you load the URL in comment 48 and then click "Toggle View" you'll see
that these are regressions.  I'm assuming red means bad and green means good
like they normally do in Western culture.  The "Hide Improvement" link in
that view seems to support that.
(In reply to Avi Halachmi (:avih) from comment #51)
> Just to make sure, Linux OMTC gets the higher numbers, right?

(In reply to Mats Palmgren (:mats) from comment #52)
> No. ... [shows that it's indeed an OMTC regression]

It's confusing since the talos test results for these tests are lower-is-better, but the alerts manager page shows negative percentages for regressions.

Overall, it's confirmed that with OMTC the test result numbers indeed worse. Thanks.
Target Milestone: --- → mozilla40
never mind, looks like this landed?
tracking-e10s: ? → ---
(In reply to George Wright (:gw280) from comment #54)
> never mind, looks like this landed?

many times, and backed out the same amount.

Currently blocked on a test timeout in linux asan builds, and also we are waiting for the next train.
I asked :nical to hold this landing this until train 40, even if ready.  Let me know if that gets into anybody's way of doing something they've planned.
Blocks: e10s-gfx
No longer blocks: 1111396
Depends on: 1155092
Depends on: 1155252
Disable 789933-1.html on Linux debug for frequent crashes. Re-enabling tracked in bug 1155252.
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1cd5d4643b4
Correct me if I'm wrong but now that it has stuck on central this bug should be resolved, right?
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Keywords: leave-open
Resolution: --- → FIXED
Blocks: 1155649
Release Note Request (optional, but appreciated)
[Why is this notable]:  Impact on linux users
[Suggested wording]: Linux: OMTC enabled by default, which can improve scrolling, graphics, and video playback performance
[Links (documentation, blog post, etc)]: 

Milan or nical, is there a blog post or documentation for this that we can link to for release notes?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
Setting firefox-relnote flag to 40+ as Liz just added this to the release notes on nucleus.
For this switch for Linux?  I don't think there is a post.
Flags: needinfo?(milan)
(In reply to Liz Henry (:lizzard) from comment #62)
> Milan or nical, is there a blog post or documentation for this that we can
> link to for release notes?

(In reply to Milan Sreckovic [:milan] from comment #64)
> For this switch for Linux?  I don't think there is a post.

I just wrote one and published it at: https://mozillagfx.wordpress.com/2015/05/19/off-main-thread-compositing-on-linux/
Flags: needinfo?(nical.bugzilla)
No longer blocks: 1155649
Depends on: 1155649
Depends on: 1181006
Depends on: 1193570
Depends on: 1195152
Depends on: 1194397
Depends on: 1196494
Depends on: 1193520
Depends on: 1217926
Depends on: 1273701
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: