1155649 - 4-15% Linux* tcanvasmark/cart/tpaint/tsvgx/tp5o_scroll/tresize/tart/tp5o/ts_paint/glterrain/tscrollx regression on Mozilla-Inbound-Non-PGO (v. 40) on April 16, 2015 from push 662e8039bc4c

Vaibhav (:vaibhav1994)

Reporter

Description

•

10 years ago

Talos has detected a Firefox performance regression from your commit 662e8039bc4c in bug 994541. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=662e8039bc4c&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p linux -u none -t chromez # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a tcanvasmark Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Tuesday, or the offending patch will be backed out! *** Our wiki page oulines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Vaibhav (:vaibhav1994)

Reporter

Comment 1

•

10 years ago

Chris, can you look at these regressions and find out the cause?

Flags: needinfo?(chrislord.net)

Chris Lord [:cwiiis]

Comment 2

•

10 years ago

Passing this on to nical.

Flags: needinfo?(chrislord.net) → needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 3

•

10 years ago

We expect some regressions from enabling OMTC. I'll investigate this and see if there are some obvious issues and low hanging fruits, but there are chances that we'll have to live with this until some of the other OMTC-related projects arrive on Linux.

Assignee: nobody → nical.bugzilla

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 4

•

10 years ago

Bug 1144906 will help making canvas transaction asynchronous (the current sync transactions hurt performance).

Depends on: 1144906

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

10 years ago

this feature has made tsvg opacity on linux 32 add 64 a useless test: http://graphs.mozilla.org/graph.html#tests=[[225,131,33],[225,131,35]]&sel=1429089183957,1429693983957&displayrange=7&datatype=geo I would like to either: * get a fix for this ASAP * back it out * or coordinate with all interested parties and turn the test off for linux Nical/Chris, please ensure momentum continues on one of the 3 above options or I will look to disable/backout this feature by this upcoming weekend.

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(chrislord.net)

Avi Halachmi (:avih)

Comment 6

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #5) > this feature has made tsvg opacity on linux 32 add 64 a useless test: Just to make it clear, by "useless" Joel means that the noise level of this test has increased dramatically, from stddev of ~0.8% to stddev of ~11%, which makes this test less useful. The lower results within the noise level are similar to before OMTC was enabled, and the the noise is towards higher values. > I would like to either: > * get a fix for this ASAP This is preferable, but I don't think there's a big urgency to fix it. We should do it correctly, by first understanding what made the noise level go up so much. nical/cwiis, your analysis would be appreciated here. > * back it out Preferably we don't do this. OMTC is already the default for a long time on Windows and OS X, and and has been delayed long enough on Linux. We should really keep it on and handle whatever needs handling. > * or coordinate with all interested parties and turn the test off for linux Probably not a good option. If you (Joel) want to ignore this test for now because it's noisy, that's fine, however, the alerts typically take noise level into account when generating alerts, and typically won't generate alerts for changes which are below the noise level, so I don't think this will generate false alerts. But if it does, I think it's reasonable if you decide to ignore those if you think they're not conclusive.

Nicolas Silva [:nical]

Assignee

Comment 7

•

10 years ago

I don't understand everything that is going on with x11/xrender, but tsvg gives xrender a ton of drawing operations to do on the x server. With OMTC on Linux we are submitting work for xrender both from the content thread and the compositor thread, and we have to wait for the xserver in certain places. This makes things very timing-dependant. It's possible that things could get a bit more reliable with some carefully placed XFlush calls to give some hints to the x server about when is a good time to stop queueing drawing commands. Long term we'll get rid of xrender and this will go away along with a whole bunch of other issues (lots of dependencies to that unfortunately).

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(chrislord.net)

Chris Lord [:cwiiis]

Comment 8

•

10 years ago

I'll leave this to nical, I haven't done any gfx work in a long time now. FWIW, I'd guess that this noise is because of the asynchronicity of X. We don't run tests with synchronous X calls (which is probably good because that wouldn't reflect real-world performance), but we also don't take into account that X is asynchronous when collecting results (for some tests, at least). If some other part of Gecko was calling XSync and it just happened to line up before or after recording timings, this could cause wildly differing results. Even regardless of calls to XSync, I think it's still possible for similar async-timing-related things to happen. Also FWIW, this guess may be entirely inaccurate and this may be indicative of real issues in our code - I'm not in a good position to know either way right now :)

Avi Halachmi (:avih)

Comment 9

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #6) > (In reply to Joel Maher (:jmaher) from comment #5) > > this feature has made tsvg opacity on linux 32 add 64 a useless test: > > Just to make it clear, by "useless" Joel means that the noise level of this > test has increased dramatically, from stddev of ~0.8% to stddev of ~11%, > which makes this test less useful. The lower results within the noise level > are similar to before OMTC was enabled, and the the noise is towards higher > values. For reference, this test has stddev of ~1% with OMTC enabled on Windows and OSX 10.8,, and ~7% on OS X 10.10. http://graphs.mozilla.org/graph.html#tests=[[225,131,35],[225,131,33],[225,131,25],[225,131,31],[225,131,37],[225,63,55],[225,63,24]]&sel=1427103023995,1429695023995&displayrange=30&datatype=geo (In reply to Nicolas Silva [:nical] from comment #7) > ... > Long term we'll get rid of xrender and this will go away along with a whole > bunch of other issues (lots of dependencies to that unfortunately). How long term is that? weeks? months? Do you expect it to make things more consistent timing-wise? Do you think that the current noise level with OMTC is an indication of an actual issue? or is it just an unfortunate side effect of implementation details? Also, AFAIK, this test is supposed to test/reflect some optimizations in rendering, could it be that with OMTC on linux it ends up "unoptimized" (whatever that means), and that's the real issue? (In reply to Chris Lord [:cwiiis] from comment #8) > ... If some other part of Gecko was calling XSync and it just happened > to line up before or after recording timings, this could cause wildly > differing results ... The only timing here is supposedly that of the MOZAFTERPAINT event after the load event.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 10

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #9) > (In reply to Avi Halachmi (:avih) from comment #6) > > (In reply to Joel Maher (:jmaher) from comment #5) > > > this feature has made tsvg opacity on linux 32 add 64 a useless test: > > > > Just to make it clear, by "useless" Joel means that the noise level of this > > test has increased dramatically, from stddev of ~0.8% to stddev of ~11%, > > which makes this test less useful. The lower results within the noise level > > are similar to before OMTC was enabled, and the the noise is towards higher > > values. > > For reference, this test has stddev of ~1% with OMTC enabled on Windows and > OSX 10.8,, and ~7% on OS X 10.10. > > http://graphs.mozilla.org/graph.html#tests=[[225,131,35],[225,131,33],[225, > 131,25],[225,131,31],[225,131,37],[225,63,55],[225,63,24]]&sel=1427103023995, > 1429695023995&displayrange=30&datatype=geo > > (In reply to Nicolas Silva [:nical] from comment #7) > > ... > > Long term we'll get rid of xrender and this will go away along with a whole > > bunch of other issues (lots of dependencies to that unfortunately). > > How long term is that? weeks? months? I wish I knew the answer to this question. Hard to get Linux stuff prioritized enough, and this is the kind of project where we'll probably see road blocks arising as we go. Definitely not to count in weeks. > Do you expect it to make things more > consistent timing-wise? Yes because painting/compositing will be performed from gecko's threads with code that we control, rather than asynchronously from the x server in code that we have no control over. > > Do you think that the current noise level with OMTC is an indication of an > actual issue? or is it just an unfortunate side effect of implementation > details? Tricky question. I consider that we'd be better off without xrender, so does it make our current usage of xrender an implementation issue? There are certainly ways to improve things even with our current dependencies to xrender, but it's hard to tell since you will get different behaviors with different verisons of x11 and graphics drivers. For instance I couldn't reproduce locally any of the linux OMTC issues that I had to deal with during the last quarter, but some of them were pretty consistent on try. We haven't used APIs like xrender from multiple threads before and few people understand the innards of this, so we are also learning things as we go. > > Also, AFAIK, this test is supposed to test/reflect some optimizations in > rendering, could it be that with OMTC on linux it ends up "unoptimized" > (whatever that means), and that's the real issue? > > (In reply to Chris Lord [:cwiiis] from comment #8) > > ... If some other part of Gecko was calling XSync and it just happened > > to line up before or after recording timings, this could cause wildly > > differing results ... > > The only timing here is supposedly that of the MOZAFTERPAINT event after the > load event. So I looked briefly at what triggers NS_AFTERPAINT, and this happens after the compositor is done compositing and has sent a message back to the main thread to notify that composition happened. There might be other paths to trigger the event, but in the OMTC world, this means that the content thread needs to paint, send the new layers to the compositor *, the compositor performs composition, the compositor sends a notification to the content thread that it has painted *, and the event is fired. each time there is a "*"in the sentence above, the message sits in an event loop until the receiving thread is done doing what its thing, so the time you spend there is completely dependent on whether the receiving thread is already busy or not, if it has already a lot of items in its event loop, etc. Without OMTC you don't have these two event loop steps, so you don't end up measuring whatever other things the threads might be doing. Sorry for the long (and perhaps confusing) answer, but in short: We don't measure the same thing with and without OMTC, and with OMTC, many more things happen before we fire AFTERPAINT, and depending on timing, things that we don't want to measure can sneak in. So when it comes to getting "unoptimized", I will look into whether there are improvement to be made on helping the xserver schedule drawing commands in a more reliable way, but the time we spend painting should not change that much. What changes is we now measure something that involves more than painting and lots of timing sensitive things that we aren't very interested in. If rendering encompasses painting+compositing, then we have to live with the fact that we don't have good ways to measure only the thing that we care about. If by rendering we mean painting, then we can look into other ways to measure it, but it's tricky with asynchronous painting backends such as xrender or D2D, because we'd have to introduce artificial flushes and sync waits to be sure that we measure only paintings, which will give us a totally different scheduling than how we want things to happen in the wild.

Flags: needinfo?(nical.bugzilla)

Avi Halachmi (:avih)

Comment 11

•

10 years ago

Thanks for the detailed info. The tests can only provide us _some_ indication when things change, and "good" test results are not a goal on their own, except when we're relatively confident that the results represent a thing which users would care about. If underlying things change "too" much, then it's possible that the test is no longer good or that we could improve it to better suit to the new underlying code. If that's the case, it's something we need to know, and if possible, with some recommendation on what kind of change to the test would reflect better what we want to test. So I'd think the very first question we'd like answered is: Does the change in pattern at the results of this test indicate a real issues which users would/could notice? If it does, then we'd need to consider how bad is it (regardless of any test results) and what should we do about it. User experience is the ultimate goal here, and the test results/noise is only the tool we use to help us with the former, so it's not a bad idea to also try and keep this tool useful. So which is it?

Nicolas Silva [:nical]

Assignee

Comment 12

•

10 years ago

(In reply to Avi Halachmi (:avih) from comment #11) > Thanks for the detailed info. > > The tests can only provide us _some_ indication when things change, and > "good" test results are not a goal on their own, except when we're > relatively confident that the results represent a thing which users would > care about. > > If underlying things change "too" much, then it's possible that the test is > no longer good or that we could improve it to better suit to the new > underlying code. If that's the case, it's something we need to know, and if > possible, with some recommendation on what kind of change to the test would > reflect better what we want to test. > > So I'd think the very first question we'd like answered is: Does the change > in pattern at the results of this test indicate a real issues which users > would/could notice? > > If it does, then we'd need to consider how bad is it (regardless of any test > results) and what should we do about it. > > User experience is the ultimate goal here, and the test results/noise is > only the tool we use to help us with the former, so it's not a bad idea to > also try and keep this tool useful. I will hopefully have good answers to this when after I have spent more time investigating. It depends on how busy the main thread is (aka how much things like sitting on the main thread's event loop after we have composited but before we send AFTERPAINT contributes to the noise, and the cost of 2 thread context switches that come with each IPDL message). Hopefully the xflush tweaks I am currently experimenting with will help reduce the some of the noise. Profiling svgr on try showed that most of the time is spent waiting for the x server (to which we gave loads and loads of drawing commands). As far as I know before OMTC we didn't wait for the x server to finish doing its work when firing AFTERPAINT and we have to with OMTC so the noise, while it may be the symptom of something bad for the user, doesn't mean it is actually a regression from the user's perspective. Beyond the direct relation with between current tsvgr results and user experience, I completely agree that noisy tests are less useful and that useful tests are important. > > So which is it? Sorry, I have no satisfactory answer other than that I am working on it.

Avi Halachmi (:avih)

Comment 13

•

10 years ago

(In reply to Nicolas Silva [:nical] from comment #12) > ... while it may be the symptom of something bad for the > user, doesn't mean it is actually a regression from the user's perspective. Indeed. That was my question - if it is or isn't something users would care about. > > So which is it? > > Sorry, I have no satisfactory answer other than that I am working on it. This is actually quite satisfying ;) Please post thoughts/conclusions to this thread as you make progress. Thanks.

Nicolas Silva [:nical]

Assignee

Comment 14

•

10 years ago

I ran svgr_opacity on linux 64 in the following configurations: A) calling XFlush after painting each layer (in the hope that the xserver will schedule paints asap): stdev: 12.21 min: 271.84 max: 319.15 avg: 284.63 stdev/avg*100: 4.29% B) disabling xrender whenever possible: stdev: 6.95 min: 497.82 max: 517.39 avg: 510.14 stdev/avg*100: 1.36% C) using x in sync mode [1] stdev: 39.39 min: 1224.93 max: 1344.23 avg: 1293.97 stdev/avg*100: 3.04% These numbers confirm that the noise is coming from our interaction with xrender. Disabling xrender is what we want to do in the long run but what this configuration currently does, is use xrender for native them drawing, then read back, and draw the rest without xrender. The read-back is pretty expensive, so we can't ship this without first getting rid of our xrender dependency. XFlush regresses the average score a little, improves the worst case and makes the best case worse. Note that these measurements are on a test which has a lot of more drawing operations that the average web page so these scores don't necessarily reflect the impact on user experience. All of these configurations improve the standard deviation. [1]: http://tronche.com/gui/x/xlib/event-handling/protocol-errors/synchronization.html It's a debug option, we certainly don't want our users to run with it and I it doesn't make sense to measure performance with it but I was curious.

Nicolas Silva [:nical]

Assignee

Comment 15

•

10 years ago

Attached patch XFlush at the end of transactions which have paints. (obsolete) — Details — Splinter Review

This reduces the noise and seems to marginally affect the average talos score. That's also the least risky solution I can think of right now.

Attachment #8598111 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Comment 16

•

10 years ago

Attached patch better patch. — Details — Splinter Review

The previous patch would not build on all platforms.

Attachment #8598111 - Attachment is obsolete: true

Attachment #8598111 - Flags: review?(jmuizelaar)

Attachment #8598116 - Flags: review?(jmuizelaar)

Chris Lord [:cwiiis]

Comment 17

•

10 years ago

(In reply to Nicolas Silva [:nical] from comment #16) > Created attachment 8598116 [details] [diff] [review] > better patch. > > The previous patch would not build on all platforms. Could we limit things to do this only when testing? If it's not necessary, we shouldn't be flushing the X command queue.

Nicolas Silva [:nical]

Assignee

Comment 18

•

10 years ago

(In reply to Chris Lord [:cwiiis] from comment #17) > Could we limit things to do this only when testing? If it's not necessary, > we shouldn't be flushing the X command queue. I am not extremely found of testing/measuring something that is different from what we ship to users. I haven't found a lot of documentation on the web about XFlush so I may miss some of the implications, but it seems like a good thing to make sure the X server starts is processing the drawing commands as soon as the transaction ends on the producer side and avoid accumulating commands from several frames. If you have more insights I am interested.

Chris Lord [:cwiiis]

Comment 19

•

10 years ago

(In reply to Nicolas Silva [:nical] from comment #18) > (In reply to Chris Lord [:cwiiis] from comment #17) > > Could we limit things to do this only when testing? If it's not necessary, > > we shouldn't be flushing the X command queue. > > I am not extremely found of testing/measuring something that is different > from what we ship to users. I haven't found a lot of documentation on the > web about XFlush so I may miss some of the implications, but it seems like a > good thing to make sure the X server starts is processing the drawing > commands as soon as the transaction ends on the producer side and avoid > accumulating commands from several frames. If you have more insights I am > interested. When I did more Linux-y stuff, it was just generally not recommended. It stops possible optimisation that happens at the X-server level, and you can't know what the server is doing wrt all the other clients running, etc. Calling flush/sync can have bad/unexpected performance implications basically, and you shouldn't call it unless you have very specific reasons (e.g. multi-process synchronisation). We're already testing something different to what the user sees because we aren't running in a desktop production environment, we probably have a very different browser work-load, etc. What we're concerned with, I think, with our tests is getting consistent results, within which changes are meaningful. Calling XFlush will help us with this in an environment where there are no/few other clients, but may adversely affect real-world performance in real terms. We're basically adding a work-around that harms real-world performance because we want to have more consistent test numbers in a test environment. This seems like a bad idea to me.

Nicolas Silva [:nical]

Assignee

Comment 20

•

10 years ago

(In reply to Chris Lord [:cwiiis] from comment #19) > > We're basically adding a work-around that harms real-world performance > because we want to have more consistent test numbers in a test environment. > This seems like a bad idea to me. I don't know for sure that this is true, though. I mean, if you are certain that this applies to multi-threaded rendering, then I am with you, but the excessive flushing story is usually applies to single-threaded apps. In a typical application you generate drawing commands and present/composite on the same thread so the xserver implicitly has enough information about what needs to be drawn for this frame when you present, because you present before you start submitting command for the next frame. With OMTC you generate drawing commands from a thread and present on another thread, so if you don't give the x server hints about which commands belong to which frame, it's hard for the it to be as smart as it is on the single threaded setup. When you present your frame to the window manager, it may have accumulated drawing commands from your last frame + part of the next frame and you end up splitting the next frame in two, possibly waiting for commands that belong to the next frame, waiting longer on the compositor thread and getting into a vicious circle where the compositor can't keep up because it is processing several frames at a time and the content thread over-produces. We had a similar story with WebGL on b2g for instance. The rule of thumb about excessive flushing is usually the same on GL drivers as with xrender as far as I can tell (not super far) especially on mobile, but multi-threaded producer-consumer setups can put the driver (and I suspect the x server in this case) in a situation where its heuristics that have been designed for single threaded rendering don't play very well. I would prefer taking the same code paths on talos tests and release (and eventually back the release part out if we find out that the real-world performance actually regresses) unless we know for sure that flushing at the end of frames makes things worse in a multi-threaded setup. I don't think we know for sure at the moment, so I'd rather try and see, than add the mental overhead of testing against something different than what we ship without being certain that we have to.

Chris Lord [:cwiiis]

Comment 21

•

10 years ago

You make a good point, and I realise I actually misread the patch and thought it was flushing more than it is - You've convinced me :)

Jeff Muizelaar [:jrmuizel]

Comment 22

•

10 years ago

Comment on attachment 8598116 [details] [diff] [review] better patch. Review of attachment 8598116 [details] [diff] [review]: ----------------------------------------------------------------- It would be great if you could add your justification for this as commen in ShadowLayers.cpp

Attachment #8598116 - Flags: review?(jmuizelaar) → review+

Nicolas Silva [:nical]

Assignee

Comment 23

•

10 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #22) > It would be great if you could add your justification for this as commen in > ShadowLayers.cpp Added: // With some platforms, telling the drawing backend that there will be no more // drawing for this frame helps with preventing command queues from spanning // across multiple frames.

Pulsebot

Comment 24

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/096da3d7f850

Ryan VanderMeulen [:RyanVM]

Comment 25

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/096da3d7f850

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox40: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla40

Nicolas Silva [:nical]

Assignee

Comment 26

•

10 years ago

The patch gave us a pretty neat improvement on tscroll and tp5 scroll. (Improvement) Mozilla-Inbound-Non-PGO - TP5 Scroll - Ubuntu HW 12.04 x64 - 16.4% (Improvement) Mozilla-Inbound-Non-PGO - TP5 Scroll - Ubuntu HW 12.04 - 19.8% (Improvement) Mozilla-Inbound-Non-PGO - tscroll-ASAP - Ubuntu HW 12.04 x64 - 4.49% (Improvement) Mozilla-Inbound-Non-PGO - tscroll-ASAP - Ubuntu HW 12.04 - 7.61% tsvgr opacity looks like it got worse on average but much less noisy (I haven't received any notification email for it). With these results I think it makes sense to keep the XFlush at the end of frames. Canvasmark (which this bug was originally about) doesn't seem to have been affected, so reopening. There are improvements that can be done with canvas+OMTC to avoid some of the overhead that was added with OMTC but they are pretty far down the priority list at the moment. I'll file bugs blocking this one and the work will happen there whenever it happens.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Nicolas Silva [:nical]

Assignee

Updated

•

10 years ago

Depends on: 1161636

Vaibhav (:vaibhav1994)

Reporter

Comment 27

•

10 years ago

:nical, your patch has shown improvement on most of the tests, but is causing 65.4% regression on SVG, Opacity Row Major test: http://alertmanager.allizom.org:8080/alerts.html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0

Flags: needinfo?(nical.bugzilla)

Avi Halachmi (:avih)

Comment 28

•

10 years ago

(In reply to Vaibhav (:vaibhav1994) from comment #27) > :nical, your patch has shown improvement on most of the tests, but is > causing 65.4% regression on SVG, Opacity Row Major test: > http://alertmanager.allizom.org:8080/alerts. > html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0 Just in case it needs reminding (:nical), the goal is to end up with the best user experience and not with the "best" test numbers. If you think the change at this test's numbers should be acceptable (either because the change in numbers doesn't represent something good enough, or because that's the best we can do or because improvements in other areas outweigh this regression, etc), just put the reasons on record such that interested parties could comment on them, and if there's a consensus they're valid, that would be fine.

Chris Lord [:cwiiis]

Comment 29

•

10 years ago

SVG Opacity was definitely broken before - assuming the same things are happening as when I tested similar changes a while ago, the Linux numbers are now probably more in-line with the Windows and Mac numbers. SVG Opacity, if I remember correctly when talking to Avi about it, is a test that only gets run once, and previously, the timing would have been taken before X actually got to executing the commands (which explains the huge decrease). To be clear, this decrease is (likely) due to a false result now being replaced by a real one. I'll leave nical to clear this up though, for all I know I could be talking rubbish :)

Joel Maher ( :jmaher ) (UTC -8)

Comment 30

•

10 years ago

good point ;cwiiis! Looking at a 90 day window: http://graphs.mozilla.org/graph.html#tests=[[225,132,35],[225,63,35],[225,131,35]]&sel=none&displayrange=90&datatype=geo we are a bit higher, but the noise has been reduced. I would gladly take reduced noise any day. SVG Opacity is one of those tests that we need to be aware of the regressions as sometimes random things (like networking) could affect it. Usually we can just verify we are doing the right thing in our patch and document it.

Avi Halachmi (:avih)

Comment 31

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #30) > SVG Opacity is one of those tests that we need to be > aware of the regressions as sometimes random things (like networking) could > affect it. They're not random, and they're very concrete and well documented*, and regardless, this isn't one of those cases and. This one's actually exactly what this test wants to test: SVG layers optimizations. Whether it succeeds or not (to test that thing) is another question, and :nical seems to be on top of it. * https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg-opacity

Nicolas Silva [:nical]

Assignee

Comment 32

•

10 years ago

(In reply to Vaibhav (:vaibhav1994) from comment #27) > :nical, your patch has shown improvement on most of the tests, but is > causing 65.4% regression on SVG, Opacity Row Major test: > http://alertmanager.allizom.org:8080/alerts. > html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0 I'll look into whether there are obvious issues showing up in profiles, but as it is, I think we should keep this configuration even with the rather impressive regression on tsvgr opacity, for the reasons Chris explained.

Flags: needinfo?(nical.bugzilla)

Joel Maher ( :jmaher ) (UTC -8)

Comment 33

•

10 years ago

I appreciate all the dialog and context here!

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

10 years ago

this is now on Aurora, so far I see a linux32 svg-asap regression of 5.93% http://alertmanager.allizom.org:8080/alerts.html?rev=e2ce1aac996e&showAll=1&testIndex=0&platIndex=0 I will update this bug if I see others.

Vaibhav (:vaibhav1994)

Reporter

Comment 35

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #34) > this is now on Aurora, so far I see a linux32 svg-asap regression of 5.93% > http://alertmanager.allizom.org:8080/alerts. > html?rev=e2ce1aac996e&showAll=1&testIndex=0&platIndex=0 > It is also on linux32 and linux64 svg-opacity, svg-asap, tab animation, custom animation, canvasmark as can be seen in the link.

Karl Tomlinson (:karlt)

Comment 36

•

10 years ago

(In reply to Chris Lord [:cwiiis] from comment #29) > SVG Opacity was definitely broken before - assuming the same things are > happening as when I tested similar changes a while ago, the Linux numbers > are now probably more in-line with the Windows and Mac numbers. SVG Opacity, > if I remember correctly when talking to Avi about it, is a test that only > gets run once, and previously, the timing would have been taken before X > actually got to executing the commands (which explains the huge decrease). For tests doing much drawing, it would be helpful to trigger an artificial wait for the server so that the tests measures the time to the completion of drawing on the server. This would be most indicative of user experience. See attachment 460193 [details] [diff] [review] for how this was done for tscroll.

Blocks: 994541

No longer depends on: 994541

Keywords: regression

Karl Tomlinson (:karlt)

Updated

•

10 years ago

Depends on: 1181006

Karl Tomlinson (:karlt)

Comment 37

•

10 years ago

(In reply to Nicolas Silva [:nical] from comment #12) > Profiling svgr on try showed that most of the > time is spent waiting for the x server (to which we gave loads and loads of > drawing commands). As far as I know before OMTC we didn't wait for the x > server to finish doing its work when firing AFTERPAINT and we have to with > OMTC so the noise, while it may be the symptom of something bad for the > user, doesn't mean it is actually a regression from the user's perspective. I'm not sure what waiting for the X server is happening here but having two threads should not make this necessary. Anything that causes us to be waiting in _XReply for the server means that the advantages of the async pipeline have been lost, and would affect user experience.

Nicolas Silva [:nical]

Assignee

Comment 38

•

10 years ago

Let's close this now. We'll keep investigating performance and look into removing sync round trips to the x server but that'll be for the sake of improving things rather than to fix this regression which we have been shipping with for a while now.

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → WONTFIX

XFlush at the end of transactions which have paints. 10 years ago Nicolas Silva [:nical] 3.56 KB, patch		Details \| Diff \| Splinter Review
better patch. 10 years ago Nicolas Silva [:nical] 3.57 KB, patch	jrmuizel : review+	Details \| Diff \| Splinter Review