Closed Bug 1155649 Opened 9 years ago Closed 9 years ago

4-15% Linux* tcanvasmark/cart/tpaint/tsvgx/tp5o_scroll/tresize/tart/tp5o/ts_paint/glterrain/tscrollx regression on Mozilla-Inbound-Non-PGO (v. 40) on April 16, 2015 from push 662e8039bc4c

Categories

(Testing :: Talos, defect)

x86
Linux
defect
Not set
normal

Tracking

(firefox40 fixed)

RESOLVED WONTFIX
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: vaibhav1994, Assigned: nical)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Talos has detected a Firefox performance regression from your commit 662e8039bc4c in bug 994541.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=662e8039bc4c&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux -u none -t chromez  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tcanvasmark

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Chris, can you look at these regressions and find out the cause?
Flags: needinfo?(chrislord.net)
Passing this on to nical.
Flags: needinfo?(chrislord.net) → needinfo?(nical.bugzilla)
We expect some regressions from enabling OMTC. I'll investigate this and see if there are some obvious issues and low hanging fruits, but there are chances that we'll have to live with this until some of the other OMTC-related projects arrive on Linux.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Bug 1144906 will help making canvas transaction asynchronous (the current sync transactions hurt performance).
Depends on: 1144906
this feature has made tsvg opacity on linux 32 add 64 a useless test:
http://graphs.mozilla.org/graph.html#tests=[[225,131,33],[225,131,35]]&sel=1429089183957,1429693983957&displayrange=7&datatype=geo

I would like to either:
* get a fix for this ASAP
* back it out
* or coordinate with all interested parties and turn the test off for linux

Nical/Chris, please ensure momentum continues on one of the 3 above options or I will look to disable/backout this feature by this upcoming weekend.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(chrislord.net)
(In reply to Joel Maher (:jmaher) from comment #5)
> this feature has made tsvg opacity on linux 32 add 64 a useless test:

Just to make it clear, by "useless" Joel means that the noise level of this test has increased dramatically, from stddev of ~0.8% to stddev of ~11%, which makes this test less useful. The lower results within the noise level are similar to before OMTC was enabled, and the the noise is towards higher values.

> I would like to either:
> * get a fix for this ASAP

This is preferable, but I don't think there's a big urgency to fix it. We should do it correctly, by first understanding what made the noise level go up so much.

nical/cwiis, your analysis would be appreciated here.


> * back it out

Preferably we don't do this. OMTC is already the default for a long time on Windows and OS X, and and has been delayed long enough on Linux. We should really keep it on and handle whatever needs handling.


> * or coordinate with all interested parties and turn the test off for linux

Probably not a good option. If you (Joel) want to ignore this test for now because it's noisy, that's fine, however, the alerts typically take noise level into account when generating alerts, and typically won't generate alerts for changes which are below the noise level, so I don't think this will generate false alerts. But if it does, I think it's reasonable if you decide to ignore those if you think they're not conclusive.
I don't understand everything that is going on with x11/xrender, but tsvg gives xrender a ton of drawing operations to do on the x server. With OMTC on Linux we are submitting work for xrender both from the content thread and the compositor thread, and we have to wait for the xserver in certain places. This makes things very timing-dependant. It's possible that things could get a bit more reliable with some carefully placed XFlush calls to give some hints to the x server about when is a good time to stop queueing drawing commands.

Long term we'll get rid of xrender and this will go away along with a whole bunch of other issues (lots of dependencies to that unfortunately).
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(chrislord.net)
I'll leave this to nical, I haven't done any gfx work in a long time now.

FWIW, I'd guess that this noise is because of the asynchronicity of X. We don't run tests with synchronous X calls (which is probably good because that wouldn't reflect real-world performance), but we also don't take into account that X is asynchronous when collecting results (for some tests, at least). If some other part of Gecko was calling XSync and it just happened to line up before or after recording timings, this could cause wildly differing results. Even regardless of calls to XSync, I think it's still possible for similar async-timing-related things to happen.

Also FWIW, this guess may be entirely inaccurate and this may be indicative of real issues in our code - I'm not in a good position to know either way right now :)
(In reply to Avi Halachmi (:avih) from comment #6)
> (In reply to Joel Maher (:jmaher) from comment #5)
> > this feature has made tsvg opacity on linux 32 add 64 a useless test:
> 
> Just to make it clear, by "useless" Joel means that the noise level of this
> test has increased dramatically, from stddev of ~0.8% to stddev of ~11%,
> which makes this test less useful. The lower results within the noise level
> are similar to before OMTC was enabled, and the the noise is towards higher
> values.

For reference, this test has stddev of ~1% with OMTC enabled on Windows and OSX 10.8,, and ~7% on OS X 10.10.

http://graphs.mozilla.org/graph.html#tests=[[225,131,35],[225,131,33],[225,131,25],[225,131,31],[225,131,37],[225,63,55],[225,63,24]]&sel=1427103023995,1429695023995&displayrange=30&datatype=geo

(In reply to Nicolas Silva [:nical] from comment #7)
> ...
> Long term we'll get rid of xrender and this will go away along with a whole
> bunch of other issues (lots of dependencies to that unfortunately).

How long term is that? weeks? months? Do you expect it to make things more consistent timing-wise?

Do you think that the current noise level with OMTC is an indication of an actual issue? or is it just an unfortunate side effect of implementation details?

Also, AFAIK, this test is supposed to test/reflect some optimizations in rendering, could it be that with OMTC on linux it ends up "unoptimized" (whatever that means), and that's the real issue?


(In reply to Chris Lord [:cwiiis] from comment #8)
> ... If some other part of Gecko was calling XSync and it just happened
> to line up before or after recording timings, this could cause wildly
> differing results ...

The only timing here is supposedly that of the MOZAFTERPAINT event after the load event.
Flags: needinfo?(nical.bugzilla)
(In reply to Avi Halachmi (:avih) from comment #9)
> (In reply to Avi Halachmi (:avih) from comment #6)
> > (In reply to Joel Maher (:jmaher) from comment #5)
> > > this feature has made tsvg opacity on linux 32 add 64 a useless test:
> > 
> > Just to make it clear, by "useless" Joel means that the noise level of this
> > test has increased dramatically, from stddev of ~0.8% to stddev of ~11%,
> > which makes this test less useful. The lower results within the noise level
> > are similar to before OMTC was enabled, and the the noise is towards higher
> > values.
> 
> For reference, this test has stddev of ~1% with OMTC enabled on Windows and
> OSX 10.8,, and ~7% on OS X 10.10.
> 
> http://graphs.mozilla.org/graph.html#tests=[[225,131,35],[225,131,33],[225,
> 131,25],[225,131,31],[225,131,37],[225,63,55],[225,63,24]]&sel=1427103023995,
> 1429695023995&displayrange=30&datatype=geo
> 
> (In reply to Nicolas Silva [:nical] from comment #7)
> > ...
> > Long term we'll get rid of xrender and this will go away along with a whole
> > bunch of other issues (lots of dependencies to that unfortunately).
> 
> How long term is that? weeks? months?

I wish I knew the answer to this question. Hard to get Linux stuff prioritized enough, and this is the kind of project where we'll probably see road blocks arising as we go. Definitely not to count in weeks.

> Do you expect it to make things more
> consistent timing-wise?

Yes because painting/compositing will be performed from gecko's threads with code that we control, rather than asynchronously from the x server in code that we have no control over.

> 
> Do you think that the current noise level with OMTC is an indication of an
> actual issue? or is it just an unfortunate side effect of implementation
> details?

Tricky question. I consider that we'd be better off without xrender, so does it make our current usage of xrender an implementation issue? There are certainly ways to improve things even with our current dependencies to xrender, but it's hard to tell since you will get different behaviors with different verisons of x11 and graphics drivers. For instance I couldn't reproduce locally any of the linux OMTC issues that I had to deal with during the last quarter, but some of them were pretty consistent on try. We haven't used APIs like xrender from multiple threads before and few people understand the innards of this, so we are also learning things as we go.

> 
> Also, AFAIK, this test is supposed to test/reflect some optimizations in
> rendering, could it be that with OMTC on linux it ends up "unoptimized"
> (whatever that means), and that's the real issue?
> 
> (In reply to Chris Lord [:cwiiis] from comment #8)
> > ... If some other part of Gecko was calling XSync and it just happened
> > to line up before or after recording timings, this could cause wildly
> > differing results ...
> 
> The only timing here is supposedly that of the MOZAFTERPAINT event after the
> load event.

So I looked briefly at what triggers NS_AFTERPAINT, and this happens after the compositor is done compositing and has sent a message back to the main thread to notify that composition happened. There might be other paths to trigger the event, but in the OMTC world, this means that the content thread needs to paint, send the new layers to the compositor *, the compositor performs composition, the compositor sends a notification to the content thread that it has painted *, and the event is fired.
each time there is a "*"in the sentence above, the message sits in an event loop until the receiving thread is done doing what its thing, so the time you spend there is completely dependent on whether the receiving thread is already busy or not, if it has already a lot of items in its event loop, etc.
Without OMTC you don't have these two event loop steps, so you don't end up measuring whatever other things the threads might be doing.
Sorry for the long (and perhaps confusing) answer, but in short: We don't measure the same thing with and without OMTC, and with OMTC, many more things happen before we fire AFTERPAINT, and depending on timing, things that we don't want to measure can sneak in.

So when it comes to getting "unoptimized", I will look into whether there are improvement to be made on helping the xserver schedule drawing commands in a more reliable way, but the time we spend painting should not change that much. What changes is we now measure something that involves more than painting and lots of timing sensitive things that we aren't very interested in.

If rendering encompasses painting+compositing, then we have to live with the fact that we don't have good ways to measure only the thing that we care about. If by rendering we mean painting, then we can look into other ways to measure it, but it's tricky with asynchronous painting backends such as xrender or D2D, because we'd have to introduce artificial flushes and sync waits to be sure that we measure only paintings, which will give us a totally different scheduling than how we want things to happen in the wild.
Flags: needinfo?(nical.bugzilla)
Thanks for the detailed info.

The tests can only provide us _some_ indication when things change, and "good" test results are not a goal on their own, except when we're relatively confident that the results represent a thing which users would care about.

If underlying things change "too" much, then it's possible that the test is no longer good or that we could improve it to better suit to the new underlying code. If that's the case, it's something we need to know, and if possible, with some recommendation on what kind of change to the test would reflect better what we want to test.

So I'd think the very first question we'd like answered is: Does the change in pattern at the results of this test indicate a real issues which users would/could notice?

If it does, then we'd need to consider how bad is it (regardless of any test results) and what should we do about it.

User experience is the ultimate goal here, and the test results/noise is only the tool we use to help us with the former, so it's not a bad idea to also try and keep this tool useful.

So which is it?
(In reply to Avi Halachmi (:avih) from comment #11)
> Thanks for the detailed info.
> 
> The tests can only provide us _some_ indication when things change, and
> "good" test results are not a goal on their own, except when we're
> relatively confident that the results represent a thing which users would
> care about.
> 
> If underlying things change "too" much, then it's possible that the test is
> no longer good or that we could improve it to better suit to the new
> underlying code. If that's the case, it's something we need to know, and if
> possible, with some recommendation on what kind of change to the test would
> reflect better what we want to test.
> 
> So I'd think the very first question we'd like answered is: Does the change
> in pattern at the results of this test indicate a real issues which users
> would/could notice?
> 
> If it does, then we'd need to consider how bad is it (regardless of any test
> results) and what should we do about it.
> 
> User experience is the ultimate goal here, and the test results/noise is
> only the tool we use to help us with the former, so it's not a bad idea to
> also try and keep this tool useful.

I will hopefully have good answers to this when after I have spent more time investigating. It depends on how busy the main thread is (aka how much things like sitting on the main thread's event loop after we have composited but before we send AFTERPAINT contributes to the noise, and the cost of 2 thread context switches that come with each IPDL message).
Hopefully the xflush tweaks I am currently experimenting with will help reduce the some of the noise. Profiling svgr on try showed that most of the time is spent waiting for the x server (to which we gave loads and loads of drawing commands). As far as I know before OMTC we didn't wait for the x server to finish doing its work when firing AFTERPAINT and we have to with OMTC so the noise, while it may be the symptom of something bad for the user, doesn't mean it is actually a regression from the user's perspective.
Beyond the direct relation with between current tsvgr results and user experience, I completely agree that noisy tests are less useful and that useful tests are important.

> 
> So which is it?

Sorry, I have no satisfactory answer other than that I am working on it.
(In reply to Nicolas Silva [:nical] from comment #12)
> ... while it may be the symptom of something bad for the
> user, doesn't mean it is actually a regression from the user's perspective.

Indeed. That was my question - if it is or isn't something users would care about.

> > So which is it?
> 
> Sorry, I have no satisfactory answer other than that I am working on it.

This is actually quite satisfying ;)

Please post thoughts/conclusions to this thread as you make progress. Thanks.
I ran svgr_opacity on linux 64 in the following configurations:

A) calling XFlush after painting each layer (in the hope that the xserver will schedule paints asap):
stdev: 12.21 min: 271.84 max: 319.15 avg: 284.63
stdev/avg*100: 4.29%

B) disabling xrender whenever possible:
stdev: 6.95 min: 497.82 max: 517.39 avg: 510.14
stdev/avg*100: 1.36%

C) using x in sync mode [1]
stdev: 39.39 min: 1224.93 max: 1344.23 avg: 1293.97
stdev/avg*100: 3.04%


These numbers confirm that the noise is coming from our interaction with xrender.

Disabling xrender is what we want to do in the long run but what this configuration currently does, is use xrender for native them drawing, then read back, and draw the rest without xrender. The read-back is pretty expensive, so we can't ship this without first getting rid of our xrender dependency.

XFlush regresses the average score a little, improves the worst case and makes the best case worse. Note that these measurements are on a test which has a lot of more drawing operations that the average web page so these scores don't necessarily reflect the impact on user experience.
All of these configurations improve the standard deviation.

[1]: http://tronche.com/gui/x/xlib/event-handling/protocol-errors/synchronization.html It's a debug option, we certainly don't want our users to run with it and I it doesn't make sense to measure performance with it but I was curious.
This reduces the noise and seems to marginally affect the average talos score. That's also the least risky solution I can think of right now.
Attachment #8598111 - Flags: review?(jmuizelaar)
Attached patch better patch.Splinter Review
The previous patch would not build on all platforms.
Attachment #8598111 - Attachment is obsolete: true
Attachment #8598111 - Flags: review?(jmuizelaar)
Attachment #8598116 - Flags: review?(jmuizelaar)
(In reply to Nicolas Silva [:nical] from comment #16)
> Created attachment 8598116 [details] [diff] [review]
> better patch.
> 
> The previous patch would not build on all platforms.

Could we limit things to do this only when testing? If it's not necessary, we shouldn't be flushing the X command queue.
(In reply to Chris Lord [:cwiiis] from comment #17)
> Could we limit things to do this only when testing? If it's not necessary,
> we shouldn't be flushing the X command queue.

I am not extremely found of testing/measuring something that is different from what we ship to users. I haven't found a lot of documentation on the web about XFlush so I may miss some of the implications, but it seems like a good thing to make sure the X server starts is processing the drawing commands as soon as the transaction ends on the producer side and avoid accumulating commands from several frames. If you have more insights I am interested.
(In reply to Nicolas Silva [:nical] from comment #18)
> (In reply to Chris Lord [:cwiiis] from comment #17)
> > Could we limit things to do this only when testing? If it's not necessary,
> > we shouldn't be flushing the X command queue.
> 
> I am not extremely found of testing/measuring something that is different
> from what we ship to users. I haven't found a lot of documentation on the
> web about XFlush so I may miss some of the implications, but it seems like a
> good thing to make sure the X server starts is processing the drawing
> commands as soon as the transaction ends on the producer side and avoid
> accumulating commands from several frames. If you have more insights I am
> interested.

When I did more Linux-y stuff, it was just generally not recommended. It stops possible optimisation that happens at the X-server level, and you can't know what the server is doing wrt all the other clients running, etc. Calling flush/sync can have bad/unexpected performance implications basically, and you shouldn't call it unless you have very specific reasons (e.g. multi-process synchronisation).

We're already testing something different to what the user sees because we aren't running in a desktop production environment, we probably have a very different browser work-load, etc. What we're concerned with, I think, with our tests is getting consistent results, within which changes are meaningful. Calling XFlush will help us with this in an environment where there are no/few other clients, but may adversely affect real-world performance in real terms.

We're basically adding a work-around that harms real-world performance because we want to have more consistent test numbers in a test environment. This seems like a bad idea to me.
(In reply to Chris Lord [:cwiiis] from comment #19)
> 
> We're basically adding a work-around that harms real-world performance
> because we want to have more consistent test numbers in a test environment.
> This seems like a bad idea to me.

I don't know for sure that this is true, though.
I mean, if you are certain that this applies to multi-threaded rendering, then I am with you, but the excessive flushing story is usually applies to single-threaded apps.

In a typical application you generate drawing commands and present/composite on the same thread so the xserver implicitly has enough information about what needs to be drawn for this frame when you present, because you present before you start submitting command for the next frame. With OMTC you generate drawing commands from a thread and present on another thread, so if you don't give the x server hints about which commands belong to which frame, it's hard for the it to be as smart as it is on the single threaded setup. When you present your frame to the window manager, it may have accumulated drawing commands from your last frame + part of the next frame and you end up splitting the next frame in two, possibly waiting for commands that belong to the next frame, waiting longer on the compositor thread and getting into a vicious circle where the compositor can't keep up because it is processing several frames at a time and the content thread over-produces. We had a similar story with WebGL on b2g for instance. The rule of thumb about excessive flushing is usually the same on GL drivers as with xrender as far as I can tell (not super far) especially on mobile, but multi-threaded producer-consumer setups can put the driver (and I suspect the x server in this case) in a situation where its heuristics that have been designed for single threaded rendering don't play very well.

I would prefer taking the same code paths on talos tests and release (and eventually back the release part out if we find out that the real-world performance actually regresses) unless we know for sure that flushing at the end of frames makes things worse in a multi-threaded setup.
I don't think we know for sure at the moment, so I'd rather try and see, than add the mental overhead of testing against something different than what we ship without being certain that we have to.
You make a good point, and I realise I actually misread the patch and thought it was flushing more than it is - You've convinced me :)
Comment on attachment 8598116 [details] [diff] [review]
better patch.

Review of attachment 8598116 [details] [diff] [review]:
-----------------------------------------------------------------

It would be great if you could add your justification for this as commen in ShadowLayers.cpp
Attachment #8598116 - Flags: review?(jmuizelaar) → review+
(In reply to Jeff Muizelaar [:jrmuizel] from comment #22)
> It would be great if you could add your justification for this as commen in
> ShadowLayers.cpp

Added:

// With some platforms, telling the drawing backend that there will be no more
// drawing for this frame helps with preventing command queues from spanning
// across multiple frames.
https://hg.mozilla.org/mozilla-central/rev/096da3d7f850
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
The patch gave us a pretty neat improvement on tscroll and tp5 scroll.

(Improvement) Mozilla-Inbound-Non-PGO - TP5 Scroll - Ubuntu HW 12.04 x64 - 16.4%
(Improvement) Mozilla-Inbound-Non-PGO - TP5 Scroll - Ubuntu HW 12.04 - 19.8%
(Improvement) Mozilla-Inbound-Non-PGO - tscroll-ASAP - Ubuntu HW 12.04 x64 - 4.49%
(Improvement) Mozilla-Inbound-Non-PGO - tscroll-ASAP - Ubuntu HW 12.04 - 7.61%

tsvgr opacity looks like it got worse on average but much less noisy (I haven't received any notification email for it).

With these results I think it makes sense to keep the XFlush at the end of frames.

Canvasmark (which this bug was originally about) doesn't seem to have been affected, so reopening.

There are improvements that can be done with canvas+OMTC to avoid some of the overhead that was added with OMTC but they are pretty far down the priority list at the moment. I'll file bugs blocking this one and the work will happen there whenever it happens.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1161636
:nical, your patch has shown improvement on most of the tests, but is causing 65.4% regression on SVG, Opacity Row Major test: http://alertmanager.allizom.org:8080/alerts.html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0
Flags: needinfo?(nical.bugzilla)
(In reply to Vaibhav (:vaibhav1994) from comment #27)
> :nical, your patch has shown improvement on most of the tests, but is
> causing 65.4% regression on SVG, Opacity Row Major test:
> http://alertmanager.allizom.org:8080/alerts.
> html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0

Just in case it needs reminding (:nical), the goal is to end up with the best user experience and not with the "best" test numbers.

If you think the change at this test's numbers should be acceptable (either because the change in numbers doesn't represent something good enough, or because that's the best we can do or because improvements in other areas outweigh this regression, etc), just put the reasons on record such that interested parties could comment on them, and if there's a consensus they're valid, that would be fine.
SVG Opacity was definitely broken before - assuming the same things are happening as when I tested similar changes a while ago, the Linux numbers are now probably more in-line with the Windows and Mac numbers. SVG Opacity, if I remember correctly when talking to Avi about it, is a test that only gets run once, and previously, the timing would have been taken before X actually got to executing the commands (which explains the huge decrease).

To be clear, this decrease is (likely) due to a false result now being replaced by a real one.

I'll leave nical to clear this up though, for all I know I could be talking rubbish :)
good point ;cwiiis!

Looking at a 90 day window:
http://graphs.mozilla.org/graph.html#tests=[[225,132,35],[225,63,35],[225,131,35]]&sel=none&displayrange=90&datatype=geo

we are a bit higher, but the noise has been reduced.  I would gladly take reduced noise any day.  SVG Opacity is one of those tests that we need to be aware of the regressions as sometimes random things (like networking) could affect it.  Usually we can just verify we are doing the right thing in our patch and document it.
(In reply to Joel Maher (:jmaher) from comment #30)
> SVG Opacity is one of those tests that we need to be
> aware of the regressions as sometimes random things (like networking) could
> affect it.

They're not random, and they're very concrete and well documented*, and regardless, this isn't one of those cases and. This one's actually exactly what this test wants to test: SVG layers optimizations.

Whether it succeeds or not (to test that thing) is another question, and :nical seems to be on top of it.

* https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg-opacity
(In reply to Vaibhav (:vaibhav1994) from comment #27)
> :nical, your patch has shown improvement on most of the tests, but is
> causing 65.4% regression on SVG, Opacity Row Major test:
> http://alertmanager.allizom.org:8080/alerts.
> html?rev=4b23f2a48e84&showAll=1&testIndex=0&platIndex=0

I'll look into whether there are obvious issues showing up in profiles, but as it is, I think we should keep this configuration even with the rather impressive regression on tsvgr opacity, for the reasons Chris explained.
Flags: needinfo?(nical.bugzilla)
I appreciate all the dialog and context here!
this is now on Aurora, so far I see a linux32 svg-asap regression of 5.93%
http://alertmanager.allizom.org:8080/alerts.html?rev=e2ce1aac996e&showAll=1&testIndex=0&platIndex=0

I will update this bug if I see others.
(In reply to Joel Maher (:jmaher) from comment #34)
> this is now on Aurora, so far I see a linux32 svg-asap regression of 5.93%
> http://alertmanager.allizom.org:8080/alerts.
> html?rev=e2ce1aac996e&showAll=1&testIndex=0&platIndex=0
> 
It is also on linux32 and linux64 svg-opacity, svg-asap, tab animation, custom animation, canvasmark as can be seen in the link.
(In reply to Chris Lord [:cwiiis] from comment #29)
> SVG Opacity was definitely broken before - assuming the same things are
> happening as when I tested similar changes a while ago, the Linux numbers
> are now probably more in-line with the Windows and Mac numbers. SVG Opacity,
> if I remember correctly when talking to Avi about it, is a test that only
> gets run once, and previously, the timing would have been taken before X
> actually got to executing the commands (which explains the huge decrease).

For tests doing much drawing, it would be helpful to trigger an artificial
wait for the server so that the tests measures the time to the completion of
drawing on the server.  This would be most indicative of user experience.

See attachment 460193 [details] [diff] [review] for how this was done for tscroll.
Blocks: 994541
No longer depends on: 994541
Keywords: regression
Depends on: 1181006
(In reply to Nicolas Silva [:nical] from comment #12)
> Profiling svgr on try showed that most of the
> time is spent waiting for the x server (to which we gave loads and loads of
> drawing commands). As far as I know before OMTC we didn't wait for the x
> server to finish doing its work when firing AFTERPAINT and we have to with
> OMTC so the noise, while it may be the symptom of something bad for the
> user, doesn't mean it is actually a regression from the user's perspective.

I'm not sure what waiting for the X server is happening here but having two
threads should not make this necessary.

Anything that causes us to be waiting in _XReply for the server means that the
advantages of the async pipeline have been lost, and would affect user experience.
Let's close this now. We'll keep investigating performance and look into removing sync round trips to the x server but that'll be for the sake of improving things rather than to fix this regression which we have been shipping with for a while now.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.