Closed Bug 1138502 Opened 9 years ago Closed 9 years ago

6% Talos Regression on tresize PGO on OSX w/ Silk Enabled

Categories

(Core :: Graphics, defect)

37 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: mchang, Assigned: mchang)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(1 file)

From https://bugzilla.mozilla.org/show_bug.cgi?id=1128690#c8

There is a 6% talos regression on the tresize talos test. This occurs with just the vsync compositor (http://compare-talos.mattn.ca/dev/?oldBranch=Try&oldRevs=742011f28b13&newBranch=Try&newRev=55d3493940ba&submit=true). Find out why and fix.
With vsync compositor - https://treeherder.mozilla.org/#/jobs?repo=try&revision=83cb4a12cda7

Base - https://treeherder.mozilla.org/#/jobs?repo=try&revision=36c6ba738947

One profile w/ vsync compositor - http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/5fda6478a6adaa4c7a3cc7517834c11b04288a5abc70dd5ea82485c7d196beba107ac92b0543e15055107f33a3d553b7d2dad4d03fb651e38add8c67fcc87516&pathInZip=profile_tresize/tresize/cycle_17.sps#report=56cf2a2213b084bfc777edf078f834c07dbbf0bc

From the profiles, the tresize test does not use ASAP mode. In addition, the test uses Forced sync composition. Essentially, the vsync callback on OS X takes up small amounts of CPU every resize, which adds up over the course of the test. I'm not sure there is much we can do about this, so I think we'll have to accept the regression. Closing as WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
I am not sure I see this regression:
http://graphs.mozilla.org/graph.html#tests=[[254,63,24],[254,63,21]]&sel=1424025209626.498,1425394930978.5178&displayrange=30&datatype=geo

is this a different platform maybe?  I wanted to ensure we annotated any alerts or other bugs to help us keep track of this.
(In reply to Joel Maher (:jmaher) from comment #2)
> I am not sure I see this regression:
> http://graphs.mozilla.org/graph.html#tests=[[254,63,24],[254,63,
> 21]]&sel=1424025209626.498,1425394930978.5178&displayrange=30&datatype=geo
> 
> is this a different platform maybe?  I wanted to ensure we annotated any
> alerts or other bugs to help us keep track of this.

It's not on inbound yet, this is still on Try. I wanted to test the regressions before actually landing :)
Markus found a nice thing in the profile where we keep enabling/disabling vsync, which causes the regressions. We should be able to fix this.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reset the unobserve count in the CompositorVsyncObserver on force composites, which will force vsync to stay enabled. Will wait on try then ask for review.
With this patch, we go down to a 3% regression

http://compare-talos.mattn.ca/dev/?oldBranch=Try&oldRevs=b77ed0031ed6&newBranch=Try&newRev=c2ee333a0503&submit=true

Still trying to get a SPS profile, lots of crashes when running with the profiling option on talos.
Mason, thanks for looking into this!  This is great progress:)
(In reply to Mason Chang [:mchang] from comment #6)
> Still trying to get a SPS profile, lots of crashes when running with the
> profiling option on talos.

This might be the crash I reported in bug 1139506. If so, it should be fixed in the next nightly
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #8)
> (In reply to Mason Chang [:mchang] from comment #6)
> > Still trying to get a SPS profile, lots of crashes when running with the
> > profiling option on talos.
> 
> This might be the crash I reported in bug 1139506. If so, it should be fixed
> in the next nightly

Yup, that fixed it, thanks! https://treeherder.mozilla.org/#/jobs?repo=try&revision=1793fc9aebef
Here are two profiles of two runs:
https://people.mozilla.org/~bgirard/cleopatra/#report=46b785500be083729b169243a0021586c4b22e5f
https://people.mozilla.org/~bgirard/cleopatra/#report=d7cca21db9a1fb706411ba7264662340506f60e9

Sometimes we see that the vsync occurs right after the layer transaction but before the compositor thread gets to the sync RecvFlushRendering on the compositor thread, so we composite based on the vsync and the RecvFlushRendering is delayed due to the compositor thread being busy. This is fine, but is still a little extra overhead since the RecvFlushRendering should just return true.

We also have the other case where the RecvFlushRendering occurs first right at vsync. In these cases, we have to cancel the current composite task, which has to grab a lock and cancel the task. Finally, we also still have the overhead of the vsync callback. I can't really see these cases in the profiles, but that's where we do extra work now compared to pre-silk.
Comment on attachment 8572168 [details] [diff] [review]
Reset unobserve counter on ForceComposites

Review of attachment 8572168 [details] [diff] [review]:
-----------------------------------------------------------------

From irc: We shouldn't hold up silk because of the 3% regression, which is 21.66 ms +- 0.7 to 22.33 +- 0.5 ms. Markus noticed a gap between the layer transaction and the composite which doesn't occur w/o vsync in the profiles. 

I will follow up with a local check to see what happens between a layer transaction the RecvFlushComposite, but this patch keeps vsync enabled, which was the major regression.
Attachment #8572168 - Flags: review?(mstange)
Comment on attachment 8572168 [details] [diff] [review]
Reset unobserve counter on ForceComposites

Review of attachment 8572168 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorParent.cpp
@@ +348,5 @@
> +{
> +  /**
> +   * bug 1138502 - If we receive a ForceComposite, we no longer
> +   * need to composite on vsyncs. However, if we receive enough ForceComposites
> +   * in a row like in talos, we will keep unobserving from vsync then

This comment describes how things were before this patch, and might be confusing to readers who don't know the history of this code. Can you reformulate it so that it expresses that the bad behavior would only happen if this method didn't exist? Or express it as a behavior that we don't want, maybe?

I also suggest changing "like in talos" to ", e.g. during long-running window resizing" or something like that.
Attachment #8572168 - Flags: review?(mstange) → review+
https://hg.mozilla.org/mozilla-central/rev/ecfaa4538984
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Keywords: perf, regression
Whiteboard: [talos_regression]
Just a heads up, bug 1137905 is in mozilla-inbound, so we expect a few small decreases / improvements.
So I figured out the last 0.5 ms. It's not really a regression in silk versus the previous timings were just lucky. While running the test, we schedule the next composite after we finish compositing. When the layer transaction occurs, we now need to composite but the next thing on the message loop is the scheduled composite task. This task then executes on the compositor thread BEFORE the sync RecvFlushRendering call occurs. Because of this, in the profiles, it looks like there is 0 gap between the LayerTransaction and the Composite. Even without vsync, there is ~0.5 ms latency between the LayerTransaction and the RecvFlushRendering.

With Silk, we don't schedule anything on the compositor loop, so we have to wait for the sync RecvFlushRendering to composite, which creates the gap in the profiles.

If you look at profiles on master w/o silk (https://people.mozilla.org/~bgirard/cleopatra/#report=efc3c742ef127d4573fda59d4dccb5d9a8b9d767&selection=0,1095), versus profiles with silk (https://people.mozilla.org/~bgirard/cleopatra/#report=49009d8f2a1f7450b6c3ce8a1b0482e4382e189d), the number of composites from RecvFlushComposites is much higher in Silk profiles versus non-silk profiles. 

We also sometimes get lucky with Silk where the vsync composite occurs before the RecvFlushRendering, in which case we get the same advantage. Overall though, the latency between a LayerTransaction and a RecvFlushRendering is ~0.5ms, which is the hit we're seeing here.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: