Closed Bug 1138502 Opened 9 years ago Closed 9 years ago

6% Talos Regression on tresize PGO on OSX w/ Silk Enabled

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

firefox39

---

fixed

People

(Reporter: mchang, Assigned: mchang)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(1 file)

Reset unobserve counter on ForceComposites 9 years ago Mason Chang [Inactive] [:mchang] 2.66 KB, patch	mstange : review+	Details \| Diff \| Splinter Review

Mason Chang [Inactive] [:mchang]

Assignee

Description

•

9 years ago

From https://bugzilla.mozilla.org/show_bug.cgi?id=1128690#c8

There is a 6% talos regression on the tresize talos test. This occurs with just the vsync compositor (http://compare-talos.mattn.ca/dev/?oldBranch=Try&oldRevs=742011f28b13&newBranch=Try&newRev=55d3493940ba&submit=true). Find out why and fix.

Mason Chang [Inactive] [:mchang]

Assignee

Comment 1

•

9 years ago

With vsync compositor - https://treeherder.mozilla.org/#/jobs?repo=try&revision=83cb4a12cda7

Base - https://treeherder.mozilla.org/#/jobs?repo=try&revision=36c6ba738947

One profile w/ vsync compositor - http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/5fda6478a6adaa4c7a3cc7517834c11b04288a5abc70dd5ea82485c7d196beba107ac92b0543e15055107f33a3d553b7d2dad4d03fb651e38add8c67fcc87516&pathInZip=profile_tresize/tresize/cycle_17.sps#report=56cf2a2213b084bfc777edf078f834c07dbbf0bc

From the profiles, the tresize test does not use ASAP mode. In addition, the test uses Forced sync composition. Essentially, the vsync callback on OS X takes up small amounts of CPU every resize, which adds up over the course of the test. I'm not sure there is much we can do about this, so I think we'll have to accept the regression. Closing as WONTFIX.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

9 years ago

I am not sure I see this regression:
http://graphs.mozilla.org/graph.html#tests=[[254,63,24],[254,63,21]]&sel=1424025209626.498,1425394930978.5178&displayrange=30&datatype=geo

is this a different platform maybe?  I wanted to ensure we annotated any alerts or other bugs to help us keep track of this.

Mason Chang [Inactive] [:mchang]

Assignee

Comment 3

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #2)
> I am not sure I see this regression:
> http://graphs.mozilla.org/graph.html#tests=[[254,63,24],[254,63,
> 21]]&sel=1424025209626.498,1425394930978.5178&displayrange=30&datatype=geo
> 
> is this a different platform maybe?  I wanted to ensure we annotated any
> alerts or other bugs to help us keep track of this.

It's not on inbound yet, this is still on Try. I wanted to test the regressions before actually landing :)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

9 years ago

Blocks: 1138995

Mason Chang [Inactive] [:mchang]

Assignee

Comment 4

•

9 years ago

Markus found a nice thing in the profile where we keep enabling/disabling vsync, which causes the regressions. We should be able to fix this.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Mason Chang [Inactive] [:mchang]

Assignee

Comment 5

•

9 years ago

Attached patch Reset unobserve counter on ForceComposites — Details — Splinter Review

Reset the unobserve count in the CompositorVsyncObserver on force composites, which will force vsync to stay enabled. Will wait on try then ask for review.

Mason Chang [Inactive] [:mchang]

Assignee

Comment 6

•

9 years ago

With this patch, we go down to a 3% regression

http://compare-talos.mattn.ca/dev/?oldBranch=Try&oldRevs=b77ed0031ed6&newBranch=Try&newRev=c2ee333a0503&submit=true

Still trying to get a SPS profile, lots of crashes when running with the profiling option on talos.

Joel Maher ( :jmaher ) (UTC -8)

Comment 7

•

9 years ago

Mason, thanks for looking into this!  This is great progress:)

Vladan Djeric (:vladan)

Comment 8

•

9 years ago

(In reply to Mason Chang [:mchang] from comment #6)
> Still trying to get a SPS profile, lots of crashes when running with the
> profiling option on talos.

This might be the crash I reported in bug 1139506. If so, it should be fixed in the next nightly

Mason Chang [Inactive] [:mchang]

Assignee

Comment 9

•

9 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #8)
> (In reply to Mason Chang [:mchang] from comment #6)
> > Still trying to get a SPS profile, lots of crashes when running with the
> > profiling option on talos.
> 
> This might be the crash I reported in bug 1139506. If so, it should be fixed
> in the next nightly

Yup, that fixed it, thanks! https://treeherder.mozilla.org/#/jobs?repo=try&revision=1793fc9aebef

Mason Chang [Inactive] [:mchang]

Assignee

Comment 10

•

9 years ago

Here are two profiles of two runs:
https://people.mozilla.org/~bgirard/cleopatra/#report=46b785500be083729b169243a0021586c4b22e5f
https://people.mozilla.org/~bgirard/cleopatra/#report=d7cca21db9a1fb706411ba7264662340506f60e9

Sometimes we see that the vsync occurs right after the layer transaction but before the compositor thread gets to the sync RecvFlushRendering on the compositor thread, so we composite based on the vsync and the RecvFlushRendering is delayed due to the compositor thread being busy. This is fine, but is still a little extra overhead since the RecvFlushRendering should just return true.

We also have the other case where the RecvFlushRendering occurs first right at vsync. In these cases, we have to cancel the current composite task, which has to grab a lock and cancel the task. Finally, we also still have the overhead of the vsync callback. I can't really see these cases in the profiles, but that's where we do extra work now compared to pre-silk.

Mason Chang [Inactive] [:mchang]

Assignee

Comment 11

•

9 years ago

Try on an older parent w/o vsync w/ SPS profile
https://treeherder.mozilla.org/#/jobs?repo=try&revision=71aab273077b

https://people.mozilla.org/~bgirard/cleopatra/#report=88168ef6c3df7624f7b10c37401d2c32dc0215c6

Mason Chang [Inactive] [:mchang]

Assignee

Comment 12

•

9 years ago

Comment on attachment 8572168 [details] [diff] [review]
Reset unobserve counter on ForceComposites

Review of attachment 8572168 [details] [diff] [review]:
-----------------------------------------------------------------

From irc: We shouldn't hold up silk because of the 3% regression, which is 21.66 ms +- 0.7 to 22.33 +- 0.5 ms. Markus noticed a gap between the layer transaction and the composite which doesn't occur w/o vsync in the profiles. 

I will follow up with a local check to see what happens between a layer transaction the RecvFlushComposite, but this patch keeps vsync enabled, which was the major regression.

Attachment #8572168 - Flags: review?(mstange)

Markus Stange [:mstange]

Comment 13

•

9 years ago

Comment on attachment 8572168 [details] [diff] [review]
Reset unobserve counter on ForceComposites

Review of attachment 8572168 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorParent.cpp
@@ +348,5 @@
> +{
> +  /**
> +   * bug 1138502 - If we receive a ForceComposite, we no longer
> +   * need to composite on vsyncs. However, if we receive enough ForceComposites
> +   * in a row like in talos, we will keep unobserving from vsync then

This comment describes how things were before this patch, and might be confusing to readers who don't know the history of this code. Can you reformulate it so that it expresses that the bad behavior would only happen if this method didn't exist? Or express it as a behavior that we don't want, maybe?

I also suggest changing "like in talos" to ", e.g. during long-running window resizing" or something like that.

Attachment #8572168 - Flags: review?(mstange) → review+

Mason Chang [Inactive] [:mchang]

Assignee

Comment 14

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ecfaa4538984

Phil Ringnalda (:philor)

Comment 15

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ecfaa4538984

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

9 years ago

Keywords: perf, regression

Whiteboard: [talos_regression]

Mason Chang [Inactive] [:mchang]

Assignee

Comment 16

•

9 years ago

Just a heads up, bug 1137905 is in mozilla-inbound, so we expect a few small decreases / improvements.

Mason Chang [Inactive] [:mchang]

Assignee

Comment 17

•

9 years ago

So I figured out the last 0.5 ms. It's not really a regression in silk versus the previous timings were just lucky. While running the test, we schedule the next composite after we finish compositing. When the layer transaction occurs, we now need to composite but the next thing on the message loop is the scheduled composite task. This task then executes on the compositor thread BEFORE the sync RecvFlushRendering call occurs. Because of this, in the profiles, it looks like there is 0 gap between the LayerTransaction and the Composite. Even without vsync, there is ~0.5 ms latency between the LayerTransaction and the RecvFlushRendering.

With Silk, we don't schedule anything on the compositor loop, so we have to wait for the sync RecvFlushRendering to composite, which creates the gap in the profiles.

If you look at profiles on master w/o silk (https://people.mozilla.org/~bgirard/cleopatra/#report=efc3c742ef127d4573fda59d4dccb5d9a8b9d767&selection=0,1095), versus profiles with silk (https://people.mozilla.org/~bgirard/cleopatra/#report=49009d8f2a1f7450b6c3ce8a1b0482e4382e189d), the number of composites from RecvFlushComposites is much higher in Silk profiles versus non-silk profiles. 

We also sometimes get lucky with Silk where the vsync composite occurs before the RecvFlushRendering, in which case we get the same advantage. Overall though, the latency between a LayerTransaction and a RecvFlushRendering is ~0.5ms, which is the hit we're seeing here.

You need to log in before you can comment on or make changes to this bug.