Closed Bug 1378966 Opened 7 years ago Closed 7 years ago

fennec does not run rAF or rIC callbacks at 60fps unless screen is touched

Categories

(Firefox for Android Graveyard :: General, defect)

defect
Not set
normal

Tracking

(firefox56 fixed)

RESOLVED FIXED
Firefox 56
Tracking Status
firefox56 --- fixed

People

(Reporter: bkelly, Assigned: snorp)

References

Details

Attachments

(2 files, 1 obsolete file)

STR:

1. Open this site in fennec https://timer-flood.glitch.me/
2. Do nothing and just wait for the numbers to stabilize
3. Observe that they fluctate around 40 to 50 (on my nexus 5x)
4. Tap the screen somewhere
5. Observe that the rAF and rIC numbers increase to ~60 fps for a short time after tapping

I have no explanation for why tapping the screen would allow us to run requestAnimationFrame faster.

I don't observe this behavior on desktop, so filing under fennec for now.
I have a way to profile this, so I'll take a look.
Assignee: nobody → snorp
I get the 40-50 rAF rate, but it does not increase after tapping.
Here's the profile: https://perfht.ml/2tVhIez

It looks like both gecko and compositor threads spend most of their time waiting, so we're just doing something wrong with the scheduling -- or the test is flawed.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #3)
> It looks like both gecko and compositor threads spend most of their time
> waiting, so we're just doing something wrong with the scheduling -- or the
> test is flawed.

The test is pretty simple:

https://glitch.com/edit/#!/timer-flood

Is there any kind of weird prioritization of apps in android?  Like you get run slower if the user isn't interacting?
Sometimes we have gaps of multiple frames worth of time beyween RefreshDriver ticks.
I guess it is taking us 100ms to paint the screen on each refresh tick.  That could account for a lot here, no?
Oh, sorry.  That was 100ms over a range... each paint is 16ms to 25ms.  That's still kind of slow and would prevent us from hitting 60fps.
I did some analysis of the nsRefreshDriver. These numbers are from my Pixel. I counted how often the refresh driver is ticked per second. I also counted how often rAF is called every second and then counted how often IsWaitingForPaint returns true every second which prevents the rAF from being called in a tick.

ticks: 75 rAF: 41 IsWaitingForPaint: 34
ticks: 76 rAF: 43 IsWaitingForPaint: 33
ticks: 80 rAF: 44 IsWaitingForPaint: 36
ticks: 76 rAF: 47 IsWaitingForPaint: 29
ticks: 76 rAF: 42 IsWaitingForPaint: 34
ticks: 79 rAF: 51 IsWaitingForPaint: 28
ticks: 73 rAF: 39 IsWaitingForPaint: 34
ticks: 77 rAF: 39 IsWaitingForPaint: 38
ticks: 77 rAF: 43 IsWaitingForPaint: 33
ticks: 78 rAF: 41 IsWaitingForPaint: 37
ticks: 78 rAF: 42 IsWaitingForPaint: 36
ticks: 80 rAF: 47 IsWaitingForPaint: 32
ticks: 76 rAF: 50 IsWaitingForPaint: 26
ticks: 77 rAF: 43 IsWaitingForPaint: 33
ticks: 76 rAF: 43 IsWaitingForPaint: 33
ticks: 80 rAF: 49 IsWaitingForPaint: 31
ticks: 81 rAF: 46 IsWaitingForPaint: 35
ticks: 77 rAF: 48 IsWaitingForPaint: 29
ticks: 74 rAF: 41 IsWaitingForPaint: 33
ticks: 79 rAF: 54 IsWaitingForPaint: 25

First, ticks = rAF + IsWaitingForPaint

So the lost rAF are because we are waiting on the compositor.

Since the compositor is mostly idle why is the refresh driver waiting on the compositor so much?

Results while tapping the screen:

ticks: 78 rAF: 58 IsWaitingForPaint: 20
ticks: 80 rAF: 60 IsWaitingForPaint: 20
ticks: 78 rAF: 58 IsWaitingForPaint: 20
ticks: 80 rAF: 60 IsWaitingForPaint: 20
ticks: 80 rAF: 60 IsWaitingForPaint: 20
ticks: 80 rAF: 60 IsWaitingForPaint: 20
ticks: 80 rAF: 60 IsWaitingForPaint: 20
Removing the animated GIF from the page bump rAF and rIC rate to 57-59Hz so something about the animated GIF seems to be causing issues in the compositor. Tapping the screen will still get the rate to 60Hz for both. So there seem to be two issues, 1) The animated GIF is killing compositor performance, 2) The refresh driver on android is still blocking the compositor even with out the animated GIF from just updating the text on screen once a second. I see average of two dropped frames where the refresh driver is blocked waiting on the compositor.
kats, any idea what is going on here. It seems the compositor thread is mostly idle so it would seem something may be preventing the finished transaction from getting back to the main thread in a timely manner?
Flags: needinfo?(bugmail)
I investigated this a little bit. Haven't gotten to the bottom of it yet, but things are definitely fishy. Our actual composition rate is ~25 fps (you can see this if you turn on the layers.acceleration.draw-fps pref, and I verified with additional logging). The transaction rate seems to roughly match the reported rAF rate, but we're basically lumping together two transactions into every composite. It almost seems like that's just timing or something because on the compositor I see SetNeedsComposite get called twice in one vsync interval, followed by zero calls in the next couple of vsync intervals. But it's far too consistent to ignore, I think. Anyway I'll keep looking at it tomorrow.
Ok, so the animated gif is set to mostly have a inter-frame delay of 50ms [1] which means at full throttle we'll be painting 20 frames a second. That is indeed what seems to be happening - we're generating about 20 transactions per second from the main thread. Except those 20 transactions all set the "repeat transaction" flag, which ends up sending 40 transactions per second IPDL over to the compositor. It also means that each paint increments the refresh driver's mPendingTransaction counter twice, and therefore immediately causes the mWaitingForTransaction flag to be set to true [2]. Until the compositor composites the transaction and returns a DidComposite message, the main thread aborts refresh driver ticks early [3], before rAF is triggered.

As far as I can tell vsync is ticking every at 60fps consistently. In addition to that, every time the compositor sends back the DidComposite message, because mWaitingForTransaction is true on the content side, we kick off a refresh as well. That happens at roughly 20fps. Therefore the refresh driver is actually getting ticked at 60+20 = 80fps. The mWaitingForTransaction flag is set to true around one-third of the total time, because of the 20fps transaction rate (generally each "real" transaction will leave mWaitingForTransaction as true for one vsync tick). Therefore of the 80 times/second the refresh driver is getting ticked, only two-thirds of those times does it actually run rAF, which gives a rAF frequency of 80 * 2/3 = 53.3/second.

I haven't yet figured out why touching the screen increases the rAF frequency, but I wrote a patch that avoids setting mWaitingForTransaction to true for repeat transactions, and that fixes the above behaviour so that rAF hits 60 consistently, at least for me.

Also for the record on my device the rIC rate is being reported at around 20-30 but I didn't investigate that at all.

[1] via ImageMagick's `identify -format "%T" <gif>` - each tick is 1/100 of a second, so 2 of the 35 frames have 100ms delays, the rest have 50ms.
[2] http://searchfox.org/mozilla-central/rev/b7e531410ebc1c7d74a78e86a894e69608db318c/layout/base/nsRefreshDriver.cpp#2171
This is the patch I wrote that fixed the rAF for me. Matt, any thoughts on this patch? See analysis in the previous comment.
Flags: needinfo?(bugmail)
Attachment #8885279 - Flags: feedback?(matt.woodrow)
Attached patch Logging patchSplinter Review
Stashing my logging patch here as well in case anybody needs it later
Just to provide a bit more context for comment 12: the numbers I posted there are all "averages" in that they are normalized into "per second" values. However the actual distribution of the ticks and messages is also relevant. For example I said that mWaitingForTransaction is true around one-third of the time, so one-third of the refresh driver ticks will get blocked. But in reality it might be that more than one-third of the refresh driver ticks get blocked, depending on their exact distribution over the relevant time interval. This might account for the 40-50 rAF rate that :bkelly reported previously. Touch events might very well disturb this distribution by triggering a higher paint rate (instead of the low 20fps) and that alone might be enough to cause the observed behaviour.

 Also all the numbers are rounded for simplicity but tend to vary +/- quite a bit.
Can you elaborate on what the repeat flag is doing here? It looks like we use it for progressive paint and/or low-res tiles when there is still more stuff to draw. Surely we aren't updating the low-res tiles that are covered by the viewport and we only need the single high-res update for the GIF?

Presumably we've had this bug ever since the compositor-based throttling was introduced, which is...a little worrying. I wonder if some kind of telemetry around throttling would've helped, but probably not.
Flags: needinfo?(bugmail)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #16)
> Can you elaborate on what the repeat flag is doing here?

I didn't look into why it was getting set. It does seem worth investigating, although I'm not sure if I have the time to do it today. ni me if you still want me to look into it - I can get to it later this week.
Flags: needinfo?(bugmail)
> Also for the record on my device the rIC rate is being reported at around 20-30 but I didn't investigate that at all.

If we are firing refresh ticks at 80HZ, the idle detection could should indeed find less idle time.  I guess my only question about this would be if there should be a better way to handle this animated GIF without triggering refresh ticks faster than 60HZ.
Comment on attachment 8885279 [details] [diff] [review]
Don't throttle refresh driver on repeat transactions

Review of attachment 8885279 [details] [diff] [review]:
-----------------------------------------------------------------

This seems pretty reasonable.

The other alternative would be to only allocate an id for the last transaction in a series of repeats.
Attachment #8885279 - Flags: feedback?(matt.woodrow) → feedback+
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
> The other alternative would be to only allocate an id for the last
> transaction in a series of repeats.

That seems like it would be more complex to implement because we only know if the transaction triggered a repeat after we allocate the id. And I already have this patch :) I'll add some comments and put it up for proper review.
Comment on attachment 8885699 [details]
Bug 1378966 - Don't cause immediate throttling of the refresh driver on repeat transactions.

https://reviewboard.mozilla.org/r/156530/#review161822
Attachment #8885699 - Flags: review?(matt.woodrow) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6ddf3b3e56d3
Don't cause immediate throttling of the refresh driver on repeat transactions. r=mattwoodrow
https://hg.mozilla.org/mozilla-central/rev/6ddf3b3e56d3
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
Ben, can you verify you get rAF at 60 Hz on a build with this patch?
Flags: needinfo?(bkelly)
I tested on a Pixel. The rAF is improved but still fluctuates between 57-60Hz while rIC is unchanged at around 40Hz. Of course tapping on the screen brings both rAF and rIC to 60Hz sustained with no fluctuations.
I guess we should clone this bug and continue investigating then.
I get similar results to comment 8 on my nexus 5x as well.
Flags: needinfo?(bkelly)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: