1529486 - Dropped frames on youtube.com/tv using GeckoView on FireTV

Reporter

Description

•

5 years ago

Here are some profiles of loading a youtube video on the Fire TV Stick 4K. I noticed that we get a dropped frame every 3-7s on GeckoView but not on Webview.

Profile of selecting a youtube video and play starting:
https://perfht.ml/2TW7BR6
Note auto play didn't work on GV so I had to manually press play

Profiles during playback:
https://perfht.ml/2TZMxJx
https://perfht.ml/2TZMoWq

Codec: vp9
Hardware Accelerated: YES
Resolution 1280x720
Frame Rate: 29.825

Vicky Chin [:vchin]

Reporter

Comment 1

•

5 years ago

YouTube Video Link: https://www.youtube.com/tv#/watch/video/control?v=WifRMy2oOaw&resume

Vicky Chin [:vchin]

Reporter

Updated

•

5 years ago

Flags: needinfo?(matt.woodrow)

Flags: needinfo?(jyavenard)

Flags: needinfo?(cpearce)

Jean-Yves Avenard [:jya]

Updated

•

5 years ago

Flags: needinfo?(jyavenard) → needinfo?(jolin)

Jean-Yves Avenard [:jya]

Comment 2

•

5 years ago

it would be pretty hard to notice a single frames dropped when watching content at 30fps.

That's .4%, YT doesn't bother with frames dropped until it's over 5%.

Anyhow, can you install extension with GV? installing the media devtools would be great and list the amount of frames dropped on the decoding side.

Would allow to determine if the issue is in the compositor (likely if it's a single frame dropped) or in the decoder not being fast enough.

AMD mentioned that they noticed our refresh rate was sometimes slightly over 60fps, this caused their drivers to drop one of the frame we were trying to paint.

Could be a similar issue on android.

Jeff Muizelaar [:jrmuizel]

Comment 3

•

5 years ago

We're not getting proper C++ symbols on those profiles.

Chris Peterson [:cpeterson]

Updated

•

5 years ago

Group: mozilla-employee-confidential

Whiteboard: [gvtv]

Vicky Chin [:vchin]

Reporter

Comment 4

•

5 years ago

Sorry I had a custom GVE build. Here are some new profiles with latest nightly GVE.

Here are profiles with symbols:
Selecting youtube video and play starting:
http://bit.ly/2TWeYI6

During Playback:
http://bit.ly/2U0XPx3
http://bit.ly/2U29hIH

Matt Woodrow (:mattwoodrow)

Comment 5

•

5 years ago

The compositor side profile looks like it's running really smoothly, we spend a bit of time waiting on something when we get a new frame (uploading to the gpu?), but doesn't look like it's causing us to miss frames?

How are you determining that frames are dropped? Using the youtube stats for nerds overlay?

It would be really nice if we could get more profiler markers, especially if the media devtools addon doesn't work for mobile.

I see PImageBridge::Msg_Update called when we get a new frame, having a marker for that would make it easier to spot when there are multiple in a row without a composite.

We could also add an explicit marker when the compositor thread records that it dropped a video frame.

Jean-Yves, Can we also have explicit markers for the other causes of reported frame drops? I see 'Media Playback #1' in the thread list, could we identify when we increment the dropped frame count on that thread? (Or another thread that we can start including in our profiles).

Flags: needinfo?(matt.woodrow) → needinfo?(jyavenard)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 6

•

5 years ago

I've filed bug 1529738 to add markers for when we drop frames in the decode pipeline. If we could get markers for frames dropped in the compositor that would be handy.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 7

•

5 years ago

The other thing to keep in mind here is that with Gecko the actual decode happens in a Java child process, which the profiler won't be sampling. I don't know whether WebView also does this. If they don't, the overhead of running the Java process may contribute to our higher resource usage?

John Lin [:jhlin][:jolin]

Assignee

Comment 8

•

5 years ago

(In reply to Chris Pearce [:cpearce (GMT+13)] from comment #7)

The other thing to keep in mind here is that with Gecko the actual decode happens in a Java child process, which the profiler won't be sampling. I don't know whether WebView also does this. If they don't, the overhead of running the Java process may contribute to our higher resource usage?

Yes, GV does some additional works (mainly IPC transactions and memory copying) compare to WV because of remote decoding. If the CPUs are extremely busy like we saw on Oculus Go in bug 1463904, it could cause dropped frames. Although I doubt that WV can play smoothly in that situation.

Flags: needinfo?(jolin)

Matt Woodrow (:mattwoodrow)

Comment 9

•

5 years ago

I've filed bug 1529795 for the compositor side markers that I asked for.

Vicky Chin [:vchin]

Reporter

Comment 10

•

5 years ago

Profiles from the Cube using the same APK and video

Selecting video and play starting
http://bit.ly/2TWTkU6

During playback:
http://bit.ly/2U2RXTN

X-ray and Advanced options show much higher CPU usage and dropped frames.

John Lin [:jhlin][:jolin]

Assignee

Comment 11

•

5 years ago

Just received the Stick 4K and Cube, installed a local built Firefox on both, and check the video in comment 1. Here are some initial observations:

The 'Frames Dropped' number seems to be calculated from log messages with tag MediaCodecLogger, which are not present in Android and looks like Fire OS customized code. Although the "Frames Dropped"[1] number keeps increasing when playing on Stick 4K, there is no frame dropped and reported to YT's 'stats for nerds' by Gecko media playback pipeline. I also didn't notice dropped frames when watching the video.
On the other hand, Gecko does drop frames when playing the video on Cube and often triggers the catching up seeking behavior. Checking CPU usage with top reveals that Alexa constantly consumes 20% or more CPU in the background (even when the mic is turned off) and leaves much less CPU time for Gecko to play video.

I will keep investigating this issue and see if there are some ways to improve.

[1] https://developer.amazon.com/docs/fire-tv/system-xray-developer-tools.html#video

Assignee: nobody → jolin

Status: NEW → ASSIGNED

Priority: -- → P1

John Lin [:jhlin][:jolin]

Assignee

Comment 12

•

5 years ago

I also noticed some WV threads have better priority and nice values, which could give it some advantage over GV on CPU scheduling.

Chris Peterson [:cpeterson]

Comment 13

•

5 years ago

John, can you tell if WV and GV are using the devices' hardware decoder? Amazon's hardware specs say both the Stick 4K and Cube have H.264 and VP9 hardware decoders.

status-firefox67: --- → affected

Flags: needinfo?(jolin)

John Lin [:jhlin][:jolin]

Assignee

Comment 14

•

5 years ago

(In reply to Chris Peterson [:cpeterson] from comment #13)

John, can you tell if WV and GV are using the devices' hardware decoder? Amazon's hardware specs say both the Stick 4K and Cube have H.264 and VP9 hardware decoders.

Yes, both WV and GV use HW(OMX.xxx) decoders for H.264 and VP9.

Flags: needinfo?(jolin)

John Lin [:jhlin][:jolin]

Assignee

Comment 15

•

5 years ago

It looks like GV frequently performs GC on both Cube and Stick 4K (adb shell top shows ~10% CPU is used by the HeapTaskDaemon thread). I think that's something we could improve by either creating a pool to recycle Java objects used by binder IPC between the content process and remote decoder service or (if feasible) replacing the service with the new RDD process introduced in bug 1471535. I will check the RDD code to see if it can be done.

Vicky Chin [:vchin]

Reporter

Updated

•

5 years ago

Comment 16

•

5 years ago

Can we get someone from the GC team to take a look?

Flags: needinfo?(sdetar)

John Lin [:jhlin][:jolin]

Assignee

Comment 17

•

5 years ago

To be more specific, the GC I referred to in comment 15 is Android JVM GC[1].

[1] https://developer.android.com/topic/performance/vitals/render#object_allocation_and_garbage_collection

Andrew Overholt [:overholt]

Comment 18

•

5 years ago

Thanks for clarifying it was JVM GC and not SpiderMonkey GC, John.

Flags: needinfo?(sdetar)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 19

•

5 years ago

I looked at the MediaPlayback threads in the above profile, a couple of things stood out:

Often we end up spending between 60-100ms calling android::AudioSystem::getOutputLatency(). I think this is doing JNI over to Java audio system? If this blocks the MDSM, this could potentially delay us pushing enough video frames to the compositor. Once my decode-pipeline frame drop markers in Bug 1529738 merge, we can retest and see if they coincide with the getOutputLatency() to test this theory. I don't understand why we have to keep polling the output latency. Maybe we can cache the value, or if we can get away with being a little sloppy with this value we could periodically have whatever is on the other side of this IPC push the value over to the content processes? Or pull it periodically on a background thread?
There are a few times that we end up spending 6ms or so in mozilla::MediaRawData::MediaRawData, inside mozilla::WebMDemuxer::GetNextPacket. Perhaps we should be using an arena here for the allocation of the AlignedBuffer, or recycle buffers, to help the allocation and re-allocation.
Also in mozilla::WebMDemuxer::GetNextPacket we do a lot of copying bytes from the SourceBufferResource's data to the MediaRawData. One option here would be for the SourceBufferResource to store a set of refcounted buffers, and instead of copying the data into the MediaRawData, we could pass a slice and a refcount on the source buffer to the MediaRawData. This would mean we wouldn't need to fix issue 2 (or at least, we'd need to fix it in SourceBufferResource's allocations, but we'd at least get rid of one copy).

jya: What do you think of the above?

Flags: needinfo?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 20

•

5 years ago

Nils pointed out that getOuputLatency() is probably being called inside this function:
https://searchfox.org/mozilla-central/rev/00f3836a87b844b5e4bc82f698c559b9966e4be2/media/libcubeb/src/cubeb_opensl.c#1474

If someone with a working build can write a patch to cache the results of that function in a static on its first call, that might alleviate repeated frame drops.

Paul Adenot (:padenot)

Comment 21

•

5 years ago

(In reply to Chris Pearce [:cpearce (GMT+13)] from comment #19)

I looked at the MediaPlayback threads in the above profile, a couple of things stood out:

Often we end up spending between 60-100ms calling android::AudioSystem::getOutputLatency(). I think this is doing JNI over to Java audio system? If this blocks the MDSM, this could potentially delay us pushing enough video frames to the compositor. Once my decode-pipeline frame drop markers in Bug 1529738 merge, we can retest and see if they coincide with the getOutputLatency() to test this theory. I don't understand why we have to keep polling the output latency. Maybe we can cache the value, or if we can get away with being a little sloppy with this value we could periodically have whatever is on the other side of this IPC push the value over to the content processes? Or pull it periodically on a background thread?

This takes a global lock in the audio system. A way to do this better is to register an listener to know when the audio output device changes, and re-query the audio output latency. The fast way to fix this is to only query this very rarely, and to cache it.

This value can change if the default audio output device changes, for example if you output the sound of the TV via bluetooth.

Paul Adenot (:padenot)

Comment 22

•

5 years ago

This is probably because Alexa is running on the cube, and this lock is contended a bit more than on other devices. Not a hard fix I think.

Alex, do you think you could write a patch that make us call this very rarely or never ?

Flags: needinfo?(achronop)

John Lin [:jhlin][:jolin]

Assignee

Comment 23

•

5 years ago

(In reply to Chris Pearce [:cpearce (GMT+13)] from comment #20)

Nils pointed out that getOuputLatency() is probably being called inside this function:
https://searchfox.org/mozilla-central/rev/00f3836a87b844b5e4bc82f698c559b9966e4be2/media/libcubeb/src/cubeb_opensl.c#1474

If someone with a working build can write a patch to cache the results of that function in a static on its first call, that might alleviate repeated frame drops.

Just tried that in my local GVE build. Still saw frame drops on Cube. :(

Alex Chronopoulos [:achronop]

Comment 24

•

5 years ago

•

Edited

Listener for device changes in android does not exist. We could cache the value. It will not be very elegant but it's the fastest solution right now. That means that on a device change A/V will be out of sync until the next update.

Flags: needinfo?(achronop)

John Lin [:jhlin][:jolin]

Assignee

Comment 25

•

5 years ago

Attached patch Quick hack to elminate per-frame Java method call. — Details — Splinter Review

The patch used for comment 23.

John Lin [:jhlin][:jolin]

Assignee

Comment 26

•

5 years ago

Two profiles during playback with patch in comment 25 applied:

https://perfht.ml/2XCgEsv
https://perfht.ml/2XDjtcT

John Lin [:jhlin][:jolin]

Assignee

Comment 27

•

5 years ago

Forgot to mention: the profiles were recorded on Cube.

John Lin [:jhlin][:jolin]

Assignee

Comment 28

•

5 years ago

Forgot to include media threads in previous profiles. These are the correct ones:

https://perfht.ml/2XzzZL7
https://perfht.ml/2XF3Gdp

John Lin [:jhlin][:jolin]

Assignee

Comment 29

•

5 years ago

Built fxtv locally and compared the frame drop rates when playing the 720p video in comment 1: ~4%-7% with per-frame latency value updates, and ~2%-4% with cached value using attachment 9048252 [details] [diff] [review] applied. It seems worthwhile to me, but still needs to find a way to observe output device changes.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 30

•

5 years ago

I just tested, and Silk browser also drops about 5% frames on the same Mysterious Universe NOVA documentary video on my FireCube.

John Lin [:jhlin][:jolin]

Assignee

Updated

•

5 years ago

Depends on: 1308405

Chris Peterson [:cpeterson]

Updated

•

5 years ago

status-firefox65: --- → wontfix

Whiteboard: [gvtv] → [gvtv:p1]

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

status-firefox66: --- → ?

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 31

•

5 years ago

The fix for getOuputLatency() is included in the patches in bug 1531833.

Depends on: 1531833

Adam Stevenson [:adamopenweb]

Updated

•

5 years ago

Whiteboard: [gvtv:p1] → [gvtv:p1] [media-q2]

Chris Peterson [:cpeterson]

Updated

•

5 years ago

Updated

•

5 years ago

Blocks: gvtv-perf-mvp

Chris Peterson [:cpeterson]

Comment 32

•

5 years ago

John says he's looking at the priority of compositor threads or raising the thread priorities of the media playback or decoder thread pool.

cpearce says he sometimes sees dropped frames that are not counted by Gecko's dropped frame accounting. John says the Android decoders never drop frames internaly, so perhaps these drops are related to surface buffering?

status-firefox66: ? → wontfix

status-firefox67: affected → fix-optional

status-firefox68: --- → affected

John Lin [:jhlin][:jolin]

Assignee

Comment 33

•

5 years ago

(In reply to Chris Peterson [:cpeterson] from comment #32)

John says he's looking at the priority of compositor threads or raising the thread priorities of the media playback or decoder thread pool.

Looks like there have been discussions about raising compositor thread priority and the conclusion is against it.

John Lin [:jhlin][:jolin]

Assignee

Comment 34

•

5 years ago

^ in bug 1366353.

Bobby Holley (:bholley)

Comment 35

•

5 years ago

(In reply to John Lin [:jhlin][:jolin] from comment #33)

(In reply to Chris Peterson [:cpeterson] from comment #32)

John says he's looking at the priority of compositor threads or raising the thread priorities of the media playback or decoder thread pool.

Looks like there have been discussions about raising compositor thread priority and the conclusion is against it.

Do we have any other paths forward to further reduce the frame drops? Do you think the current drop rate should block the GVTV MVP?

Flags: needinfo?(jolin)

John Lin [:jhlin][:jolin]

Assignee

Comment 36

•

5 years ago

(In reply to Bobby Holley (:bholley) from comment #35)

(In reply to John Lin [:jhlin][:jolin] from comment #33)

Do we have any other paths forward to further reduce the frame drops? Do you think the current drop rate should block the GVTV MVP?

For Stick 4K, I think bug 1543625 would be a blocker if playing 60fps video is a must-have for GVTV MVP. The 14-15ms updateTexImage() time in compositor thread makes it nearly impossible to complete a frame in 16ms.

Flags: needinfo?(jolin)

Jean-Yves Avenard [:jya]

Updated

•

5 years ago

Flags: needinfo?(jyavenard)

Adam Stevenson [:adamopenweb]

Comment 37

•

5 years ago

Cpeterson - is this bug still important to us or should we close it for now?

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 38

•

5 years ago

(In reply to Adam Stevenson [:adamopenweb] from comment #37)

Cpeterson - is this bug still important to us or should we close it for now?

GeckoView video performance on Fire TV is not a priority right now. I'd rather not close this bug unless we EOL FFTV because I assume the dropped frames are still a problem. Can we instead lower the bug priority to P3 or P5?

status-firefox67: fix-optional → wontfix

status-firefox68: affected → wontfix

Flags: needinfo?(cpeterson)

Summary: Dropped frames on youtube.com/tv using geckoview on FireTV → Dropped frames on youtube.com/tv using GeckoView on FireTV

John Lin [:jhlin][:jolin]

Assignee

Comment 39

•

4 years ago

Demote it to P5.

Priority: P1 → P5

u597032

Comment 40

•

2 years ago

https://support.mozilla.org/en-US/kb/end-support-firefox-amazon-devices

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE