Closed Bug 643454 Opened 14 years ago Closed 7 years ago

Video is very choppy on Maemo/Android

Categories

(Core :: Audio/Video: Playback, defect)

Product:

Component:

Platform:

x86

Linux

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED INACTIVE

People

(Reporter: romaxa, Unassigned)

References

(
URL
)

Details

Attachments

(7 files, 3 obsolete files)

debug patch v0 14 years ago Matthew Gregan [:kinetik] 13.92 KB, patch		Details \| Diff \| Splinter Review
Output with patch 14 years ago Oleg Romashin (:romaxa) 1.40 KB, text/plain		Details
Enabled 8kb buffer size 14 years ago Oleg Romashin (:romaxa) 1012 bytes, text/plain		Details
Enabled 8kb buffer size 14 years ago Oleg Romashin (:romaxa) 1012 bytes, text/plain		Details
Enabled 8kb buffer size, both 14 years ago Oleg Romashin (:romaxa) 4.67 KB, text/plain		Details
pactl list 14 years ago Oleg Romashin (:romaxa) 39.92 KB, text/plain		Details
Profile data 14 years ago Oleg Romashin (:romaxa) 161.48 KB, application/x-gzip		Details
vp8 CPU usage 14 years ago Oleg Romashin (:romaxa) 5.39 KB, text/plain		Details
YUV software and hardware comprasion 14 years ago Oleg Romashin (:romaxa) 1.59 KB, text/plain		Details
Proposed API for user buffers in libtheora 13 years ago Timothy B. Terriberry (:derf) 5.19 KB, patch		Details \| Diff \| Splinter Review

Oleg Romashin (:romaxa)

Reporter

Description

•

14 years ago

I was playing with latest upstream fennec with patched pixman and found that video and sound is interrupting all the time. I've checked oprofile and found that ~45% CPU is free, but video and sound still choppy. IIRC ~ 2 month ago it was working fine

Oleg Romashin (:romaxa)

Reporter

Comment 1

•

14 years ago

It looks like problem not in yuv or slow rendering, problem somewhere in decoder... did we changed anything recently?

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 2

•

14 years ago

Yes, we've made changes over the past two months, `hg log -l 50 content/media` will give you a list of changes to the decoder engine. Can you find a regression range?

Oleg Romashin (:romaxa)

Reporter

Comment 3

•

14 years ago

yes I will try to do quick bisect.

Oleg Romashin (:romaxa)

Reporter

Comment 4

•

14 years ago

ok, as start point for bisect 59432:b9dcbc836bb3 mc revision playing good even with old pixman. will continue bisect

Oleg Romashin (:romaxa)

Reporter

Comment 5

•

14 years ago

BAD - changeset: 63232:635bb4ffe6ad - bad user: Matthew Gregan <kinetik@flim.org> date: Wed Mar 02 14:40:44 2011 +1300 summary: Bug 636894 - Revert bug 634787's change to AUDIO_DURATION_MS to work around a regression in MozAudioAvailable event delivery. r=roc a=roc GOOD - changeset: 62891:d30bc9781cfd - good user: Matthew Gregan <kinetik@flim.org> date: Mon Feb 21 16:38:29 2011 +1300 summary: Bug 546700 - Recover gracefully from servers that send Accept-Ranges but don't. r=roc a=roc Ok, I found that regression is about in this range. will try to test 635bb4ffe6ad with reverted bug 636894 changes

Oleg Romashin (:romaxa)

Reporter

Comment 6

•

14 years ago

Ok, I found http://hg.mozilla.org/mozilla-central/rev/23cf0cedfd4a and without this commit video play smooth, with this commit audio and playback is choppy

Oleg Romashin (:romaxa)

Reporter

Comment 7

•

14 years ago

hmm.. something wrong here... when I'm using http://hg.mozilla.org/mozilla-central/rev/b2d9d4028d67 (63258) revision it play video smooth. when I'm using 63259 with reverted patch http://hg.mozilla.org/mozilla-central/rev/23cf0cedfd4a - it still choppy.

Oleg Romashin (:romaxa)

Reporter

Comment 8

•

14 years ago

hm.. I'm stucked... sometime it playing video smooth, and sometime it is choppy. CPU freen all the time, but audio stream is interruptible...

Oleg Romashin (:romaxa)

Reporter

Comment 9

•

14 years ago

Ok, something wrong with sound and seems with pulseaudio interaction/write... I've tested this with sintel_trailer_800x480 from http://people.xiph.org/~tterribe/tmp/ and nosound version play smooth and fast lot of CPU free et.c but sound version is choppy (sound and video)... When I did kill -STOP to pulseaudio process, then whole video and fennec content process stuck. Not sure how it should be, but it looks like our write to pulseaudio is sync, blocking decoding and something else is going on... I've tested on the same environment Flash playback, and it smooth/fast (30 FPS), and audio works fine.

Oleg Romashin (:romaxa)

Reporter

Comment 10

•

14 years ago

I've tested with flash plugin, and when I STOP pulseaudio, flash continue rendering video for some time, and stop decoding only after 6-7 seconds

Oleg Romashin (:romaxa)

Reporter

Comment 11

•

14 years ago

Commented out nsAudioStreamLocal::Write function, and video start playing 21FPS (before 6FPS)... Is there are any tricks about audio write functionality? write chunks size, or non-blocking write?

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 12

•

14 years ago

We do an extra copy of the audio data due to the Audio API. You could try applying the most recent (but obsolete) patch from bug 604682, which eliminates this copy. That patch needs to be reworked, but I'd be curious to see if it makes an impact.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 13

•

14 years ago

Wait, scratch that, the Audio API gets called outside of nsAudioStreamLocal::Write(), so that won't help this particular problem.

Oleg Romashin (:romaxa)

Reporter

Comment 14

•

14 years ago

is our final write happening in non-main thread? it looks like sounds write de-sync video playback

Matthew Gregan [:kinetik]

Comment 15

•

14 years ago

All of the audio writes happen on a dedicated audio thread, so blocking writes are expected and shouldn't hold up video playback in general. Would you mind trying a build with the line at http://mxr.mozilla.org/mozilla-central/source/media/libsydneyaudio/src/sydney_audio_alsa.c#166 changed from 500000 to 1000000? This behaviour sounds like bug 607200, which I thought we had worked around.

Oleg Romashin (:romaxa)

Reporter

Comment 16

•

14 years ago

tried that and does not help... removing nsAudioStreamLocal::Write make video smooth,... Also tried to change min_write value, and that also does not have any effect.

Oleg Romashin (:romaxa)

Reporter

Comment 17

•

14 years ago

I tried gstreamer backend from bug 422540, and that play video smoothly with sound

Oleg Romashin (:romaxa)

Reporter

Comment 18

•

14 years ago

this is weird, we can have 25FPS for HTML5 video with free CPU (I'm using HW accelerated fennec), but audio somehow blocking us for nothing... Does anyone experienced the same problem on android with HW accel enabled? could it be some platform/audio process scheduling problem?

Benoit Girard (:BenWa)

Comment 19

•

14 years ago

Me and ajuma both noticed it. We though it was a regression with OGL Layers but noticed the same problem without OGL Layers.

Oleg Romashin (:romaxa)

Reporter

Comment 20

•

14 years ago

I've also heard that on maemo we should write with specific buffer size (4096*2), if it is not equals to that, less or bigger, than it will cause perf problems, Is there are some place or pref I can use in order to specify preffered writable buffer size?

Matthew Gregan [:kinetik]

Comment 21

•

14 years ago

I'm pretty sure this is going to turn out to be a(nother) bad interaction with PulseAudio. I'll post a debug logging patch a bit later today. (In reply to Oleg Romashin (:romaxa) from comment #20) > I've also heard that on maemo we should write with specific buffer size > (4096*2), if it is not equals to that, less or bigger, than it will cause > perf problems, > Is there are some place or pref I can use in order to specify preffered > writable buffer size? Oh, sorry, I was missing a bit of information during the IRC discussion. You could try modifying the logic inside the else branch: http://mxr.mozilla.org/mozilla-central/source/media/libsydneyaudio/src/sydney_audio_alsa.c#259 ...so that avail is whatever your magic buffer size is. Note that avail's units is frames (not bytes), so if your 4096*2 magic size is bytes you'll need to convert it to frames with snd_pcm_bytes_to_frames first.

Oleg Romashin (:romaxa)

Reporter

Comment 22

•

14 years ago

Checked it and we have now Write calls with aCount = 1024, but I need 4096, how to ask decoder decode bigger chunks?

Matthew Gregan [:kinetik]

Comment 23

•

14 years ago

Attached patch debug patch v0 (obsolete) — Details — Splinter Review

Please try applying this patch and reproducing the bug, then attached a copy of the output to the bug. There's some additional test code disabled via #if, where the original code is inside the #if 1 block and the test code is in the else block. Please also test those, the first is below, change to #if 0 to enable the test code: +// Disable this to test for bug 669556. +#if 1 The second is in two parts, one in nsBuiltinDecoderStateMachine: +// Disable to use 8k write batching path. +#if 1 and one in sydney_audio_alsa.c: +/* Disable to use 8k write batching path. */ +#if 1

Oleg Romashin (:romaxa)

Reporter

Comment 24

•

14 years ago

Attached file Output with patch — Details

Oleg Romashin (:romaxa)

Reporter

Comment 25

•

14 years ago

Attached file Enabled 8kb buffer size (obsolete) — Details

Matthew Gregan [:kinetik]

Comment 26

•

14 years ago

Thanks. It looks like you didn't enable the second part of the 8k buffer code for the second run; otherwise it should be logging |write(8192)->1| rather than higher numbers after the ->. If the low frame rate happens from the start of playback, it'd be useful to have complete logs from the first 5 seconds or so. There's a bunch of debug info printed at the start of playback that would be useful to see, too. Given the lack of |write xrun| messages (and assuming they're not happening frequently in the parts of the log not included), that excludes bug 607200.

Oleg Romashin (:romaxa)

Reporter

Comment 27

•

14 years ago

Attached file Enabled 8kb buffer size (obsolete) — Details

Attachment #559045 - Attachment is obsolete: true

Oleg Romashin (:romaxa)

Reporter

Comment 28

•

14 years ago

Attached file Enabled 8kb buffer size, both — Details

Attachment #559049 - Attachment is obsolete: true

Oleg Romashin (:romaxa)

Reporter

Comment 29

•

14 years ago

Attached file pactl list — Details

Oleg Romashin (:romaxa)

Reporter

Comment 30

•

14 years ago

found also that webm video is also choppy with cube backend http://clips.vorwaerts-gmbh.de/big_buck_bunny.webm and even worse than with SA but ogv version: http://clips.vorwaerts-gmbh.de/big_buck_bunny.ogv works fine with SA and cube backend, and produces 24FPS so seems problem somewhere in web-codec/audio/video sync

Oleg Romashin (:romaxa)

Reporter

Comment 31

•

14 years ago

Plus on N9 we have with OGV HW accelerated playback 25% CPU free with 25FPS..

Oleg Romashin (:romaxa)

Reporter

Comment 32

•

14 years ago

for same webm video with disabled sa_write, I have 20FPS and 20%CPU free.

Oleg Romashin (:romaxa)

Reporter

Comment 33

•

14 years ago

Attached file Profile data — Details

Ok, retested it once again on another device, and found different results. I've disabled skipToNextKeyframe = PR_TRUE; in order to avoid decoding interrupts and get full profile data. I've found that in both ogv and webm cases we are using almost all CPU, and frame dropping start working. more visible in webm case because that is more expensive. practically if we disable frame dropping then we have more smooth video (almost no problems) and using full CPU with efficient results. When frame dropping triggered we seems just breaking video/audio sync and, stop decoding most of frames, freeing CPU, and instead of dropping some frames we drop almost all of them (60%). One way to fix this problem is to free CPU by optimizing rendering pipeline and give more space for decoding mechanism (or get more powerfull device) Another way is to make frame dropping mechanism more effective, and skip frames without busting whole playback...

Timothy B. Terriberry (:derf)

Comment 34

•

14 years ago

(In reply to Oleg Romashin (:romaxa) from comment #33) > One way to fix this problem is to free CPU by optimizing rendering pipeline > and give more space for decoding mechanism (or get more powerfull device) That will fail to work as soon as someone makes a larger video. > Another way is to make frame dropping mechanism more effective, and skip > frames without busting whole playback... There isn't really a way to just skip some frames in Theora, and for VP8 you could only do it if the file was encoded specifically to allow it, but I don't think libvpx has the API to support it (basically to skip frame n in VP8 you'd need to check that a) it is not a new golden or alt-ref frame (easy) and b) frame n+1 does not use the previous frame as a predictor... for almost every file in existence b) is unlikely to happen for non-keyframes, and requires a significant amount of decoding to check). What we _should_ do is make it harder to go into keyframe skipping mode. Because it's a decision we can't undo until we get to the next keyframe, we shouldn't activate it when there are "almost no problems".

Oleg Romashin (:romaxa)

Reporter

Comment 35

•

14 years ago

yes , probably we should tweak frame dropping conditions... but it still bad that one trigger skipToNextKeyframe = PR_TRUE break video and audio playback for 1 second

Siarhei Siamashka

Comment 36

•

14 years ago

There are many hacks which could be used to speed up video decoding at the cost of introducing visual artefacts. One more trick is to play video a bit slower than normal and correct audio pitch. But the most useful solution that I myself would like to see as a user would be a video transcoding support. So that the browser detects that the video can't be played back in realtime on the available hardware and suggests the user to wait a bit until it gets re-encoded to lower resolution. If the user has a charger and the battery life is not an issue, that may be a viable solution. Actually I happened to be in such a situation at least once (in a hotel room with just a phone, but no laptop) and regretted not being able to watch some video on the web.

Oleg Romashin (:romaxa)

Reporter

Comment 37

•

14 years ago

Another way is to find somewhere DSP decoding implementation for WebM and possibly for Theora... and just use the on mobile where it is possible... IIUC right now we have only x264 dsp optimized codecs which are accessible via gstreamer...

Ralph Giles (:rillian)

Comment 38

•

14 years ago

We have a Theora implementation for TI C64x+. http://code.entropywave.com/leonora/

Timothy B. Terriberry (:derf)

Comment 39

•

14 years ago

(In reply to Oleg Romashin (:romaxa) from comment #37) > Another way is to find somewhere DSP decoding implementation for WebM and > possibly for Theora... and just use the on mobile where it is possible... > IIUC right now we have only x264 dsp optimized codecs which are accessible > via gstreamer... Right, we commissioned a C64x port of Theora called Leonora (and in fact I worked on it some myself). There were a number of issues with making it production-ready: a) it failed on small frame sizes due to some cache flushing bug (probably could be worked around by just decoding those in software), b) it has all the DSP resource limit and robustness problems (it'll work fine for one video in one tab, but more than that you'll have problems, many of which result in device reboots, which is a suboptimal thing to allow web page content to produce), c) actually getting the data to the screen in RGB required either significant CPU, or custom TI kernel modules not shipped with the device that often failed to work (e.g., the first attempt to play video always failed for me) and also sometimes panicked the kernel. But see http://blog.mjg.im/2010/04/16/theora-on-n900.html for more details on that part. On the N900 it was only a _little_ slower than the pure-software version Robin Watts and I did later with ARM asm (for an A8 chip running at 600 MHz vs. the DSP at 430 MHz). WebM would be even worse. Getting better performance would probably require explicitly managing the cache, which requires some major re-architecting of the decoder (Leonora used a mostly-unmodified libtheora with some accelerator functions written with TI intrinsics), or figuring out how to use the programmable hardware for motion compensation and loop filtering, etc., that the H.264 decoders use (at least theoretically programmable... the only docs I was ever able to find were "stick this blob of hex values into this address to enable RV9, this other blob for MPEG4, this other blob for H.264, etc.). In other words, producing something like this is not an easy undertaking.

Oleg Romashin (:romaxa)

Reporter

Comment 40

•

14 years ago

Ok, sounds tricky, but from other side I think I know how to optimize rendering pipeline in order to free CPU for decoding.. at least it is doable on maemo 1) Create IPC channel from video decoding thread to Chrome main-thread (could help also for android pipeline so we can avoid sync with main thread and related planes copy) 2) Make texture swapping from decoding thread to Chrome (no upload) 3) Try different ways of uploading texture: a) Decode yuv directly into locked EGL texture b) upload planes to normal texture and use YUV shader 2) 3) b) could be used on android if we find way to share textures between processes. that should give us room of CPU for decoding

Oleg Romashin (:romaxa)

Reporter

Comment 41

•

14 years ago

Checked skipKeyFrame conditions more detail, and First we have lack of GetDecodedAudioDuration, if we disable that check, then later we have some lack of data in video queue. But with disabled skipToNextKeyframe, we have more essential frame dropping and video whole video become watchable... so wondering can we just drop that? because with skipToNextKeyframe enabled we just breaking whole video experience completely... Another assumption, is when decoding slowness happening then we do not set skipToNextKeyframe = true (which breaks audio/video sync and need about 1-2 seconds to restore back), but instead of just stop sending updates to Layout, so we give temporary more CPU for decoder, get more decoded audio and video data, and then resume layout rendering..

Timothy B. Terriberry (:derf)

Comment 42

•

14 years ago

(In reply to Oleg Romashin (:romaxa) from comment #40) > 2) Make texture swapping from decoding thread to Chrome (no upload) Right, this would help a lot, and was planned, but hasn't happened yet. See bug 656185 comment 15. libtheora and libvpx would also benefit from modifications to allow them to decode into a user-specified buffer. I started a libtheora API design for this at http://pastebin.mozilla.org/1203306 but never did the actual implementation. Google seemed interested in doing something similar for libvpx, but that hasn't happened yet, either.

Timothy B. Terriberry (:derf)

Comment 43

•

14 years ago

(In reply to Oleg Romashin (:romaxa) from comment #41) > But with disabled skipToNextKeyframe, we have more essential frame dropping > and video whole video become watchable... so wondering can we just drop > that? because with skipToNextKeyframe enabled we just breaking whole video > experience completely... Right, I think this is the easiest avenue to explore. And as I said above, will still be useful even in the face of other performance optimizations (otherwise you still get failures, just on slightly larger videos). > Another assumption, is when decoding slowness happening then we do not set > skipToNextKeyframe = true (which breaks audio/video sync and need about 1-2 > seconds to restore back), but instead of just stop sending updates to > Layout, so we give temporary more CPU for decoder, get more decoded audio > and video data, and then resume layout rendering.. Yes, this is also a good idea, but it would be even better to get the cost of doing updates with GL layers turned on low enough that this doesn't matter.

Oleg Romashin (:romaxa)

Reporter

Comment 44

•

14 years ago

> of doing updates with GL layers turned on low enough that this doesn't > matter. That is not only GL layers update, but also paint, LayerManager manipulations et.c. > modifications to allow them to decode into a user-specified buffer. I > started a libtheora API design for this at currently if we go with locked texture approach, then we can just take planes and do yuv conversion directly into locked texture buffer... otherwise we just upload yuv planes with glTexImage2D upload into 3 textures and do yuv shader... but if decoder will allow to decode in user specified buffer, then that buffer can be locked yuv texture, which provide memory buffer where you can write yuv data directly. that is practically texture streaming...

Oleg Romashin (:romaxa)

Reporter

Comment 45

•

14 years ago

> buffer can be locked yuv texture, which provide memory buffer where you can that is actually already available on Maemo Harmattan N9

Matthew Gregan [:kinetik]

Updated

•

14 years ago

Attachment #559036 - Attachment is obsolete: true

Matthew Gregan [:kinetik]

Comment 46

•

14 years ago

The intention of the current frame skipping logic is that the audio should continue playing back seamlessly. If that's not happening, that's a bug. It sounds like the decode-time frame skipping needs to be less aggressive. I'm not sure how much tuning it has seen on low powered devices.

Florian Hänel [:heeen]

Comment 47

•

14 years ago

I tried to use texture streaming for webm some time ago but gave up on it because vp8 does not support writing to packed UYVY formats which the N9 can display directly just by enabling the format via flags. The planar images would have to be displayed through a yuv shader and 3 separate textures I guess.

Timothy B. Terriberry (:derf)

Comment 48

•

14 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #42) > http://pastebin.mozilla.org/1203306 but never did the actual implementation. heeen pointed out on IRC that this link is dead. I guess I just had a copy saved by SessionStore in a window I hadn't closed for a few months. http://pastebin.mozilla.org/1326966 should work.

Oleg Romashin (:romaxa)

Reporter

Comment 49

•

14 years ago

Attached file vp8 CPU usage — Details

Has actually implemented direct rendering and got some CPU free, and now I have 24 FPS on that video with sound enabled. + ~2 CPU free. but vp8 codecs still too expensive ~12% CPU, vp8_decode_mb_tokens - in top of profile I think we should get some arm version of vp8_decode_mb_tokens without bug 645284. that could give us really fast youtube html5 rendering

Oleg Romashin (:romaxa)

Reporter

Comment 50

•

14 years ago

Attached file YUV software and hardware comprasion — Details

I've implemented direct compositing from video thread -> chrome process, and added also old SW yuv2rgb565 conversion path, and found that DecoderYUV->Copy YUV to ShmemYUV->upload data into locked Texture memory is actually slower on N9 than DecoderYUV->Convert YUV to ShmemRGB(neon565)->copy data into locked Texture memory With HW YUV conversion I see libGLES_v2 using almost 4x more CPU than just simple paint into locked texture path, also I see some weird kernel generic interrupt which is eating almost same amount of CPU Attached top of oprofile for both cases

Oleg Romashin (:romaxa)

Reporter

Comment 51

•

14 years ago

of course copy into locked texture again possible only on maemo, but it could be also that our planes upload + YUV shader code is not very friendly for generic GLES drivers...

Oleg Romashin (:romaxa)

Reporter

Comment 52

•

14 years ago

Also have noticed strange thing... on youtube page while playing video we destroy and create shadow Image layers almost every 1 second...

Oleg Romashin (:romaxa)

Reporter

Updated

•

14 years ago

Depends on: 686770

Matthew Gregan [:kinetik]

Updated

•

14 years ago

Depends on: 688363

Matthew Gregan [:kinetik]

Comment 53

•

14 years ago

Bug 688363 covers the frame skipping problem. I'll try to look at that soon.

Matthew Gregan [:kinetik]

Updated

•

14 years ago

Depends on: 693131, 693095

Matthew Gregan [:kinetik]

Updated

•

14 years ago

Depends on: 693905
No longer depends on: 693095

Timothy B. Terriberry (:derf)

Comment 54

•

13 years ago

Attached patch Proposed API for user buffers in libtheora — Details — Splinter Review

Since pastebin keeps expiring things even if they're set to be kept forever, I should just attach the proposed API from comment 42.

Maire Reavy [:mreavy]

Updated

•

10 years ago

Component: Audio/Video → Audio/Video: Playback

Nils Ohlmeier [:drno]

Comment 55

•

7 years ago

Mass closing do to inactivity. Feel free to re-open if still needed.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INACTIVE

You need to log in before you can comment on or make changes to this bug.