Closed
Bug 643454
Opened 13 years ago
Closed 6 years ago
Video is very choppy on Maemo/Android
Categories
(Core :: Audio/Video: Playback, defect)
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: romaxa, Unassigned)
References
()
Details
Attachments
(7 files, 3 obsolete files)
I was playing with latest upstream fennec with patched pixman and found that video and sound is interrupting all the time. I've checked oprofile and found that ~45% CPU is free, but video and sound still choppy. IIRC ~ 2 month ago it was working fine
Reporter | ||
Comment 1•13 years ago
|
||
It looks like problem not in yuv or slow rendering, problem somewhere in decoder... did we changed anything recently?
Comment 2•13 years ago
|
||
Yes, we've made changes over the past two months, `hg log -l 50 content/media` will give you a list of changes to the decoder engine. Can you find a regression range?
Reporter | ||
Comment 3•13 years ago
|
||
yes I will try to do quick bisect.
Reporter | ||
Comment 4•13 years ago
|
||
ok, as start point for bisect 59432:b9dcbc836bb3 mc revision playing good even with old pixman. will continue bisect
Reporter | ||
Comment 5•13 years ago
|
||
BAD - changeset: 63232:635bb4ffe6ad - bad user: Matthew Gregan <kinetik@flim.org> date: Wed Mar 02 14:40:44 2011 +1300 summary: Bug 636894 - Revert bug 634787's change to AUDIO_DURATION_MS to work around a regression in MozAudioAvailable event delivery. r=roc a=roc GOOD - changeset: 62891:d30bc9781cfd - good user: Matthew Gregan <kinetik@flim.org> date: Mon Feb 21 16:38:29 2011 +1300 summary: Bug 546700 - Recover gracefully from servers that send Accept-Ranges but don't. r=roc a=roc Ok, I found that regression is about in this range. will try to test 635bb4ffe6ad with reverted bug 636894 changes
Reporter | ||
Comment 6•13 years ago
|
||
Ok, I found http://hg.mozilla.org/mozilla-central/rev/23cf0cedfd4a and without this commit video play smooth, with this commit audio and playback is choppy
Reporter | ||
Comment 7•13 years ago
|
||
hmm.. something wrong here... when I'm using http://hg.mozilla.org/mozilla-central/rev/b2d9d4028d67 (63258) revision it play video smooth. when I'm using 63259 with reverted patch http://hg.mozilla.org/mozilla-central/rev/23cf0cedfd4a - it still choppy.
Reporter | ||
Comment 8•13 years ago
|
||
hm.. I'm stucked... sometime it playing video smooth, and sometime it is choppy. CPU freen all the time, but audio stream is interruptible...
Reporter | ||
Comment 9•13 years ago
|
||
Ok, something wrong with sound and seems with pulseaudio interaction/write... I've tested this with sintel_trailer_800x480 from http://people.xiph.org/~tterribe/tmp/ and nosound version play smooth and fast lot of CPU free et.c but sound version is choppy (sound and video)... When I did kill -STOP to pulseaudio process, then whole video and fennec content process stuck. Not sure how it should be, but it looks like our write to pulseaudio is sync, blocking decoding and something else is going on... I've tested on the same environment Flash playback, and it smooth/fast (30 FPS), and audio works fine.
Reporter | ||
Comment 10•13 years ago
|
||
I've tested with flash plugin, and when I STOP pulseaudio, flash continue rendering video for some time, and stop decoding only after 6-7 seconds
Reporter | ||
Comment 11•13 years ago
|
||
Commented out nsAudioStreamLocal::Write function, and video start playing 21FPS (before 6FPS)... Is there are any tricks about audio write functionality? write chunks size, or non-blocking write?
Comment 12•13 years ago
|
||
We do an extra copy of the audio data due to the Audio API. You could try applying the most recent (but obsolete) patch from bug 604682, which eliminates this copy. That patch needs to be reworked, but I'd be curious to see if it makes an impact.
Comment 13•13 years ago
|
||
Wait, scratch that, the Audio API gets called outside of nsAudioStreamLocal::Write(), so that won't help this particular problem.
Reporter | ||
Comment 14•13 years ago
|
||
is our final write happening in non-main thread? it looks like sounds write de-sync video playback
Comment 15•13 years ago
|
||
All of the audio writes happen on a dedicated audio thread, so blocking writes are expected and shouldn't hold up video playback in general. Would you mind trying a build with the line at http://mxr.mozilla.org/mozilla-central/source/media/libsydneyaudio/src/sydney_audio_alsa.c#166 changed from 500000 to 1000000? This behaviour sounds like bug 607200, which I thought we had worked around.
Reporter | ||
Comment 16•13 years ago
|
||
tried that and does not help... removing nsAudioStreamLocal::Write make video smooth,... Also tried to change min_write value, and that also does not have any effect.
Reporter | ||
Comment 17•13 years ago
|
||
I tried gstreamer backend from bug 422540, and that play video smoothly with sound
Reporter | ||
Comment 18•13 years ago
|
||
this is weird, we can have 25FPS for HTML5 video with free CPU (I'm using HW accelerated fennec), but audio somehow blocking us for nothing... Does anyone experienced the same problem on android with HW accel enabled? could it be some platform/audio process scheduling problem?
Comment 19•13 years ago
|
||
Me and ajuma both noticed it. We though it was a regression with OGL Layers but noticed the same problem without OGL Layers.
Reporter | ||
Comment 20•13 years ago
|
||
I've also heard that on maemo we should write with specific buffer size (4096*2), if it is not equals to that, less or bigger, than it will cause perf problems, Is there are some place or pref I can use in order to specify preffered writable buffer size?
Comment 21•13 years ago
|
||
I'm pretty sure this is going to turn out to be a(nother) bad interaction with PulseAudio. I'll post a debug logging patch a bit later today. (In reply to Oleg Romashin (:romaxa) from comment #20) > I've also heard that on maemo we should write with specific buffer size > (4096*2), if it is not equals to that, less or bigger, than it will cause > perf problems, > Is there are some place or pref I can use in order to specify preffered > writable buffer size? Oh, sorry, I was missing a bit of information during the IRC discussion. You could try modifying the logic inside the else branch: http://mxr.mozilla.org/mozilla-central/source/media/libsydneyaudio/src/sydney_audio_alsa.c#259 ...so that avail is whatever your magic buffer size is. Note that avail's units is frames (not bytes), so if your 4096*2 magic size is bytes you'll need to convert it to frames with snd_pcm_bytes_to_frames first.
Reporter | ||
Comment 22•13 years ago
|
||
Checked it and we have now Write calls with aCount = 1024, but I need 4096, how to ask decoder decode bigger chunks?
Comment 23•13 years ago
|
||
Please try applying this patch and reproducing the bug, then attached a copy of the output to the bug. There's some additional test code disabled via #if, where the original code is inside the #if 1 block and the test code is in the else block. Please also test those, the first is below, change to #if 0 to enable the test code: +// Disable this to test for bug 669556. +#if 1 The second is in two parts, one in nsBuiltinDecoderStateMachine: +// Disable to use 8k write batching path. +#if 1 and one in sydney_audio_alsa.c: +/* Disable to use 8k write batching path. */ +#if 1
Reporter | ||
Comment 24•13 years ago
|
||
Reporter | ||
Comment 25•13 years ago
|
||
Comment 26•13 years ago
|
||
Thanks. It looks like you didn't enable the second part of the 8k buffer code for the second run; otherwise it should be logging |write(8192)->1| rather than higher numbers after the ->. If the low frame rate happens from the start of playback, it'd be useful to have complete logs from the first 5 seconds or so. There's a bunch of debug info printed at the start of playback that would be useful to see, too. Given the lack of |write xrun| messages (and assuming they're not happening frequently in the parts of the log not included), that excludes bug 607200.
Reporter | ||
Comment 27•13 years ago
|
||
Attachment #559045 -
Attachment is obsolete: true
Reporter | ||
Comment 28•13 years ago
|
||
Attachment #559049 -
Attachment is obsolete: true
Reporter | ||
Comment 29•13 years ago
|
||
Reporter | ||
Comment 30•13 years ago
|
||
found also that webm video is also choppy with cube backend http://clips.vorwaerts-gmbh.de/big_buck_bunny.webm and even worse than with SA but ogv version: http://clips.vorwaerts-gmbh.de/big_buck_bunny.ogv works fine with SA and cube backend, and produces 24FPS so seems problem somewhere in web-codec/audio/video sync
Reporter | ||
Comment 31•13 years ago
|
||
Plus on N9 we have with OGV HW accelerated playback 25% CPU free with 25FPS..
Reporter | ||
Comment 32•13 years ago
|
||
for same webm video with disabled sa_write, I have 20FPS and 20%CPU free.
Reporter | ||
Comment 33•13 years ago
|
||
Ok, retested it once again on another device, and found different results. I've disabled skipToNextKeyframe = PR_TRUE; in order to avoid decoding interrupts and get full profile data. I've found that in both ogv and webm cases we are using almost all CPU, and frame dropping start working. more visible in webm case because that is more expensive. practically if we disable frame dropping then we have more smooth video (almost no problems) and using full CPU with efficient results. When frame dropping triggered we seems just breaking video/audio sync and, stop decoding most of frames, freeing CPU, and instead of dropping some frames we drop almost all of them (60%). One way to fix this problem is to free CPU by optimizing rendering pipeline and give more space for decoding mechanism (or get more powerfull device) Another way is to make frame dropping mechanism more effective, and skip frames without busting whole playback...
Comment 34•13 years ago
|
||
(In reply to Oleg Romashin (:romaxa) from comment #33) > One way to fix this problem is to free CPU by optimizing rendering pipeline > and give more space for decoding mechanism (or get more powerfull device) That will fail to work as soon as someone makes a larger video. > Another way is to make frame dropping mechanism more effective, and skip > frames without busting whole playback... There isn't really a way to just skip some frames in Theora, and for VP8 you could only do it if the file was encoded specifically to allow it, but I don't think libvpx has the API to support it (basically to skip frame n in VP8 you'd need to check that a) it is not a new golden or alt-ref frame (easy) and b) frame n+1 does not use the previous frame as a predictor... for almost every file in existence b) is unlikely to happen for non-keyframes, and requires a significant amount of decoding to check). What we _should_ do is make it harder to go into keyframe skipping mode. Because it's a decision we can't undo until we get to the next keyframe, we shouldn't activate it when there are "almost no problems".
Reporter | ||
Comment 35•13 years ago
|
||
yes , probably we should tweak frame dropping conditions... but it still bad that one trigger skipToNextKeyframe = PR_TRUE break video and audio playback for 1 second
Comment 36•13 years ago
|
||
There are many hacks which could be used to speed up video decoding at the cost of introducing visual artefacts. One more trick is to play video a bit slower than normal and correct audio pitch. But the most useful solution that I myself would like to see as a user would be a video transcoding support. So that the browser detects that the video can't be played back in realtime on the available hardware and suggests the user to wait a bit until it gets re-encoded to lower resolution. If the user has a charger and the battery life is not an issue, that may be a viable solution. Actually I happened to be in such a situation at least once (in a hotel room with just a phone, but no laptop) and regretted not being able to watch some video on the web.
Reporter | ||
Comment 37•13 years ago
|
||
Another way is to find somewhere DSP decoding implementation for WebM and possibly for Theora... and just use the on mobile where it is possible... IIUC right now we have only x264 dsp optimized codecs which are accessible via gstreamer...
Comment 38•13 years ago
|
||
We have a Theora implementation for TI C64x+. http://code.entropywave.com/leonora/
Comment 39•13 years ago
|
||
(In reply to Oleg Romashin (:romaxa) from comment #37) > Another way is to find somewhere DSP decoding implementation for WebM and > possibly for Theora... and just use the on mobile where it is possible... > IIUC right now we have only x264 dsp optimized codecs which are accessible > via gstreamer... Right, we commissioned a C64x port of Theora called Leonora (and in fact I worked on it some myself). There were a number of issues with making it production-ready: a) it failed on small frame sizes due to some cache flushing bug (probably could be worked around by just decoding those in software), b) it has all the DSP resource limit and robustness problems (it'll work fine for one video in one tab, but more than that you'll have problems, many of which result in device reboots, which is a suboptimal thing to allow web page content to produce), c) actually getting the data to the screen in RGB required either significant CPU, or custom TI kernel modules not shipped with the device that often failed to work (e.g., the first attempt to play video always failed for me) and also sometimes panicked the kernel. But see http://blog.mjg.im/2010/04/16/theora-on-n900.html for more details on that part. On the N900 it was only a _little_ slower than the pure-software version Robin Watts and I did later with ARM asm (for an A8 chip running at 600 MHz vs. the DSP at 430 MHz). WebM would be even worse. Getting better performance would probably require explicitly managing the cache, which requires some major re-architecting of the decoder (Leonora used a mostly-unmodified libtheora with some accelerator functions written with TI intrinsics), or figuring out how to use the programmable hardware for motion compensation and loop filtering, etc., that the H.264 decoders use (at least theoretically programmable... the only docs I was ever able to find were "stick this blob of hex values into this address to enable RV9, this other blob for MPEG4, this other blob for H.264, etc.). In other words, producing something like this is not an easy undertaking.
Reporter | ||
Comment 40•13 years ago
|
||
Ok, sounds tricky, but from other side I think I know how to optimize rendering pipeline in order to free CPU for decoding.. at least it is doable on maemo 1) Create IPC channel from video decoding thread to Chrome main-thread (could help also for android pipeline so we can avoid sync with main thread and related planes copy) 2) Make texture swapping from decoding thread to Chrome (no upload) 3) Try different ways of uploading texture: a) Decode yuv directly into locked EGL texture b) upload planes to normal texture and use YUV shader 2) 3) b) could be used on android if we find way to share textures between processes. that should give us room of CPU for decoding
Reporter | ||
Comment 41•13 years ago
|
||
Checked skipKeyFrame conditions more detail, and First we have lack of GetDecodedAudioDuration, if we disable that check, then later we have some lack of data in video queue. But with disabled skipToNextKeyframe, we have more essential frame dropping and video whole video become watchable... so wondering can we just drop that? because with skipToNextKeyframe enabled we just breaking whole video experience completely... Another assumption, is when decoding slowness happening then we do not set skipToNextKeyframe = true (which breaks audio/video sync and need about 1-2 seconds to restore back), but instead of just stop sending updates to Layout, so we give temporary more CPU for decoder, get more decoded audio and video data, and then resume layout rendering..
Comment 42•13 years ago
|
||
(In reply to Oleg Romashin (:romaxa) from comment #40) > 2) Make texture swapping from decoding thread to Chrome (no upload) Right, this would help a lot, and was planned, but hasn't happened yet. See bug 656185 comment 15. libtheora and libvpx would also benefit from modifications to allow them to decode into a user-specified buffer. I started a libtheora API design for this at http://pastebin.mozilla.org/1203306 but never did the actual implementation. Google seemed interested in doing something similar for libvpx, but that hasn't happened yet, either.
Comment 43•13 years ago
|
||
(In reply to Oleg Romashin (:romaxa) from comment #41) > But with disabled skipToNextKeyframe, we have more essential frame dropping > and video whole video become watchable... so wondering can we just drop > that? because with skipToNextKeyframe enabled we just breaking whole video > experience completely... Right, I think this is the easiest avenue to explore. And as I said above, will still be useful even in the face of other performance optimizations (otherwise you still get failures, just on slightly larger videos). > Another assumption, is when decoding slowness happening then we do not set > skipToNextKeyframe = true (which breaks audio/video sync and need about 1-2 > seconds to restore back), but instead of just stop sending updates to > Layout, so we give temporary more CPU for decoder, get more decoded audio > and video data, and then resume layout rendering.. Yes, this is also a good idea, but it would be even better to get the cost of doing updates with GL layers turned on low enough that this doesn't matter.
Reporter | ||
Comment 44•13 years ago
|
||
> of doing updates with GL layers turned on low enough that this doesn't > matter. That is not only GL layers update, but also paint, LayerManager manipulations et.c. > modifications to allow them to decode into a user-specified buffer. I > started a libtheora API design for this at currently if we go with locked texture approach, then we can just take planes and do yuv conversion directly into locked texture buffer... otherwise we just upload yuv planes with glTexImage2D upload into 3 textures and do yuv shader... but if decoder will allow to decode in user specified buffer, then that buffer can be locked yuv texture, which provide memory buffer where you can write yuv data directly. that is practically texture streaming...
Reporter | ||
Comment 45•13 years ago
|
||
> buffer can be locked yuv texture, which provide memory buffer where you can
that is actually already available on Maemo Harmattan N9
Updated•13 years ago
|
Attachment #559036 -
Attachment is obsolete: true
Comment 46•13 years ago
|
||
The intention of the current frame skipping logic is that the audio should continue playing back seamlessly. If that's not happening, that's a bug. It sounds like the decode-time frame skipping needs to be less aggressive. I'm not sure how much tuning it has seen on low powered devices.
Comment 47•13 years ago
|
||
I tried to use texture streaming for webm some time ago but gave up on it because vp8 does not support writing to packed UYVY formats which the N9 can display directly just by enabling the format via flags. The planar images would have to be displayed through a yuv shader and 3 separate textures I guess.
Comment 48•13 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #42) > http://pastebin.mozilla.org/1203306 but never did the actual implementation. heeen pointed out on IRC that this link is dead. I guess I just had a copy saved by SessionStore in a window I hadn't closed for a few months. http://pastebin.mozilla.org/1326966 should work.
Reporter | ||
Comment 49•13 years ago
|
||
Has actually implemented direct rendering and got some CPU free, and now I have 24 FPS on that video with sound enabled. + ~2 CPU free. but vp8 codecs still too expensive ~12% CPU, vp8_decode_mb_tokens - in top of profile I think we should get some arm version of vp8_decode_mb_tokens without bug 645284. that could give us really fast youtube html5 rendering
Reporter | ||
Comment 50•13 years ago
|
||
I've implemented direct compositing from video thread -> chrome process, and added also old SW yuv2rgb565 conversion path, and found that DecoderYUV->Copy YUV to ShmemYUV->upload data into locked Texture memory is actually slower on N9 than DecoderYUV->Convert YUV to ShmemRGB(neon565)->copy data into locked Texture memory With HW YUV conversion I see libGLES_v2 using almost 4x more CPU than just simple paint into locked texture path, also I see some weird kernel generic interrupt which is eating almost same amount of CPU Attached top of oprofile for both cases
Reporter | ||
Comment 51•13 years ago
|
||
of course copy into locked texture again possible only on maemo, but it could be also that our planes upload + YUV shader code is not very friendly for generic GLES drivers...
Reporter | ||
Comment 52•13 years ago
|
||
Also have noticed strange thing... on youtube page while playing video we destroy and create shadow Image layers almost every 1 second...
Comment 53•13 years ago
|
||
Bug 688363 covers the frame skipping problem. I'll try to look at that soon.
Updated•13 years ago
|
Updated•13 years ago
|
Comment 54•12 years ago
|
||
Since pastebin keeps expiring things even if they're set to be kept forever, I should just attach the proposed API from comment 42.
Updated•9 years ago
|
Component: Audio/Video → Audio/Video: Playback
Comment 55•6 years ago
|
||
Mass closing do to inactivity. Feel free to re-open if still needed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•