Closed Bug 1163820 Opened 9 years ago Closed 9 years ago

Video decode takes 30% more CPU compared to Android

Categories

(Firefox OS Graveyard :: Performance, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:-)

RESOLVED WORKSFORME
tracking-b2g -

People

(Reporter: ntroast, Assigned: slee)

Details

(Keywords: perf, Whiteboard: [caf priority: p2][CR 830845])

Attachments

(2 files)

Gaia: 83a63e0e6fcc22c6a74b06ef77b88d5049719cad
Gecko: 4f049c08e74aab39ee81e7c6e7fcee311a1868b0

Attached is a comparison between Android and Firefox OS during video playback.
5x 20second kernel traces were averaged while playing Big Buck Bunny trailer in 720p.

http://mirror.cessen.com/blender.org/peach/trailer/trailer_720p.mov

I disassembled the mov and reassembled in a mp4 container with the following
avconv -i inputfile.mov -c copy outputfile.mp4
Summary: Video decode takes 44% more CPU compared to Android → Video decode takes 30% more CPU compared to Android
Sorry % Increase is probably not the right statistic here. This is better :D
(1 - (1 / (664.732 / 461.537))) * 100 = ~30%
Is this a regression in 2.2 or the first time doing this kind of power consumption measurement?
Flags: needinfo?(ntroast)
2.2 is the first to support L, so I can not determine whether this is a regression or not.
Flags: needinfo?(ntroast)
Steven, could you help investigate this?
Flags: needinfo?(slee)
Hi Nicholas,

Can you tell me how you get the data? Thanks.
Flags: needinfo?(ntroast)
Hi Steven,

Have a look at this tutorial about ftrace
http://bamboopuppy.com/ftrace-android-profiling/

This one looks good too
http://blog.fpmurphy.com/2014/05/kernel-tracing-using-ftrace.html

Setting the "sched:*" event should be good enough to see how long each process spent on the CPU
Flags: needinfo?(ntroast)
Hi Nicolas,

Thanks for providing the information. But it seems that we should change some kernel configs to enable the tracers? I cannot get any data from my nexus 5. Even I set the pid to set_ftrace_pid, set 1 to tracing_on. 

From comment 0, you mentioned that you get the result by testing 5 times and every time capturing 20 seconds of data. Then you average them, is it right? And the result shows that FxOS uses more 203.186 ms of CPU time than Android. So it means that when playing 20 seconds video clip, FxOS uses 203.186 ms more CPU time than Android.

I checked attachment 8604335 [details]. Here are some reasons about why FxOS uses more CPU time than Android.
1. FxOS has more IPC transmitions. 
2. FxOS has internal synchronization mechanism.

For Android, the encoded data is sent to codec then render it. On FxOS, we get the decoded data from HW codec, keep it in a state machine, and decide when we can display it. The follow is 
a. mediaserver gets the frame from video app
b. when the frame is decoded, mediaserver sends the data back to video app
c. MediaDecoderStateMachine finds that the frame should be rendered, it sends to Compositor(b2g process)

We have more IPCs than Android(b and c). That can explain why "Binder" and "ImageBridgeChild"(40.551+21.850-17.145=45.256ms) are higher in FxOS. For "Media Decode #1" and "Media S~hine #1"(74.514+26.194=100.708 ms), they are the state machine in FxOS for synchronization(c). These 2 items contribute about 3/4 of the time.

For 2 things I still don't have any idea,
1. "gle.aac.decoder" uses 151.967 - 124.305 = 27.662ms on FxOS. 
2. FxOS has "FastMixer" which costs 39.376 ms.

I will try to find out why FxOS uses FastMixer first.
Flags: needinfo?(slee)
(In reply to StevenLee[:slee] from comment #7)
> Hi Nicolas,
> 
> Thanks for providing the information. But it seems that we should change
> some kernel configs to enable the tracers? I cannot get any data from my
> nexus 5. Even I set the pid to set_ftrace_pid, set 1 to tracing_on.

This sets all sched events to be traced
echo sched:* > /sys/kernel/debug/tracing/set_event

This enables tracing
echo 1 > /sys/kernel/debug/tracing/tracing_on

This shows the trace output
cat /sys/kernel/debug/tracing/trace

Hopefully after those steps you will be able to get data.

(In reply to StevenLee[:slee] from comment #7)
> From comment 0, you mentioned that you get the result by testing 5 times and
> every time capturing 20 seconds of data. Then you average them, is it right?
> And the result shows that FxOS uses more 203.186 ms of CPU time than
> Android. So it means that when playing 20 seconds video clip, FxOS uses
> 203.186 ms more CPU time than Android.

Sorry, I forgot to mention I also divided the results by 20, so in 1 second FxOS uses 203.186ms more CPU time than Android.
Whiteboard: [CR 836789]
Whiteboard: [CR 836789] → [caf priority: p2][CR 836789]
Whiteboard: [caf priority: p2][CR 836789] → [caf priority: p2][CR 830845]
Hi Steven,

Thanks for pointing out fastmixer. I disabled fastmixer and saw a 93ms improvement.

I see in Android that "At the time AudioFlinger creates a normal mixer thread, it decides whether or not to also create a fast mixer thread."[1] Does FxOS also do this decision making?

[1] https://source.android.com/devices/audio/latency_design.html#mixerThreads
Flags: needinfo?(slee)
(In reply to Nicholas Troast [:ntroast] from comment #9)
> I see in Android that "At the time AudioFlinger creates a normal mixer
> thread, it decides whether or not to also create a fast mixer thread."[1]
> Does FxOS also do this decision making?
> [1] https://source.android.com/devices/audio/latency_design.html#mixerThreads
Hi Nicholas,

I found that FxOS does not explicitly request to use fast mixer thread. When we are creating an AudioTrack, the code flow just goes to the path of using the fast mixer thread. I will try to figure out whether we can or cannot prevent that. 

BTW, afterI flash the whole image, ftrace works. But I can enable only sched:* events or the kernel will hang then reboot. And the result only shows kernel functions. Should I need to add some flags when I compile the source code? Can you also upload an updated result? Thanks
Hi Steven,

I tried on flame and I am able to collect the ftrace logs. You should see output like the following:

      Homescreen-1108  [000] ...2   144.521995: sched_switch: prev_comm=Homescreen prev_pid=1108 prev_prio=121 prev_state=S ==> next_comm=Timer next_pid=1169 next_prio=121

This raw output then needs to be run through a tool like pytimechart or kernelshark to give you a more human readable representation.

I am not aware of any flags that need to be set. If you are still having trouble maybe I can ask around.

If you have a patch I can certainly test it and provide feedback.
Attached file ftrace.txt
(In reply to Nicholas Troast [:ntroast] from comment #11)
>       Homescreen-1108  [000] ...2   144.521995: sched_switch:
> prev_comm=Homescreen prev_pid=1108 prev_prio=121 prev_state=S ==>
> next_comm=Timer next_pid=1169 next_prio=121
Hi Nicholas,

Thanks for providing the information. I tried to install pytimechart on my Linux and pytimechart cannot parse my data correctly. I will try to profile Video app by our built-in profiler and see what functions consume much CPU.
It's not regression, remove 2.2?. Steven will continue the investigation.
Assignee: nobody → slee
blocking-b2g: 2.2? → ---
tracking-b2g: --- → +
Keywords: perf
Whiteboard: [caf priority: p2][CR 830845] → [caf priority: p2][CR 830845][perf-wanted]
Flags: needinfo?(slee)
> 2. FxOS has "FastMixer" which costs 39.376 ms.
> 
> I will try to find out why FxOS uses FastMixer first.

There are 2 reasons why FxOS uses FastMixer in this test:

1. FastMixer is used when AudioTrack AUDIO_OUTPUT_FLAG_FAST is set in components OpenSL ES (frameworks/wilhelm), ToneGenerator, and SoundPool [1]. FxOS's libcubeb uses OpenSL ES.
2. To use FastMixer, the input sample rate needs to be equal to output sample rate. In this case, the sample rate in test video is 48000 which matches the output sample rate.

If you use other video which the audio sample rate is not 48k, the normal mixer will be used instead of FastMixer. However, it just shifts the CPU loading to normal mixer, I don't think it has performance improvement unless there is bug in FastMixer.

Because Android's video player uses AudioTrack directly without AUDIO_OUTPUT_FLAG_FAST, so FastMixer is not in your chart (Android part).


[1] https://source.android.com/devices/audio/latency_design.html
(In reply to Alfredo Yang from comment #14)
> If you use other video which the audio sample rate is not 48k, the normal
> mixer will be used instead of FastMixer. However, it just shifts the CPU
> loading to normal mixer, I don't think it has performance improvement unless
> there is bug in FastMixer.

From what I understand FastMixer improves audio latency at the cost of performance. Is this correct?
Is audio latency a concern during video playback?

> Because Android's video player uses AudioTrack directly without
> AUDIO_OUTPUT_FLAG_FAST, so FastMixer is not in your chart (Android part).

Why does Android use AudioTrack directly? Should FxOS be doing the same?
Flags: needinfo?(ayang)
(In reply to Nicholas Troast [:ntroast] from comment #15)
> From what I understand FastMixer improves audio latency at the cost of
> performance. Is this correct?
> Is audio latency a concern during video playback?

Audio latency is not concerned in video playback. On FxOS, we ask for high latency [1]; however, in Android's OpenSL implementation, it chooses to use FastMixer. It could be a bug in Android's OpenSL layer.

> 
> > Because Android's video player uses AudioTrack directly without
> > AUDIO_OUTPUT_FLAG_FAST, so FastMixer is not in your chart (Android part).
> 
> Why does Android use AudioTrack directly? Should FxOS be doing the same?

It is out of my scope to answer it. IMHO, it's better to support an open standard like OpenSL instead of a specific platform api.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/media/AudioStream.cpp?from=AudioStream.cpp#388
[2] http://androidxref.com/4.4.2_r2/xref/frameworks/wilhelm/src/android/AudioPlayer_to_android.cpp#1430
Flags: needinfo?(ayang)
Dylan, per comment 16, there is possible issue in Android. it should be listed as known issue.
Flags: needinfo?(doliver)
Whiteboard: [caf priority: p2][CR 830845][perf-wanted] → [caf priority: p2][CR 830845]
Removing dependency on CAF 2.2, per agreement with partner.
No longer blocks: CAF-v2.2-metabug
Flags: needinfo?(doliver)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
No longer blocks: CAF-v2.2-metabug
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: