Closed Bug 864210 Opened 7 years ago Closed 7 years ago

Camera preview cause high CPU usage in Compositor thread on Unagi

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: chiajung, Unassigned)

Details

(Keywords: perf, Whiteboard: c= s=2013.05.31 ,)

Attachments

(4 files, 1 obsolete file)

Attached file perf output
Current camera preview cause very high CPU usage, here is a sample top data:

User 65%, System 11%, IOW 22%, IRQ 0%
User 202 + Nice 5 + Sys 37 + Idle 0 + IOW 72 + IRQ 0 + SIRQ 0 = 316

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
 3086  3108  0  57% R 184892K  70108K  fg root     Compositor      /system/b2g/b2g
 3086  3086  0   3% S 184892K  70108K  fg root     b2g             /system/b2g/b2g
 3300  3300  0   2% R   1116K    476K  fg root     top             top
 3221  3221  0   2% R 109204K  27316K  fg app_3221 Camera          /system/b2g/plugin-container
 3086  3093  0   1% S 184892K  70108K  fg root     Gecko_IOThread  /system/b2g/b2g

And a perf data in attachment. (generated with perf record -a -g)

Since perf can not generate stack for memcpy and the code path should not call memcpy, I tried to mark out some code and try. As a result, I found 

fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL, image);

in GLContextProviderEGL.cpp cause the high CPU usage.
Blocks: 860441
Summary: Camera preview cause high CPU usage in Compositor thread → Camera preview cause high CPU usage in Compositor thread on Unagi
The data was tested based on mozilla-central r126237.

On r129442, the Camera preview is jittering and the CPU usage looks like:

User 41%, System 10%, IOW 9%, IRQ 0%
User 125 + Nice 3 + Sys 31 + Idle 122 + IOW 28 + IRQ 0 + SIRQ 0 = 309

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
 5402  5425  0  36% S 177056K  64784K  fg root     Compositor      /system/b2g/b2g
 5506  5599  0   2% S 102020K  28584K  fg app_5506 Camera          /system/b2g/plugin-container
 5630  5630  0   2% R   1108K    460K  fg root     top             top
 5402  5402  0   1% S 177056K  64784K  fg root     b2g             /system/b2g/b2g
 5506  5506  0   1% S 102020K  28584K  fg app_5506 Camera          /system/b2g/plugin-container
Attached file perf output (r129442) (obsolete) —
Here is new perf report, test against r129442.

By the way, it seems camera preview sometimes shows old frame and cause the preview looks jumpy.
For the jitter/jumpy part, I think the problem is IPC related.

I added some log into GrallocTextureHostOGL:
04-22 17:07:47.313  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46ddd104
04-22 17:07:47.504  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46dddb84
04-22 17:07:47.594  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46ddd104
04-22 17:07:47.784  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46dddb84
04-22 17:07:47.864  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46ddd784
04-22 17:07:48.074  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46d4ff84
04-22 17:07:48.184  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46dddb04
04-22 17:07:48.394  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4560ba84
04-22 17:07:48.474  5950  5973 I GrallocTextureHostOGL: Update new graphicBuffer: 0x46dddb84

The buffer update seems strange.
Some log added into CameraPreviewMediaStream. Camera update several frame then b2g process sense that. I think this is the source of jittering.

04-22 17:41:22.816  6124  6147 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4757ab84
04-22 17:41:22.836  6261  6264 I CameraPreviewMediaStream: New buffer: 0x43ea4e84
04-22 17:41:22.886  6261  6307 I CameraPreviewMediaStream: New buffer: 0x44237784
04-22 17:41:22.916  6261  6262 I CameraPreviewMediaStream: New buffer: 0x43ea4404
04-22 17:41:22.946  6261  6264 I CameraPreviewMediaStream: New buffer: 0x43ea4484
04-22 17:41:22.986  6261  6307 I CameraPreviewMediaStream: New buffer: 0x43ea5104
04-22 17:41:23.026  6261  6262 I CameraPreviewMediaStream: New buffer: 0x43ea5304
04-22 17:41:23.036  6261  6264 I CameraPreviewMediaStream: New buffer: 0x43ea4f84
04-22 17:41:23.076  6261  6307 I CameraPreviewMediaStream: New buffer: 0x43ea5504
04-22 17:41:23.076  6124  6147 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4757a804
04-22 17:41:23.116  6261  6262 I CameraPreviewMediaStream: New buffer: 0x43ea4e84
04-22 17:41:23.126  6124  6147 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4757a804
04-22 17:41:23.136  6261  6264 I CameraPreviewMediaStream: New buffer: 0x44237784
04-22 17:41:23.176  6261  6307 I CameraPreviewMediaStream: New buffer: 0x43ea4404
04-22 17:41:23.226  6261  6262 I CameraPreviewMediaStream: New buffer: 0x43ea4484
04-22 17:41:23.236  6261  6264 I CameraPreviewMediaStream: New buffer: 0x43ea5104
04-22 17:41:23.276  6261  6307 I CameraPreviewMediaStream: New buffer: 0x43ea5304
04-22 17:41:23.326  6261  6262 I CameraPreviewMediaStream: New buffer: 0x43ea4f84
04-22 17:41:23.346  6261  6264 I CameraPreviewMediaStream: New buffer: 0x43ea5504
04-22 17:41:23.376  6124  6147 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4757b204
04-22 17:41:23.376  6261  6307 I CameraPreviewMediaStream: New buffer: 0x43ea4e84
04-22 17:41:23.416  6261  6262 I CameraPreviewMediaStream: New buffer: 0x44237784
04-22 17:41:23.426  6124  6147 I GrallocTextureHostOGL: Update new graphicBuffer: 0x4757b204

We should fix this problem first then see why CPU consumption is high.
Attached patch test patchSplinter Review
This hacky patch make Camera preview smooth, and make it easy to see the high CPU consumption in Compositor thread.

The reason Camera preview jittering is because the ImageBridge thread may block on IPC, while Camera preview thread generate many new tasks. When ImageBridge thread completes previous blocking operation, it finds most task in queue are out-of-date. As a result, most frame are skipped and cause jittering preview.

This patch makes Camera preview thread blocking on IPC itself, and prevent jittering.
After apply the patch, the top result is:

After apply the patch, the top result becomes

User 56%, System 14%, IOW 29%, IRQ 0%
User 175 + Nice 2 + Sys 46 + Idle 0 + IOW 92 + IRQ 0 + SIRQ 0 = 315

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
 1476  1498  0  54% S 177956K  67960K  fg root     Compositor      /system/b2g/b2g
 1476  1476  0   2% S 177956K  67960K  fg root     b2g             /system/b2g/b2g
 1353  1353  0   2% S      0K      0K  fg root     kworker/0:0     
 1635  1635  0   2% R   1108K    464K  fg root     top             top
 1476  1491  0   0% S 177956K  67960K  fg root     Timer           /system/b2g/b2g
 1572  1577  0   0% S  87748K  28564K  fg app_1572 Chrome_ChildThr /system/b2g/plugin-container
  118  1603  0   0% S   2220K    508K  fg root     akmd8962_new    /system/bin/akmd8962_new
  873  1605  0   0% S  35920K   6028K  fg media    mediaserver     /system/bin/mediaserver
 1572  1576  0   0% S  87748K  28564K  fg app_1572 Binder Thread # /system/b2g/plugin-container
 1476  1513  0   0% S 177956K  67960K  fg root     GL updater      /system/b2g/b2g

and perf data attached.
Attachment #740189 - Attachment is obsolete: true
QA Contact: milan
Here is a perf data with a naive memcpy implementation in BionicGlue.

The caller of memcpy is ioctl_kgsl_sharedmem_write (in libgsl), which may explain the experiment result in bug description.

@Diego, 
Can you comment why libgsl cause memcpy when bind external texture?
Flags: needinfo?(dwilson)
Jeff, let's take a look at this as a priority today.
Assignee: nobody → jmuizelaar
I think the root of this issue may be bug 864017. Dup?
Flags: needinfo?(dwilson)
It could be different from bug 864017. It is about GRALLOC_PLANAR_YCBCR. Camera preview in b2g18 usess GONK_IO_SURFACE.
Interesting. Using the fix in bug 862952 to enable HWComposer the FPS goes up to 30 fps and the CPU usage goes down to ~20%.

That means the problem in GPU composition is very likely in the way GONK_IO_SURFACE bind in the new compositor
As comment 10 says, this is different from bug 864017. 

Bug 864017 is going to implement GRALLOC_PLANAR_YCBCR image format for software decoded image. Camera preview use GONK_IO_SURFACE which was regressed after LayerRefactoring and fixed in bug 860441.

This problem can be seen before LayerRefactoring (the first perf data, tested on r126237). Since LayerRefactoring introduce other problem that cause camera preview jitter, the problem becomes hard to notice. So if you want to investigate this problem after LayerRefactoring, you can apply my patch.
All product phones enable HwComposer. Only mozilla's ROM do not use it. I heard from mwu that he is going to enable HwComposer for new devices but not on unagi.
I will try to find a Inari and enable HWComposer to test it later.

However, I think this may still a bug. Since Camera preview frame can be render to canvas. If we want to render camera preview to canvas via similar implementation to take advantage of hardware resource, we may still have to solve it.
I can not enable HWComposer by just apply the patch in bug 862952. It seems there are some more patches to be applied before I can enable it :S
(In reply to Chiajung Hung [:chiajung] from comment #14)
> However, I think this may still a bug. Since Camera preview frame can be
> render to canvas. If we want to render camera preview to canvas via similar
> implementation to take advantage of hardware resource, we may still have to
> solve it.

It is a different problem. If gecko uses GPU for the rendering we use a lot of cpu time than HwComposer. HwComposer could mitigate this. canvas rendering do not use GPU for rendering right now. Rendering to Canvas use only cpu and needs more cpu time. Even when gecko uses GPU for rendering to canvas, cpu usage is greater than HwCompose.
Bug 845200, Bug 827229 are related to GPU-rendered canvas.
FYI both GPU composition and HWC composition access the same surfaces backing the layers, including for canvas layer. The layers themselves render their content to said surface in the exact same way for both. So the issue here is most likely that the surface binding during GPU composition has a bug that causes it to mem copy.

I actually agree with both Sotaro and Chianjung. Yes, HWC will be used in commercial devices for camera preview. However, even in those commercial devices there are many use cases where we fall back to GPU rendering. So we still want to knock out this bug!
(In reply to Chiajung Hung [:chiajung] from comment #12)
> This problem can be seen before LayerRefactoring (the first perf data,
> tested on r126237). 

chiajung, I do not understand the above comment. Are you saying preview's performance problem is present also on b2g18?
(In reply to Diego Wilson [:diego] from comment #18)
> here is most likely that the surface binding during GPU composition has a
> bug that causes it to mem copy.

diego, is it a bug of qcom's code? compositor does not call mem copy.
Flags: needinfo?(dwilson)
Assignee: jmuizelaar → sotaro.ikeda.g
(In reply to Sotaro Ikeda [:sotaro] from comment #20)
> (In reply to Diego Wilson [:diego] from comment #18)
> > here is most likely that the surface binding during GPU composition has a
> > bug that causes it to mem copy.
> 
> diego, is it a bug of qcom's code? compositor does not call mem copy.

Most likely it's a problem in the way the camera frame surface is provided to GLES in the GPU composition. I think all other layers in B2G (eg ShadowThebesLayer) are backed by a surface and if that binding caused a mem copy we would see terrible performance in the homescreen too.

I just tried it out. This camera issue is reproducible in b2g18 when I disable HWC composition.
Flags: needinfo?(dwilson)
Inder,

Do you remember what fixed the camera preview performance? Maybe something in the HWC composition was patched that wasn't patched in the GPU composition.
Flags: needinfo?(ikumar)
Bug 832100 tracks enable HwComposer in mozbuild.
Almost, bug 828876 is the full HWC enabling bug
Heh, that was sotaro's patch in bug 844248 :) That was not an HWC specific patch. Oh well...

My guess is the fix will be somewhere in ShadowImageLayer, which is the one in charge of binding the camera frame surface.
Flags: needinfo?(ikumar)
(In reply to Sotaro Ikeda [:sotaro] from comment #19)
> (In reply to Chiajung Hung [:chiajung] from comment #12)
> > This problem can be seen before LayerRefactoring (the first perf data,
> > tested on r126237). 
> 
> chiajung, I do not understand the above comment. Are you saying preview's
> performance problem is present also on b2g18?

I tested it on m-c only, but I think the code path for camera preview rendering are similar on b2g18 and m-c before LayerRefactoring. And as comment 21 said, this is reproducible in b2g18.

For more detail, I found video playback for 3gp/mp4 do not have simiar problem, I think the YUV format may be related, and I changed
http://mxr.mozilla.org/mozilla-central/source/dom/camera/GonkCameraControl.cpp#58
to 0, and the result is the same.
If it is the color format proble, it is the qcom's platforms problem. Camera preview and video playback uses same code for rendering.
I agree this should be a qcom platform issue :)
> Camera preview and video playback uses same code for rendering

Exactly! Does video playback also have the same problem?
I can not observe same high CPU usage problem when play video. But I just test a little set of video.

If you need the top/perf data for MP4/3GP video playback, I can provide it later.
I checked some video clips. I also can not observe the problem. Though, it could depend on video size ans rendering scaling.
Hmm... that does sound suspect. I agree that both video playback and camera should follow the same rendering path. I will have to compare the video and camera frame surfaces. Could be that GL is deciding to convert the camera frame to another format in a slow and painful way.
On buri device Hw with composer disabled, cpu usage of camera preview is not so high. Composer thread uses 10% of cpu.
I just checked with chiajung and we didn't found the CPU high issue on Leo device.
But inari and unagi did.

Also we found the egl libraries were different. Was it the root cause?

//Leo
-rw-r--r-- root     root           26 2013-03-06 08:00 egl.cfg
-rw-r--r-- root     root        30456 2013-03-06 08:00 eglsubAndroid.so
-rw-r--r-- root     root       134156 2013-03-06 08:00 libEGL_adreno200.so
-rw-r--r-- root     root        81520 2013-03-06 08:22 libGLES_android.so
-rw-r--r-- root     root       200980 2013-03-06 08:00 libGLESv1_CM_adreno200.so
-rw-r--r-- root     root       720128 2013-03-06 08:00 libGLESv2_adreno200.so
-rw-r--r-- root     root       379140 2013-03-06 08:00 libq3dtools_adreno200.so

//inari
-rw-r--r-- root     root           26 2013-04-18 14:20 egl.cfg
-rw-r--r-- root     root        22160 2013-04-18 14:20 eglsubAndroid.so
-rw-r--r-- root     root       130008 2013-04-18 14:20 libEGL_adreno200.so
-rw-r--r-- root     root        81520 2013-04-18 14:21 libGLES_android.so
-rw-r--r-- root     root       196852 2013-04-18 14:20 libGLESv1_CM_adreno200.so
-rw-r--r-- root     root       575252 2013-04-18 14:20 libGLESv2_adreno200.so
-rw-r--r-- root     root       211040 2013-04-18 14:20 libq3dtools_adreno200.so
buri and leo devices use ics_strawberry. ungi and inari use ics_chocolate. That might affect to this problem.
(In reply to pchang from comment #34)
> Also we found the egl libraries were different. Was it the root cause?

As in comment #35, leo and inari uses different code base. So egl could also different. I also suspect this could be the root cause.
It the bug is an ics_chocolate specific issue. The bug can be tef bug, I think.
After the fix for bug 862324 landed I see the FPS for GPU rendered camera frames goes up to 30 FPS. Maybe this is fixed now?
Status: NEW → ASSIGNED
Whiteboard: c=performance
(In reply to Diego Wilson [:diego] from comment #38)
> After the fix for bug 862324 landed I see the FPS for GPU rendered camera
> frames goes up to 30 FPS. Maybe this is fixed now?

Chiajung, can you answer the question?
Flags: needinfo?(chung)
We tried with 5/3 codebase on Unagi, the problem still present.

Diego, which device you tested?
Flags: needinfo?(chung) → needinfo?(dwilson)
I tested on the leo device. It's around 30fps and pasted the cpu usage at the bottom. What is you cpu usage target?

User 49%, System 17%, IOW 23%, IRQ 0%
User 130 + Nice 24 + Sys 55 + Idle 30 + IOW 72 + IRQ 0 + SIRQ 0 = 311

  PID PR CPU% S  #THR     VSS     RSS PCY UID      Name
  153  0  43% S    50 212864K  56744K  fg root     /system/b2g/b2g
  172  0   6% S     6   5908K    468K  fg root     /system/bin/sensord
  170  0   6% S     4   8060K   1352K  fg system   /system/bin/mm-qcamera-daemon
  496  0   5% S    19  81620K  23760K  fg app_496  /system/b2g/plugin-container
Flags: needinfo?(dwilson)
Well, as comment 34 says, Leo devices have no such problem.
We should not bother trying to debug Unagi, the software on that device is a random collection of semi-related bits that nobody other than Mozilla really supports.  The more interesting questions are
(1) Does this manifest on the *vendor* Inari build, and if so
(2) Does the vendor consider this to be blocking.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #43)
> (1) Does this manifest on the *vendor* Inari build, and if so

Chiajung, can you confirm that? I do not have *vendor* Inari built ROM.

And I think Inari device enables HW composer. It seems that there are no use case that it always renders video frame by using OpenGL.
Flags: needinfo?(chung)
(In reply to Sotaro Ikeda [:sotaro] from comment #44)
> And I think Inari device enables HW composer. It seems that there are no use
> case that it always renders video frame by using OpenGL.

It is about v1.0.1 *vendor* Inari built ROM.
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> (In reply to Sotaro Ikeda [:sotaro] from comment #44)
> > And I think Inari device enables HW composer. It seems that there are no use
> > case that it always renders video frame by using OpenGL.
> 
> It is about v1.0.1 *vendor* Inari built ROM.

FYI we are very close to enabling HW composer by default on Mozilla builds as well. See bug 828876
Whiteboard: c=performance → c=
top result of vendor built ROM on Inari during camera preview. It seems that hw composer is used. 

User 28%, System 21%, IOW 0%, IRQ 0%
User 43 + Nice 16 + Sys 45 + Idle 101 + IOW 2 + IRQ 0 + SIRQ 0 = 207

  PID PR CPU% S  #THR     VSS     RSS PCY UID      Name
  114  0  20% S    39 200752K  54028K  fg root     /system/b2g/b2g
  488  0   6% S    18  77996K  23648K  fg app_488  /system/b2g/plugin-container
  513  0   2% R     1   1064K    420K  fg shell    top
   17  0   0% S     1      0K      0K  fg root     kworker/0:1
  258  0   0% S     1    780K    360K  fg root     logcat
  132  0   0% S     1    868K    424K  fg root     /system/bin/getlogtofile
unassign myself. There is nothing I have to do more.
Assignee: sotaro.ikeda.g → nobody
Since hardware composer default to on and fix the problem. Close this for now.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(chung)
Resolution: --- → INVALID
Keywords: perf
Whiteboard: c= → c= s=2013.05.31 ,
No longer blocks: 860441
You need to log in before you can comment on or make changes to this bug.