Open Bug 896391 Opened 7 years ago Updated 3 years ago

memcpy from camera preview's GraphicBuffer is slow

Categories

(Core :: WebRTC: Audio/Video, defect, P5)

ARM
Gonk (Firefox OS)
defect

Tracking

()

Blocking Flags:

People

(Reporter: slee, Unassigned)

References

Details

(Whiteboard: [WebRTC])

Attachments

(4 files)

memcpy from GraphicBuffer to system memory takes much time, about 10s ~ 100ms(and average 20s ~ 30s ms)  to copy 640*480*1.5 bytes.
I found it is resulted from the GraphicBuffer callbacked from camera preview is uncached. After pchang's help, we force the preview buffer from camera to be "cached".  The average memory copy time  reduces to 1~12ms(and average 2.s ms). 
On Android, there are 2 callbacks, one for preview and the other for record. On FFOS, we only use preview callback. I think we may use the record callback, [1], in peerconnection. The buffers callback from this callback is cached. But we cannot force gUM uses record callback by default since camera app gets MediaStream from gUM.

Hi ROC,

Do you think if we can add one more track into the MediaStream? For media element needs to display, it uses track 1. And for peerconnection, it copies from track2 to encode.
 
[1] http://dxr.mozilla.org/mozilla-central/source/dom/camera/GonkCameraSource.cpp#l85
(In reply to StevenLee from comment #0)
> Do you think if we can add one more track into the MediaStream? For media
> element needs to display, it uses track 1. And for peerconnection, it copies
> from track2 to encode.

We can't add an extra track, because that becomes visible to JS. What we could do is create a special image type that contains the data from both callbacks for each frame. Then different consumers of that image could request the most suitable format for their intended use.

We will want to do something similar to support, e.g., USB cameras that have built-in hardware encoders on other platforms.
(In reply to StevenLee from comment #0)
> memcpy from GraphicBuffer to system memory takes much time, about 10s ~
> 100ms(and average 20s ~ 30s ms)  to copy 640*480*1.5 bytes.
> I found it is resulted from the GraphicBuffer callbacked from camera preview
> is uncached. After pchang's help, we force the preview buffer from camera to
> be "cached".  The average memory copy time  reduces to 1~12ms(and average
> 2.s ms). 

Can you please provide the patch for this so it's clear what we're
talking about.


> On Android, there are 2 callbacks, one for preview and the other for record.
> On FFOS, we only use preview callback. I think we may use the record
> callback, [1], in peerconnection. The buffers callback from this callback is
> cached. But we cannot force gUM uses record callback by default since camera
> app gets MediaStream from gUM.


Why won't the camera app be happy to have the record callback used?
If it's using gUM, then aren't we currently incurring the long memcpy()
in any case?
(In reply to Eric Rescorla (:ekr) from comment #2)
> (In reply to StevenLee from comment #0)
> > memcpy from GraphicBuffer to system memory takes much time, about 10s ~
> > 100ms(and average 20s ~ 30s ms)  to copy 640*480*1.5 bytes.
> > I found it is resulted from the GraphicBuffer callbacked from camera preview
> > is uncached. After pchang's help, we force the preview buffer from camera to
> > be "cached".  The average memory copy time  reduces to 1~12ms(and average
> > 2.s ms). 
> 
> Can you please provide the patch for this so it's clear what we're
> talking about.
I don't have the patch on hand. But here is the chatting log with Peter. Basically, he made all pmem be cache-able for testing purpose.

peter@pchang-desktop:~/b2g_debug_leo/hardware/qcom/display$ git diff
diff --git a/libgralloc/alloc_controller.cpp b/libgralloc/alloc_controller.cpp
index 17a6b14..1338ee4 100644
--- a/libgralloc/alloc_controller.cpp
+++ b/libgralloc/alloc_controller.cpp
@@ -79,11 +79,14 @@ static bool canFallback(int usage, bool triedSystem)
 static bool useUncached(int usage)
 {
     // System heaps cannot be uncached
+    LOGE("found usage 0x%08x\n", usage);
     if(usage & (GRALLOC_USAGE_PRIVATE_SYSTEM_HEAP |
                 GRALLOC_USAGE_PRIVATE_IOMMU_HEAP))
         return false;
-    if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED)
-        return true;
+    if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) {
+        LOGE("found uncached2 flag\n");
+        return false;
+    }
     return false;
 }
 
@@ -269,8 +272,10 @@ int PmemAshmemController::allocate(alloc_data& data, int usage,
         data.uncached = false;
 
     // Override if we explicitly need uncached buffers
-    if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED)
-        data.uncached = true;
+    if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) {
+        LOGE("found uncached flag usage 0x%08x\n", usage);
+        data.uncached = false;
+    }

> 
> > On Android, there are 2 callbacks, one for preview and the other for record.
> > On FFOS, we only use preview callback. I think we may use the record
> > callback, [1], in peerconnection. The buffers callback from this callback is
> > cached. But we cannot force gUM uses record callback by default since camera
> > app gets MediaStream from gUM.
> 
> 
> Why won't the camera app be happy to have the record callback used?
> If it's using gUM, then aren't we currently incurring the long memcpy()
> in any case?
what does "cached" mean here?

Sometimes hardware buffers for IO devices are uncached to avoid polluting/wasting the cache and having to flush it when hardware DMA or mapping finishes.  However, if the buffer is read out in software (as opposed to being DMA'd elsewhere) having cache off may mean that each read (byte or word, whatever) may induce a memory read cycle instead of it reading a cacheline at a time.  As ekr says, the patch will help, but also an explanation of why it was uncached would help as well, and what assumptions led to that.
Hi erk and jesup,

Here is the patch for profiling memcpy. ConvertToI420 does only memcpy without color space conversion.
(In reply to Timothy B. Terriberry (:derf) from comment #1)
> We can't add an extra track, because that becomes visible to JS. What we
> could do is create a special image type that contains the data from both
> callbacks for each frame. Then different consumers of that image could
> request the most suitable format for their intended use.
Have we created the bug for this?
Thanks.
(In reply to Eric Rescorla (:ekr) from comment #2)
> Why won't the camera app be happy to have the record callback used?
> If it's using gUM, then aren't we currently incurring the long memcpy()
> in any case?
I thinks it's because they can get higher performance when displaying the camera streaming. When camera app needs to record, it registers record callback and uses this callback to get data and encode.
(In reply to Randell Jesup [:jesup] from comment #4)
> what does "cached" mean here?
> 
> Sometimes hardware buffers for IO devices are uncached to avoid
> polluting/wasting the cache and having to flush it when hardware DMA or
> mapping finishes.  However, if the buffer is read out in software (as
> opposed to being DMA'd elsewhere) having cache off may mean that each read
> (byte or word, whatever) may induce a memory read cycle instead of it
> reading a cacheline at a time.  As ekr says, the patch will help, but also
> an explanation of why it was uncached would help as well, and what
> assumptions led to that.
I've discussed with other colleagues in TPE office. We think it's something like you said.
(In reply to Randell Jesup [:jesup] from comment #4)
> what does "cached" mean here?
> 
> Sometimes hardware buffers for IO devices are uncached to avoid
> polluting/wasting the cache and having to flush it when hardware DMA or
> mapping finishes.  However, if the buffer is read out in software (as
> opposed to being DMA'd elsewhere) having cache off may mean that each read
> (byte or word, whatever) may induce a memory read cycle instead of it
> reading a cacheline at a time.  As ekr says, the patch will help, but also
> an explanation of why it was uncached would help as well, and what
> assumptions led to that.

I just checked the kernel implementation.
As Randell mentioned, the cached memory will not call flush and get better performance.

./drivers/misc/pmem.c

struct pmem_info {
 ...
        /* indicates maps of this region should be cached, if a mix of
         * cached and uncached is desired, set this and open the device with
         * O_SYNC to get an uncached region */
        unsigned cached;


void flush_pmem_file(struct file *file, unsigned long offset, unsigned long len)
...
        id = get_id(file);
        if (!pmem[id].cached)
                return;
Steven and I discussed this today and he points out that while on desktop we do a memcpy:

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ImageContainer.cpp#l454


However, on B2G, we just do an assignment of the Image* to the image variable RefPtr member variable.
Hi Michael,

I encountered some problems when porting camera into webrtc, memcpy from GraphicBuffer to system memory is slow. This is because that the memory in GraphicBuffer is non-cached. I found that there are 2 callbacks in camera module, preview and encoding. So that I tried to register encoding callback(postDataTimestamp) and get the video frame from that callback. And I had new problems.
1. There are only one buffer in encoding callback and it effects the preview callback? If I don't return the buffer, there is no more encoding and preview callback.
2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is there anyway that we can get the video frames from camera module more efficiently?
Flags: needinfo?(mvines)
Hey Tapas, could you please take a look at this.
Flags: needinfo?(mvines)
Flags: needinfo?(tkundu)
@m1, 

I will look into this and update here soon.
Hi Steven, 

I guess that you are testing this (In reply to StevenLee from comment #11)
> Created attachment 786117 [details] [diff] [review]
> EnableRecordingCallback.patch
> 
> Hi Michael,
> 
> I encountered some problems when porting camera into webrtc, memcpy from
> GraphicBuffer to system memory is slow. This is because that the memory in
> GraphicBuffer is non-cached. I found that there are 2 callbacks in camera
> module, preview and encoding. So that I tried to register encoding
> callback(postDataTimestamp) and get the video frame from that callback. And
> I had new problems.
> 1. There are only one buffer in encoding callback and it effects the preview
> callback? If I don't return the buffer, there is no more encoding and
> preview callback.
> 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is
> there anyway that we can get the video frames from camera module more
> efficiently?

I guess that you are testing this for ICS master branch (not jb_mr port of gecko) ?

Please confirm me platform details so that I can try to find more information about this problem.
Flags: needinfo?(slee)
(In reply to Tapas Kumar Kundu from comment #14)
> I guess that you are testing this for ICS master branch (not jb_mr port of
> gecko) ?
Hi Tapas,
Yes, we are using ICS.
Flags: needinfo?(slee)
(In reply to StevenLee from comment #11)
> Created attachment 786117 [details] [diff] [review]
> EnableRecordingCallback.patch
> 
> Hi Michael,
> 
> I encountered some problems when porting camera into webrtc, memcpy from
> GraphicBuffer to system memory is slow. This is because that the memory in
> GraphicBuffer is non-cached. I found that there are 2 callbacks in camera
> module, preview and encoding. So that I tried to register encoding
> callback(postDataTimestamp) and get the video frame from that callback. And
> I had new problems.
> 1. There are only one buffer in encoding callback and it effects the preview
> callback? If I don't return the buffer, there is no more encoding and
> preview callback.
> 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is
> there anyway that we can get the video frames from camera module more
> efficiently?

I tried to do following steps :

1) I rebuild my ICS master latest tip (synced on 21th Aug) with your your patch (https://bugzilla.mozilla.org/attachment.cgi?id=786117). I saw some minor conflicts and resolve it myself. 
2) I flashed device and launched camera app
3) I tried to switch into video mode from preview mode. 

Camera app crashed at step 3. Please know that camera app crashes in step 3 even if I don't apply your patch. 

My guess is that I need to record some video using 'camera app' on ICS master branch to reproduce this performance problem. 

Please help me by mentioning steps which can help me to reproduce this performance problem in my device (with your patch). This will help me to understand actual reason of this problem
Flags: needinfo?(tkundu)
Hi Tapas,

Do you build with mozilla-central(You built with gecko in B2G folder or you check out gecko from http://hg.mozilla.org/mozilla-central/ by mercurial)? The latter is correct. 

1. Please apply this patch. It skips the camera permission check.
2. Turn on the log in content/media/webrtc/MediaEngineWebRTCVideo.cpp::AddRecordingBuffer
3. Build gecko, flash and go to http://mozilla.github.io/webrtc-landing/gum_test.html
4. Choose video and logcat will show the average memcpy time

BTW, camera app does not have this problem. I guess it's because that camera app uses OMX encoder to handle the memory.
Flags: needinfo?(slee)

hi Stevenlee,

I was able to reproduce your issue today.I will analyse it and get back to you soon
Hi Tapas,

Is there any update about this issue?
Thanks.
So(In reply to Tapas Kumar Kundu from comment #18)
> 
> hi Stevenlee,
> 
> I was able to reproduce your issue today.I will analyse it and get back to
> you soon

I have one doubt why don't we use OMX encoder like camera app? Sorry for the delay. I was busy with other high priority task. I am trying to find a better workaround here.
(In reply to Tapas Kumar Kundu from comment #20)
> I have one doubt why don't we use OMX encoder like camera app? Sorry for the
> delay. I was busy with other high priority task. I am trying to find a
> better workaround here.
Hi Tapas,

Because in WebRTC, we are using vp8 codec, and not all platforms have OMX encoder with vp8 codec. Furthermore, the memcpy from OMX buffer to system memory seems slow, too(copy VGA size YCbCr buffer takes about 20s ms). After encoding, we need to copy the encoded data to system memory for sending to others.
(In reply to StevenLee from comment #21)
> (In reply to Tapas Kumar Kundu from comment #20)
> > I have one doubt why don't we use OMX encoder like camera app? Sorry for the
> > delay. I was busy with other high priority task. I am trying to find a
> > better workaround here.
> Hi Tapas,
> 
> Because in WebRTC, we are using vp8 codec, and not all platforms have OMX
> encoder with vp8 codec. Furthermore, the memcpy from OMX buffer to system
> memory seems slow, too(copy VGA size YCbCr buffer takes about 20s ms). After
> encoding, we need to copy the encoded data to system memory for sending to
> others.

Good Point. Thanks for the information. 

I understand that you need to memcpy buffer for sending it to internet. I have one doubt. I guess that firefox runs on Android also. 

How does webrtc works on android ? 
Is that also does same kind of memcpy ? 
is that memcy faster than what FFOS is doing ? 
Are we doing something different from that Android webrtc? 

I will enable caching for graphic buffer if it is the *ONLY* solution. I want to find a better workaround for it.
Flags: needinfo?(tkundu)
WebRTC on Android is pretty unoptimized in general (any implementation, including ours). We are currently focusing optimization on FFOS and will carry over what we learn here to our Android implementation.
(In reply to StevenLee from comment #11)
> Created attachment 786117 [details] [diff] [review]
> EnableRecordingCallback.patch
> 
> Hi Michael,
> 
> I encountered some problems when porting camera into webrtc, memcpy from
> GraphicBuffer to system memory is slow. This is because that the memory in
> GraphicBuffer is non-cached. I found that there are 2 callbacks in camera
> module, preview and encoding. So that I tried to register encoding
> callback(postDataTimestamp) and get the video frame from that callback. And
> I had new problems.
> 1. There are only one buffer in encoding callback and it effects the preview
> callback? If I don't return the buffer, there is no more encoding and
> preview callback.

I tried to disable encoding callback by commenting following line in MediaEngineWebRTCVideo.cpp (inside GonkCameraSourceListener::postDataTimestamp() function).

//mSource->AddRecordingBuffer(dataPtr->pointer());

 But I am unable to see smooth video in webrtc even after disabling encoding/recording callback. But existing camera app video is smooth always (even if I enable recording) .

if we are not using recording/encoding callback then webrtc camera preview video should be as good as existing camera app. (You can trace it by putting log in "GonkCameraHardware::OnNewFrame()" )

But this is not happening.  Can you please tell me reasons for this ?

 
> 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is
> there anyway that we can get the video frames from camera module more
> efficiently?
 It seems to me that we need camera recording frame buffer in system memory instead of pmem. I am in touch with another internal team who can help us in this matter. I will update here soon.
Flags: needinfo?(slee)
(In reply to Tapas Kumar Kundu from comment #24)
>  But I am unable to see smooth video in webrtc even after disabling
> encoding/recording callback. But existing camera app video is smooth always
> (even if I enable recording) .
> if we are not using recording/encoding callback then webrtc camera preview
> video should be as good as existing camera app. (You can trace it by putting
> log in "GonkCameraHardware::OnNewFrame()" )
> But this is not happening.  Can you please tell me reasons for this ?
Hi Tapas, 

Actually, the display mechanism of webrtc and camera app are different. For camera app, when the video callbacks, it will be displayed directly. But for webrtc, we will send the video to our media framework. The media framework decides when the frame should be displayed then returns the video frame to GonkCamera. I am not sure if it causes the problem. I will try to figure out it.
Flags: needinfo?(slee)
In camera preview case, gecko bypass MediaStreamGraph. Because MediaStreamGraph added too much latency to camera preview. See Bug 844248.

https://github.com/sotaroikeda/firefox-diagrams/blob/master/dom/dom_camera_DOMCameraPreview_FirefoxOS_1_01.pdf?raw=true
This has turned into a really long bug, so let me try to summarize what I think
is going on:

(In reply to Tapas Kumar Kundu from comment #24)
> (In reply to StevenLee from comment #11)
> > Created attachment 786117 [details] [diff] [review]
> > EnableRecordingCallback.patch
> > 
> > Hi Michael,
> > 
> > I encountered some problems when porting camera into webrtc, memcpy from
> > GraphicBuffer to system memory is slow. This is because that the memory in
> > GraphicBuffer is non-cached. I found that there are 2 callbacks in camera
> > module, preview and encoding. So that I tried to register encoding
> > callback(postDataTimestamp) and get the video frame from that callback. And
> > I had new problems.
> > 1. There are only one buffer in encoding callback and it effects the preview
> > callback? If I don't return the buffer, there is no more encoding and
> > preview callback.
> 
> I tried to disable encoding callback by commenting following line in
> MediaEngineWebRTCVideo.cpp (inside
> GonkCameraSourceListener::postDataTimestamp() function).
> 
> //mSource->AddRecordingBuffer(dataPtr->pointer());
> 
>  But I am unable to see smooth video in webrtc even after disabling
> encoding/recording callback. But existing camera app video is smooth always
> (even if I enable recording) .

Can you clarify the test you are doing here? Is this just getUserMedia mapped
to a local video element?


> if we are not using recording/encoding callback then webrtc camera preview
> video should be as good as existing camera app. (You can trace it by putting
> log in "GonkCameraHardware::OnNewFrame()" )
> 
> But this is not happening.  Can you please tell me reasons for this ?
> 
>  
> > 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is
> > there anyway that we can get the video frames from camera module more
> > efficiently?
>  It seems to me that we need camera recording frame buffer in system memory
> instead of pmem. I am in touch with another internal team who can help us in
> this matter. I will update here soon.
(In reply to Eric Rescorla (:ekr) from comment #27)

> Can you clarify the test you are doing here? Is this just getUserMedia mapped
> to a local video element?

I followed all steps from comment #17 and trying to see webrtc camera preview is as good as existing camera app preview. I disabled recording callback (comment #24) to stop additional processing and release recording frame immediately.

From comment #11, it seems to me that I should be able to see good camera preview in webrtc if I disable additional processing in recording callback postdataTimestamp() to avoid memcpy delay. But I still don't see webrtc camera preview is as good as existing camera app preview. Is this because of some other issue in webrtc? 

I am not sure whether 'disabling recording callback (comment #24)' will force Camera preview in webrtc to local video element or not.
Flags: needinfo?(ekr)
(In reply to Tapas Kumar Kundu from comment #28)
> (In reply to Eric Rescorla (:ekr) from comment #27)
> 
> > Can you clarify the test you are doing here? Is this just getUserMedia mapped
> > to a local video element?
> 
I think Tapas is using http://mozilla.github.io/webrtc-landing/gum_test.html to test. It just calls getUserMedia and assign the MediaStream to a local video element.

> I followed all steps from comment #17 and trying to see webrtc camera
> preview is as good as existing camera app preview. I disabled recording
> callback (comment #24) to stop additional processing and release recording
> frame immediately.
I think you can try to disable recording callback by commenting out mNativeCameraControl->mCameraHw->SetListener and mNativeCameraControl->mCameraHw->StartRecording. 

> From comment #11, it seems to me that I should be able to see good camera
> preview in webrtc if I disable additional processing in recording callback
> postdataTimestamp() to avoid memcpy delay. But I still don't see webrtc
> camera preview is as good as existing camera app preview. Is this because of
> some other issue in webrtc? 
The slow memcpy problem is resulted from that we are using preview callback to preview and encode. When encoding, we need to copy the video frame from GraphicBuffer. Since it's a non-cached memory so that the memcpy is slow. Then we try to use another path, recording callback. We think it should be a correct way to do that(that's the patch in comment 11). But we found that the memcpy is slow too. So that it seems no efficient way to copy captured video frames to system memory. 

Here are the reasons why camera app does not have such problem. 
* preview - in comment 26, as Sotaro said that camera app has another way to display.
* recording - it should be that camera app uses OMX codec that may no need to copy the video frames to system memory.
Hi,
(In reply to StevenLee from comment #29)


> > I followed all steps from comment #17 and trying to see webrtc camera
> > preview is as good as existing camera app preview. I disabled recording
> > callback (comment #24) to stop additional processing and release recording
> > frame immediately.
> I think you can try to disable recording callback by commenting out
> mNativeCameraControl->mCameraHw->SetListener and
> mNativeCameraControl->mCameraHw->StartRecording. 
> 

I still don't see stutter in preview video in gum test page. I also enabled caching for camera buffer during this test. 

My suggestion will be : 

1) Use Recording callack for MediaStreamGraph processing (or any other processing needed for webrtc implementation) which can delay (or show stutter) in camera preview. Preview should be good always. There should not be any delay in preview video.

2) I am enabling cache for camera buffer. So memcpy won't be an issue anymore. I will upload a patch soon for this.


This should solve problems mentioned in #comment 11 .
Flags: needinfo?(tkundu)
Hi 

I made a type mistake in  above comment.

>>I still don't see stutter in preview video in gum test page. I also enabled caching for camera buffer during this test. 

This line should be : 

I still see *BIG STUTTER* in preview video in gum test page. I also enabled caching for camera buffer during this test.
Tapas: Often block-copying a large framebuffer is best implemented by avoiding the cache (since it often (esp on mobile) blow the entire cache multiple times).  The only caveat would be if the read implementation caused memcpy() to generate extra HW memory read cycles for uncached memory; in which case it still could be better even if the cache gets blown.  Also, often there's a platform-specific way to copy HW memory buffers around efficiently (DMA engine, 3D engine/blitter, etc), but maybe there's no such thing available here.  I never saw an answer to the "why is it uncached" comment I made, though I can guess perhaps.
Flags: needinfo?(tkundu)
(In reply to Randell Jesup [:jesup] from comment #32)
> Tapas: Often block-copying a large framebuffer is best implemented by
> avoiding the cache (since it often (esp on mobile) blow the entire cache
> multiple times).  The only caveat would be if the read implementation caused
> memcpy() to generate extra HW memory read cycles for uncached memory; in
> which case it still could be better even if the cache gets blown.  Also,
> often there's a platform-specific way to copy HW memory buffers around
> efficiently (DMA engine, 3D engine/blitter, etc), but maybe there's no such
> thing available here.  I never saw an answer to the "why is it uncached"
> comment I made, though I can guess perhaps.

I understand your concerns. I think that camera driver is providing buffer on cached memory so memcpy is not required.
Tapas,

Maybe I am misunderstanding you, but we actually do need
to copy the data from this buffer to enqueue it for the
encoder (and of course this also is going to involve reading
the entire buffer) so we can color convert, encode, etc.
(In reply to Eric Rescorla (:ekr) from comment #34)
> Tapas,
> 
> Maybe I am misunderstanding you, but we actually do need
> to copy the data from this buffer to enqueue it for the
> encoder (and of course this also is going to involve reading
> the entire buffer) so we can color convert, encode, etc.

This should be fine. Camera buffer will be cached with my patch and you should not see any overhead for memcpy or memory read. I will upload a fix for it. 

At present, there is delay in processing of buffer from preview callback (#comment 26 and #comment 29). This is causing preview video to be bad in webrtc.

But please use recording callback to do webrtc processing (#comment 30) . Preview should not be delayed by webrtc processing. 


Please let me know if you have any doubts.
Steven, can you try this and see if it helps?
Flags: needinfo?(slee)
(In reply to Tapas Kumar Kundu from comment #30)
> 2) I am enabling cache for camera buffer. So memcpy won't be an issue
> anymore. I will upload a patch soon for this.
Can you give me the link to the patch then I can test when you're done?

Ekr,
Sure, I will test it when Tapas's patch is done.
Flags: needinfo?(slee)
(In reply to StevenLee from comment #37)
> (In reply to Tapas Kumar Kundu from comment #30)
> > 2) I am enabling cache for camera buffer. So memcpy won't be an issue
> > anymore. I will upload a patch soon for this.
> Can you give me the link to the patch then I can test when you're done?
> 
> Ekr,
> Sure, I will test it when Tapas's patch is done.

I have uploaded patch. Could you please try with latest repo ?
Attached file build error log
Hi Tapas, 
Sorry for late reply. I tried and got building error. It seems resulted from that make cannot find camera.h and camera_defs_i.h. Where can I get these 2 files?
Thanks.
Flags: needinfo?(tkundu)
(In reply to StevenLee from comment #40)
> Created attachment 815758 [details]
> build error log
> 
> Hi Tapas, 
> Sorry for late reply. I tried and got building error. It seems resulted from
> that make cannot find camera.h and camera_defs_i.h. Where can I get these 2
> files?
> Thanks.

 I already tested that it works fine in my build. Can you please try a clean build again .
Flags: needinfo?(tkundu)
Hi Tapas,

I tried the clean build and it failed, too.
I think it's because that on my build, we do not build hardware/qcom/camera. Our camera.msm7627a.so is from the vendor. So that we may not have all the source code needed for camera module.
Flags: needinfo?(slee)
Can you please update me with latest status on this?
Flags: needinfo?(slee)
Hi Tapas,

I am waiting for the vendor. I need them to build the library. 
But I did the more detailed memory copy performance on unagi and peak. There are the variables
1. copy by libyuv
   a. width is 64 alignment (640x480)
   b. width is not 64 alignment (636x480)
2. copy by memcpy
   a. width is 64 alignment (640x480)
   b. width is not 64 alignment (636x480)
3. GraphicBuffer is cache or not

Here are the results. It shows that for non-cached memory, if we can use libyuv's optimized memory copy, the speed is acceptable.

* non-cache version
** peak
        640x480  636x480
libyuv  2.7-2.8   16-17
memcpy   16-17    16-17

** unagi
        640x480  636x480
libyuv   3-4      21-22
memcpy  21-22     21-22

* cache version
        640x480    636x480
libyuv  0.7-0.8    0.7-0.8
memcpy  1.3-1.5    0.7-0.8

** unagi
        640x480    636x480
libyuv  2.8-3      1.2-1.3
memcpy  1.6-1.8    1.2-1.3
Flags: needinfo?(slee)
slee: thanks for the tests!

So: aligned memory buffers work much better than unaligned with cache off (no surprise) in libyuv.  Slightly surprising that memcpy doesn't hit the optimized path for non-cached, but it's not designed for non-cached really.

Cached is faster all around, though this doesn't capture the impact on other operations caused by totally blowing the cache away during the copy.  2ms difference (peak) is significant, though - if that's all a "pure win".  If we lose elsewhere by caching, then the uncached libyuv aligned copy may be best.

The data in the buffer being copied (i.e. camera data): if we turn caching on, *if* the data in that buffer appears via DMA or equivalent, I assume that the driver flushes the cache (or those cache lines)?

slee: did your test copy the frames once, or multiple times?  I.e. a case where there's a 'hot' cache would not be a good test, probably.  Was this a test copying data from the camera, when the camera said it was ready?  If so, that makes it a more real-world test, which is good.

Also, are those numbers in ms?
Flags: needinfo?(slee)
Hi jesup,

I test copy the frame once, just estimate the time consumption in function, [1]. And I force the format to I420, so that there is no color space converting. So the calling path is the same as real world. And all the numbers are in ms. 

I will test the new library provided by the vendor today. And I will update the new data when it's done.

[1]http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_capture/video_capture_impl.cc#286
Flags: needinfo?(slee)
We should do some new measurements on current hardware and OS versions
backlog: --- → webRTC+
Rank: 45
Priority: -- → P4
Mass change P4->P5 to align with new Mozilla triage process.
Priority: P4 → P5
You need to log in before you can comment on or make changes to this bug.