896391 - memcpy from camera preview's GraphicBuffer is slow

Reporter

Description

•

12 years ago

memcpy from GraphicBuffer to system memory takes much time, about 10s ~ 100ms(and average 20s ~ 30s ms) to copy 640*480*1.5 bytes. I found it is resulted from the GraphicBuffer callbacked from camera preview is uncached. After pchang's help, we force the preview buffer from camera to be "cached". The average memory copy time reduces to 1~12ms(and average 2.s ms). On Android, there are 2 callbacks, one for preview and the other for record. On FFOS, we only use preview callback. I think we may use the record callback, [1], in peerconnection. The buffers callback from this callback is cached. But we cannot force gUM uses record callback by default since camera app gets MediaStream from gUM. Hi ROC, Do you think if we can add one more track into the MediaStream? For media element needs to display, it uses track 1. And for peerconnection, it copies from track2 to encode. [1] http://dxr.mozilla.org/mozilla-central/source/dom/camera/GonkCameraSource.cpp#l85

Timothy B. Terriberry (:derf)

Comment 1

•

12 years ago

(In reply to StevenLee from comment #0) > Do you think if we can add one more track into the MediaStream? For media > element needs to display, it uses track 1. And for peerconnection, it copies > from track2 to encode. We can't add an extra track, because that becomes visible to JS. What we could do is create a special image type that contains the data from both callbacks for each frame. Then different consumers of that image could request the most suitable format for their intended use. We will want to do something similar to support, e.g., USB cameras that have built-in hardware encoders on other platforms.

Eric Rescorla (:ekr)

Comment 2

•

12 years ago

(In reply to StevenLee from comment #0) > memcpy from GraphicBuffer to system memory takes much time, about 10s ~ > 100ms(and average 20s ~ 30s ms) to copy 640*480*1.5 bytes. > I found it is resulted from the GraphicBuffer callbacked from camera preview > is uncached. After pchang's help, we force the preview buffer from camera to > be "cached". The average memory copy time reduces to 1~12ms(and average > 2.s ms). Can you please provide the patch for this so it's clear what we're talking about. > On Android, there are 2 callbacks, one for preview and the other for record. > On FFOS, we only use preview callback. I think we may use the record > callback, [1], in peerconnection. The buffers callback from this callback is > cached. But we cannot force gUM uses record callback by default since camera > app gets MediaStream from gUM. Why won't the camera app be happy to have the record callback used? If it's using gUM, then aren't we currently incurring the long memcpy() in any case?

u459114

Comment 3

•

12 years ago

(In reply to Eric Rescorla (:ekr) from comment #2) > (In reply to StevenLee from comment #0) > > memcpy from GraphicBuffer to system memory takes much time, about 10s ~ > > 100ms(and average 20s ~ 30s ms) to copy 640*480*1.5 bytes. > > I found it is resulted from the GraphicBuffer callbacked from camera preview > > is uncached. After pchang's help, we force the preview buffer from camera to > > be "cached". The average memory copy time reduces to 1~12ms(and average > > 2.s ms). > > Can you please provide the patch for this so it's clear what we're > talking about. I don't have the patch on hand. But here is the chatting log with Peter. Basically, he made all pmem be cache-able for testing purpose. peter@pchang-desktop:~/b2g_debug_leo/hardware/qcom/display$ git diff diff --git a/libgralloc/alloc_controller.cpp b/libgralloc/alloc_controller.cpp index 17a6b14..1338ee4 100644 --- a/libgralloc/alloc_controller.cpp +++ b/libgralloc/alloc_controller.cpp @@ -79,11 +79,14 @@ static bool canFallback(int usage, bool triedSystem) static bool useUncached(int usage) { // System heaps cannot be uncached + LOGE("found usage 0x%08x\n", usage); if(usage & (GRALLOC_USAGE_PRIVATE_SYSTEM_HEAP | GRALLOC_USAGE_PRIVATE_IOMMU_HEAP)) return false; - if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) - return true; + if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) { + LOGE("found uncached2 flag\n"); + return false; + } return false; } @@ -269,8 +272,10 @@ int PmemAshmemController::allocate(alloc_data& data, int usage, data.uncached = false; // Override if we explicitly need uncached buffers - if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) - data.uncached = true; + if (usage & GRALLOC_USAGE_PRIVATE_UNCACHED) { + LOGE("found uncached flag usage 0x%08x\n", usage); + data.uncached = false; + } > > > On Android, there are 2 callbacks, one for preview and the other for record. > > On FFOS, we only use preview callback. I think we may use the record > > callback, [1], in peerconnection. The buffers callback from this callback is > > cached. But we cannot force gUM uses record callback by default since camera > > app gets MediaStream from gUM. > > > Why won't the camera app be happy to have the record callback used? > If it's using gUM, then aren't we currently incurring the long memcpy() > in any case?

Randell Jesup [:jesup] (needinfo me)

Comment 4

•

12 years ago

what does "cached" mean here? Sometimes hardware buffers for IO devices are uncached to avoid polluting/wasting the cache and having to flush it when hardware DMA or mapping finishes. However, if the buffer is read out in software (as opposed to being DMA'd elsewhere) having cache off may mean that each read (byte or word, whatever) may induce a memory read cycle instead of it reading a cacheline at a time. As ekr says, the patch will help, but also an explanation of why it was uncached would help as well, and what assumptions led to that.

StevenLee[:slee]

Reporter

Comment 5

•

12 years ago

Attached patch profileMemcpy.patch — Details — Splinter Review

Hi erk and jesup, Here is the patch for profiling memcpy. ConvertToI420 does only memcpy without color space conversion.

StevenLee[:slee]

Reporter

Comment 6

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #1) > We can't add an extra track, because that becomes visible to JS. What we > could do is create a special image type that contains the data from both > callbacks for each frame. Then different consumers of that image could > request the most suitable format for their intended use. Have we created the bug for this? Thanks.

StevenLee[:slee]

Reporter

Comment 7

•

12 years ago

(In reply to Eric Rescorla (:ekr) from comment #2) > Why won't the camera app be happy to have the record callback used? > If it's using gUM, then aren't we currently incurring the long memcpy() > in any case? I thinks it's because they can get higher performance when displaying the camera streaming. When camera app needs to record, it registers record callback and uses this callback to get data and encode.

StevenLee[:slee]

Reporter

Comment 8

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #4) > what does "cached" mean here? > > Sometimes hardware buffers for IO devices are uncached to avoid > polluting/wasting the cache and having to flush it when hardware DMA or > mapping finishes. However, if the buffer is read out in software (as > opposed to being DMA'd elsewhere) having cache off may mean that each read > (byte or word, whatever) may induce a memory read cycle instead of it > reading a cacheline at a time. As ekr says, the patch will help, but also > an explanation of why it was uncached would help as well, and what > assumptions led to that. I've discussed with other colleagues in TPE office. We think it's something like you said.

Peter Chang[:pchang]

Comment 9

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #4) > what does "cached" mean here? > > Sometimes hardware buffers for IO devices are uncached to avoid > polluting/wasting the cache and having to flush it when hardware DMA or > mapping finishes. However, if the buffer is read out in software (as > opposed to being DMA'd elsewhere) having cache off may mean that each read > (byte or word, whatever) may induce a memory read cycle instead of it > reading a cacheline at a time. As ekr says, the patch will help, but also > an explanation of why it was uncached would help as well, and what > assumptions led to that. I just checked the kernel implementation. As Randell mentioned, the cached memory will not call flush and get better performance. ./drivers/misc/pmem.c struct pmem_info { ... /* indicates maps of this region should be cached, if a mix of * cached and uncached is desired, set this and open the device with * O_SYNC to get an uncached region */ unsigned cached; void flush_pmem_file(struct file *file, unsigned long offset, unsigned long len) ... id = get_id(file); if (!pmem[id].cached) return;

Eric Rescorla (:ekr)

Comment 10

•

12 years ago

Steven and I discussed this today and he points out that while on desktop we do a memcpy: http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ImageContainer.cpp#l454 However, on B2G, we just do an assignment of the Image* to the image variable RefPtr member variable.

StevenLee[:slee]

Reporter

Comment 11

•

12 years ago

Attached patch EnableRecordingCallback.patch — Details — Splinter Review

Hi Michael, I encountered some problems when porting camera into webrtc, memcpy from GraphicBuffer to system memory is slow. This is because that the memory in GraphicBuffer is non-cached. I found that there are 2 callbacks in camera module, preview and encoding. So that I tried to register encoding callback(postDataTimestamp) and get the video frame from that callback. And I had new problems. 1. There are only one buffer in encoding callback and it effects the preview callback? If I don't return the buffer, there is no more encoding and preview callback. 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is there anyway that we can get the video frames from camera module more efficiently?

Flags: needinfo?(mvines)

Michael Vines [:m1] [:evilmachines]

Comment 12

•

12 years ago

Hey Tapas, could you please take a look at this.

Flags: needinfo?(mvines)

Michael Vines [:m1] [:evilmachines]

Updated

•

12 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 13

•

12 years ago

@m1, I will look into this and update here soon.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 14

•

12 years ago

Hi Steven, I guess that you are testing this (In reply to StevenLee from comment #11) > Created attachment 786117 [details] [diff] [review] > EnableRecordingCallback.patch > > Hi Michael, > > I encountered some problems when porting camera into webrtc, memcpy from > GraphicBuffer to system memory is slow. This is because that the memory in > GraphicBuffer is non-cached. I found that there are 2 callbacks in camera > module, preview and encoding. So that I tried to register encoding > callback(postDataTimestamp) and get the video frame from that callback. And > I had new problems. > 1. There are only one buffer in encoding callback and it effects the preview > callback? If I don't return the buffer, there is no more encoding and > preview callback. > 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is > there anyway that we can get the video frames from camera module more > efficiently? I guess that you are testing this for ICS master branch (not jb_mr port of gecko) ? Please confirm me platform details so that I can try to find more information about this problem.

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 15

•

12 years ago

(In reply to Tapas Kumar Kundu from comment #14) > I guess that you are testing this for ICS master branch (not jb_mr port of > gecko) ? Hi Tapas, Yes, we are using ICS.

Flags: needinfo?(slee)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 16

•

12 years ago

(In reply to StevenLee from comment #11) > Created attachment 786117 [details] [diff] [review] > EnableRecordingCallback.patch > > Hi Michael, > > I encountered some problems when porting camera into webrtc, memcpy from > GraphicBuffer to system memory is slow. This is because that the memory in > GraphicBuffer is non-cached. I found that there are 2 callbacks in camera > module, preview and encoding. So that I tried to register encoding > callback(postDataTimestamp) and get the video frame from that callback. And > I had new problems. > 1. There are only one buffer in encoding callback and it effects the preview > callback? If I don't return the buffer, there is no more encoding and > preview callback. > 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is > there anyway that we can get the video frames from camera module more > efficiently? I tried to do following steps : 1) I rebuild my ICS master latest tip (synced on 21th Aug) with your your patch (https://bugzilla.mozilla.org/attachment.cgi?id=786117). I saw some minor conflicts and resolve it myself. 2) I flashed device and launched camera app 3) I tried to switch into video mode from preview mode. Camera app crashed at step 3. Please know that camera app crashes in step 3 even if I don't apply your patch. My guess is that I need to record some video using 'camera app' on ICS master branch to reproduce this performance problem. Please help me by mentioning steps which can help me to reproduce this performance problem in my device (with your patch). This will help me to understand actual reason of this problem

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

12 years ago

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 17

•

12 years ago

Attached patch SkipCameraPermissionCheck.patch — Details — Splinter Review

Hi Tapas, Do you build with mozilla-central(You built with gecko in B2G folder or you check out gecko from http://hg.mozilla.org/mozilla-central/ by mercurial)? The latter is correct. 1. Please apply this patch. It skips the camera permission check. 2. Turn on the log in content/media/webrtc/MediaEngineWebRTCVideo.cpp::AddRecordingBuffer 3. Build gecko, flash and go to http://mozilla.github.io/webrtc-landing/gum_test.html 4. Choose video and logcat will show the average memcpy time BTW, camera app does not have this problem. I guess it's because that camera app uses OMX encoder to handle the memory.

Flags: needinfo?(slee)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 18

•

12 years ago

hi Stevenlee, I was able to reproduce your issue today.I will analyse it and get back to you soon

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

12 years ago

Flags: needinfo?(tkundu)

StevenLee[:slee]

Reporter

Comment 19

•

12 years ago

Hi Tapas, Is there any update about this issue? Thanks.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 20

•

12 years ago

So(In reply to Tapas Kumar Kundu from comment #18) > > hi Stevenlee, > > I was able to reproduce your issue today.I will analyse it and get back to > you soon I have one doubt why don't we use OMX encoder like camera app? Sorry for the delay. I was busy with other high priority task. I am trying to find a better workaround here.

StevenLee[:slee]

Reporter

Comment 21

•

12 years ago

(In reply to Tapas Kumar Kundu from comment #20) > I have one doubt why don't we use OMX encoder like camera app? Sorry for the > delay. I was busy with other high priority task. I am trying to find a > better workaround here. Hi Tapas, Because in WebRTC, we are using vp8 codec, and not all platforms have OMX encoder with vp8 codec. Furthermore, the memcpy from OMX buffer to system memory seems slow, too(copy VGA size YCbCr buffer takes about 20s ms). After encoding, we need to copy the encoded data to system memory for sending to others.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 22

•

12 years ago

(In reply to StevenLee from comment #21) > (In reply to Tapas Kumar Kundu from comment #20) > > I have one doubt why don't we use OMX encoder like camera app? Sorry for the > > delay. I was busy with other high priority task. I am trying to find a > > better workaround here. > Hi Tapas, > > Because in WebRTC, we are using vp8 codec, and not all platforms have OMX > encoder with vp8 codec. Furthermore, the memcpy from OMX buffer to system > memory seems slow, too(copy VGA size YCbCr buffer takes about 20s ms). After > encoding, we need to copy the encoded data to system memory for sending to > others. Good Point. Thanks for the information. I understand that you need to memcpy buffer for sending it to internet. I have one doubt. I guess that firefox runs on Android also. How does webrtc works on android ? Is that also does same kind of memcpy ? is that memcy faster than what FFOS is doing ? Are we doing something different from that Android webrtc? I will enable caching for graphic buffer if it is the *ONLY* solution. I want to find a better workaround for it.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

12 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

12 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

12 years ago

Flags: needinfo?(tkundu)

Andreas Gal :gal

Comment 23

•

12 years ago

WebRTC on Android is pretty unoptimized in general (any implementation, including ours). We are currently focusing optimization on FFOS and will carry over what we learn here to our Android implementation.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 24

•

12 years ago

(In reply to StevenLee from comment #11) > Created attachment 786117 [details] [diff] [review] > EnableRecordingCallback.patch > > Hi Michael, > > I encountered some problems when porting camera into webrtc, memcpy from > GraphicBuffer to system memory is slow. This is because that the memory in > GraphicBuffer is non-cached. I found that there are 2 callbacks in camera > module, preview and encoding. So that I tried to register encoding > callback(postDataTimestamp) and get the video frame from that callback. And > I had new problems. > 1. There are only one buffer in encoding callback and it effects the preview > callback? If I don't return the buffer, there is no more encoding and > preview callback. I tried to disable encoding callback by commenting following line in MediaEngineWebRTCVideo.cpp (inside GonkCameraSourceListener::postDataTimestamp() function). //mSource->AddRecordingBuffer(dataPtr->pointer()); But I am unable to see smooth video in webrtc even after disabling encoding/recording callback. But existing camera app video is smooth always (even if I enable recording) . if we are not using recording/encoding callback then webrtc camera preview video should be as good as existing camera app. (You can trace it by putting log in "GonkCameraHardware::OnNewFrame()" ) But this is not happening. Can you please tell me reasons for this ? > 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is > there anyway that we can get the video frames from camera module more > efficiently? It seems to me that we need camera recording frame buffer in system memory instead of pmem. I am in touch with another internal team who can help us in this matter. I will update here soon.

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 25

•

12 years ago

(In reply to Tapas Kumar Kundu from comment #24) > But I am unable to see smooth video in webrtc even after disabling > encoding/recording callback. But existing camera app video is smooth always > (even if I enable recording) . > if we are not using recording/encoding callback then webrtc camera preview > video should be as good as existing camera app. (You can trace it by putting > log in "GonkCameraHardware::OnNewFrame()" ) > But this is not happening. Can you please tell me reasons for this ? Hi Tapas, Actually, the display mechanism of webrtc and camera app are different. For camera app, when the video callbacks, it will be displayed directly. But for webrtc, we will send the video to our media framework. The media framework decides when the frame should be displayed then returns the video frame to GonkCamera. I am not sure if it causes the problem. I will try to figure out it.

Flags: needinfo?(slee)

Sotaro Ikeda [:sotaro]

Comment 26

•

12 years ago

In camera preview case, gecko bypass MediaStreamGraph. Because MediaStreamGraph added too much latency to camera preview. See Bug 844248. https://github.com/sotaroikeda/firefox-diagrams/blob/master/dom/dom_camera_DOMCameraPreview_FirefoxOS_1_01.pdf?raw=true

Eric Rescorla (:ekr)

Comment 27

•

12 years ago

This has turned into a really long bug, so let me try to summarize what I think is going on: (In reply to Tapas Kumar Kundu from comment #24) > (In reply to StevenLee from comment #11) > > Created attachment 786117 [details] [diff] [review] > > EnableRecordingCallback.patch > > > > Hi Michael, > > > > I encountered some problems when porting camera into webrtc, memcpy from > > GraphicBuffer to system memory is slow. This is because that the memory in > > GraphicBuffer is non-cached. I found that there are 2 callbacks in camera > > module, preview and encoding. So that I tried to register encoding > > callback(postDataTimestamp) and get the video frame from that callback. And > > I had new problems. > > 1. There are only one buffer in encoding callback and it effects the preview > > callback? If I don't return the buffer, there is no more encoding and > > preview callback. > > I tried to disable encoding callback by commenting following line in > MediaEngineWebRTCVideo.cpp (inside > GonkCameraSourceListener::postDataTimestamp() function). > > //mSource->AddRecordingBuffer(dataPtr->pointer()); > > But I am unable to see smooth video in webrtc even after disabling > encoding/recording callback. But existing camera app video is smooth always > (even if I enable recording) . Can you clarify the test you are doing here? Is this just getUserMedia mapped to a local video element? > if we are not using recording/encoding callback then webrtc camera preview > video should be as good as existing camera app. (You can trace it by putting > log in "GonkCameraHardware::OnNewFrame()" ) > > But this is not happening. Can you please tell me reasons for this ? > > > > 2. The memcpy is slow too. :( Is it non-cached? The memcpy is about 20ms. Is > > there anyway that we can get the video frames from camera module more > > efficiently? > It seems to me that we need camera recording frame buffer in system memory > instead of pmem. I am in touch with another internal team who can help us in > this matter. I will update here soon.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 28

•

12 years ago

(In reply to Eric Rescorla (:ekr) from comment #27) > Can you clarify the test you are doing here? Is this just getUserMedia mapped > to a local video element? I followed all steps from comment #17 and trying to see webrtc camera preview is as good as existing camera app preview. I disabled recording callback (comment #24) to stop additional processing and release recording frame immediately. From comment #11, it seems to me that I should be able to see good camera preview in webrtc if I disable additional processing in recording callback postdataTimestamp() to avoid memcpy delay. But I still don't see webrtc camera preview is as good as existing camera app preview. Is this because of some other issue in webrtc? I am not sure whether 'disabling recording callback (comment #24)' will force Camera preview in webrtc to local video element or not.

Flags: needinfo?(ekr)

StevenLee[:slee]

Reporter

Comment 29

•

12 years ago

(In reply to Tapas Kumar Kundu from comment #28) > (In reply to Eric Rescorla (:ekr) from comment #27) > > > Can you clarify the test you are doing here? Is this just getUserMedia mapped > > to a local video element? > I think Tapas is using http://mozilla.github.io/webrtc-landing/gum_test.html to test. It just calls getUserMedia and assign the MediaStream to a local video element. > I followed all steps from comment #17 and trying to see webrtc camera > preview is as good as existing camera app preview. I disabled recording > callback (comment #24) to stop additional processing and release recording > frame immediately. I think you can try to disable recording callback by commenting out mNativeCameraControl->mCameraHw->SetListener and mNativeCameraControl->mCameraHw->StartRecording. > From comment #11, it seems to me that I should be able to see good camera > preview in webrtc if I disable additional processing in recording callback > postdataTimestamp() to avoid memcpy delay. But I still don't see webrtc > camera preview is as good as existing camera app preview. Is this because of > some other issue in webrtc? The slow memcpy problem is resulted from that we are using preview callback to preview and encode. When encoding, we need to copy the video frame from GraphicBuffer. Since it's a non-cached memory so that the memcpy is slow. Then we try to use another path, recording callback. We think it should be a correct way to do that(that's the patch in comment 11). But we found that the memcpy is slow too. So that it seems no efficient way to copy captured video frames to system memory. Here are the reasons why camera app does not have such problem. * preview - in comment 26, as Sotaro said that camera app has another way to display. * recording - it should be that camera app uses OMX codec that may no need to copy the video frames to system memory.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 30

•

11 years ago

Hi, (In reply to StevenLee from comment #29) > > I followed all steps from comment #17 and trying to see webrtc camera > > preview is as good as existing camera app preview. I disabled recording > > callback (comment #24) to stop additional processing and release recording > > frame immediately. > I think you can try to disable recording callback by commenting out > mNativeCameraControl->mCameraHw->SetListener and > mNativeCameraControl->mCameraHw->StartRecording. > I still don't see stutter in preview video in gum test page. I also enabled caching for camera buffer during this test. My suggestion will be : 1) Use Recording callack for MediaStreamGraph processing (or any other processing needed for webrtc implementation) which can delay (or show stutter) in camera preview. Preview should be good always. There should not be any delay in preview video. 2) I am enabling cache for camera buffer. So memcpy won't be an issue anymore. I will upload a patch soon for this. This should solve problems mentioned in #comment 11 .

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 31

•

11 years ago

Hi I made a type mistake in above comment. >>I still don't see stutter in preview video in gum test page. I also enabled caching for camera buffer during this test. This line should be : I still see *BIG STUTTER* in preview video in gum test page. I also enabled caching for camera buffer during this test.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

11 years ago

Flags: needinfo?(ekr)

Randell Jesup [:jesup] (needinfo me)

Comment 32

•

11 years ago

Tapas: Often block-copying a large framebuffer is best implemented by avoiding the cache (since it often (esp on mobile) blow the entire cache multiple times). The only caveat would be if the read implementation caused memcpy() to generate extra HW memory read cycles for uncached memory; in which case it still could be better even if the cache gets blown. Also, often there's a platform-specific way to copy HW memory buffers around efficiently (DMA engine, 3D engine/blitter, etc), but maybe there's no such thing available here. I never saw an answer to the "why is it uncached" comment I made, though I can guess perhaps.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 33

•

11 years ago

(In reply to Randell Jesup [:jesup] from comment #32) > Tapas: Often block-copying a large framebuffer is best implemented by > avoiding the cache (since it often (esp on mobile) blow the entire cache > multiple times). The only caveat would be if the read implementation caused > memcpy() to generate extra HW memory read cycles for uncached memory; in > which case it still could be better even if the cache gets blown. Also, > often there's a platform-specific way to copy HW memory buffers around > efficiently (DMA engine, 3D engine/blitter, etc), but maybe there's no such > thing available here. I never saw an answer to the "why is it uncached" > comment I made, though I can guess perhaps. I understand your concerns. I think that camera driver is providing buffer on cached memory so memcpy is not required.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Eric Rescorla (:ekr)

Comment 34

•

11 years ago

Tapas, Maybe I am misunderstanding you, but we actually do need to copy the data from this buffer to enqueue it for the encoder (and of course this also is going to involve reading the entire buffer) so we can color convert, encode, etc.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 35

•

11 years ago

(In reply to Eric Rescorla (:ekr) from comment #34) > Tapas, > > Maybe I am misunderstanding you, but we actually do need > to copy the data from this buffer to enqueue it for the > encoder (and of course this also is going to involve reading > the entire buffer) so we can color convert, encode, etc. This should be fine. Camera buffer will be cached with my patch and you should not see any overhead for memcpy or memory read. I will upload a fix for it. At present, there is delay in processing of buffer from preview callback (#comment 26 and #comment 29). This is causing preview video to be bad in webrtc. But please use recording callback to do webrtc processing (#comment 30) . Preview should not be delayed by webrtc processing. Please let me know if you have any doubts.

Eric Rescorla (:ekr)

Comment 36

•

11 years ago

Steven, can you try this and see if it helps?

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 37

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #30) > 2) I am enabling cache for camera buffer. So memcpy won't be an issue > anymore. I will upload a patch soon for this. Can you give me the link to the patch then I can test when you're done? Ekr, Sure, I will test it when Tapas's patch is done.

Flags: needinfo?(slee)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 38

•

11 years ago

(In reply to StevenLee from comment #37) > (In reply to Tapas Kumar Kundu from comment #30) > > 2) I am enabling cache for camera buffer. So memcpy won't be an issue > > anymore. I will upload a patch soon for this. > Can you give me the link to the patch then I can test when you're done? > > Ekr, > Sure, I will test it when Tapas's patch is done. I have uploaded patch. Could you please try with latest repo ?

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 39

•

11 years ago

Or you can pick up change from here : https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/camera/commit/?h=b2g_ics_1.2&id=ba2759d04e2e427183a09c4e4c50dfe3ec3102f5

StevenLee[:slee]

Reporter

Comment 40

•

11 years ago

Attached file build error log — Details

Hi Tapas, Sorry for late reply. I tried and got building error. It seems resulted from that make cannot find camera.h and camera_defs_i.h. Where can I get these 2 files? Thanks.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 41

•

11 years ago

(In reply to StevenLee from comment #40) > Created attachment 815758 [details] > build error log > > Hi Tapas, > Sorry for late reply. I tried and got building error. It seems resulted from > that make cannot find camera.h and camera_defs_i.h. Where can I get these 2 > files? > Thanks. I already tested that it works fine in my build. Can you please try a clean build again .

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

11 years ago

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 42

•

11 years ago

Hi Tapas, I tried the clean build and it failed, too. I think it's because that on my build, we do not build hardware/qcom/camera. Our camera.msm7627a.so is from the vendor. So that we may not have all the source code needed for camera module.

Flags: needinfo?(slee)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 43

•

11 years ago

Can you please update me with latest status on this?

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 44

•

11 years ago

Hi Tapas, I am waiting for the vendor. I need them to build the library. But I did the more detailed memory copy performance on unagi and peak. There are the variables 1. copy by libyuv a. width is 64 alignment (640x480) b. width is not 64 alignment (636x480) 2. copy by memcpy a. width is 64 alignment (640x480) b. width is not 64 alignment (636x480) 3. GraphicBuffer is cache or not Here are the results. It shows that for non-cached memory, if we can use libyuv's optimized memory copy, the speed is acceptable. * non-cache version ** peak 640x480 636x480 libyuv 2.7-2.8 16-17 memcpy 16-17 16-17 ** unagi 640x480 636x480 libyuv 3-4 21-22 memcpy 21-22 21-22 * cache version 640x480 636x480 libyuv 0.7-0.8 0.7-0.8 memcpy 1.3-1.5 0.7-0.8 ** unagi 640x480 636x480 libyuv 2.8-3 1.2-1.3 memcpy 1.6-1.8 1.2-1.3

Flags: needinfo?(slee)

Randell Jesup [:jesup] (needinfo me)

Comment 45

•

11 years ago

slee: thanks for the tests! So: aligned memory buffers work much better than unaligned with cache off (no surprise) in libyuv. Slightly surprising that memcpy doesn't hit the optimized path for non-cached, but it's not designed for non-cached really. Cached is faster all around, though this doesn't capture the impact on other operations caused by totally blowing the cache away during the copy. 2ms difference (peak) is significant, though - if that's all a "pure win". If we lose elsewhere by caching, then the uncached libyuv aligned copy may be best. The data in the buffer being copied (i.e. camera data): if we turn caching on, *if* the data in that buffer appears via DMA or equivalent, I assume that the driver flushes the cache (or those cache lines)? slee: did your test copy the frames once, or multiple times? I.e. a case where there's a 'hot' cache would not be a good test, probably. Was this a test copying data from the camera, when the camera said it was ready? If so, that makes it a more real-world test, which is good. Also, are those numbers in ms?

Flags: needinfo?(slee)

StevenLee[:slee]

Reporter

Comment 46

•

11 years ago

Hi jesup, I test copy the frame once, just estimate the time consumption in function, [1]. And I force the format to I420, so that there is no color space converting. So the calling path is the same as real world. And all the numbers are in ms. I will test the new library provided by the vendor today. And I will update the new data when it's done. [1]http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_capture/video_capture_impl.cc#286

Flags: needinfo?(slee)

Randell Jesup [:jesup] (needinfo me)

Comment 47

•

10 years ago

We should do some new measurements on current hardware and OS versions

backlog: --- → webRTC+

Rank: 45

Priority: -- → P4

Bulk Bug Changes for mreavy's org

Comment 48

•

7 years ago

Mass change P4->P5 to align with new Mozilla triage process.

Priority: P4 → P5

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

profileMemcpy.patch 12 years ago StevenLee[:slee] 4.54 KB, patch		Details \| Diff \| Splinter Review
EnableRecordingCallback.patch 12 years ago StevenLee[:slee] 21.79 KB, patch		Details \| Diff \| Splinter Review
SkipCameraPermissionCheck.patch 12 years ago StevenLee[:slee] 1.84 KB, patch		Details \| Diff \| Splinter Review
build error log 11 years ago StevenLee[:slee] 20.04 KB, text/plain		Details