Closed Bug 1001069 Opened 11 years ago Closed 3 years ago

Slow getImageData() and putImageData() performance as canvas size increases.

Categories

(Core :: Graphics: Canvas2D, defect)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bugs, Assigned: chiajung)

References

Details

(Keywords: perf, Whiteboard: [shumway:p1])

Attachments

(1 file, 3 obsolete files)

+++ This bug was initially created as a clone of Bug #952909 +++ Splitting out this bug so we separate the crasher (bug 952909) from the performance issue found in: http://jsperf.com/cwahlers-getimagedata/2 In the benchmark, the elapsed time for getting/putting a fixed 100x100 bitmap increases as the canvas size increases. We expect that the cost of reading/writing fixed source/destination regions would be constant regardless of the canvas size.
If not earlier, graphics can take a look at this the week of May 5h
A few comments, based on OS X (10.9.2) runs with willReadFrequently on/off, canvas set to cg or skia, canvas acceleration on or off: * In these runs, willReadFrequently value doesn't make a difference * For CG, acceleration on or off doesn't make a difference * For Skia, acceleration off gives a significant speedup (2-100x) and a smaller drop-off from 512 to 1k to 2k.
:snorp just reminded me - willChangeFrequently is only considered if gfx.canvas.willChangeFrequently.enable is set to true, so that first statement above is because of that. There is a difference in the "write" part of the read-write in the test, but much more of the drop in performance is coming from the read part.
Yeah, fair enough, we do a full canvas copy before we cut out the piece we want. Should be easy enough to patch.
On CG backend, the switch to copying the full canvas copy came with bug 925448. Before we go deep, how important is this? Outside of a test, do we have normal scenarios where we would be asking for pieces of canvas like this?
Blocks: 925448
:milan one quite popular use case for this is grabbing frames from a texture atlas
We make a full copy of the surface because once we've given out the pointer to the underlying data we can't take it back. We could fix this by going back to lazily copying the data when we try modify the DrawTarget, but instead of making a copy for the snapshot, make a copy for the DrawTarget and leave the snapshot as-is. That should give us correctness, but without requiring any copies for the common case.
Bas' quick suggestion is to extend GetDataSurface(), or introduce GetDataSurface-like Moz2D API that also specifies the region we want.
(In reply to Milan Sreckovic [:milan] from comment #8) > Bas' quick suggestion is to extend GetDataSurface(), or introduce > GetDataSurface-like Moz2D API that also specifies the region we want. For skia case, I think it would rather easy by call SkBitmap::extractSubset, which seems render the image into another SkBitmap. And if we use skiaGL, this extraction-copy is done by GPU. I will try and upload the patch later.
Assignee: nobody → chung
Attached patch WIP (obsolete) — Splinter Review
With this simple test patch the test result is go from: ctx512 139 ±8.27%fastest ctx1024 28.25 ±57.77%86% slower ctx2048 15.68 ±24.00%90% slower ctx4096 2.10 ±4.70%98% slower to ctx512 210 ±4.15%fastest ctx1024 177 ±5.07%17% slower ctx2048 130 ±59.50%fastest ctx4096 4.40 ±14.52%98% slower (about 2x faster) The result is still unreasonable for me. I will try further dig into the problem.
(In reply to Chiajung Hung [:chiajung] from comment #10) ... > > The result is still unreasonable for me. I will try further dig into the > problem. Sure, but this is not a high priority task, so plan your time accordingly.
Is this still critical for shumway performance, or did we change our Canvas usage to not depend on this being fast?
Flags: needinfo?(mbebenita)
This is still very important. We have to use putImageData whenever we update BitmapData, which is quite frequently in games like Canabalt.
Flags: needinfo?(mbebenita)
Ok, in that case: chiajung, did you make any progress on this? It'll become a bottleneck for Shumway performance at some point.
Flags: needinfo?(chung)
I have some more idea about this case to test. Do you have any good sample to test this API? The test case in comment 1 is in fact benchmark like...it does not really draw anything but calls get/put rapidly, which is quite easy to fool around. The basic idea is to copy-on-write while get is invoked, moreover we can avoid the expensive glReadPixels on B2G by manipulate the GrallocBuffer directly. I will do some test to check whether it is possible. For the put function, I am not sure whether we can do the alpha multiplication while write to each element. And this will need some test to check whether it is better or worse. If this blocks you, I think we should take this into our priority list. @peter How do you think?
Flags: needinfo?(chung) → needinfo?(pchang)
(In reply to Chiajung Hung [:chiajung] from comment #15) > I have some more idea about this case to test. > > Do you have any good sample to test this API? mbx, do we have any real-world test for this, or would the test right now be "run something in Shumway"? If the latter, I wonder if we could somehow extract the gfx part from that to make it easier to test.
Flags: needinfo?(mbebenita)
Here is one real-world test case: http://www.areweflashyet.com/adamatomic/canabalt2.html A quick profile shows that the game spends about 30-40% of the time unpremultipling and uploading bitmap data using putImageData.
Flags: needinfo?(mbebenita)
Dupe of bug 1064105?
Depends on: 880114
No longer depends on: 952909
After discuss with :tingyuan, we think copy-on-write on TypedArray may not very easy and maybe introduce some other overhead, so I will try to handle the alpha manipulation/division in GPU directly first and see whether it helps. Because I will use some basic constructions in bug 880114, I set the dependency.
Attached patch WIP V2 (obsolete) — Splinter Review
SkiaGL supports handle alpha in GPU, this patch use it to accelerate GetImageData/PutImageData API. However, the performance is still not comparable with Google Chrome on local Linux try of the benchmark in comment 1. By the way, to test the patch, we will have to switch to SkiaGL. To enhance other backend(software) of Canvas, we should take (x, y, w, h) into consideration while Snapshot of each DrawTarget or GetDataSourceSurface of each SourceSurface.
Attachment #8419839 - Attachment is obsolete: true
Attached patch WIP v3 (obsolete) — Splinter Review
Avoid 1 memcpy for cairo backend shows some improvement. The SkiaGL part is almost equivalent to Chrome implementation, but Get+Put benchmark shows Firefox still about 2 times slow than Chrome. Something outside these APIs should should be checked.
Attachment #8489283 - Attachment is obsolete: true
Blocks: 927828
getImageData() and putImageData() perf does not scale linearly. Gecko always copies out the whole canvas and then extracts the subrect. Shumway wants to use these APIs for a scratch canvas (like bug 952539).
So (In reply to Chris Peterson (:cpeterson) from comment #23) getImageData() and putImageData() perf does not scale linearly. Gecko always > copies out the whole canvas and then extracts the subrect. Shumway wants to > use these APIs for a scratch canvas (like bug 952539). So, your real need is to extend WebGL API to support something like mozTexImage2D(GLenum target, ImageElement or CanvasElement or VideoElement elt, int x, int y, int w, int h) Which is already available: http://dxr.mozilla.org/mozilla-central/source/dom/webidl/WebGLRenderingContext.webidl?from=WebGLRenderingContext.webidl&case=true#671 To optimize your case is slightly different from the getImageData()/putImageData() case. For your case, I think we could carry the texture infomation(a RefPtr to SourceSurface) in dom::ImageData in Gecko, and only does those read back if ImageData::GetData called. This should rather cheap for skiaGL backend case(If they uses shared context). For others, a texture upload is inevitable. I will try to implement the idea, and if I get any progress, I will upload my patch here.(it should be a distinct bug :) ) BTW, if there is any existing test case, please let me know, it can save me a lot of time to recreate one :p
(In reply to Chiajung Hung [:chiajung] from comment #24) > So (In reply to Chris Peterson (:cpeterson) from comment #23) > So, your real need is to extend WebGL API to support something like > mozTexImage2D(GLenum target, ImageElement or CanvasElement or VideoElement > elt, int x, int y, int w, int h) Yes, that's part of it. However, we also have a canvas2d-based compositing backend, and except to have that as a requirement going forward. So we'd need an equivalent for canvas2d, too. > > Which is already available: > http://dxr.mozilla.org/mozilla-central/source/dom/webidl/ > WebGLRenderingContext.webidl?from=WebGLRenderingContext.webidl&case=true#671 > > To optimize your case is slightly different from the > getImageData()/putImageData() case. For your case, I think we could carry > the texture infomation(a RefPtr to SourceSurface) in dom::ImageData in > Gecko, and only does those read back if ImageData::GetData called. This > should rather cheap for skiaGL backend case(If they uses shared context). That sounds perfect. > > BTW, if there is any existing test case, please let me know, it can save me > a lot of time to recreate one :p mbx, do we have such a test case?
Flags: needinfo?(mbebenita)
Some note about the idea of lazy pixel data resolution: ImageDataBinding always call ImageData::GetData as soon as CanvasRenderingContext2D::GetImageData returned, so we can not really delay the pixel data resolution. In fact, delayed data resolution is somewhat tedious, since we have to resolve the pixels while drawTarget dirty, too. If JS side do something like: var raw = ctx.getImageData() <-- In fact, resolved here from ImageData::WrapObject call ctx.drawImage(img) <-- must resolve here var rawArr = raw.data() <-- ideally resolve here for(var i = 0; i < rawArr.len; i++) //do something access rawArr[i]; As a result, I will try to carry 2 data in ImageData: (1) Resolved raw array(original) (2) texture with context. BTW, If the GL context of WebGLContext is not shared with the skiaGL context, almost nothing can be optimized this way. Maybe some generic alpha manipulation/Y-flip/crop/pixel format handling can be done in shader, but we should not get huge improvement this way. Another possibility is something like EGLImage, which is not supported on most platform except Android/B2G, I think :)
Flags: needinfo?(pchang)
Attached patch WIP V4Splinter Review
Rebased and cleaned up. @Bas I think we should extend the DrawTarget::Snapshot API to have crop and alpha operations as parameter to hide the backend detail of this patch in its correspond DrawTarget. (This part can be implemented as DrawTarget::LockBits in a hacky way, though) And we need a DrawTarget::WritePixels, too. If you think it is OK, I will extend it in next version.
Attachment #8491367 - Attachment is obsolete: true
Flags: needinfo?(bas)
Comment on attachment 8512405 [details] [diff] [review] WIP V4 Review of attachment 8512405 [details] [diff] [review]: ----------------------------------------------------------------- I'm not convinced that a solution that does not improve Windows performance is worth the complications in the code. The underlying reason for this bug is Shumway and Windows, if we're not covering that scenario, I'm not sure this is worth the effort. Am I reading the code wrong, or is this only for Skia-based canvas, which would exclude Windows?
(In reply to Milan Sreckovic [:milan] from comment #28) > Comment on attachment 8512405 [details] [diff] [review] > WIP V4 > > Review of attachment 8512405 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm not convinced that a solution that does not improve Windows performance > is worth the complications in the code. The underlying reason for this bug > is Shumway and Windows, if we're not covering that scenario, I'm not sure > this is worth the effort. Am I reading the code wrong, or is this only for > Skia-based canvas, which would exclude Windows? This is for Skia based canvas now, for other software backend we will need NEON/SSE implementation of alpha handling routines. (BTW, why we exclude Windows on SkiaGL canvas?) Though the bug is for Shumway on windows, there is some related bug on B2G. @Tim I heard that gaia on b2g uses getImageData API and filed some bug for its performance, can you link them here?
Flags: needinfo?(timdream)
I am not aware any of that (besides the use cases that make us use willReadFrequently) in B2G. (This bug was cloned from bug 952909 and that bug was about an off-work project of mine and unrelated to B2G) Maybe we should ask :djf and the media team?
Flags: needinfo?(timdream) → needinfo?(dflanagan)
BTW, if gaia uses full canvas read/write than new API is not needed.
(In reply to Chiajung Hung [:chiajung] from comment #29) > ... > (BTW, why we exclude Windows on SkiaGL canvas?) We do not use Skia/SkiaGL on (accelerated) Windows, nor have plans to use Skia/SkiaGL on (accelerated) Windows. We do have plans to move to Skia on all other platforms (I'm ignoring the Windows configurations where we do software rendering/compositing only, where Skia does come into the picture.) I don't mean to minimize the importance of doing the Skia solution, just to emphasize that the Windows solution is more important given that what's currently driving this is Shumway.
(In reply to Chiajung Hung [:chiajung] from comment #27) > Created attachment 8512405 [details] [diff] [review] > WIP V4 > > Rebased and cleaned up. > > @Bas > I think we should extend the DrawTarget::Snapshot API to have crop and alpha > operations as parameter to hide the backend detail of this patch in its > correspond DrawTarget. (This part can be implemented as DrawTarget::LockBits > in a hacky way, though) > > And we need a DrawTarget::WritePixels, too. > > If you think it is OK, I will extend it in next version. LockBits is the way to do any of these things. When lock bits fails it means we're using a hardware accelerated scenario, and it's pretty much game over since we're using readback. ReadPixels will be slow whatever way we play it. I am also of the opinion that beyond using LockBits where we can, there's little value in changing or extending the Moz2D APIs.
Flags: needinfo?(bas)
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #30) > I am not aware any of that (besides the use cases that make us use > willReadFrequently) in B2G. > > (This bug was cloned from bug 952909 and that bug was about an off-work > project of mine and unrelated to B2G) > > Maybe we should ask :djf and the media team? I don't think I have any pending bugs filed. The FirefoxOS Gallery app does use getImageData() to extract rectangular tiles from a large image for processing, so we are affected by this performance issue. If FirefoxOS the main constraint is memory, so anything you can do to make image processing less memory intensive would make me very happy. Almost always, I'm working with off-screen canvases for resizing or processing images and often do not want the canvas to use the GPU. If you're thinking about adding new APIs, I'd love it if there was a way to encode pixel data directly to a jpeg without having to copy it back to a Canvas element first.
Flags: needinfo?(dflanagan)
OK, if no one really cares this, than just pending until I have time to investigate SSE/NEON implementation for it. For direct encoding I think there is a mailing list discuss about it, but I don't know the conclusion.
There is an SSE implementation for premultiplication / unpremultiplication here: http://dxr.mozilla.org/mozilla-central/source/gfx/2d/FilterProcessing.h#55 which you could expose through gfx/2d/Tools.h. Bug 961759 will add a NEON implementation for it.
No longer blocks: shumway-m4
Flags: needinfo?(mbebenita)
Bug 1293970, Bug 1293968 and Bug 1296166 should improve the performance of getImageData() and putImageData() in certain case with PersistentBufferProviderShared.
Depends on: 1293970, 1293968, 1296166
I'm seeing a big difference between 51 and 52, with 52 being much better. Not seeing a lot of difference between 49, 50 and 51. 52 being better is probably due to bug 1309913, where we currently don't use SkiaGL at all, so I would expect getImageData() to be faster. Once bug 1309913 is fixed, I imagine we're going to see the results drop back down.

Hi Milan, does this issue still occur in our latest builds ? is it possible that it was fixed in Fx52 ?

Flags: needinfo?(milaninbugzilla)

The original testcase seems offline at the moment, but there's a snapshot here: https://web.archive.org/web/20140415030917/https://jsperf.com/cwahlers-getimagedata/2

It looks like the problem is that getImageData is called before anything was drawn to the canvas. At larger canvas sizes, this means that the backing buffer is re-created and then discarded for every call to getImageData. That explains why the canvas size changes the performance.

I also looked at the code and it looks like we correctly only unpremultiply / premultiply the sub-area of the canvas that is specified by the get/putImageData arguments.

I'll resolve this as WORKSFORME. The original Shumway use case (with a scratch buffer) is no longer present.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(milaninbugzilla)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: