1001069 - Slow getImageData() and putImageData() performance as canvas size increases.

Reporter

Description

•

12 years ago

+++ This bug was initially created as a clone of Bug #952909 +++ Splitting out this bug so we separate the crasher (bug 952909) from the performance issue found in: http://jsperf.com/cwahlers-getimagedata/2 In the benchmark, the elapsed time for getting/putting a fixed 100x100 bitmap increases as the canvas size increases. We expect that the cost of reading/writing fixed source/destination regions would be constant regardless of the canvas size.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 1

•

12 years ago

If not earlier, graphics can take a look at this the week of May 5h

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

12 years ago

A few comments, based on OS X (10.9.2) runs with willReadFrequently on/off, canvas set to cg or skia, canvas acceleration on or off: * In these runs, willReadFrequently value doesn't make a difference * For CG, acceleration on or off doesn't make a difference * For Skia, acceleration off gives a significant speedup (2-100x) and a smaller drop-off from 512 to 1k to 2k.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 3

•

12 years ago

:snorp just reminded me - willChangeFrequently is only considered if gfx.canvas.willChangeFrequently.enable is set to true, so that first statement above is because of that. There is a difference in the "write" part of the read-write in the test, but much more of the drop in performance is coming from the read part.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

12 years ago

Yeah, fair enough, we do a full canvas copy before we cut out the piece we want. Should be easy enough to patch.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 5

•

12 years ago

On CG backend, the switch to copying the full canvas copy came with bug 925448. Before we go deep, how important is this? Outside of a test, do we have normal scenarios where we would be asking for pieces of canvas like this?

Blocks: 925448

Claus Wahlers [:cwahlers]

Comment 6

•

12 years ago

:milan one quite popular use case for this is grabbing frames from a texture atlas

Matt Woodrow (:mattwoodrow)

Comment 7

•

12 years ago

We make a full copy of the surface because once we've given out the pointer to the underlying data we can't take it back. We could fix this by going back to lazily copying the data when we try modify the DrawTarget, but instead of making a copy for the snapshot, make a copy for the DrawTarget and leave the snapshot as-is. That should give us correctness, but without requiring any copies for the common case.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 8

•

12 years ago

Bas' quick suggestion is to extend GetDataSurface(), or introduce GetDataSurface-like Moz2D API that also specifies the region we want.

Chiajung Hung [:chiajung]

Assignee

Comment 9

•

12 years ago

(In reply to Milan Sreckovic [:milan] from comment #8) > Bas' quick suggestion is to extend GetDataSurface(), or introduce > GetDataSurface-like Moz2D API that also specifies the region we want. For skia case, I think it would rather easy by call SkBitmap::extractSubset, which seems render the image into another SkBitmap. And if we use skiaGL, this extraction-copy is done by GPU. I will try and upload the patch later.

Chiajung Hung [:chiajung]

Assignee

Updated

•

12 years ago

Assignee: nobody → chung

Chiajung Hung [:chiajung]

Assignee

Comment 10

•

12 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

With this simple test patch the test result is go from: ctx512 139 ±8.27%fastest ctx1024 28.25 ±57.77%86% slower ctx2048 15.68 ±24.00%90% slower ctx4096 2.10 ±4.70%98% slower to ctx512 210 ±4.15%fastest ctx1024 177 ±5.07%17% slower ctx2048 130 ±59.50%fastest ctx4096 4.40 ±14.52%98% slower (about 2x faster) The result is still unreasonable for me. I will try further dig into the problem.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 11

•

12 years ago

(In reply to Chiajung Hung [:chiajung] from comment #10) ... > > The result is still unreasonable for me. I will try further dig into the > problem. Sure, but this is not a high priority task, so plan your time accordingly.

Chris Peterson [:cpeterson]

Updated

•

12 years ago

Blocks: shumway-m4

Till Schneidereit [:till]

Comment 12

•

11 years ago

Is this still critical for shumway performance, or did we change our Canvas usage to not depend on this being fast?

Flags: needinfo?(mbebenita)

Michael Bebenita [:mbx]

Comment 13

•

11 years ago

This is still very important. We have to use putImageData whenever we update BitmapData, which is quite frequently in games like Canabalt.

Flags: needinfo?(mbebenita)

Till Schneidereit [:till]

Comment 14

•

11 years ago

Ok, in that case: chiajung, did you make any progress on this? It'll become a bottleneck for Shumway performance at some point.

Flags: needinfo?(chung)

Chiajung Hung [:chiajung]

Assignee

Comment 15

•

11 years ago

I have some more idea about this case to test. Do you have any good sample to test this API? The test case in comment 1 is in fact benchmark like...it does not really draw anything but calls get/put rapidly, which is quite easy to fool around. The basic idea is to copy-on-write while get is invoked, moreover we can avoid the expensive glReadPixels on B2G by manipulate the GrallocBuffer directly. I will do some test to check whether it is possible. For the put function, I am not sure whether we can do the alpha multiplication while write to each element. And this will need some test to check whether it is better or worse. If this blocks you, I think we should take this into our priority list. @peter How do you think?

Flags: needinfo?(chung) → needinfo?(pchang)

Till Schneidereit [:till]

Comment 16

•

11 years ago

(In reply to Chiajung Hung [:chiajung] from comment #15) > I have some more idea about this case to test. > > Do you have any good sample to test this API? mbx, do we have any real-world test for this, or would the test right now be "run something in Shumway"? If the latter, I wonder if we could somehow extract the gfx part from that to make it easier to test.

Flags: needinfo?(mbebenita)

Michael Bebenita [:mbx]

Comment 17

•

11 years ago

Here is one real-world test case: http://www.areweflashyet.com/adamatomic/canabalt2.html A quick profile shows that the game spends about 30-40% of the time unpremultipling and uploading bitmap data using putImageData.

Flags: needinfo?(mbebenita)

Jet Villegas (inactive)

Reporter

Comment 18

•

11 years ago

Dupe of bug 1064105?

Chiajung Hung [:chiajung]

Assignee

Updated

•

11 years ago

Depends on: 880114
No longer depends on: 952909

Comment hidden (obsolete)

Chiajung Hung [:chiajung]

Assignee

Comment 20

•

11 years ago

After discuss with :tingyuan, we think copy-on-write on TypedArray may not very easy and maybe introduce some other overhead, so I will try to handle the alpha manipulation/division in GPU directly first and see whether it helps. Because I will use some basic constructions in bug 880114, I set the dependency.

Chiajung Hung [:chiajung]

Assignee

Comment 21

•

11 years ago

Attached patch WIP V2 (obsolete) — Details — Splinter Review

SkiaGL supports handle alpha in GPU, this patch use it to accelerate GetImageData/PutImageData API. However, the performance is still not comparable with Google Chrome on local Linux try of the benchmark in comment 1. By the way, to test the patch, we will have to switch to SkiaGL. To enhance other backend(software) of Canvas, we should take (x, y, w, h) into consideration while Snapshot of each DrawTarget or GetDataSourceSurface of each SourceSurface.

Attachment #8419839 - Attachment is obsolete: true

Chiajung Hung [:chiajung]

Assignee

Comment 22

•

11 years ago

Attached patch WIP v3 (obsolete) — Details — Splinter Review

Avoid 1 memcpy for cairo backend shows some improvement. The SkiaGL part is almost equivalent to Chrome implementation, but Get+Put benchmark shows Firefox still about 2 times slow than Chrome. Something outside these APIs should should be checked.

Attachment #8489283 - Attachment is obsolete: true

Till Schneidereit [:till]

Updated

•

11 years ago

Blocks: 927828

Chris Peterson [:cpeterson]

Comment 23

•

11 years ago

getImageData() and putImageData() perf does not scale linearly. Gecko always copies out the whole canvas and then extracts the subrect. Shumway wants to use these APIs for a scratch canvas (like bug 952539).

Chiajung Hung [:chiajung]

Assignee

Comment 24

•

11 years ago

So (In reply to Chris Peterson (:cpeterson) from comment #23) getImageData() and putImageData() perf does not scale linearly. Gecko always > copies out the whole canvas and then extracts the subrect. Shumway wants to > use these APIs for a scratch canvas (like bug 952539). So, your real need is to extend WebGL API to support something like mozTexImage2D(GLenum target, ImageElement or CanvasElement or VideoElement elt, int x, int y, int w, int h) Which is already available: http://dxr.mozilla.org/mozilla-central/source/dom/webidl/WebGLRenderingContext.webidl?from=WebGLRenderingContext.webidl&case=true#671 To optimize your case is slightly different from the getImageData()/putImageData() case. For your case, I think we could carry the texture infomation(a RefPtr to SourceSurface) in dom::ImageData in Gecko, and only does those read back if ImageData::GetData called. This should rather cheap for skiaGL backend case(If they uses shared context). For others, a texture upload is inevitable. I will try to implement the idea, and if I get any progress, I will upload my patch here.(it should be a distinct bug :) ) BTW, if there is any existing test case, please let me know, it can save me a lot of time to recreate one :p

Till Schneidereit [:till]

Comment 25

•

11 years ago

(In reply to Chiajung Hung [:chiajung] from comment #24) > So (In reply to Chris Peterson (:cpeterson) from comment #23) > So, your real need is to extend WebGL API to support something like > mozTexImage2D(GLenum target, ImageElement or CanvasElement or VideoElement > elt, int x, int y, int w, int h) Yes, that's part of it. However, we also have a canvas2d-based compositing backend, and except to have that as a requirement going forward. So we'd need an equivalent for canvas2d, too. > > Which is already available: > http://dxr.mozilla.org/mozilla-central/source/dom/webidl/ > WebGLRenderingContext.webidl?from=WebGLRenderingContext.webidl&case=true#671 > > To optimize your case is slightly different from the > getImageData()/putImageData() case. For your case, I think we could carry > the texture infomation(a RefPtr to SourceSurface) in dom::ImageData in > Gecko, and only does those read back if ImageData::GetData called. This > should rather cheap for skiaGL backend case(If they uses shared context). That sounds perfect. > > BTW, if there is any existing test case, please let me know, it can save me > a lot of time to recreate one :p mbx, do we have such a test case?

Flags: needinfo?(mbebenita)

Chiajung Hung [:chiajung]

Assignee

Comment 26

•

11 years ago

Some note about the idea of lazy pixel data resolution: ImageDataBinding always call ImageData::GetData as soon as CanvasRenderingContext2D::GetImageData returned, so we can not really delay the pixel data resolution. In fact, delayed data resolution is somewhat tedious, since we have to resolve the pixels while drawTarget dirty, too. If JS side do something like: var raw = ctx.getImageData() <-- In fact, resolved here from ImageData::WrapObject call ctx.drawImage(img) <-- must resolve here var rawArr = raw.data() <-- ideally resolve here for(var i = 0; i < rawArr.len; i++) //do something access rawArr[i]; As a result, I will try to carry 2 data in ImageData: (1) Resolved raw array(original) (2) texture with context. BTW, If the GL context of WebGLContext is not shared with the skiaGL context, almost nothing can be optimized this way. Maybe some generic alpha manipulation/Y-flip/crop/pixel format handling can be done in shader, but we should not get huge improvement this way. Another possibility is something like EGLImage, which is not supported on most platform except Android/B2G, I think :)

Peter Chang[:pchang]

Updated

•

11 years ago

Flags: needinfo?(pchang)

Chiajung Hung [:chiajung]

Assignee

Comment 27

•

11 years ago

Attached patch WIP V4 — Details — Splinter Review

Rebased and cleaned up. @Bas I think we should extend the DrawTarget::Snapshot API to have crop and alpha operations as parameter to hide the backend detail of this patch in its correspond DrawTarget. (This part can be implemented as DrawTarget::LockBits in a hacky way, though) And we need a DrawTarget::WritePixels, too. If you think it is OK, I will extend it in next version.

Attachment #8491367 - Attachment is obsolete: true

Flags: needinfo?(bas)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 28

•

11 years ago

Comment on attachment 8512405 [details] [diff] [review] WIP V4 Review of attachment 8512405 [details] [diff] [review]: ----------------------------------------------------------------- I'm not convinced that a solution that does not improve Windows performance is worth the complications in the code. The underlying reason for this bug is Shumway and Windows, if we're not covering that scenario, I'm not sure this is worth the effort. Am I reading the code wrong, or is this only for Skia-based canvas, which would exclude Windows?

Chiajung Hung [:chiajung]

Assignee

Comment 29

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #28) > Comment on attachment 8512405 [details] [diff] [review] > WIP V4 > > Review of attachment 8512405 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm not convinced that a solution that does not improve Windows performance > is worth the complications in the code. The underlying reason for this bug > is Shumway and Windows, if we're not covering that scenario, I'm not sure > this is worth the effort. Am I reading the code wrong, or is this only for > Skia-based canvas, which would exclude Windows? This is for Skia based canvas now, for other software backend we will need NEON/SSE implementation of alpha handling routines. (BTW, why we exclude Windows on SkiaGL canvas?) Though the bug is for Shumway on windows, there is some related bug on B2G. @Tim I heard that gaia on b2g uses getImageData API and filed some bug for its performance, can you link them here?

Flags: needinfo?(timdream)

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 30

•

11 years ago

I am not aware any of that (besides the use cases that make us use willReadFrequently) in B2G. (This bug was cloned from bug 952909 and that bug was about an off-work project of mine and unrelated to B2G) Maybe we should ask :djf and the media team?

Flags: needinfo?(timdream) → needinfo?(dflanagan)

Chiajung Hung [:chiajung]

Assignee

Comment 31

•

11 years ago

BTW, if gaia uses full canvas read/write than new API is not needed.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 32

•

11 years ago

(In reply to Chiajung Hung [:chiajung] from comment #29) > ... > (BTW, why we exclude Windows on SkiaGL canvas?) We do not use Skia/SkiaGL on (accelerated) Windows, nor have plans to use Skia/SkiaGL on (accelerated) Windows. We do have plans to move to Skia on all other platforms (I'm ignoring the Windows configurations where we do software rendering/compositing only, where Skia does come into the picture.) I don't mean to minimize the importance of doing the Skia solution, just to emphasize that the Windows solution is more important given that what's currently driving this is Shumway.

Bas Schouten (:bas.schouten)

Comment 33

•

11 years ago

(In reply to Chiajung Hung [:chiajung] from comment #27) > Created attachment 8512405 [details] [diff] [review] > WIP V4 > > Rebased and cleaned up. > > @Bas > I think we should extend the DrawTarget::Snapshot API to have crop and alpha > operations as parameter to hide the backend detail of this patch in its > correspond DrawTarget. (This part can be implemented as DrawTarget::LockBits > in a hacky way, though) > > And we need a DrawTarget::WritePixels, too. > > If you think it is OK, I will extend it in next version. LockBits is the way to do any of these things. When lock bits fails it means we're using a hardware accelerated scenario, and it's pretty much game over since we're using readback. ReadPixels will be slow whatever way we play it. I am also of the opinion that beyond using LockBits where we can, there's little value in changing or extending the Moz2D APIs.

Flags: needinfo?(bas)

David Flanagan [:djf]

Comment 34

•

11 years ago

(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #30) > I am not aware any of that (besides the use cases that make us use > willReadFrequently) in B2G. > > (This bug was cloned from bug 952909 and that bug was about an off-work > project of mine and unrelated to B2G) > > Maybe we should ask :djf and the media team? I don't think I have any pending bugs filed. The FirefoxOS Gallery app does use getImageData() to extract rectangular tiles from a large image for processing, so we are affected by this performance issue. If FirefoxOS the main constraint is memory, so anything you can do to make image processing less memory intensive would make me very happy. Almost always, I'm working with off-screen canvases for resizing or processing images and often do not want the canvas to use the GPU. If you're thinking about adding new APIs, I'd love it if there was a way to encode pixel data directly to a jpeg without having to copy it back to a Canvas element first.

Flags: needinfo?(dflanagan)

Chiajung Hung [:chiajung]

Assignee

Comment 35

•

11 years ago

OK, if no one really cares this, than just pending until I have time to investigate SSE/NEON implementation for it. For direct encoding I think there is a mailing list discuss about it, but I don't know the conclusion.

Markus Stange [:mstange]

Comment 36

•

11 years ago

There is an SSE implementation for premultiplication / unpremultiplication here: http://dxr.mozilla.org/mozilla-central/source/gfx/2d/FilterProcessing.h#55 which you could expose through gfx/2d/Tools.h. Bug 961759 will add a NEON implementation for it.

Chris Peterson [:cpeterson]

Updated

•

11 years ago

Blocks: shumway-m5

Chris Peterson [:cpeterson]

Updated

•

11 years ago

No longer blocks: shumway-m4

Till Schneidereit [:till]

Updated

•

10 years ago

Flags: needinfo?(mbebenita)

Ethan Lin[:ethlin]

Comment 37

•

9 years ago

Bug 1293970, Bug 1293968 and Bug 1296166 should improve the performance of getImageData() and putImageData() in certain case with PersistentBufferProviderShared.

Depends on: 1293970, 1293968, 1296166

Milan Sreckovic [:milan] (needinfo for best results)

Comment 38

•

9 years ago

I'm seeing a big difference between 51 and 52, with 52 being much better. Not seeing a lot of difference between 49, 50 and 51. 52 being better is probably due to bug 1309913, where we currently don't use SkiaGL at all, so I would expect getImageData() to be faster. Once bug 1309913 is fixed, I imagine we're going to see the results drop back down.

Rares Doghi, Desktop QA

Comment 39

•

5 years ago

Hi Milan, does this issue still occur in our latest builds ? is it possible that it was fixed in Fx52 ?

Flags: needinfo?(milaninbugzilla)

Markus Stange [:mstange]

Comment 40

•

5 years ago

The original testcase seems offline at the moment, but there's a snapshot here: https://web.archive.org/web/20140415030917/https://jsperf.com/cwahlers-getimagedata/2

It looks like the problem is that getImageData is called before anything was drawn to the canvas. At larger canvas sizes, this means that the backing buffer is re-created and then discarded for every call to getImageData. That explains why the canvas size changes the performance.

I also looked at the code and it looks like we correctly only unpremultiply / premultiply the sub-area of the canvas that is specified by the get/putImageData arguments.

I'll resolve this as WORKSFORME. The original Shumway use case (with a scratch buffer) is no longer present.

Status: NEW → RESOLVED

Closed: 5 years ago

Flags: needinfo?(milaninbugzilla)

Resolution: --- → WORKSFORME

WIP 12 years ago Chiajung Hung [:chiajung] 3.72 KB, patch		Details \| Diff \| Splinter Review
WIP V2 11 years ago Chiajung Hung [:chiajung] 5.73 KB, patch		Details \| Diff \| Splinter Review
WIP v3 11 years ago Chiajung Hung [:chiajung] 7.66 KB, patch		Details \| Diff \| Splinter Review
WIP V4 11 years ago Chiajung Hung [:chiajung] 3.37 KB, patch		Details \| Diff \| Splinter Review