1083672 - drawImage where the source and destination canvas are the same is 100x slower than copying through a temporary canvas

Reporter

Description

•

10 years ago

Attached file copy.html — Details

Using drawImage to copy a region of a canvas to another region within the same canvas is extremely slow. We're talking 100X slower than copying the region to a temporary canvas and then copying it back. I have only tested on OSX.

Jet Villegas (inactive)

Comment 1

•

10 years ago

The A<->A copy is also slower in Chrome (though Chrome is faster than Firefox.) The A<->B<->A copy is considerably faster in both browsers. It looks like a typical memory/speed tradeoff.

Jet Villegas (inactive)

Comment 2

•

10 years ago

Markus: can you profile this one for us? It does seem like we're still several times slower than Chrome. Thanks!

Flags: needinfo?(mstange)

Chris Peterson [:cpeterson]

Updated

•

10 years ago

Blocks: shumway-1.0
No longer blocks: shumway-m3

Keywords: perf

Summary: drawImage where the source and destination are the same is painfully slow. → drawImage where the source and destination canvas are the same is 100x slower than copying through a temporary canvas

Chris Peterson [:cpeterson]

Updated

•

10 years ago

Blocks: 1083985

Markus Stange [:mstange]

Assignee

Comment 3

•

10 years ago

This happens because internally we copy out the whole canvas instead of just the subrect we need.

Assignee: nobody → mstange

Status: NEW → ASSIGNED

Flags: needinfo?(mstange)

Jet Villegas (inactive)

Comment 4

•

10 years ago

dupe of bug 1001069?

Markus Stange [:mstange]

Assignee

Comment 5

•

10 years ago

(In reply to Jet Villegas (:jet) from comment #4) > dupe of bug 1001069? Similar issue but needs to be fixed separately.

Markus Stange [:mstange]

Assignee

Comment 6

•

10 years ago

Attached patch v1 — Details — Splinter Review

Together with bug 1074842 this patch gets us most of the way there. The remaining overhead comes from allocating, zeroing-out and de-allocating the temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have that overhead because CanvasB stays alive across all drawImage the calls.

Attachment #8507949 - Flags: review?(bas)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 7

•

10 years ago

Comment on attachment 8507949 [details] [diff] [review] v1 Stealing the review from Bas to Jeff.

Attachment #8507949 - Flags: review?(bas) → review?(jmuizelaar)

Michael Bebenita [:mbx]

Reporter

Comment 8

•

10 years ago

(In reply to Markus Stange [:mstange] from comment #6) > Created attachment 8507949 [details] [diff] [review] > v1 > > Together with bug 1074842 this patch gets us most of the way there. The > remaining overhead comes from allocating, zeroing-out and de-allocating the > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have > that overhead because CanvasB stays alive across all drawImage the calls. Can we create and keep a temporary DrawTarget around for each canvas that needs this? We can resize it as needed, or free it if we're under memory pressure. Is the zeroing-out step necessary? We are reading back exactly from the same area we wrote to.

Markus Stange [:mstange]

Assignee

Comment 9

•

10 years ago

(In reply to Michael Bebenita [:mbx] from comment #8) > (In reply to Markus Stange [:mstange] from comment #6) > > Created attachment 8507949 [details] [diff] [review] > > v1 > > > > Together with bug 1074842 this patch gets us most of the way there. The > > remaining overhead comes from allocating, zeroing-out and de-allocating the > > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have > > that overhead because CanvasB stays alive across all drawImage the calls. > > Can we create and keep a temporary DrawTarget around for each canvas that > needs this? We can resize it as needed, or free it if we're under memory > pressure. Yes, we could do that. With my patches there's still a difference of around 23% (~25500 Op/s vs ~33000 Op/s) which should be enough to justify the added complexity. Though it would be better if we could demonstrate the perf difference on a real world application instead of a microbenchmark. Do you have one you can point me to? > Is the zeroing-out step necessary? We are reading back exactly from the same > area we wrote to. You're right, the zeroing-out step is not necessary. But at the moment Factory::CreateDrawTarget always creates an empty DrawTarget. We could add a fast path for creating uninitialized DrawTargets, but if we have to keep around the intermediate DrawTarget anyway we probably won't need it.

Michael Bebenita [:mbx]

Reporter

Comment 10

•

10 years ago

(In reply to Markus Stange [:mstange] from comment #9) > (In reply to Michael Bebenita [:mbx] from comment #8) > > (In reply to Markus Stange [:mstange] from comment #6) > > > Created attachment 8507949 [details] [diff] [review] > > > v1 > > > > > > Together with bug 1074842 this patch gets us most of the way there. The > > > remaining overhead comes from allocating, zeroing-out and de-allocating the > > > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have > > > that overhead because CanvasB stays alive across all drawImage the calls. > > > > Can we create and keep a temporary DrawTarget around for each canvas that > > needs this? We can resize it as needed, or free it if we're under memory > > pressure. > > Yes, we could do that. With my patches there's still a difference of around > 23% (~25500 Op/s vs ~33000 Op/s) which should be enough to justify the added > complexity. Though it would be better if we could demonstrate the perf > difference on a real world application instead of a microbenchmark. Do you > have one you can point me to? The use case here is JS developers needing to allocate "DrawTargets" to hold temporary results, usually for blending or masking operations. They can either create lots of Canvas elements, or reuse a pool of large Canvas elements and manage their own allocation of subregions within those. This then leads to the problem where you read and write to the same canvas. They can work around it by using a temporary canvas like I do in this example, but I doubt most will be aware of the problem. > > Is the zeroing-out step necessary? We are reading back exactly from the same > > area we wrote to. > > You're right, the zeroing-out step is not necessary. But at the moment > Factory::CreateDrawTarget always creates an empty DrawTarget. We could add a > fast path for creating uninitialized DrawTargets, but if we have to keep > around the intermediate DrawTarget anyway we probably won't need it.

Jeff Muizelaar [:jrmuizel]

Comment 11

•

10 years ago

Comment on attachment 8507949 [details] [diff] [review] v1 Review of attachment 8507949 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/canvas/CanvasRenderingContext2D.cpp @@ +3858,5 @@ > +// Returns a surface that contains only the part needed to draw aSourceRect. > +// On entry, aSourceRect is relative to aSurface, and on return aSourceRect is > +// relative to the returned surface. > +static TemporaryRef<SourceSurface> > +ExtractSubrect(SourceSurface* aSurface, mgfx::Rect* aSourceRect, DrawTarget* aTargetDT) I don't really like the fact that the rounding happens here and source rect is modified, but I have no suggestions for better.

Attachment #8507949 - Flags: review?(jmuizelaar) → review+

Markus Stange [:mstange]

Assignee

Comment 12

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/853612ddc686 I'll open a new bug for the other optimization we talked about.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 13

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/853612ddc686

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Markus Stange [:mstange]

Assignee

Updated

•

10 years ago

Depends on: 1090323

Markus Stange [:mstange]

Assignee

Comment 14

•

10 years ago

(In reply to Michael Bebenita [:mbx] from comment #8) > (In reply to Markus Stange [:mstange] from comment #6) > > Created attachment 8507949 [details] [diff] [review] > > v1 > > > > Together with bug 1074842 this patch gets us most of the way there. The > > remaining overhead comes from allocating, zeroing-out and de-allocating the > > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have > > that overhead because CanvasB stays alive across all drawImage the calls. > > Can we create and keep a temporary DrawTarget around for each canvas that > needs this? We can resize it as needed, or free it if we're under memory > pressure. I put up a patch for this in bug 1090323.

Jerry Shih[:jerry] (UTC+8) (inactive)

Updated

•

8 years ago

copy.html 10 years ago Michael Bebenita [:mbx] 2.70 KB, text/html		Details
v1 10 years ago Markus Stange [:mstange] 3.17 KB, patch	jrmuizel : review+	Details \| Diff \| Splinter Review