drawImage where the source and destination canvas are the same is 100x slower than copying through a temporary canvas

RESOLVED FIXED in mozilla36

Status

()

defect
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: mbx, Assigned: mstange)

Tracking

({perf})

unspecified
mozilla36
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
Posted file copy.html
Using drawImage to copy a region of a canvas to another region within the same canvas is extremely slow. We're talking 100X slower than copying the region to a temporary canvas and then copying it back. I have only tested on OSX.
The A<->A copy is also slower in Chrome (though Chrome is faster than Firefox.)
The A<->B<->A copy is considerably faster in both browsers.
It looks like a typical memory/speed tradeoff.
Markus: can you profile this one for us? It does seem like we're still several times slower than Chrome. Thanks!
Flags: needinfo?(mstange)
Blocks: shumway-1.0
No longer blocks: shumway-m3
Keywords: perf
Summary: drawImage where the source and destination are the same is painfully slow. → drawImage where the source and destination canvas are the same is 100x slower than copying through a temporary canvas
(Assignee)

Comment 3

5 years ago
This happens because internally we copy out the whole canvas instead of just the subrect we need.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Flags: needinfo?(mstange)
(Assignee)

Comment 5

5 years ago
(In reply to Jet Villegas (:jet) from comment #4)
> dupe of bug 1001069?

Similar issue but needs to be fixed separately.
(Assignee)

Comment 6

5 years ago
Posted patch v1Splinter Review
Together with bug 1074842 this patch gets us most of the way there. The remaining overhead comes from allocating, zeroing-out and de-allocating the temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have that overhead because CanvasB stays alive across all drawImage the calls.
Attachment #8507949 - Flags: review?(bas)
Comment on attachment 8507949 [details] [diff] [review]
v1

Stealing the review from Bas to Jeff.
Attachment #8507949 - Flags: review?(bas) → review?(jmuizelaar)
(Reporter)

Comment 8

5 years ago
(In reply to Markus Stange [:mstange] from comment #6)
> Created attachment 8507949 [details] [diff] [review]
> v1
> 
> Together with bug 1074842 this patch gets us most of the way there. The
> remaining overhead comes from allocating, zeroing-out and de-allocating the
> temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have
> that overhead because CanvasB stays alive across all drawImage the calls.

Can we create and keep a temporary DrawTarget around for each canvas that needs this? We can resize it as needed, or free it if we're under memory pressure. 

Is the zeroing-out step necessary? We are reading back exactly from the same area we wrote to.
(Assignee)

Comment 9

5 years ago
(In reply to Michael Bebenita [:mbx] from comment #8)
> (In reply to Markus Stange [:mstange] from comment #6)
> > Created attachment 8507949 [details] [diff] [review]
> > v1
> > 
> > Together with bug 1074842 this patch gets us most of the way there. The
> > remaining overhead comes from allocating, zeroing-out and de-allocating the
> > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have
> > that overhead because CanvasB stays alive across all drawImage the calls.
> 
> Can we create and keep a temporary DrawTarget around for each canvas that
> needs this? We can resize it as needed, or free it if we're under memory
> pressure.

Yes, we could do that. With my patches there's still a difference of around 23% (~25500 Op/s vs ~33000 Op/s) which should be enough to justify the added complexity. Though it would be better if we could demonstrate the perf difference on a real world application instead of a microbenchmark. Do you have one you can point me to?

> Is the zeroing-out step necessary? We are reading back exactly from the same
> area we wrote to.

You're right, the zeroing-out step is not necessary. But at the moment Factory::CreateDrawTarget always creates an empty DrawTarget. We could add a fast path for creating uninitialized DrawTargets, but if we have to keep around the intermediate DrawTarget anyway we probably won't need it.
(Reporter)

Comment 10

5 years ago
(In reply to Markus Stange [:mstange] from comment #9)
> (In reply to Michael Bebenita [:mbx] from comment #8)
> > (In reply to Markus Stange [:mstange] from comment #6)
> > > Created attachment 8507949 [details] [diff] [review]
> > > v1
> > > 
> > > Together with bug 1074842 this patch gets us most of the way there. The
> > > remaining overhead comes from allocating, zeroing-out and de-allocating the
> > > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have
> > > that overhead because CanvasB stays alive across all drawImage the calls.
> > 
> > Can we create and keep a temporary DrawTarget around for each canvas that
> > needs this? We can resize it as needed, or free it if we're under memory
> > pressure.
> 
> Yes, we could do that. With my patches there's still a difference of around
> 23% (~25500 Op/s vs ~33000 Op/s) which should be enough to justify the added
> complexity. Though it would be better if we could demonstrate the perf
> difference on a real world application instead of a microbenchmark. Do you
> have one you can point me to?

The use case here is JS developers needing to allocate "DrawTargets" to hold temporary results, usually for blending or masking operations. They can either create lots of Canvas elements, or reuse a pool of large Canvas elements and manage their own allocation of subregions within those. This then leads to the problem where you read and write to the same canvas. They can work around it by using a temporary canvas like I do in this example, but I doubt most will be aware of the problem.


> > Is the zeroing-out step necessary? We are reading back exactly from the same
> > area we wrote to.
> 
> You're right, the zeroing-out step is not necessary. But at the moment
> Factory::CreateDrawTarget always creates an empty DrawTarget. We could add a
> fast path for creating uninitialized DrawTargets, but if we have to keep
> around the intermediate DrawTarget anyway we probably won't need it.
Comment on attachment 8507949 [details] [diff] [review]
v1

Review of attachment 8507949 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +3858,5 @@
> +// Returns a surface that contains only the part needed to draw aSourceRect.
> +// On entry, aSourceRect is relative to aSurface, and on return aSourceRect is
> +// relative to the returned surface.
> +static TemporaryRef<SourceSurface>
> +ExtractSubrect(SourceSurface* aSurface, mgfx::Rect* aSourceRect, DrawTarget* aTargetDT)

I don't really like the fact that the rounding happens here and source rect is modified, but I have no suggestions for better.
Attachment #8507949 - Flags: review?(jmuizelaar) → review+
(Assignee)

Comment 12

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/853612ddc686

I'll open a new bug for the other optimization we talked about.
https://hg.mozilla.org/mozilla-central/rev/853612ddc686
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
(Assignee)

Updated

5 years ago
Depends on: 1090323
(Assignee)

Comment 14

5 years ago
(In reply to Michael Bebenita [:mbx] from comment #8)
> (In reply to Markus Stange [:mstange] from comment #6)
> > Created attachment 8507949 [details] [diff] [review]
> > v1
> > 
> > Together with bug 1074842 this patch gets us most of the way there. The
> > remaining overhead comes from allocating, zeroing-out and de-allocating the
> > temporary DrawTarget. The manual CanvasA->CanvasB->CanvasA copy doesn't have
> > that overhead because CanvasB stays alive across all drawImage the calls.
> 
> Can we create and keep a temporary DrawTarget around for each canvas that
> needs this? We can resize it as needed, or free it if we're under memory
> pressure.

I put up a patch for this in bug 1090323.
You need to log in before you can comment on or make changes to this bug.