582968 - Slow canvas painting on this testcase

Reporter

Description

•

15 years ago

We're a lot slower than chrome on this testcase visually. About 50% of the time is spent painting the canvas, all of it under BasicCanvasLayer::Paint calling _moz_cairo_fill_preserve. At least on Mac trunk.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

15 years ago

The testcase is also attached in bug 582973.

Blocks: 582973

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

15 years ago

And in fact, most of the time is spent under CGContextDrawImage doing argb32_sample_argb32 (2/3 of the drawing time right there). I wonder whether hardware accel of some sort would help here...

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 3

•

15 years ago

Well, chrome doesn't have any hw accel, so I wonder if we're hitting some particular setup that causes us to do bad things. Can you profile chrome and see where it spends the time? (Even without chrome symbols we should still see the coregraphics time)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

15 years ago

For chrome, 60% of the time is spent under CGContextDrawImage, almost all of it in argb32_sample_RGBA32. They also spend about 35% of their time in what looks like jit-generated code (compared to 45% for us). If we assume they're making as many CGContextDrawImage calls as we are and that argb32_sample_RGBA32 is the same speed as argb32_sample_argb32 (is it?), then they're basically 20% faster than us just due to faster js execution. But I wonder whether they just paint less often than we do here or something.... The testcase runs the loop every 1/60 of a second, or tries to, so I'd think that's how often we paint. I would be surprised if Chrome painted much less often than that, I guess. Which leaves the question about those sample functions. They're definitely pegging one of my cores, just like we are.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 5

•

15 years ago

I would assume argb32_sample_RGBA32 is the same speed as argb32_sample_argb32, but I'm not sure -- there are all sorts of weirdly named functions there, and I can't remember if caps had specific meaning or not. Hmm, http://benoitgirard.wordpress.com/2010/03/09/optimizing-cgcontextdrawimage/ is interesting -- there's mention there of cairo not hitting the sample call, are we setting an odd colorspace on the image?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

15 years ago

Or not setting any? Sounds like the default is RGB or something? http://lists.apple.com/archives/perfoptimization-dev/2008/Feb/msg00028.html (which mentions the colorspace problem to) suggests that non-integral positions can trigger issues too. But that was talking about argb32_sample_ARGB32 (uppercase, not lowercase).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

15 years ago

http://www.mailinglistarchive.com/html/quartz-dev@lists.apple.com/2008-07/msg00034.html mentions that if we're hitting the _mark stuff, we more or less lose already.... As far as colorspace goes, if cairo_quartz_create_cgimage is called with no colorSpaceOverride and with CAIRO_FORMAT_ARGB32 it uses: colorSpace = CGColorSpaceCreateDeviceRGB(); _cairo_surface_to_cgimage passes a null colorspace override. So does cairo_quartz_image_surface_flush. So does cairo_quartz_image_surface_create. The question is... do we do our own color management for canvas? If not, then we do in fact want to use the rgb colorspace here, right? Or no?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

15 years ago

The docs say: CGColorSpaceCreateDeviceRGB In Mac OS X v10.4 and later, this color space is no longer device-dependent and is replaced by the generic counterpart—kCGColorSpaceGenericRGB—described in “Color Space Names”. If you use this function in Mac OS X v10.4 and later, colors are mapped to the generic color spaces. If you want to bypass color matching, use the color space of the destination context. I tried using CGColorSpaceCreateWithPlatformColorSpace and using the system profile with it, but I still end up in argb32_sample_argb32....

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

15 years ago

Doesn't chrome use Skia to draw into the canvas buffer and only use CG to draw the results to the screen?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

15 years ago

Sure, but the screen drawing is what I saw taking time for us. See the "all of it under BasicCanvasLayer::Paint" part above.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 11

•

15 years ago

Unless I musunderstood the question in comment 9?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

15 years ago

No, I misunderstood. We're using Quartz to scale the canvas buffer (195x251 -> 1173x753 on my system) in BasicCanvasLayer::Paint. I assume that Chrome is using Skia to scale the buffer to the process backing store and then throwing that onto the screen using CG without further scaling. If so, we'd expect to see more time in Quartz than Chrome has because we're doing expensive scaling there. So one question is whether Skia's scaling is faster than Quartz's, and if so, whether that's because it's lower quality or because they have faster code. Or maybe they are able to use two cores because they have the content process scaling the canvas to backing store and the other process throwing it on the screen? Regardless, the good news is that with the GL backend (and appropriate additional optimizations, like making sure our glTexSubimage and glFinish calls don't block the main thread), that scaling moves to the GPU, probably off the main thread, and should become a non-issue.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 13

•

15 years ago

Are we sure that's the case? I'd expect some skia functions to show up high in the profile. I didn't realize there was scaling going on here; that could trigger all sorts of things.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

15 years ago

> We're using Quartz to scale the canvas buffer Ah, indeed. I'd looked for the scaling but missed it somehow. That explains the argb32_sample_argb32, I would assume. I looked at the Chrome profile more carefully. They peg a core in their worker process, and use about 12% of another core in the main process. The worker, as I said above, is about 60% the Quartz/CG stuff. Another 35% looks like JS. No obvious room left for skia. The main process is about half under CGContextDrawImage doing an sseCGSBlendXXXX8888 (more or less pure blit if I understand correctly). The other half is CGContextDrawLayerAtPoint which also ends up under argb32_image and then sseCGSBlendXXXX8888. So they do get another 12% of a core this way. Between the 20% or so that we estimate they got by being faster on the JS end, that's a 30% difference. That may be enough to get the smoothness I saw...

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

Slow canvas painting on this testcase

Categories

(Core :: Graphics: Canvas2D, defect)

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(
URL
)

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Updated