Open Bug 582968 Opened 15 years ago Updated 3 years ago

Slow canvas painting on this testcase

Categories

(Core :: Graphics: Canvas2D, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

()

Details

(Keywords: perf)

We're a lot slower than chrome on this testcase visually. About 50% of the time is spent painting the canvas, all of it under BasicCanvasLayer::Paint calling _moz_cairo_fill_preserve. At least on Mac trunk.
The testcase is also attached in bug 582973.
Blocks: 582973
And in fact, most of the time is spent under CGContextDrawImage doing argb32_sample_argb32 (2/3 of the drawing time right there). I wonder whether hardware accel of some sort would help here...
Well, chrome doesn't have any hw accel, so I wonder if we're hitting some particular setup that causes us to do bad things. Can you profile chrome and see where it spends the time? (Even without chrome symbols we should still see the coregraphics time)
For chrome, 60% of the time is spent under CGContextDrawImage, almost all of it in argb32_sample_RGBA32. They also spend about 35% of their time in what looks like jit-generated code (compared to 45% for us). If we assume they're making as many CGContextDrawImage calls as we are and that argb32_sample_RGBA32 is the same speed as argb32_sample_argb32 (is it?), then they're basically 20% faster than us just due to faster js execution. But I wonder whether they just paint less often than we do here or something.... The testcase runs the loop every 1/60 of a second, or tries to, so I'd think that's how often we paint. I would be surprised if Chrome painted much less often than that, I guess. Which leaves the question about those sample functions. They're definitely pegging one of my cores, just like we are.
I would assume argb32_sample_RGBA32 is the same speed as argb32_sample_argb32, but I'm not sure -- there are all sorts of weirdly named functions there, and I can't remember if caps had specific meaning or not. Hmm, http://benoitgirard.wordpress.com/2010/03/09/optimizing-cgcontextdrawimage/ is interesting -- there's mention there of cairo not hitting the sample call, are we setting an odd colorspace on the image?
Or not setting any? Sounds like the default is RGB or something? http://lists.apple.com/archives/perfoptimization-dev/2008/Feb/msg00028.html (which mentions the colorspace problem to) suggests that non-integral positions can trigger issues too. But that was talking about argb32_sample_ARGB32 (uppercase, not lowercase).
http://www.mailinglistarchive.com/html/quartz-dev@lists.apple.com/2008-07/msg00034.html mentions that if we're hitting the _mark stuff, we more or less lose already.... As far as colorspace goes, if cairo_quartz_create_cgimage is called with no colorSpaceOverride and with CAIRO_FORMAT_ARGB32 it uses: colorSpace = CGColorSpaceCreateDeviceRGB(); _cairo_surface_to_cgimage passes a null colorspace override. So does cairo_quartz_image_surface_flush. So does cairo_quartz_image_surface_create. The question is... do we do our own color management for canvas? If not, then we do in fact want to use the rgb colorspace here, right? Or no?
The docs say: CGColorSpaceCreateDeviceRGB In Mac OS X v10.4 and later, this color space is no longer device-dependent and is replaced by the generic counterpart—kCGColorSpaceGenericRGB—described in “Color Space Names”. If you use this function in Mac OS X v10.4 and later, colors are mapped to the generic color spaces. If you want to bypass color matching, use the color space of the destination context. I tried using CGColorSpaceCreateWithPlatformColorSpace and using the system profile with it, but I still end up in argb32_sample_argb32....
Doesn't chrome use Skia to draw into the canvas buffer and only use CG to draw the results to the screen?
Sure, but the screen drawing is what I saw taking time for us. See the "all of it under BasicCanvasLayer::Paint" part above.
Unless I musunderstood the question in comment 9?
No, I misunderstood. We're using Quartz to scale the canvas buffer (195x251 -> 1173x753 on my system) in BasicCanvasLayer::Paint. I assume that Chrome is using Skia to scale the buffer to the process backing store and then throwing that onto the screen using CG without further scaling. If so, we'd expect to see more time in Quartz than Chrome has because we're doing expensive scaling there. So one question is whether Skia's scaling is faster than Quartz's, and if so, whether that's because it's lower quality or because they have faster code. Or maybe they are able to use two cores because they have the content process scaling the canvas to backing store and the other process throwing it on the screen? Regardless, the good news is that with the GL backend (and appropriate additional optimizations, like making sure our glTexSubimage and glFinish calls don't block the main thread), that scaling moves to the GPU, probably off the main thread, and should become a non-issue.
Are we sure that's the case? I'd expect some skia functions to show up high in the profile. I didn't realize there was scaling going on here; that could trigger all sorts of things.
> We're using Quartz to scale the canvas buffer Ah, indeed. I'd looked for the scaling but missed it somehow. That explains the argb32_sample_argb32, I would assume. I looked at the Chrome profile more carefully. They peg a core in their worker process, and use about 12% of another core in the main process. The worker, as I said above, is about 60% the Quartz/CG stuff. Another 35% looks like JS. No obvious room left for skia. The main process is about half under CGContextDrawImage doing an sseCGSBlendXXXX8888 (more or less pure blit if I understand correctly). The other half is CGContextDrawLayerAtPoint which also ends up under argb32_image and then sseCGSBlendXXXX8888. So they do get another 12% of a core this way. Between the 20% or so that we estimate they got by being faster on the JS end, that's a 30% difference. That may be enough to get the smoothness I saw...
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.