Closed Bug 455513 Opened 11 years ago Closed 11 years ago

Canvas - PNGs without transparency paint 10x slower on Windows, not Mac.

Categories

(Core :: Canvas: 2D, defect, P1, major)

x86
Windows XP
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: kroc, Assigned: vlad)

References

()

Details

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-GB; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-GB; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1

When tiling a small image over a large area, PNG files that have no transparency (24-bit, or 8-bit with no transparency) paint 10x slower than PNG files with transparency (1 or 8-bit). This affects Windows, not Mac. Linux not tested.


Reproducible: Always

Steps to Reproduce:
Repeat this test for Windows & Mac.
This Test requires Firebug for the timing results.

1. Visit http://camendesign.com/code/files/canvas-bug-test/ 
2. Click the button to start the test, takes about 5 seconds. A message will appear to confirm the test is complete
3. Open the Firebug console to see results.
Actual Results:  
In Windows I am getting:

32-Bit (8-Bit Alpha): 60ms
32-Bit (1-Bit Alpha): 44ms
24-Bit (No Transparency): 1033ms
8-Bit (No Transparency): 1061ms
8-Bit (1-Bit Alpha): 35ms

Expected Results:  
On Mac I'm getting

32-Bit (8-Bit Alpha): 67ms
32-Bit (1-Bit Alpha): 60ms
24-Bit (No Transparency): 61ms
8-Bit (No Transparency): 58ms
8-Bit (1-Bit Alpha): 63ms

If it takes over 1 second to paint a 1400x700 screen there's something very wrong. Especially when Firefox/Mac does it in 60ms.
Hrm, that's pretty broken -- what're the OS/hardware specs of the win32 box?  There's all sorts of bizarre variants of hardware that accelerate some GDI functions but not others that I'm tempted to just throw out the whole thing and do everything in software.  Yuck.
Component: General → Layout: Canvas
Product: Firefox → Core
QA Contact: general → layout.canvas
I'm on a MacBook Pro C2D 2.16 / 2GB. Windows is running in a VM with 512MB RAM; yes lop-sided but this is easily disproved considering that Safari and Google Chrome in Windows do not suffer with this speed issue *at all*. (render times are the same as on the Mac side) - even in Firefox/Win the transparent images are rendering as fast as the Mac. Only Non-transparent images are affected.

A friend has also tested on a real Windows laptop and got the same results. I can ask him for his specs when I next talk to him.
Nah, that's fine.. it just sounded like another similar GDI issue I was dealing with.  Sounds like it's more pathological than that though. I'll take a look in a few days, thanks!
I haven't tested that this patch fixes the problem, but it makes sense that it would.
I do not know how to apply this, I'm just a user/web-developer. I can compile some software, but Firefox is just too large and the build system too complicated for me. Can you spin a build off for me, or apply this to trunk so it will appear in a nightly?

Kind regards,
Kroc.
Assignee: nobody → jmuizelaar
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: wanted1.9.1? → wanted1.9.1+
OS: Mac OS X → Windows XP
Priority: -- → P1
Many thanks. Using this build on Windows, I get these results:

32-Bit (8-Bit Alpha): 61ms
32-Bit (1-Bit Alpha): 48ms
24-Bit (No Transparency): 1275ms
8-Bit (No Transparency): 1263ms
8-Bit (1-Bit Alpha): 35ms

Sorry if that's not very helpful.
It looks like we may actually be spending time in _cairo_win32_surface_get_subimage() called from _cairo_pattern_acquire_surfaces()
Yeah, so in the 24-bit case we have:

- source: 24-bit DDB (optimized image); therefore no image surface equivalent
- destination: 32-bit DIB; has image surface equivalent

When we do the blit, because we're doing 24bpp OVER 32bpp (or SOURCE, doesn't matter), we hit the unsupported accel case.  The dest has an image surface equiv, so we toss it straight over to the image backend's composite function.  That function calls acquire_surfaces, like jeff says.. but the 24bpp has no image surface equiv, so it has to go through the slow variant of win32_surface_acquire_source.

Fixing this is a pain, because blitting from a 24-bit DDB to a 24-bit DDB (the common case for rendering) is very fast.  But in this case it's slow -- not because the operation itself is intrinsically slow, but because it has to read the source image each time.  The same problem shows up when using an <img> inside a block that needs translucency though (e.g. that has a translucent color background).

One option might be to cache the image surface equivalent the first time it's acquired as a source, if it's not already present; blow it away if the surface is rendered to.  This will use up more memory at runtime in exchange for performance; because it'll just be inside the cairo surface, that memory will be freed when the surface is thrown away by the imglib layer.
Another option would be to de-optimize that surface to a DIB the first time a source is acquired... that wouldn't use any more memory.
Attached patch fix, v1 (obsolete) — Splinter Review
(Stealing this from Jeff)

The problem is indeed that get_subimage was being called -- we were being asked to composite an image that we've optimized to a DDB (RGB24) to a 32 bit DIB (what canvas creates).  GDI doesn't support this operation, because it doesn't really know about alpha (perhaps AlphaBlend and/or StretchBlt might be ok, but it's already getting convoluted).  So, we end up reading the image data out to acquire a source for fallback each time the fallback happens.

Instead, with this patch, if a DDB-optimized surface is ever used as a source for an operation that we can't do quickly, we convert it to a DIB so that we always have a data buffer we can share with the fallback code without requiring that we grab the data each time.  Whether we can deoptimize or not is controlled by a flag that we set on DDBs created in nsThebesImage.
Assignee: jmuizelaar → vladimir
Attachment #341166 - Flags: review?(jmuizelaar)
Comment on attachment 339145 [details] [diff] [review]
Add Src_x888x8888 fast path

However, additional fast paths are good as well, but:


>@@ -1556,6 +1596,9 @@ static const FastPathInfo c_fast_paths[] =
>     { PIXMAN_OP_SRC, PIXMAN_x8r8g8b8,  PIXMAN_null,     PIXMAN_x8r8g8b8, fbCompositeSrc_8888xx888, 0 },
>     { PIXMAN_OP_SRC, PIXMAN_a8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_8888xx888, 0 },
>     { PIXMAN_OP_SRC, PIXMAN_x8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_8888xx888, 0 },
>+    { PIXMAN_OP_SRC, PIXMAN_x8r8g8b8,  PIXMAN_null,     PIXMAN_x8r8g8b8, fbCompositeSrc_x888x8888, 0 },
>+    { PIXMAN_OP_SRC, PIXMAN_x8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_x888x8888, 0 },
>+

isn't the first line here identical to the first new one that's added, same with the third and second?
(In reply to comment #12)
> (From update of attachment 339145 [details] [diff] [review])
> However, additional fast paths are good as well, but:
> 
> 
> >@@ -1556,6 +1596,9 @@ static const FastPathInfo c_fast_paths[] =
> >     { PIXMAN_OP_SRC, PIXMAN_x8r8g8b8,  PIXMAN_null,     PIXMAN_x8r8g8b8, fbCompositeSrc_8888xx888, 0 },
> >     { PIXMAN_OP_SRC, PIXMAN_a8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_8888xx888, 0 },
> >     { PIXMAN_OP_SRC, PIXMAN_x8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_8888xx888, 0 },
> >+    { PIXMAN_OP_SRC, PIXMAN_x8r8g8b8,  PIXMAN_null,     PIXMAN_x8r8g8b8, fbCompositeSrc_x888x8888, 0 },
> >+    { PIXMAN_OP_SRC, PIXMAN_x8b8g8r8,  PIXMAN_null,     PIXMAN_x8b8g8r8, fbCompositeSrc_x888x8888, 0 },
> >+
> 
> isn't the first line here identical to the first new one that's added, same
> with the third and second?

Yeah, I caught this when I was testing. I have a fixed patch, but didn't bother doing much with it because the time was dominated by the DIB/DDB problem. I'll measure it on top of your patch and then push it upstream.
Comment on attachment 341166 [details] [diff] [review]
fix, v1

>diff -r 820b8844b5b6 gfx/cairo/cairo/src/cairo-win32-private.h
>--- a/gfx/cairo/cairo/src/cairo-win32-private.h	Wed Sep 10 15:02:06 2008 -0700
>+++ b/gfx/cairo/cairo/src/cairo-win32-private.h	Tue Sep 30 15:25:54 2008 -0700
>@@ -117,6 +117,9 @@
> 
>     /* Whether we can use GradientFill rectangles with this surface */
>     CAIRO_WIN32_SURFACE_CAN_RECT_GRADIENT = (1<<6),
>+
>+    /* if this DDB surface can be converted to a DIB if necessary */
>+    CAIRO_WIN32_SURFACE_CAN_CONVERT_TO_DIB = (1<<7),
> };
> 
> cairo_status_t
>diff -r 820b8844b5b6 gfx/cairo/cairo/src/cairo-win32-surface.c
>--- a/gfx/cairo/cairo/src/cairo-win32-surface.c	Wed Sep 10 15:02:06 2008 -0700
>+++ b/gfx/cairo/cairo/src/cairo-win32-surface.c	Tue Sep 30 15:25:54 2008 -0700
>@@ -556,6 +556,59 @@
>     return CAIRO_STATUS_SUCCESS;
> }
> 
>+static void
>+_cairo_win32_convert_ddb_to_dib (cairo_win32_surface_t *surface)
>+{
>+    cairo_win32_surface_t *new_surface;
>+    int width = surface->extents.width;
>+    int height = surface->extents.height;
>+
>+    BOOL ok;
>+    HBITMAP oldbitmap;
>+
>+    /* If we're not allowed to convert, don't. */
>+    if ((surface->flags & CAIRO_WIN32_SURFACE_CAN_CONVERT_TO_DIB) == 0)
>+	return;

I wonder if the CAN_CONVERT_TO_DIB check would be better outside of this function

>+
>+    new_surface = (cairo_win32_surface_t*)
>+	_cairo_win32_surface_create_for_dc (surface->dc,
>+					    surface->format,
>+					    width,
>+					    height);
>+
>+    if (new_surface->base.status)
>+	return;

Should we be ignoring this?

>+
>+    /* DDB can't be 32bpp, so BitBlt is safe */
>+    ok = BitBlt (new_surface->dc,
>+		 0, 0, width, height,
>+		 surface->dc,
>+		 0, 0, SRCCOPY);

can BitBlt fail like in _cairo_win32_surface_get_subimage?

>+
>+    if (!ok)
>+	goto out;
>+
>+    /* Now swizzle around new_surface and surface */

'swizzle' makes me thinks graphics card swizzling. You could say 'swap'. Maybe
it's just me.

>+    DeleteDC (new_surface->dc);
>+    new_surface->dc = NULL;

maybe a comment like: /* Replace oldbitmap with the new one */? I had to look up what SelectObject did before understanding what was going on...

>+    oldbitmap = SelectObject (surface->dc, new_surface->bitmap);
>+    DeleteObject (oldbitmap);
>+
>+    surface->image = new_surface->image;
>+    surface->is_dib = new_surface->is_dib;
>+    surface->bitmap = new_surface->bitmap;
>+
>+    new_surface->bitmap = NULL;
>+    new_surface->image = NULL;
>+
>+    /* Finally update flags */
>+    surface->flags = _cairo_win32_flags_for_dc (surface->dc);
>+
>+  out:
>+    cairo_surface_destroy ((cairo_surface_t*)new_surface);
>+}
>+
> static cairo_status_t
> _cairo_win32_surface_acquire_source_image (void                    *abstract_surface,
> 					   cairo_image_surface_t  **image_out,
>@@ -564,6 +617,14 @@
>     cairo_win32_surface_t *surface = abstract_surface;
>     cairo_win32_surface_t *local = NULL;
>     cairo_status_t status;
>+
>+    if (!surface->image && !surface->is_dib && surface->bitmap) {
>+	/* This is a DDB, and we're being asked to use it as a source for
>+	 * something that we couldn't support natively.  So turn it into
>+	 * a DIB, so that we have an equivalent image surface.
>+	 */

Instead check for it here...

>+	_cairo_win32_convert_ddb_to_dib (surface);
>+    }
> 
>     if (surface->image) {
> 	*image_out = (cairo_image_surface_t *)surface->image;
>@@ -2157,3 +2218,61 @@
>     free(rd);
>     fflush (stderr);
> }


I feel these functions should have better names. It isn't clear from the
names or the comments why one would ever not want to call set_can_convert_to_dib(). 
Will calling this function ever cause wrong results or does it only affect performance?
Also should set_can_convert be the default? Maybe a better name would be set_allow_migration()? That's maybe a more platform independent description. I'm not sure...

>+
>+/**
>+ * cairo_win32_surface_set_can_convert_to_dib
>+ * @surface: a #cairo_surface_t
>+ * @can_convert: a #cairo_bool_t indicating whether this surface can
>+ *               be coverted to a DIB if necessary
>+ *
>+ * A DDB surface with this flag set can be converted to a DIB if it's
>+ * used as a source in a way that GDI can't natively handle; for
>+ * example, drawing a RGB24 DDB onto an ARGB32 DIB.  Doing this
>+ * conversion results in a significant speed optimization, because we
>+ * can call on pixman to perform the operation natively, instead of
>+ * reading the data from the DC each time.
>+ *
>+ * Return value: %CAIRO_STATUS_SUCCESS if the flag was successfully
>+ * changed, or an error otherwise.
>+ * 
>+ */
>+cairo_status_t
>+cairo_win32_surface_set_can_convert_to_dib (cairo_surface_t *asurface, cairo_bool_t can_convert)
>+{
>+    cairo_win32_surface_t *surface = (cairo_win32_surface_t*) asurface;
>+    if (surface->base.type != CAIRO_SURFACE_TYPE_WIN32)
>+	return CAIRO_STATUS_SURFACE_TYPE_MISMATCH;
>+
>+    if (surface->bitmap) {
>+	if (can_convert)
>+	    surface->flags |= CAIRO_WIN32_SURFACE_CAN_CONVERT_TO_DIB;
>+	else
>+	    surface->flags &= ~CAIRO_WIN32_SURFACE_CAN_CONVERT_TO_DIB;
>+    }
>+
>+    return CAIRO_STATUS_SUCCESS;
>+}
>+
>+/**
>+ * cairo_win32_surface_get_can_convert_to_dib
>+ * @surface: a #cairo_surface_t
>+ * @can_convert: a #cairo_bool_t* that receives the return value
>+ *
>+ * Returns the value of the flag indicating whether the surface can be
>+ * converted to a DIB if necessary, as set by
>+ * cairo_win32_surface_set_can_convert_to_dib.
>+ *
>+ * Return value: %CAIRO_STATUS_SUCCESS if the flag was successfully
>+ * retreived, or an error otherwise.
>+ * 
>+ */
>+cairo_status_t
>+cairo_win32_surface_get_can_convert_to_dib (cairo_surface_t *asurface, cairo_bool_t *can_convert)
>+{
>+    cairo_win32_surface_t *surface = (cairo_win32_surface_t*) asurface;
>+    if (surface->base.type != CAIRO_SURFACE_TYPE_WIN32)
>+	return CAIRO_STATUS_SURFACE_TYPE_MISMATCH;
>+
>+    *can_convert = ((surface->flags & CAIRO_WIN32_SURFACE_CAN_CONVERT_TO_DIB) != 0);
>+    return CAIRO_STATUS_SUCCESS;
>+}
>diff -r 820b8844b5b6 gfx/src/thebes/nsThebesImage.cpp
>--- a/gfx/src/thebes/nsThebesImage.cpp	Wed Sep 10 15:02:06 2008 -0700
>+++ b/gfx/src/thebes/nsThebesImage.cpp	Tue Sep 30 15:25:54 2008 -0700
>@@ -129,6 +129,7 @@
>             // no error
>             mImageSurface = mWinSurface->GetImageSurface();
>         }

[snip]

>+    return (st == CAIRO_STATUS_SUCCESS ? can_convert : PR_FALSE);
>+}


The mozilla changes look fine other than including some unrelated cosmetic changes could be done separately.
Updated patch, cairo only bits.  I kept the naming because it's a very platform specific operation, so I don't want the name to be generic.  I honestly can't think of when you wouldn't want to do this, though you may have some cases when you really want something to remain a DDB for, say, printing or something.
Attachment #341166 - Attachment is obsolete: true
Attachment #344328 - Flags: review?(jmuizelaar)
Attachment #341166 - Flags: review?(jmuizelaar)
DIB/DDB stuff is platform specific but I can imagine a similar operation happening with Pixmap and XImage on X11.

Is it worth adding api for this if we aren't even sure what it would be used for? I'd prefer to have the migration happen by default and if we come up with a reason to avoid it we could add an api like cairo_win32_surface_pin_to_ddb().
Attachment #344328 - Flags: review?(jmuizelaar) → review+
Comment on attachment 344328 [details] [diff] [review]
updated patch (cairo only)

This looks good enough for our private copy of cairo. It might need more thinking about the api before upstreaming.
Checked in to our cairo; should be in b2.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.