Open Bug 459078 Opened 17 years ago Updated 3 years ago

Improve image scaling performance.

Categories

(Core :: Graphics, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: jrmuizel, Assigned: jrmuizel)

Details

(Keywords: mobile)

Attachments

(4 files)

The image scaling that happens in the xserver is much slower than it needs to be.
This patch takes a much different approach than I originally planned. Instead of making fast code correct, this starts with correct code and will make it fast. Currently it isn't that optimized beyond the general code, however it still provides a healthy performance improvement. The patch is against upstream pixman, but I'll do a version against the nokia xserver next.
It looks about twice as fast as the existing code.
Here's a basically unrelated patch against the nokia xserver. It should make quite a difference on scaled image rendering. However, I've not compared any numbers. Note: it forces the filtering to NEAREST because fennec is asking for BILINEAR. We should figure what it is we want to do here. Also, it forces EXTEND_PAD to EXTEND_NONE, this isn't as problematic because the nokia xserver doesn't support EXTEND_PAD anyways.
So for scaling down, we are requesting bilinear, because we don't set filter to 0 in http://mxr.mozilla.org/mozilla-central/source/gfx/src/thebes/nsThebesImage.cpp#596 . I'll add a fix for that to my patch in 459150, to just always request nearest/no-pad with the mobile opt flag.
Hm, no, ctxHasNonTranslation should be TRUE there -- where does BILINEAR come from?
I'm not sure...
Comment on attachment 342334 [details] [diff] [review] Make scaling faster with nokia's xserver >+ /* we don't support RepeatPad so we might as well use the RepeatNone >+ * fast paths for RepeatPad too */ >+ if (params->src->repeat == RepeatPad) >+ params->src->repeat = RepeatNone; Hmm, is this really needed? We should never hit it. > >+ if (params->op == PictOpSrc && params->src->format == PICT_x8r8g8b8 && params->dest->format == PICT_x8r8g8b8) { >+ if (params->src->filter == PictFilterNearest && params->src->repeat == RepeatNone) { >+ fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8(params, >+ bits, stride, v.vector[0], v.vector[1], unit.vector[0], unit.vector[1]); >+ return; >+ } >+ } Any chance you can make this work with RepeatNormal as well? I'm seeing fbFetchTransformed as ~25% of my profile run, and some of those are with a repeating source. I'll apply this patch though and reprofile.
Or ignore that second part. With this patch, fbFetchTransformed goes from 25% -> 2.5%. The new profile inside Xomap is: samples % symbol name 3741 12.2555 fbSolid 1643 5.3825 fbCompositeSolidMask_nx8x8888 1577 5.1663 fbOver 1211 3.9672 fbRasterizeEdges 962 3.1515 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8 915 2.9975 fbIn 890 2.9156 fbComposite 771 2.5258 fbFetchTransformed 666 2.1818 miComputeCompositeRegion 584 1.9132 cwComposite 517 1.6937 fbCompositeSrcAdd_8000x8000_armv6 458 1.5004 miGlyphs 436 1.4283 fbComposite_x8r8g8b8_src_r5g6b5_internal_armv6 431 1.4120 fbCombineMaskU 429 1.4054 miValidatePicture 411 1.3464 ReadRequestFromClient 409 1.3399 fbmemcpy_arm 395 1.2940 damageComposite 392 1.2842 miModifyPixmapHeader 384 1.2580 cwValidatePicture 373 1.2219 fbCombineAddU The fbSolid mostly shows up with this stack; there's already some arm code in fbSolid (via memset_fbbits), but I don't know if this can be sped up any more. #0 fbSolid (dst=0x3b6288, dstStride=1, dstX=0, bpp=8, width=16, height=22, and=0, xor=0) at ../../fb/fbsolid.c:102 #1 0x00065cc0 in fbFill (pDrawable=0x3b6250, pGC=<value optimized out>, x=0, y=0, width=2, height=22) at ../../fb/fbfill.c:57 #2 0x00065fcc in fbPolyFillRect (pDrawable=0x3b6250, pGC=0x17d560, nrect=0, prect=0xbe82451c) at ../../fb/fbfillrect.c:79 #3 0x000f1050 in cwPolyFillRect (pDst=<value optimized out>, pGC=0x17d560, nrects=1, pRects=0xbe82451c) at ../../../miext/cw/cw_ops.c:343 #4 0x000ebf90 in damagePolyFillRect (pDrawable=0x3b6250, pGC=0x17d560, nRects=1, pRects=0xbe82451c) at ../../../miext/damage/damage.c:1285 #5 0x000d2980 in miCreateAlphaPicture (pScreen=0x1778f8, pDst=<value optimized out>, pPictFormat=0x1788e0, width=2, height=22) at ../../render/mitrap.c:78 #6 0x000d2b84 in miTrapezoids (op=3 '\003', pSrc=0x38d840, pDst=0x3bed28, maskFormat=0x1788e0, xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020) at ../../render/mitrap.c:164 #7 0x000f24ec in cwTrapezoids (op=3 '\003', pSrcPicture=<value optimized out>, pDstPicture=0x3bed28, maskFormat=0x1788e0, xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020) at ../../../miext/cw/cw_render.c:365 #8 0x000d3bb4 in CompositeTrapezoids (op=3 '\003', pSrc=0x38d840, pDst=0x3bed28, maskFormat=0x1788e0, xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020) at ../../render/picture.c:1854 #9 0x000dc8fc in ProcRenderTrapezoids (client=0x315db0) at ../../render/render.c:820 #10 0x000d6a58 in ProcRenderDispatch (client=<value optimized out>) at ../../render/render.c:2002 #11 0x000b8bc8 in XaceCatchExtProc (client=0x315db0) at ../../Xext/xace.c:299 #12 0x00037c64 in Dispatch () at ../../dix/dispatch.c:503 371 1.2154 miSpriteSourceValidate 364 1.1925 __divsi3 359 1.1761 CompositePicture 346 1.1335 __aeabi_idivmod 342 1.1204 SecurityLookupIDByType 333 1.0909 XaceHook 322 1.0549 FreePicture 306 1.0025 Dispatch 292 0.9566 cwChangePicture
(In reply to comment #7) > (From update of attachment 342334 [details] [diff] [review]) > > >+ /* we don't support RepeatPad so we might as well use the RepeatNone > >+ * fast paths for RepeatPad too */ > >+ if (params->src->repeat == RepeatPad) > >+ params->src->repeat = RepeatNone; > > Hmm, is this really needed? We should never hit it. I'm not sure I actually saw RepeatPad, but aren't we setting it in nsThebesImage?
A patch against the nokia xserver to add fbCompositeSolidMask_nx8x8888_armv6. I've only compile tested it, but I expect it to work.
Shouldn't be setting it; if we are, it'll be handled by the in-cairo pixman.. I'm pretty sure the xlib surface doesn't let PAD make it to Render currently.
fbCompositeSolidMask_nx8x8888_armv6 is showing up in profiles now, so it's definitely being called. Now the new profile looks like: samples % symbol name 5504 17.3229 fbRasterizeEdges 4232 13.3195 fbSolid 2891 9.0989 fbCompositeSolidMask_nx8x8888_armv6 1045 3.2890 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8 986 3.1033 fbComposite 714 2.2472 miComputeCompositeRegion 618 1.9450 fbmemcpy_arm 606 1.9073 miGlyphs 549 1.7279 cwComposite 517 1.6272 fbCompositeSrcAdd_8000x8000_armv6 489 1.5390 miModifyPixmapHeader 474 1.4918 cwValidatePicture 418 1.3156 damageComposite 406 1.2778 fbCombineMaskU 394 1.2400 fbComposite_x8r8g8b8_src_r5g6b5_internal_armv6 With Xomap at 41.17%. That's also with a different page than the original though, I'll check the original. The fbSolid calls now: #0 fbSolid (dst=0x38b560, dstStride=50, dstX=0, bpp=8, width=1584, height=9, and=0, xor=0) at ../../fb/fbsolid.c:102 #1 0x00065cc8 in fbFill (pDrawable=0x38b528, pGC=<value optimized out>, x=0, y=0, width=198, height=9) at ../../fb/fbfill.c:57 #2 0x00065fd4 in fbPolyFillRect (pDrawable=0x38b528, pGC=0x17d560, nrect=0, prect=0xbed11dc8) at ../../fb/fbfillrect.c:79 #3 0x000f1310 in cwPolyFillRect (pDst=<value optimized out>, pGC=0x17d560, nrects=1, pRects=0xbed11dc8) at ../../../miext/cw/cw_ops.c:343 #4 0x000ec250 in damagePolyFillRect (pDrawable=0x38b528, pGC=0x17d560, nRects=1, pRects=0xbed11dc8) at ../../../miext/damage/damage.c:1285 #5 0x000d06f0 in miGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0, xSrc=112, ySrc=623, nlist=19, list=0xbed1237c, glyphs=0xbed11f7c) at ../../render/miglyph.c:158 #6 0x000f24bc in cwGlyphs (op=3 '\003', pSrcPicture=<value optimized out>, pDstPicture=0x31b100, maskFormat=0x1788e0, xSrc=112, ySrc=623, nlists=19, lists=0xbed1237c, glyphs=0xbed11f7c) at ../../../miext/cw/cw_render.c:305 #7 0x000ee750 in damageGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0, xSrc=112, ySrc=623, nlist=19, list=0xbed1237c, glyphs=0xbed11f7c) at ../../../miext/damage/damage.c:654 #8 0x000d3d60 in CompositeGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0, xSrc=112, ySrc=623, nlist=19, lists=0xbed1237c, glyphs=0xbed11f7c) at ../../render/picture.c:1824 and much less frequently: #0 fbSolid (dst=0x402dc000, dstStride=400, dstX=0, bpp=16, width=12800, height=480, and=0, xor=0) at ../../fb/fbsolid.c:102 #1 0x0007477c in fbFillRegionSolid (pDrawable=<value optimized out>, pRegion=<value optimized out>, and=0, xor=0) at ../../fb/fbwindow.c:242 #2 0x00074864 in fbPaintWindow (pWin=0x376588, pRegion=0x38b820, what=0) at ../../fb/fbwindow.c:347 #3 0x000f0870 in cwPaintWindowBackground (pWin=0x376588, pRegion=0x38b820, what=985200) at ../../../miext/cw/cw.c:461 #4 0x000ebc58 in damagePaintWindow (pWindow=0x376588, prgn=0x38b820, what=0) at ../../../miext/damage/damage.c:1650 For the text case, I have to confess I don't know what it's doing. Clearing the mask that it'll use for rendering text? Now on to fbRasterizeEdges.. it's all in this sequence: #0 fbRasterizeEdges (buf=0x38c218, bpp=8, width=151, stride=38, l=0xbed12500, r=0xbed124d8, t=50243, b=72089) at ../../fb/fbedge.c:301 #1 0x00073f04 in fbRasterizeTrapezoid (pPicture=<value optimized out>, trap=0x40f5f7dc, x_off=-578, y_off=-93) at ../../fb/fbtrap.c:145 #2 0x000d2e78 in miTrapezoids (op=3 '\003', pSrc=0x319c48, pDst=0x31b100, maskFormat=<value optimized out>, xSrc=579, ySrc=93, ntrap=1, traps=0x40f5f7dc) at ../../render/mitrap.c:170 #3 0x000f27ac in cwTrapezoids (op=3 '\003', pSrcPicture=<value optimized out>, pDstPicture=0x31b100, maskFormat=0x1788e0, xSrc=579, ySrc=93, ntrap=8, traps=0x40f5f6c4) at ../../../miext/cw/cw_render.c:365 #4 0x000d3e74 in CompositeTrapezoids (op=3 '\003', pSrc=0x319c48, pDst=0x31b100, maskFormat=0x1788e0, xSrc=579, ySrc=93, ntrap=8, traps=0x40f5f6c4) at ../../render/picture.c:1854 What's interesting is that a bunch of the time when we get to miTrapezoids ntrap is 1, whereas going into cwTrapezoids it's 8.. I'm not sure if those numbers can be believed, since the trap pointers are different (and should be the same); I think that 8 is the correct number.
Here's cnn with the new patch: samples % symbol name 5466 13.3454 fbSolid 3121 7.6200 fbCompositeSolidMask_nx8x8888_armv6 2410 5.8841 fbRasterizeEdges 1250 3.0519 fbComposite 1197 2.9225 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8 1159 2.8297 miComputeCompositeRegion 780 1.9044 fbFetchTransformed 737 1.7994 miGlyphs 676 1.6505 fbCompositeSrcAdd_8000x8000_armv6 662 1.6163 SecurityLookupIDByType 645 1.5748 ReadRequestFromClient 635 1.5504 cwValidatePicture 620 1.5137 FreePicture 612 1.4942 cwComposite Not sure if it helped or not, actually...
>The patch is against upstream pixman, but I'll do a version against the nokia >xserver next. I think there are no any reason to do double work, because we already have some optimization against nokia-xomap-pixman made by Siarhei. I hope it would be possible to convert it and apply to latest pixman library version..
Just a question. Have you tried using 16bpp format for intermediate storage and internal images processing as well (patch from bug #386440)? In this case a lot of overhead and the need for functions with x8r8g8b8 source format will go away. Of course, if you have any problems/regressions with this patch, I would be interested to know about them.
romaxa, Siarhei: can you update your patches here to the latest version of pixman so we can get them pushed upstream?
Attachment #342544 - Attachment is patch: true
Attachment #342544 - Attachment mime type: application/octet-stream → text/plain
I'm not sure that we have time to do it right now, but as I remember from last conversation with jmuizelaar he was promising to help or probably merge it with upstream pixman...
I wasn't promising to merge it, I just wanted a public release so that parts of it could be upstreamed as needed. romaxa: any news on getting the correctness bugs in these patches fixed?
romaxa, any news here?
Component: General → GFX: Thebes
Product: Fennec → Core
QA Contact: general → thebes
Keywords: mobile
I've been investigating important fennec pageload slowdowns associated with chinese fonts and the problem turns out to be closely related to this bug. I've applied the patches and notices significant speed-ups, so if we could just refocus our interest on this bug (and on bug 386440), that'd be great !
I believe that the nearest neighbour scaling performance patches discussed here got into upstream pixman already. It was also tracked in bug 552142
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: