Open
Bug 459078
Opened 17 years ago
Updated 3 years ago
Improve image scaling performance.
Categories
(Core :: Graphics, defect)
Tracking
()
NEW
People
(Reporter: jrmuizel, Assigned: jrmuizel)
Details
(Keywords: mobile)
Attachments
(4 files)
4.35 KB,
patch
|
Details | Diff | Splinter Review | |
3.31 KB,
patch
|
Details | Diff | Splinter Review | |
4.02 KB,
patch
|
Details | Diff | Splinter Review | |
29.03 KB,
application/x-gzip
|
Details |
The image scaling that happens in the xserver is much slower than it needs to be.
Assignee | ||
Comment 1•17 years ago
|
||
This patch takes a much different approach than I originally planned. Instead of making fast code correct, this starts with correct code and will make it fast.
Currently it isn't that optimized beyond the general code, however it still provides a healthy performance improvement.
The patch is against upstream pixman, but I'll do a version against the nokia xserver next.
Assignee | ||
Comment 2•17 years ago
|
||
It looks about twice as fast as the existing code.
Assignee | ||
Comment 3•17 years ago
|
||
Here's a basically unrelated patch against the nokia xserver. It should make quite a difference on scaled image rendering. However, I've not compared any numbers.
Note: it forces the filtering to NEAREST because fennec is asking for BILINEAR. We should figure what it is we want to do here.
Also, it forces EXTEND_PAD to EXTEND_NONE, this isn't as problematic because the nokia xserver doesn't support EXTEND_PAD anyways.
So for scaling down, we are requesting bilinear, because we don't set filter to 0 in http://mxr.mozilla.org/mozilla-central/source/gfx/src/thebes/nsThebesImage.cpp#596 . I'll add a fix for that to my patch in 459150, to just always request nearest/no-pad with the mobile opt flag.
Hm, no, ctxHasNonTranslation should be TRUE there -- where does BILINEAR come from?
Assignee | ||
Comment 6•17 years ago
|
||
I'm not sure...
Comment on attachment 342334 [details] [diff] [review]
Make scaling faster with nokia's xserver
>+ /* we don't support RepeatPad so we might as well use the RepeatNone
>+ * fast paths for RepeatPad too */
>+ if (params->src->repeat == RepeatPad)
>+ params->src->repeat = RepeatNone;
Hmm, is this really needed? We should never hit it.
>
>+ if (params->op == PictOpSrc && params->src->format == PICT_x8r8g8b8 && params->dest->format == PICT_x8r8g8b8) {
>+ if (params->src->filter == PictFilterNearest && params->src->repeat == RepeatNone) {
>+ fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8(params,
>+ bits, stride, v.vector[0], v.vector[1], unit.vector[0], unit.vector[1]);
>+ return;
>+ }
>+ }
Any chance you can make this work with RepeatNormal as well? I'm seeing fbFetchTransformed as ~25% of my profile run, and some of those are with a repeating source. I'll apply this patch though and reprofile.
Or ignore that second part. With this patch, fbFetchTransformed goes from 25% -> 2.5%. The new profile inside Xomap is:
samples % symbol name
3741 12.2555 fbSolid
1643 5.3825 fbCompositeSolidMask_nx8x8888
1577 5.1663 fbOver
1211 3.9672 fbRasterizeEdges
962 3.1515 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8
915 2.9975 fbIn
890 2.9156 fbComposite
771 2.5258 fbFetchTransformed
666 2.1818 miComputeCompositeRegion
584 1.9132 cwComposite
517 1.6937 fbCompositeSrcAdd_8000x8000_armv6
458 1.5004 miGlyphs
436 1.4283 fbComposite_x8r8g8b8_src_r5g6b5_internal_armv6
431 1.4120 fbCombineMaskU
429 1.4054 miValidatePicture
411 1.3464 ReadRequestFromClient
409 1.3399 fbmemcpy_arm
395 1.2940 damageComposite
392 1.2842 miModifyPixmapHeader
384 1.2580 cwValidatePicture
373 1.2219 fbCombineAddU
The fbSolid mostly shows up with this stack; there's already some arm code in fbSolid (via memset_fbbits), but I don't know if this can be sped up any more.
#0 fbSolid (dst=0x3b6288, dstStride=1, dstX=0, bpp=8, width=16, height=22, and=0, xor=0)
at ../../fb/fbsolid.c:102
#1 0x00065cc0 in fbFill (pDrawable=0x3b6250, pGC=<value optimized out>, x=0, y=0, width=2,
height=22) at ../../fb/fbfill.c:57
#2 0x00065fcc in fbPolyFillRect (pDrawable=0x3b6250, pGC=0x17d560, nrect=0, prect=0xbe82451c)
at ../../fb/fbfillrect.c:79
#3 0x000f1050 in cwPolyFillRect (pDst=<value optimized out>, pGC=0x17d560, nrects=1,
pRects=0xbe82451c) at ../../../miext/cw/cw_ops.c:343
#4 0x000ebf90 in damagePolyFillRect (pDrawable=0x3b6250, pGC=0x17d560, nRects=1,
pRects=0xbe82451c) at ../../../miext/damage/damage.c:1285
#5 0x000d2980 in miCreateAlphaPicture (pScreen=0x1778f8, pDst=<value optimized out>,
pPictFormat=0x1788e0, width=2, height=22) at ../../render/mitrap.c:78
#6 0x000d2b84 in miTrapezoids (op=3 '\003', pSrc=0x38d840, pDst=0x3bed28, maskFormat=0x1788e0,
xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020) at ../../render/mitrap.c:164
#7 0x000f24ec in cwTrapezoids (op=3 '\003', pSrcPicture=<value optimized out>,
pDstPicture=0x3bed28, maskFormat=0x1788e0, xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020)
at ../../../miext/cw/cw_render.c:365
#8 0x000d3bb4 in CompositeTrapezoids (op=3 '\003', pSrc=0x38d840, pDst=0x3bed28,
maskFormat=0x1788e0, xSrc=195, ySrc=46, ntrap=1, traps=0x40dba020)
at ../../render/picture.c:1854
#9 0x000dc8fc in ProcRenderTrapezoids (client=0x315db0) at ../../render/render.c:820
#10 0x000d6a58 in ProcRenderDispatch (client=<value optimized out>) at ../../render/render.c:2002
#11 0x000b8bc8 in XaceCatchExtProc (client=0x315db0) at ../../Xext/xace.c:299
#12 0x00037c64 in Dispatch () at ../../dix/dispatch.c:503
371 1.2154 miSpriteSourceValidate
364 1.1925 __divsi3
359 1.1761 CompositePicture
346 1.1335 __aeabi_idivmod
342 1.1204 SecurityLookupIDByType
333 1.0909 XaceHook
322 1.0549 FreePicture
306 1.0025 Dispatch
292 0.9566 cwChangePicture
Assignee | ||
Comment 9•17 years ago
|
||
(In reply to comment #7)
> (From update of attachment 342334 [details] [diff] [review])
>
> >+ /* we don't support RepeatPad so we might as well use the RepeatNone
> >+ * fast paths for RepeatPad too */
> >+ if (params->src->repeat == RepeatPad)
> >+ params->src->repeat = RepeatNone;
>
> Hmm, is this really needed? We should never hit it.
I'm not sure I actually saw RepeatPad, but aren't we setting it in nsThebesImage?
Assignee | ||
Comment 10•17 years ago
|
||
A patch against the nokia xserver to add fbCompositeSolidMask_nx8x8888_armv6. I've only compile tested it, but I expect it to work.
Shouldn't be setting it; if we are, it'll be handled by the in-cairo pixman.. I'm pretty sure the xlib surface doesn't let PAD make it to Render currently.
fbCompositeSolidMask_nx8x8888_armv6 is showing up in profiles now, so it's definitely being called. Now the new profile looks like:
samples % symbol name
5504 17.3229 fbRasterizeEdges
4232 13.3195 fbSolid
2891 9.0989 fbCompositeSolidMask_nx8x8888_armv6
1045 3.2890 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8
986 3.1033 fbComposite
714 2.2472 miComputeCompositeRegion
618 1.9450 fbmemcpy_arm
606 1.9073 miGlyphs
549 1.7279 cwComposite
517 1.6272 fbCompositeSrcAdd_8000x8000_armv6
489 1.5390 miModifyPixmapHeader
474 1.4918 cwValidatePicture
418 1.3156 damageComposite
406 1.2778 fbCombineMaskU
394 1.2400 fbComposite_x8r8g8b8_src_r5g6b5_internal_armv6
With Xomap at 41.17%.
That's also with a different page than the original though, I'll check the original. The fbSolid calls now:
#0 fbSolid (dst=0x38b560, dstStride=50, dstX=0, bpp=8, width=1584, height=9, and=0, xor=0)
at ../../fb/fbsolid.c:102
#1 0x00065cc8 in fbFill (pDrawable=0x38b528, pGC=<value optimized out>, x=0, y=0, width=198,
height=9) at ../../fb/fbfill.c:57
#2 0x00065fd4 in fbPolyFillRect (pDrawable=0x38b528, pGC=0x17d560, nrect=0, prect=0xbed11dc8)
at ../../fb/fbfillrect.c:79
#3 0x000f1310 in cwPolyFillRect (pDst=<value optimized out>, pGC=0x17d560, nrects=1,
pRects=0xbed11dc8) at ../../../miext/cw/cw_ops.c:343
#4 0x000ec250 in damagePolyFillRect (pDrawable=0x38b528, pGC=0x17d560, nRects=1,
pRects=0xbed11dc8) at ../../../miext/damage/damage.c:1285
#5 0x000d06f0 in miGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0,
xSrc=112, ySrc=623, nlist=19, list=0xbed1237c, glyphs=0xbed11f7c) at ../../render/miglyph.c:158
#6 0x000f24bc in cwGlyphs (op=3 '\003', pSrcPicture=<value optimized out>, pDstPicture=0x31b100,
maskFormat=0x1788e0, xSrc=112, ySrc=623, nlists=19, lists=0xbed1237c, glyphs=0xbed11f7c)
at ../../../miext/cw/cw_render.c:305
#7 0x000ee750 in damageGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0,
xSrc=112, ySrc=623, nlist=19, list=0xbed1237c, glyphs=0xbed11f7c)
at ../../../miext/damage/damage.c:654
#8 0x000d3d60 in CompositeGlyphs (op=3 '\003', pSrc=0x2fa010, pDst=0x31b100, maskFormat=0x1788e0,
xSrc=112, ySrc=623, nlist=19, lists=0xbed1237c, glyphs=0xbed11f7c)
at ../../render/picture.c:1824
and much less frequently:
#0 fbSolid (dst=0x402dc000, dstStride=400, dstX=0, bpp=16, width=12800, height=480, and=0, xor=0)
at ../../fb/fbsolid.c:102
#1 0x0007477c in fbFillRegionSolid (pDrawable=<value optimized out>,
pRegion=<value optimized out>, and=0, xor=0) at ../../fb/fbwindow.c:242
#2 0x00074864 in fbPaintWindow (pWin=0x376588, pRegion=0x38b820, what=0)
at ../../fb/fbwindow.c:347
#3 0x000f0870 in cwPaintWindowBackground (pWin=0x376588, pRegion=0x38b820, what=985200)
at ../../../miext/cw/cw.c:461
#4 0x000ebc58 in damagePaintWindow (pWindow=0x376588, prgn=0x38b820, what=0)
at ../../../miext/damage/damage.c:1650
For the text case, I have to confess I don't know what it's doing. Clearing the mask that it'll use for rendering text?
Now on to fbRasterizeEdges.. it's all in this sequence:
#0 fbRasterizeEdges (buf=0x38c218, bpp=8, width=151, stride=38, l=0xbed12500, r=0xbed124d8,
t=50243, b=72089) at ../../fb/fbedge.c:301
#1 0x00073f04 in fbRasterizeTrapezoid (pPicture=<value optimized out>, trap=0x40f5f7dc,
x_off=-578, y_off=-93) at ../../fb/fbtrap.c:145
#2 0x000d2e78 in miTrapezoids (op=3 '\003', pSrc=0x319c48, pDst=0x31b100,
maskFormat=<value optimized out>, xSrc=579, ySrc=93, ntrap=1, traps=0x40f5f7dc)
at ../../render/mitrap.c:170
#3 0x000f27ac in cwTrapezoids (op=3 '\003', pSrcPicture=<value optimized out>,
pDstPicture=0x31b100, maskFormat=0x1788e0, xSrc=579, ySrc=93, ntrap=8, traps=0x40f5f6c4)
at ../../../miext/cw/cw_render.c:365
#4 0x000d3e74 in CompositeTrapezoids (op=3 '\003', pSrc=0x319c48, pDst=0x31b100,
maskFormat=0x1788e0, xSrc=579, ySrc=93, ntrap=8, traps=0x40f5f6c4)
at ../../render/picture.c:1854
What's interesting is that a bunch of the time when we get to miTrapezoids ntrap is 1, whereas going into cwTrapezoids it's 8.. I'm not sure if those numbers can be believed, since the trap pointers are different (and should be the same); I think that 8 is the correct number.
Here's cnn with the new patch:
samples % symbol name
5466 13.3454 fbSolid
3121 7.6200 fbCompositeSolidMask_nx8x8888_armv6
2410 5.8841 fbRasterizeEdges
1250 3.0519 fbComposite
1197 2.9225 fbCompositeTransformNearestNonrotatedAffineTrivialclip_x8r8g8b8_src_x8r8g8b8
1159 2.8297 miComputeCompositeRegion
780 1.9044 fbFetchTransformed
737 1.7994 miGlyphs
676 1.6505 fbCompositeSrcAdd_8000x8000_armv6
662 1.6163 SecurityLookupIDByType
645 1.5748 ReadRequestFromClient
635 1.5504 cwValidatePicture
620 1.5137 FreePicture
612 1.4942 cwComposite
Not sure if it helped or not, actually...
Comment 14•17 years ago
|
||
>The patch is against upstream pixman, but I'll do a version against the nokia
>xserver next.
I think there are no any reason to do double work, because we already have some optimization against nokia-xomap-pixman made by Siarhei.
I hope it would be possible to convert it and apply to latest pixman library version..
Comment 15•17 years ago
|
||
Just a question. Have you tried using 16bpp format for intermediate storage and internal images processing as well (patch from bug #386440)? In this case a lot of overhead and the need for functions with x8r8g8b8 source format will go away. Of course, if you have any problems/regressions with this patch, I would be interested to know about them.
Comment 16•17 years ago
|
||
romaxa, Siarhei: can you update your patches here to the latest version of pixman so we can get them pushed upstream?
Updated•17 years ago
|
Attachment #342544 -
Attachment is patch: true
Attachment #342544 -
Attachment mime type: application/octet-stream → text/plain
Comment 17•17 years ago
|
||
I'm not sure that we have time to do it right now, but as I remember from last conversation with jmuizelaar he was promising to help or probably merge it with upstream pixman...
Assignee | ||
Comment 18•17 years ago
|
||
I wasn't promising to merge it, I just wanted a public release so that parts of it could be upstreamed as needed.
romaxa: any news on getting the correctness bugs in these patches fixed?
Comment 19•17 years ago
|
||
romaxa, any news here?
Updated•17 years ago
|
Component: General → GFX: Thebes
Product: Fennec → Core
QA Contact: general → thebes
Comment 20•16 years ago
|
||
I've been investigating important fennec pageload slowdowns associated with chinese fonts and the problem turns out to be closely related to this bug. I've applied the patches and notices significant speed-ups, so if we could just refocus our interest on this bug (and on bug 386440), that'd be great !
Comment 21•15 years ago
|
||
I believe that the nearest neighbour scaling performance patches discussed here got into upstream pixman already. It was also tracked in bug 552142
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•