consider using SSE to speed up fetch_scanline_a8/store_scanline_a8




7 years ago
5 years ago


(Reporter: tnikkel, Unassigned)



Firefox Tracking Flags

(Not tracked)



(1 attachment)



7 years ago
While profiling for bug 579488 I found that fetch_scanline_a8 was the top function, taking over 10% of the time spent in Firefox. store_scanline_a8 took another 3.6%. 1.9% was spent in fetch_scanline_x8r8g8b8. The rest was under 1% or had sse2 in the name.

If the READ macro is the simple one in gfx/cairo/libpixman/src/pixman-accessor.h (is it? I'm not sure) then I would think these functions should be fairly easy to speed up with SSE.
would be intersting to know the full sequence of operations that takes place here, from fetch to store.  I think if you hit the generic fetch/store paths you're already going down a multi-step path, and there's probably a single-step function that could be written.

Comment 2

7 years ago
Created attachment 487533 [details]
simple testcase

Sorry for the delay.

Basically all you need is to set up a rounded rectangle clip and do some painting, even a simple color fill, aka drawing a div with border-radius and a background color.

I tried clipping to the regular rect, pushing a group, and then popping it, applying the rounded rect clip and then painting. That has a similar profile.

Comment 3

7 years ago
I observe a 3x slowdown in the testcase on Linux when going from no-border-radius to border-radius.

Comment 4

7 years ago
fetch_scanline_a8 is now SSE2 optimized:

Regarding the missing single-step function. I have a pixman branch (*) with the additional code which can help to identify such cases. The information is reported to syslog and then can be decoded into a human readable form using a script.

You need to log in before you can comment on or make changes to this bug.