While profiling for bug 579488 I found that fetch_scanline_a8 was the top function, taking over 10% of the time spent in Firefox. store_scanline_a8 took another 3.6%. 1.9% was spent in fetch_scanline_x8r8g8b8. The rest was under 1% or had sse2 in the name. If the READ macro is the simple one in gfx/cairo/libpixman/src/pixman-accessor.h (is it? I'm not sure) then I would think these functions should be fairly easy to speed up with SSE.
would be intersting to know the full sequence of operations that takes place here, from fetch to store. I think if you hit the generic fetch/store paths you're already going down a multi-step path, and there's probably a single-step function that could be written.
Created attachment 487533 [details] simple testcase Sorry for the delay. Basically all you need is to set up a rounded rectangle clip and do some painting, even a simple color fill, aka drawing a div with border-radius and a background color. I tried clipping to the regular rect, pushing a group, and then popping it, applying the rounded rect clip and then painting. That has a similar profile.
I observe a 3x slowdown in the testcase on Linux when going from no-border-radius to border-radius.
fetch_scanline_a8 is now SSE2 optimized: http://cgit.freedesktop.org/pixman/commit/?id=08e855f15cba24aac83145b994069d0bb50be5a1 Regarding the missing single-step function. I have a pixman branch (*) with the additional code which can help to identify such cases. The information is reported to syslog and then can be decoded into a human readable form using a script. * http://cgit.freedesktop.org/~siamashka/pixman/log/?h=playground/slow-path-reporter
You need to log in before you can comment on or make changes to this bug.