Open Bug 1601020 Opened 2 years ago Updated 2 years ago

Implement SSE2/AVX/NEON variations of ICCv4 transforms

Categories

(Core :: GFX: Color Management, enhancement, P3)

Desktop
All
enhancement

Tracking

()

ASSIGNED

People

(Reporter: aosmond, Assigned: aosmond)

References

Details

(Keywords: perf)

Our ICCv4 is still just native C (see qcms_transform_data_tetra_clut_template). I've written SSE2 accelerated variants, but the one that is faster requires us to merge the R/G/B clut tables such that it is float table[samples][4] (RGB all next to each other at the same samples index) instead of table[samples] * 3 (which requires 3 separate lookups/loads). It can likely be improved with AVX as well. However the importance of this work depends on bug 1555331, since there is no point if it is faster, if it is still oversaturated.

You need to log in before you can comment on or make changes to this bug.