Open
Bug 1601020
Opened 5 years ago
Updated 1 year ago
Implement SSE2/AVX/NEON variations of ICCv4 transforms
Categories
(Core :: Graphics: Color Management, enhancement, P3)
Tracking
()
ASSIGNED
People
(Reporter: aosmond, Assigned: aosmond)
References
Details
(Keywords: perf)
Our ICCv4 is still just native C (see qcms_transform_data_tetra_clut_template). I've written SSE2 accelerated variants, but the one that is faster requires us to merge the R/G/B clut tables such that it is float table[samples][4] (RGB all next to each other at the same samples index) instead of table[samples] * 3 (which requires 3 separate lookups/loads). It can likely be improved with AVX as well. However the importance of this work depends on bug 1555331, since there is no point if it is faster, if it is still oversaturated.
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•