bilinear scaling fast paths for SIMD

RESOLVED WONTFIX

Status

()

Core
Graphics
RESOLVED WONTFIX
7 years ago
4 years ago

People

(Reporter: blassey, Unassigned)

Tracking

({perf})

Trunk
x86_64
Linux
Points:
---

Firefox Tracking Flags

(fennec+)

Details

(Whiteboard: [ARMv6][mentor=snorp][ARM-opt])

bilinear scaling is slow on tegra 2 devices because of their lack of NEON support. We should be able to get approximately the same performance with SIMD as we do with NEON for these devices.

Alternatively we can look at Tegra 2 specific fast paths, but I think that would be a separate bug
(Reporter)

Updated

7 years ago
tracking-fennec: --- → ?
(In reply to Brad Lassey [:blassey] from comment #0)
> bilinear scaling is slow on tegra 2 devices because of their lack of NEON
> support. We should be able to get approximately the same performance with
> SIMD as we do with NEON for these devices.
> 
> Alternatively we can look at Tegra 2 specific fast paths, but I think that
> would be a separate bug

My patch on bug 692328 should help with this. Clang compiles it to reasonable SIMD code and I don't know if it's possible to do much better.
(Reporter)

Updated

7 years ago
tracking-fennec: ? → +
I don't think we need to track this now, since we've got the lower-quality interpolation for Tegra already. I'd be very happy to get better SIMD scaling, of course, but it's not super-critical any more.
tracking-fennec: + → -
Does SIMD help our armv6 story in general? only for tegra? not at all?
yes, SIMD would help ARMv6. renomming
tracking-fennec: - → ?
It's not clear that a hand coded version of this will beat what the compiler produces:

Here's the c code in question:

#include <stdint.h>


#define GREEN_MASK (((1 << 6) - 1) << 5)

static inline uint32_t
expand_rgb_565 (uint16_t c) {
        return ((c & GREEN_MASK) << 16) | (c & ~GREEN_MASK);
}

static inline uint16_t
compact_rgb_565 (uint32_t c) {
        return ((c >> 16) & GREEN_MASK) | (c & ~GREEN_MASK);
}

uint16_t
bilinear_interpolation_565(uint16_t tl, uint16_t tr,
                           uint16_t bl, uint16_t br,
                           int x, int y)
{
        int xy;
        uint32_t a00 = expand_rgb_565 (tl);
        uint32_t a01 = expand_rgb_565 (tr);
        uint32_t a10 = expand_rgb_565 (bl);
        uint32_t a11 = expand_rgb_565 (br);

        xy = (x * y) >> 3;
        return compact_rgb_565 ((a00 * (32 - 2*y - 2*x + xy) +
                                 a01 * (2*x - xy) +
                                 a10 * (2*y - xy) +
                                 a11 * xy) >> 5);
}

and the produced assembly:

        push    {r4, r5, r6, r7, r8}
        ldr     r5, [sp, #20]
        ldr     r6, [sp, #24]
        and     ip, r1, #2016
        and     r8, r2, #2016
        bic     r1, r1, #2016
        orr     r1, r1, ip, lsl #16
        bic     r2, r2, #2016
        mul     r4, r5, r6
        lsl     ip, r6, #1
        lsls    r7, r5, #1
        orr     r2, r2, r8, lsl #16
        rsb     r6, r6, #16
        subs    r5, r6, r5
        asrs    r4, r4, #3
        rsb     ip, r4, ip
        subs    r7, r7, r4
        mul     r2, r2, ip
        mla     r1, r1, r7, r2
        and     r2, r3, #2016
        bic     r3, r3, #2016
        orr     r3, r3, r2, lsl #16
        and     r2, r0, #2016
        bic     r0, r0, #2016
        orr     r0, r0, r2, lsl #16
        mla     r3, r4, r3, r1
        add     r4, r4, r5, lsl #1
        mla     r4, r0, r4, r3
        movw    r3, #:lower16:63519
        movt    r3, #:upper16:63519
        lsrs    r2, r4, #5
        lsrs    r0, r4, #21
        ands    r3, r3, r2
        and     r0, r0, #2016
        orrs    r0, r0, r3
        pop     {r4, r5, r6, r7, r8}
        bx      lr

I'd be happy to see someone do much better by hand.
jeff, joe any recommendation on where to go from here?
Kats should this track bug 792131
While it might be useful for ARMv6 performance in general it doesn't look like it will lead to memory reductions, AIUI, so I would say no to blocking 792131.
(Reporter)

Updated

6 years ago
tracking-fennec: ? → +
Keywords: perf
Whiteboard: [ARMv6]
(Reporter)

Updated

4 years ago
Whiteboard: [ARMv6] → [ARMv6][mentor=snorp]
(Reporter)

Updated

4 years ago
Whiteboard: [ARMv6][mentor=snorp] → [ARMv6][mentor=snorp][ARM-opt]
If Jeff says this isn't going to help, that's good enough for me.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.