Closed Bug 692328 Opened 14 years ago Closed 12 years ago

bilinear scaling fast paths for SIMD

Tracking

()

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

fennec

---

People

(Reporter: blassey, Unassigned)

Details

(Keywords: perf, Whiteboard: [ARMv6][mentor=snorp][ARM-opt])

Brad Lassey [:blassey] (use needinfo?)

Reporter

Description

•

14 years ago

bilinear scaling is slow on tegra 2 devices because of their lack of NEON support. We should be able to get approximately the same performance with SIMD as we do with NEON for these devices. Alternatively we can look at Tegra 2 specific fast paths, but I think that would be a separate bug

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

14 years ago

tracking-fennec: --- → ?

Jeff Muizelaar [:jrmuizel]

Comment 1

•

14 years ago

(In reply to Brad Lassey [:blassey] from comment #0) > bilinear scaling is slow on tegra 2 devices because of their lack of NEON > support. We should be able to get approximately the same performance with > SIMD as we do with NEON for these devices. > > Alternatively we can look at Tegra 2 specific fast paths, but I think that > would be a separate bug My patch on bug 692328 should help with this. Clang compiles it to reasonable SIMD code and I don't know if it's possible to do much better.

Jeff Muizelaar [:jrmuizel]

Comment 2

•

14 years ago

make that bug 689707

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

14 years ago

tracking-fennec: ? → +

Joe Drew (not getting mail)

Comment 3

•

14 years ago

I don't think we need to track this now, since we've got the lower-quality interpolation for Tegra already. I'd be very happy to get better SIMD scaling, of course, but it's not super-critical any more.

tracking-fennec: + → -

Mark Finkle (:mfinkle) (use needinfo?)

Comment 4

•

14 years ago

Does SIMD help our armv6 story in general? only for tegra? not at all?

Brad Lassey [:blassey] (use needinfo?)

Reporter

Comment 5

•

14 years ago

yes, SIMD would help ARMv6. renomming

tracking-fennec: - → ?

Jeff Muizelaar [:jrmuizel]

Comment 6

•

14 years ago

It's not clear that a hand coded version of this will beat what the compiler produces: Here's the c code in question: #include <stdint.h> #define GREEN_MASK (((1 << 6) - 1) << 5) static inline uint32_t expand_rgb_565 (uint16_t c) { return ((c & GREEN_MASK) << 16) | (c & ~GREEN_MASK); } static inline uint16_t compact_rgb_565 (uint32_t c) { return ((c >> 16) & GREEN_MASK) | (c & ~GREEN_MASK); } uint16_t bilinear_interpolation_565(uint16_t tl, uint16_t tr, uint16_t bl, uint16_t br, int x, int y) { int xy; uint32_t a00 = expand_rgb_565 (tl); uint32_t a01 = expand_rgb_565 (tr); uint32_t a10 = expand_rgb_565 (bl); uint32_t a11 = expand_rgb_565 (br); xy = (x * y) >> 3; return compact_rgb_565 ((a00 * (32 - 2*y - 2*x + xy) + a01 * (2*x - xy) + a10 * (2*y - xy) + a11 * xy) >> 5); } and the produced assembly: push {r4, r5, r6, r7, r8} ldr r5, [sp, #20] ldr r6, [sp, #24] and ip, r1, #2016 and r8, r2, #2016 bic r1, r1, #2016 orr r1, r1, ip, lsl #16 bic r2, r2, #2016 mul r4, r5, r6 lsl ip, r6, #1 lsls r7, r5, #1 orr r2, r2, r8, lsl #16 rsb r6, r6, #16 subs r5, r6, r5 asrs r4, r4, #3 rsb ip, r4, ip subs r7, r7, r4 mul r2, r2, ip mla r1, r1, r7, r2 and r2, r3, #2016 bic r3, r3, #2016 orr r3, r3, r2, lsl #16 and r2, r0, #2016 bic r0, r0, #2016 orr r0, r0, r2, lsl #16 mla r3, r4, r3, r1 add r4, r4, r5, lsl #1 mla r4, r0, r4, r3 movw r3, #:lower16:63519 movt r3, #:upper16:63519 lsrs r2, r4, #5 lsrs r0, r4, #21 ands r3, r3, r2 and r0, r0, #2016 orrs r0, r0, r3 pop {r4, r5, r6, r7, r8} bx lr I'd be happy to see someone do much better by hand.

Brad Lassey [:blassey] (use needinfo?)

Reporter

Comment 7

•

13 years ago

jeff, joe any recommendation on where to go from here?

Kevin Brosnan [Ex-Mozilla]

Comment 8

•

13 years ago

Kats should this track bug 792131

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 9

•

13 years ago

While it might be useful for ARMv6 performance in general it doesn't look like it will lead to memory reductions, AIUI, so I would say no to blocking 792131.

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

13 years ago

tracking-fennec: ? → +

Keywords: perf

Whiteboard: [ARMv6]

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

12 years ago

Whiteboard: [ARMv6] → [ARMv6][mentor=snorp]

Brad Lassey [:blassey] (use needinfo?)

Reporter

Updated

•

12 years ago

Whiteboard: [ARMv6][mentor=snorp] → [ARMv6][mentor=snorp][ARM-opt]

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 10

•

12 years ago

If Jeff says this isn't going to help, that's good enough for me.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

bilinear scaling fast paths for SIMD

Categories

(Core :: Graphics, defect)

Tracking

()

People

(Reporter: blassey, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [ARMv6][mentor=snorp][ARM-opt])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Comment 10