Implement bilinear scaling with NEON

RESOLVED FIXED

Status

()

Core
Graphics
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: cjones, Assigned: Siarhei Siamashka)

Tracking

unspecified
ARM
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(fennec2.0next+)

Details

I haven't profiled the difference myself, but Siarhei says

bug 598736 comment 5
(In reply to comment #4)
> Does GOOD filter already works anywhere fast enough?

Currently it's approximately 10-30 times slower than NEAREST scaling for
pixman-0.19.4 on ARM Cortex-A8:
[snip]
Even with full NEON optimizations added, I expect that BILINEAR is still going
to be about 2-4x slower than NEAREST. But indeed, NEON optimizations for
bilinear scaling would be very nice to have.


2-4x slower may or may not be fast enough for pinch-zoom, but I think we'd use that for content rendering regardless.

Siarhei, do you happen to have patches for this floating around?  If not, I can look at this this week.
Blocks a blocker.
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0+
Although this will be somewhat of a RISC-y project, if we put our Cortexes together, we should be able to StrongARM it through.
(In reply to comment #2)
> Although this will be somewhat of a RISC-y project, if we put our Cortexes
> together, we should be able to StrongARM it through.

r-.
(Assignee)

Comment 4

7 years ago
(In reply to comment #0)
> Siarhei, do you happen to have patches for this floating around?

Not yet, but I will provide more details a little bit later.

> If not, I can look at this this week.

That's the spirit! Thanks.
Can someone take ownership of this bug?
Assignee: nobody → jmuizelaar
Assignee: jmuizelaar → siarhei.siamashka

Comment 6

7 years ago
Would love to get this in if it is fast enough to use, but not blocking on it at this point
tracking-fennec: 2.0+ → 2.0next+
(Assignee)

Comment 7

7 years ago
NEON optimizations for bilinear scaling are coming through upstream pixman, so eventually they should also reach Mozilla.
(In reply to comment #7)
> NEON optimizations for bilinear scaling are coming through upstream pixman, so
> eventually they should also reach Mozilla.

Cool! Do you know the commits offhand?
(Assignee)

Comment 9

7 years ago
(In reply to comment #8)
> Cool! Do you know the commits offhand?

There are no commits yet. I'm working on a proper patchset right now and expect to finish it in a few days (so that it's fast enough and passes all the tests). At least I think getting maximum performance for SRC operator and PAD repeat should be the bare minimum. There is also some interest in having fast bilinear scaling from webkit side in the cairo mailing list: http://lists.cairographics.org/archives/cairo/2011-February/021645.html
(Assignee)

Comment 10

7 years ago
Sent pixman patches with bilinear scaling optimizations here: http://lists.freedesktop.org/archives/pixman/2011-February/001053.html

Even though NONE repeat is a major PITA to implement, it is also partially supported after all. Additional patch for scaling r5g6b5 images with the help of ARM NEON will be available in a few days. Maybe some other variants of scaling operations can be optimized too.

Thanks a lot for making the decision that SIMD optimizations for bilinear scaling could actually have some use in Mozilla. This allowed me to get some time allocated for working on this task. And actually these patches should have been ready by the beginning of the previous week, but I just dropped out and could not do much productive work lately due to certain circumstances.
(Assignee)

Comment 11

7 years ago
With the following patchset ready, everything that was originally planned is now implemented: http://lists.freedesktop.org/archives/pixman/2011-March/001119.html

The current performance numbers on 1GHz ARM Cortex-A8 are more like:
nearest scaling a8r8g8b8:  163.12 MPix/s
nearest scaling r5g6b5:    267.50 MPix/s
bilinear scaling a8r8g8b8: 74.36 MPix/s
bilinear scaling r5g6b5:   41.35 MPix/s

Nearest scaling was also optimized recently. So in the end, bilinear scaling is roughly 2x slower than nearest for 32bpp format and more than 6x slower than nearest for 16bpp. Some additional optimizations for bilinear scaling are still possible though (in the ballpark of a few tens percents). Compared to the old C code in pixman, NEON bilinear scaling got approximately 10x faster on ARM.

There is also SSE2 bilinear scaling optimization too (mostly proof of concept), but it's not really highly optimized and only provides ~2x speedup over C implementation. If anybody wants to invest some efforts in SSE2/SSSE3 bilinear scaling optimizations for pixman, there is some really good potential there.

Hopefully all these optimizations will be included in pixman 0.21.8 release.

There is still some more bilinear work to do in pixman, mostly to get NONE repeat fully optimized (EXTEND_NONE in cairo terms). But I really hope that firefox/fennec can switch to using EXTEND_PAD instead whenever it is possible (bug 600390 and bug 630114). Also more bilinear fast paths can be added on case by case basis (most likely those using OVER operator). But as I said, this particular bug is basically done.
(Assignee)

Comment 12

7 years ago
Fixed via bug 640250
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Depends on: 640250
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.