Closed Bug 650651 Opened 13 years ago Closed 6 years ago

Implement more efficient scaling YUV-to-RGB conversions for ARM (using NEON and SIMD)

Categories

(Core :: Audio/Video: Playback, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: derf, Unassigned)

References

Details

(Keywords: mobile, perf)

This is a continuation of bug 634557, which added a single NEON-accelerated scaling converter, which covered 4:2:0 and 4:2:2 bilinear conversions (with nearest neighbor chroma scaling) for a relatively small range of scale values. We still need additional routines for larger scale values (i.e., that use bilinear for chroma as well, as nearest neighbor starts to look terrible) and smaller scale values (the VTBL approach used in bug 634557 breaks down when the scale factor is less than 0.5), as well as nearest neighbor conversion for all formats and scale factors.

For upscaling to large scale factors, it may be better to do a staged conversion to a temporary row buffer (e.g., YUV to RGB24, then scale RGB24 to RGB565), as suggested by Siarhei. This saves considerable arithmetic during the conversion, since the YUV-to-RGB conversion only has to happen at the original resolution. This would require a 2-row, 3 bytes-per-pixel buffer, which should fit in cache for reasonable resolutions (which should be the common case, since we're upscaling). It may also make it easier to relax padding and alignment requirements by the underlying scaler, as the input can be fixed up during the YUV-to-RGB process. I'd recommend not packing the RGB24 pixels, as with a separate buffer for each component, a) you don't have to pack them after the YUV-to-RGB conversion, b) you don't have to store 4 bytes per pixel, nor do you have to use unaligned accesses, and c) you don't have to unpack them again during scaling. However, that means we can't share code with the normal unscaled conversion.
Depends on: 634557
Keywords: mobile, perf
changed the summary to include SIMD since a major part of our current install base is on devices without NEON support
Summary: Implement more efficient scaling YUV-to-RGB conversions for ARM (using NEON) → Implement more efficient scaling YUV-to-RGB conversions for ARM (using NEON and SIMD)
(In reply to Brad Lassey [:blassey] from comment #1)
> changed the summary to include SIMD since a major part of our current
> install base is on devices without NEON support

FWIW, we don't even have ARMv6 _non_-scaling YUV-to-RGB.
GL layers will obviate the need for this.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #3)
> GL layers will obviate the need for this.

How? By using YCbCr textures and letting the hardware do the conversion? When do we expect to have the GL stuff up and running?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #3)
> GL layers will obviate the need for this.

I don't think that's true assuming there are going to be driver issues
(In reply to Brad Lassey [:blassey] from comment #5)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #3)
> > GL layers will obviate the need for this.
> 
> I don't think that's true assuming there are going to be driver issues

It's true, this certainly wouldn't hurt especially if the native-fennec postpones the use of GL.
Ok, I don't think it's worth spending a lot of time on NEON routines if
we're going to replace them with OpenGL in 6-9 months. It's just not a
sensible use of time when there are other things to optimize. I don't
know what the GL-layers schedule looks like, however.

However, I'm not entirely convinced that the need for NEON will
completely go away. In particular:

   * Even if we're using GL, it might not support YCbCr texture formats,
     and then we need to convert anyway.
   * It's unlikely that the GPU can use the decoded frame directly. In
     that case, we still need to copy the frame to GPU-accessible
     memory, and if we're doing that we might as well convert at the
     same time as the memory overhead is the expensive bit when we have
     NEON doing the number crunching.

In both of those cases, scaling is not necessary, however, and that does
limit the amount of work that we need to do.

So, what are your thoughts? I currently have plans to write three
scalers for YV12, YV16 and YV24 (except that YV12 already has one
scaler, written by :derf). That's a down-scaler for really tiny scaling
factors, a scaler for nearly-correct-size (like we have now), and one
for large scaling factors. GPU scaling will probably be much faster than
NEON scaling so we should use that if we can. That just leaves
YCbCr-to-RGB converters for YV12 (which we already have), YV16 and YV24.
I don't think any other packing formats are supported in WebM and
Theora, but I might be wrong as I'm certainly not a video expert.

Non-scaling converters are easy to write so if we're using the GPU for
scaling, I can knock out some converters pretty quickly (if we don't
already have them somewhere in the code base).
Component: Audio/Video → Audio/Video: Playback
Mass closing do to inactivity.
Feel free to re-open if still needed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.