Closed Bug 1266491 Opened 8 years ago Closed 8 years ago

Use an ssse3 scaler for video

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed
relnote-firefox --- 49+

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

(Keywords: feature, Whiteboard: [gfx-noted])

Attachments

(2 files, 7 obsolete files)

pixman has a decent ssse3 scaler. It also has the advantage of being separable. I'm going to try it out to see how much it helps.
Whiteboard: [gfx-noted]
Attached patch A hardly working patch (obsolete) — Splinter Review
I ran into a bunch of trouble when dealing with the edges. I ended up with some really crappy padding.
Some quick testing of suggest that this can cut the time it takes from 1.0 to 0.8. The distribution is 27% bilinear_cover and 14% fetch_horizontal.
Attachment #8744102 - Attachment is obsolete: true
This one has:
20.25% bilinear_cover
13.57% fetch_horizontal

Sotaro can you try this patch out, profile it and report how much time is being spent in the two ssse3 functions?
Flags: needinfo?(sotaro.ikeda.g)
attachment 8744102 [details] [diff] [review] was also necessary to apply attachment 8744518 [details] [diff] [review].
Flags: needinfo?(sotaro.ikeda.g)
I checked the performance with the following video with 720p on my laptop.
https://www.youtube.com/watch?v=e-ORhEE9VVg

Without the patch: full screen video was 15-16fps
With the patches: full screen video was 21-22fps

But with the patches, white line appear on to of display during full screen playback.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5)
> This one has:
> 20.25% bilinear_cover
> 13.57% fetch_horizontal
> 
> Sotaro can you try this patch out, profile it and report how much time is
> being spent in the two ssse3 functions?

It was like the following.
   bilinear_cover: 18.91%
   fetch_horizontal: 23.53%
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> But with the patches, white line appear on to of display during full screen
> playback.

Yeah, there are some known correctness issues with the patch.

What is the screen resolution on your laptop?
Flags: needinfo?(sotaro.ikeda.g)
My laptop's screen resolution is 2880*1620.
Flags: needinfo?(sotaro.ikeda.g)
Attached patch More correct and more fast (obsolete) — Splinter Review
This version fixes the white line problem and should be a little bit faster too.
Attachment #8744518 - Attachment is obsolete: true
This passes try and although quite ugly could probably start being reviewed.
Attachment #8746774 - Attachment is obsolete: true
We only use the SSSE3 scaler when this texture flag is set. This avoids using it for compositing things other than video.
Attachment #8748892 - Flags: feedback?(sotaro.ikeda.g)
Attachment #8748891 - Flags: feedback?(mstange)
Comment on attachment 8748892 [details] [diff] [review]
Add a RGB_FROM_YCBCR texture flag

Looks good. The rendering problem of Comment 7 was addressed with applying the patches.
Attachment #8748892 - Flags: feedback?(sotaro.ikeda.g) → feedback+
Attachment #8748892 - Flags: review?(nical.bugzilla)
Depends on: 1272358
Assignee: nobody → jmuizelaar
Comment on attachment 8748892 [details] [diff] [review]
Add a RGB_FROM_YCBCR texture flag

Review of attachment 8748892 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicCompositor.cpp
@@ +198,5 @@
>  already_AddRefed<DataTextureSource>
>  BasicCompositor::CreateDataTextureSource(TextureFlags aFlags)
>  {
> +  RefPtr<DataTextureSourceBasic> result = new DataTextureSourceBasic(nullptr);
> +  if (aFlags & TextureFlags::RGB_FROM_YCBCR)

nit: thou shalt brace your ifs!

::: gfx/layers/basic/TextureHostBasic.h
@@ +25,3 @@
>    virtual ~TextureSourceBasic() {}
>    virtual gfx::SourceSurface* GetSurface(gfx::DrawTarget* aTarget) = 0;
> +  bool FromYCBCR;

nit: I usually see members named mFoo or sometimes foo when they are public, but Foo is at odds with the rest of the code.
Attachment #8748892 - Flags: review?(nical.bugzilla) → review+
Attached patch SSSE3 scaler (obsolete) — Splinter Review
Attachment #8748891 - Attachment is obsolete: true
Attachment #8748891 - Flags: feedback?(mstange)
Attachment #8753978 - Flags: review?(mstange)
Attached patch SSSE3 scaler (obsolete) — Splinter Review
Attachment #8753978 - Attachment is obsolete: true
Attachment #8753978 - Flags: review?(mstange)
Attachment #8754003 - Flags: review?(mstange)
Attached patch SSSE3 scaler (obsolete) — Splinter Review
Attachment #8754003 - Attachment is obsolete: true
Attachment #8754003 - Flags: review?(mstange)
Attached patch SSSE3 scalerSplinter Review
Attachment #8754024 - Attachment is obsolete: true
Attachment #8756587 - Flags: review?(mstange)
Comment on attachment 8756587 [details] [diff] [review]
SSSE3 scaler

Review of attachment 8756587 [details] [diff] [review]:
-----------------------------------------------------------------

r+ assuming all the comments are addressed in one way or another.

::: gfx/2d/ssse3-scaler.c
@@ +345,5 @@
> +    // we use 15 instead of 16 because we need an extra bit to handle when the weights are 0 and 1
> +    dist_y <<= (15 - BILINEAR_INTERPOLATION_BITS);
> +
> +    vw = _mm_set_epi16 (
> +	dist_y, dist_y, dist_y, dist_y, dist_y, dist_y, dist_y, dist_y);

You could use _mm_set1_epi16(dist_y) here.

@@ +546,5 @@
> +    iter.x = 0;
> +    iter.y = 0;
> +    iter.width = width;
> +    iter.height = src_height;
> +    iter.buffer = dest + x + y * dest_stride;

Is x correct here or should it be x * 4?

::: gfx/2d/ssse3-scaler.h
@@ +11,5 @@
> +#endif
> +void ssse3_scale_data(uint32_t *src, int src_width, int src_height,
> +                int src_stride,
> +                uint32_t *dest, int dest_width, int dest_height,
> +                int dest_rowstride,

Let's rename the parameter to dest_stride here too.

::: gfx/layers/basic/BasicCompositor.cpp
@@ +279,5 @@
>      aMaskTransform.PostTranslate(-aOffset.x, -aOffset.y);
>    }
>  }
> +static
> +bool AttemptVideoScale(TextureSourceBasic* aSource, const SourceSurface* aSourceMask,

Gecko style wants the line break between "static bool" and the function name.

@@ +282,5 @@
> +static
> +bool AttemptVideoScale(TextureSourceBasic* aSource, const SourceSurface* aSourceMask,
> +                       gfx::Float aOpacity, CompositionOp aBlendMode,
> +                       const TexturedEffect* aTexturedEffect,
> +                       const Matrix &aNewTransform, const gfx::Rect& aRect,

const Matrix& aNewTransform

@@ +294,5 @@
> +  if (aNewTransform.HasNonAxisAlignedTransform() || aNewTransform.HasNegativeScaling())
> +      return false;
> +  if (aSourceMask || aOpacity != 1.0f)
> +      return false;
> +  if (aBlendMode != CompositionOp::OP_OVER || aBlendMode != CompositionOp::OP_SOURCE)

This || needs to be a &&.

@@ +298,5 @@
> +  if (aBlendMode != CompositionOp::OP_OVER || aBlendMode != CompositionOp::OP_SOURCE)
> +      return false;
> +
> +  IntRect dstRect;
> +  if (!Rect(aNewTransform * aRect.TopLeft(), aNewTransform * aRect.Size()).ToIntRect(&dstRect))

What does it mean to multiply a matrix with a size? Do you want aNewTransform.TransformBounds(aRect)?

@@ +312,5 @@
> +  int32_t dstStride;
> +  SurfaceFormat dstFormat;
> +  if (aDest->LockBits(&dstData, &dstSize, &dstStride, &dstFormat)) {
> +    if (aDest == aBuffer) {
> +      dstRect = dstRect.Intersect(aClipRect - aOffset);

Why do we not apply the clip if aDest != aBuffer? Does the regular drawing pass do something similar? We don't pass "buffer" to it.
If this is correct, it at least needs a comment.
Attachment #8756587 - Flags: review?(mstange) → review+
https://hg.mozilla.org/mozilla-central/rev/7c65a26181d9
https://hg.mozilla.org/mozilla-central/rev/9542a06550b6
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Depends on: 1276923
Shouldn't the call to ssse3_scale_data be wrapped in ifdefs to prevent non-Intel platforms from trying to resolve the symbol? I hit this problem today while building for Android. I think it was fixed when the if(true) was added and allowed the linker to no longer try and resolve the symbol.
(In reply to Randall Barker [:rbarker] from comment #23)
> Shouldn't the call to ssse3_scale_data be wrapped in ifdefs to prevent
> non-Intel platforms from trying to resolve the symbol? I hit this problem
> today while building for Android. I think it was fixed when the if(true) was
> added and allowed the linker to no longer try and resolve the symbol.

Probably yes.
I'm also getting a Fennec build error due to ssse3_scale_data when --disable-optimize is used.
Release Note Request (optional, but appreciated)
[Why is this notable]:
[Suggested wording]: Improve video performance on systems without hardware acceleration, that support SSSE3 instruction set.
[Links (documentation, blog post, etc)]:
relnote-firefox: --- → ?
49+ for release notes with wording "Improve video performance on systems without hardware acceleration that support SSSE3 instruction set"
(In reply to William Chen [:wchen] from comment #25)
> I'm also getting a Fennec build error due to ssse3_scale_data when
> --disable-optimize is used.

This is fixed in bug 1278723
Depends on: 1279972
Depends on: 1304999
Depends on: CVE-2018-12362
Regressions: 1629220
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: