[Azure] Extremely large images are rendered garbled or not at all after scrolling

VERIFIED FIXED in mozilla15

Status

()

Core
Graphics
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: Steve Scott (pxbugz), Assigned: bas)

Tracking

Trunk
mozilla15
x86_64
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

5 years ago
Created attachment 602917 [details]
H6EMd.jpg

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120304 Firefox/13.0a1
Build ID: 20120304031031

Steps to reproduce:

-Set gfx.content.azure.enabled to true
-Encountered an extremely large (nearly 10,000px tall) 'comic strip' image on Reddit and viewed.


Actual results:

The viewable part of the image, after expanding to full size, renders correctly. Scrolling the image renders it all garbled.


Expected results:

Image should be rendered correctly. Issue not present without gfx.content.azure.enabled true.

Latest nightly tested Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120304 Firefox/13.0a1

Mozregression tested back to http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4795500b7c1d&tochange=8ae16e346bd0 when gfx.content.azure.enabled was first introduced so I guess I just haven't encountered a huge enough image until now.
(Reporter)

Updated

5 years ago
Blocks: 715768
(Assignee)

Updated

5 years ago
Assignee: nobody → bas.schouten
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Duplicate of this bug: 739569
(Assignee)

Comment 2

5 years ago
Created attachment 613285 [details] [diff] [review]
Part 1: Add image scaling code to Azure

This patch adds image scaling code to Azure, including a number of tests for the new code.
Attachment #613285 - Flags: review?(jmuizelaar)
(In reply to Bas Schouten (:bas) from comment #2)
> Created attachment 613285 [details] [diff] [review]
> Part 1: Add image scaling code to Azure
> 
> This patch adds image scaling code to Azure, including a number of tests for
> the new code.

Should this include the SSE2 code?
(Assignee)

Comment 4

5 years ago
Created attachment 613420 [details] [diff] [review]
Part 1: Add image scaling code to Azure v2

With SSE2 code this time.
Attachment #613285 - Attachment is obsolete: true
Attachment #613420 - Flags: review?(jmuizelaar)
Attachment #613285 - Flags: review?(jmuizelaar)
(Assignee)

Comment 5

5 years ago
Created attachment 613446 [details] [diff] [review]
Part 2: Use ImageScales for large images in Azure

This patch makes us use the image scaler for very large images in Azure. It also moves some stl::vectors to use plain new[] to avoid array initialization costs (which become quite high on these large images), and fixes 2 bugs in partial upload code. There's a couple of things this doesn't cover:

- DrawSurface uses its own code, it should use this too, it's only used by canvas though, so that issue is already in releases Fx versions without issues. It should be addressed soon though.
- Perhaps we should cache the downsampled surface for future drawing, on animated cases that should make a big performance difference.

It also has an interesting side-effect, it makes extremely large images look much better when scaled down than just very large images.
Attachment #613446 - Flags: review?(jmuizelaar)
Comment on attachment 613446 [details] [diff] [review]
Part 2: Use ImageScales for large images in Azure

Review of attachment 613446 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/Tools.h
@@ +75,5 @@
> +  return false;
> +}
> +
> +static inline Float
> +Dist2D(Point aA, Point aB)

I think you can just call this Distance().

@@ +78,5 @@
> +static inline Float
> +Dist2D(Point aA, Point aB)
> +{
> +  return sqrt((aB.x - aA.x) * (aB.x - aA.x) +
> +              (aB.y - aA.y) * (aB.y - aA.y));

Using hypot is more correct here. Also, it seems like on OS X this will use the double version of sqrt instead of the single precision one.
Attachment #613446 - Flags: review?(jmuizelaar) → review+
Comment on attachment 613420 [details] [diff] [review]
Part 1: Add image scaling code to Azure v2

Review of attachment 613420 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't reviewed everything yet, but here are some comments to start on.

::: gfx/2d/ImageScalingSSE2.cpp
@@ +79,5 @@
> + *            == ~((~(a + b)) >> 1)
> + *            == (a + b) >> 1
> + *            == g(a, b)
> + */
> +__forceinline __m128i avg_sse2_8x2(__m128i *a, __m128i *b, __m128i *c, __m128i *d)

Why do you pass these by pointer instead of by value?

__forceinline is msvc only. There's probably something in mfbt to help with this.

@@ +84,5 @@
> +{
> +#define shuf1 _MM_SHUFFLE(2, 0, 2, 0)
> +#define shuf2 _MM_SHUFFLE(3, 1, 3, 1)
> +
> +  __m128i t = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(*a), _mm_castsi128_ps(*b), shuf1));

add a wrapper function that does these casts maybe call it shuffle_si128

@@ +99,5 @@
> +
> +  __m128i carry = _mm_or_si128(_mm_and_si128(*a, *b), _mm_or_si128(_mm_and_si128(*a, *c), _mm_and_si128(*b, *c)));
> +
> +  __m128i minusone = _mm_set1_epi32(0xffffffff);
> +#define not(arg) _mm_xor_si128(arg, minusone)

just use an inline function instead of a macro

@@ +114,5 @@
> +#define not(arg) _mm_xor_si128(arg, minusone)
> +
> +  return not(_mm_avg_epu8(not(a), not(b)));
> +#undef not
> +}

These functions should be static inline

@@ +133,5 @@
> +inline __m128i loadUnaligned128(const __m128i *aSource)
> +{
> +  // Yes! We use uninitialized memory here, we'll overwrite it though!
> +  __m128 res = _mm_loadl_pi(_mm_set1_ps(0), (const __m64*)aSource);
> +  return _mm_castps_si128(_mm_loadh_pi(res, ((const __m64*)(aSource)) + 1));

Why not just use _mm_loadu_si128?

@@ +148,5 @@
> +  // underflow.
> +  sum = (((sum ^ d) & mask) >> 1) + (sum & d);
> +
> +  return (((sum ^ carry) & mask) >> 1) + (sum & carry);
> +}

It would be nice not to duplicate these functions.

@@ +169,5 @@
> +ImageHalfScaler::HalfImage2DSSE2(uint8_t *aSource, int32_t aSourceStride,
> +                                 const IntSize &aSourceSize, uint8_t *aDest,
> +                                 uint32_t aDestStride)
> +{
> +  for (int y = 0; y < aSourceSize.height; y += 2) {

What if aSourceSize.height = 1?

@@ +176,5 @@
> +    // Run a loop depending on alignment.
> +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> +      for (; x < (aSourceSize.width - 7); x += 8) {
> +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);

Seems like the type of these should __m128i*. Please fix all of them

@@ +177,5 @@
> +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> +      for (; x < (aSourceSize.width - 7); x += 8) {
> +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);
> +        uint8_t *lowerRow = aSource + ((y + 1) * aSourceStride + x * 4);

It might be worth renaming 4 to pixelSize. It took me a while to figure out how the 8 and 4 went together.

@@ +198,5 @@
> +        __m128i d = loadUnaligned128((__m128i*)lowerRow + 1);
> +
> +        *storage++ = avg_sse2_8x2(&a, &b, &c, &d);
> +      }
> +    } else if (!(uintptr_t(aSource + (y * aSourceStride)) % 16)) {

This condition is the same as above. Presumably you meant y+1? Please add this to the tests.

@@ +247,5 @@
> +
> +void
> +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> +                                const IntSize &aSourceSize, uint8_t *aDest,
> +                                uint32_t aDestStride)

I think we can afford to use Vertical instead of V
Attachment #613420 - Flags: review?(jmuizelaar) → review-
(Assignee)

Comment 8

5 years ago
Here's my replies, only on stuff where I had questions. The rest seems strictly improvements.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I haven't reviewed everything yet, but here are some comments to start on.
> 
> ::: gfx/2d/ImageScalingSSE2.cpp
> @@ +79,5 @@
> > + *            == ~((~(a + b)) >> 1)
> > + *            == (a + b) >> 1
> > + *            == g(a, b)
> > + */
> > +__forceinline __m128i avg_sse2_8x2(__m128i *a, __m128i *b, __m128i *c, __m128i *d)
> 
> Why do you pass these by pointer instead of by value?

Since MSVC's calling convention doesn't allow passing more than 3 SSE arguments through registers, it would mean passing one on the stack. MSVC does not allow passing 16-byte aligned structures on the stack. As such passing my pointer is the only option in MSVC.

> __forceinline is msvc only. There's probably something in mfbt to help with
> this.

MOZ_ALWAYS_INLINE, yes, sorry, an oversight on my part.
  
> @@ +133,5 @@
> > +inline __m128i loadUnaligned128(const __m128i *aSource)
> > +{
> > +  // Yes! We use uninitialized memory here, we'll overwrite it though!
> > +  __m128 res = _mm_loadl_pi(_mm_set1_ps(0), (const __m64*)aSource);
> > +  return _mm_castps_si128(_mm_loadh_pi(res, ((const __m64*)(aSource)) + 1));
> 
> Why not just use _mm_loadu_si128?

It's significantly slower according to anyone I've talked to. Bjacob's math libraries to the same thing.

> @@ +148,5 @@
> > +  // underflow.
> > +  sum = (((sum ^ d) & mask) >> 1) + (sum & d);
> > +
> > +  return (((sum ^ carry) & mask) >> 1) + (sum & carry);
> > +}
> 
> It would be nice not to duplicate these functions.

It would, sadly I'd need to shove them into headers if we wanted that, GCC for some reason does not allow using SSE2 intrinsics in a file which is not compiled with sse2 arch, which for GCC also means it will use SSE2 in all other code it compiles.

> @@ +169,5 @@
> > +ImageHalfScaler::HalfImage2DSSE2(uint8_t *aSource, int32_t aSourceStride,
> > +                                 const IntSize &aSourceSize, uint8_t *aDest,
> > +                                 uint32_t aDestStride)
> > +{
> > +  for (int y = 0; y < aSourceSize.height; y += 2) {
> 
> What if aSourceSize.height = 1?

Hrm, my intention was to never size down to below 4x4 (doesn't seem to make too much sense at that point). Seems I forgot those checks, will add.

> @@ +177,5 @@
> > +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> > +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> > +      for (; x < (aSourceSize.width - 7); x += 8) {
> > +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);
> > +        uint8_t *lowerRow = aSource + ((y + 1) * aSourceStride + x * 4);
> 
> It might be worth renaming 4 to pixelSize. It took me a while to figure out
> how the 8 and 4 went together.

Will do.
> @@ +247,5 @@
> > +
> > +void
> > +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> > +                                const IntSize &aSourceSize, uint8_t *aDest,
> > +                                uint32_t aDestStride)
> 
> I think we can afford to use Vertical instead of V

I assume you'd like the same for Horizontal then?
(In reply to Bas Schouten (:bas) from comment #8)
> > 
> > 
> > Why do you pass these by pointer instead of by value?
> 
> Since MSVC's calling convention doesn't allow passing more than 3 SSE
> arguments through registers, it would mean passing one on the stack. MSVC
> does not allow passing 16-byte aligned structures on the stack. As such
> passing my pointer is the only option in MSVC.

Hmm... You should definitely add a comment about this reason. Hopefully, most compilers are able to inline it all and untangle the loads and stores.

> > 
> > Why not just use _mm_loadu_si128?
> 
> It's significantly slower according to anyone I've talked to. Bjacob's math
> libraries to the same thing.

It looks like this has been fixed in Nehalem and newer. You should add a comment about that and later when a large precentage of our users are on Nehalem+ we can switch it.

> > @@ +247,5 @@
> > > +
> > > +void
> > > +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> > > +                                const IntSize &aSourceSize, uint8_t *aDest,
> > > +                                uint32_t aDestStride)
> > 
> > I think we can afford to use Vertical instead of V
> 
> I assume you'd like the same for Horizontal then?

Yes.
(Assignee)

Comment 10

5 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> (In reply to Bas Schouten (:bas) from comment #8)
> > > 
> > > 
> > > Why do you pass these by pointer instead of by value?
> > 
> > Since MSVC's calling convention doesn't allow passing more than 3 SSE
> > arguments through registers, it would mean passing one on the stack. MSVC
> > does not allow passing 16-byte aligned structures on the stack. As such
> > passing my pointer is the only option in MSVC.
> 
> Hmm... You should definitely add a comment about this reason. Hopefully,
> most compilers are able to inline it all and untangle the loads and stores.

I checked the generated assembly for MSVC 2010 -O2 and it was fine. It did require __forceinline, inline wasn't sufficient to get this inlined and resulted in a bunch of wasted loads/stores and a function call.

> 
> > > 
> > > Why not just use _mm_loadu_si128?
> > 
> > It's significantly slower according to anyone I've talked to. Bjacob's math
> > libraries to the same thing.
> 
> It looks like this has been fixed in Nehalem and newer. You should add a
> comment about that and later when a large precentage of our users are on
> Nehalem+ we can switch it.

Yes, that's what Derf said as well. I'll add a comment.
(Assignee)

Comment 11

5 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +84,5 @@
> > +{
> > +#define shuf1 _MM_SHUFFLE(2, 0, 2, 0)
> > +#define shuf2 _MM_SHUFFLE(3, 1, 3, 1)
> > +
> > +  __m128i t = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(*a), _mm_castsi128_ps(*b), shuf1));
> 
> add a wrapper function that does these casts maybe call it shuffle_si128

I tried this but it's tricky, as it wants __Imm to be a compile time constant. If I make this a wrapper function the compile cannot guarantee that when compiling the function. I'll make it a #define for now.
(Assignee)

Comment 12

5 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +198,5 @@
> > +        __m128i d = loadUnaligned128((__m128i*)lowerRow + 1);
> > +
> > +        *storage++ = avg_sse2_8x2(&a, &b, &c, &d);
> > +      }
> > +    } else if (!(uintptr_t(aSource + (y * aSourceStride)) % 16)) {
> 
> This condition is the same as above. Presumably you meant y+1? Please add
> this to the tests.

I can't add this to the tests as when !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16) in the current code this will just use the last clause of all unaligned loads, which works just fine but just performs worse.
(Assignee)

Comment 13

5 years ago
Created attachment 617318 [details] [diff] [review]
Part 1: Add image scaling code to Azure v3
Attachment #613420 - Attachment is obsolete: true
Attachment #617318 - Flags: review?(jmuizelaar)
Comment on attachment 617318 [details] [diff] [review]
Part 1: Add image scaling code to Azure v3

Review of attachment 617318 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/ImageScaling.cpp
@@ +107,5 @@
> +  }
> +
> +  delete [] mDataStorage;
> +  // Allocate 15 bytes extra to make sure we can get 16 byte alignment.
> +  mDataStorage = new uint8_t[internalSurfSize.height * mStride + 15];

Perhaps a reference to bug 751696

@@ +122,5 @@
> +
> +  IntSize currentSampledSize = mOrigSize;
> +  uint32_t currentSampledStride = mOrigStride;
> +  uint8_t *currentSampledData = mOrigData;
> +  

It would be nice to add a bit of documentation about how you're dealing with odd sizes. advantages/alternatives etc.

@@ +136,5 @@
> +    if (Factory::HasSSE2()) {
> +      HalfImage2DSSE2(currentSampledData, currentSampledStride, currentSampledSize,
> +                      mData, mStride);
> +    } else
> +#endif

Please move this #ifdefery out of this main function. For example add a HalfImage2D_C and have HalfImage2D call the appropriate function. Also, it might be worth adding an underscore to HalfImage2DSSE2 as case isn't enough to separate the words here.
Attachment #617318 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/mozilla-central/rev/b937d3d96745
https://hg.mozilla.org/mozilla-central/rev/f448869ad726
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Depends on: 753790
Depends on: 753772

Comment 16

5 years ago
I get build error:
gfx/2d/Tools.h(81) : C3861: 'hypotf': identifier not found
on VC 2010. Ideas?
surkov, bug 753790

Comment 18

5 years ago
(In reply to Joe Drew (:JOEDREW!) from comment #17)
> surkov, bug 753790

thank you

Updated

5 years ago
Status: RESOLVED → VERIFIED
(Assignee)

Updated

5 years ago
Duplicate of this bug: 753853
Depends on: 756010
(Assignee)

Updated

5 years ago
No longer blocks: 715768
(Assignee)

Updated

5 years ago
Blocks: 715768
You need to log in before you can comment on or make changes to this bug.