Last Comment Bug 732985 - [Azure] Extremely large images are rendered garbled or not at all after scrolling
: [Azure] Extremely large images are rendered garbled or not at all after scrol...
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86_64 Windows 7
: -- normal (vote)
: mozilla15
Assigned To: Bas Schouten (:bas.schouten)
:
Mentors:
: 739569 753853 (view as bug list)
Depends on: 753772 753790 756010
Blocks: 715768
  Show dependency treegraph
 
Reported: 2012-03-05 08:17 PST by Steve Scott (pxbugz)
Modified: 2012-05-20 13:41 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
H6EMd.jpg (446.65 KB, image/jpeg)
2012-03-05 08:17 PST, Steve Scott (pxbugz)
no flags Details
Part 1: Add image scaling code to Azure (33.31 KB, patch)
2012-04-09 06:55 PDT, Bas Schouten (:bas.schouten)
no flags Details | Diff | Review
Part 1: Add image scaling code to Azure v2 (47.24 KB, patch)
2012-04-09 15:40 PDT, Bas Schouten (:bas.schouten)
jmuizelaar: review-
Details | Diff | Review
Part 2: Use ImageScales for large images in Azure (8.14 KB, patch)
2012-04-09 17:20 PDT, Bas Schouten (:bas.schouten)
jmuizelaar: review+
Details | Diff | Review
Part 1: Add image scaling code to Azure v3 (48.04 KB, patch)
2012-04-22 08:41 PDT, Bas Schouten (:bas.schouten)
jmuizelaar: review+
Details | Diff | Review

Description Steve Scott (pxbugz) 2012-03-05 08:17:04 PST
Created attachment 602917 [details]
H6EMd.jpg

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120304 Firefox/13.0a1
Build ID: 20120304031031

Steps to reproduce:

-Set gfx.content.azure.enabled to true
-Encountered an extremely large (nearly 10,000px tall) 'comic strip' image on Reddit and viewed.


Actual results:

The viewable part of the image, after expanding to full size, renders correctly. Scrolling the image renders it all garbled.


Expected results:

Image should be rendered correctly. Issue not present without gfx.content.azure.enabled true.

Latest nightly tested Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120304 Firefox/13.0a1

Mozregression tested back to http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4795500b7c1d&tochange=8ae16e346bd0 when gfx.content.azure.enabled was first introduced so I guess I just haven't encountered a huge enough image until now.
Comment 1 Jim Jeffery not reading bug-mail 1/2/11 2012-03-30 08:24:25 PDT
*** Bug 739569 has been marked as a duplicate of this bug. ***
Comment 2 Bas Schouten (:bas.schouten) 2012-04-09 06:55:16 PDT
Created attachment 613285 [details] [diff] [review]
Part 1: Add image scaling code to Azure

This patch adds image scaling code to Azure, including a number of tests for the new code.
Comment 3 Jeff Muizelaar [:jrmuizel] 2012-04-09 11:42:41 PDT
(In reply to Bas Schouten (:bas) from comment #2)
> Created attachment 613285 [details] [diff] [review]
> Part 1: Add image scaling code to Azure
> 
> This patch adds image scaling code to Azure, including a number of tests for
> the new code.

Should this include the SSE2 code?
Comment 4 Bas Schouten (:bas.schouten) 2012-04-09 15:40:39 PDT
Created attachment 613420 [details] [diff] [review]
Part 1: Add image scaling code to Azure v2

With SSE2 code this time.
Comment 5 Bas Schouten (:bas.schouten) 2012-04-09 17:20:50 PDT
Created attachment 613446 [details] [diff] [review]
Part 2: Use ImageScales for large images in Azure

This patch makes us use the image scaler for very large images in Azure. It also moves some stl::vectors to use plain new[] to avoid array initialization costs (which become quite high on these large images), and fixes 2 bugs in partial upload code. There's a couple of things this doesn't cover:

- DrawSurface uses its own code, it should use this too, it's only used by canvas though, so that issue is already in releases Fx versions without issues. It should be addressed soon though.
- Perhaps we should cache the downsampled surface for future drawing, on animated cases that should make a big performance difference.

It also has an interesting side-effect, it makes extremely large images look much better when scaled down than just very large images.
Comment 6 Jeff Muizelaar [:jrmuizel] 2012-04-12 22:27:10 PDT
Comment on attachment 613446 [details] [diff] [review]
Part 2: Use ImageScales for large images in Azure

Review of attachment 613446 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/Tools.h
@@ +75,5 @@
> +  return false;
> +}
> +
> +static inline Float
> +Dist2D(Point aA, Point aB)

I think you can just call this Distance().

@@ +78,5 @@
> +static inline Float
> +Dist2D(Point aA, Point aB)
> +{
> +  return sqrt((aB.x - aA.x) * (aB.x - aA.x) +
> +              (aB.y - aA.y) * (aB.y - aA.y));

Using hypot is more correct here. Also, it seems like on OS X this will use the double version of sqrt instead of the single precision one.
Comment 7 Jeff Muizelaar [:jrmuizel] 2012-04-20 20:42:58 PDT
Comment on attachment 613420 [details] [diff] [review]
Part 1: Add image scaling code to Azure v2

Review of attachment 613420 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't reviewed everything yet, but here are some comments to start on.

::: gfx/2d/ImageScalingSSE2.cpp
@@ +79,5 @@
> + *            == ~((~(a + b)) >> 1)
> + *            == (a + b) >> 1
> + *            == g(a, b)
> + */
> +__forceinline __m128i avg_sse2_8x2(__m128i *a, __m128i *b, __m128i *c, __m128i *d)

Why do you pass these by pointer instead of by value?

__forceinline is msvc only. There's probably something in mfbt to help with this.

@@ +84,5 @@
> +{
> +#define shuf1 _MM_SHUFFLE(2, 0, 2, 0)
> +#define shuf2 _MM_SHUFFLE(3, 1, 3, 1)
> +
> +  __m128i t = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(*a), _mm_castsi128_ps(*b), shuf1));

add a wrapper function that does these casts maybe call it shuffle_si128

@@ +99,5 @@
> +
> +  __m128i carry = _mm_or_si128(_mm_and_si128(*a, *b), _mm_or_si128(_mm_and_si128(*a, *c), _mm_and_si128(*b, *c)));
> +
> +  __m128i minusone = _mm_set1_epi32(0xffffffff);
> +#define not(arg) _mm_xor_si128(arg, minusone)

just use an inline function instead of a macro

@@ +114,5 @@
> +#define not(arg) _mm_xor_si128(arg, minusone)
> +
> +  return not(_mm_avg_epu8(not(a), not(b)));
> +#undef not
> +}

These functions should be static inline

@@ +133,5 @@
> +inline __m128i loadUnaligned128(const __m128i *aSource)
> +{
> +  // Yes! We use uninitialized memory here, we'll overwrite it though!
> +  __m128 res = _mm_loadl_pi(_mm_set1_ps(0), (const __m64*)aSource);
> +  return _mm_castps_si128(_mm_loadh_pi(res, ((const __m64*)(aSource)) + 1));

Why not just use _mm_loadu_si128?

@@ +148,5 @@
> +  // underflow.
> +  sum = (((sum ^ d) & mask) >> 1) + (sum & d);
> +
> +  return (((sum ^ carry) & mask) >> 1) + (sum & carry);
> +}

It would be nice not to duplicate these functions.

@@ +169,5 @@
> +ImageHalfScaler::HalfImage2DSSE2(uint8_t *aSource, int32_t aSourceStride,
> +                                 const IntSize &aSourceSize, uint8_t *aDest,
> +                                 uint32_t aDestStride)
> +{
> +  for (int y = 0; y < aSourceSize.height; y += 2) {

What if aSourceSize.height = 1?

@@ +176,5 @@
> +    // Run a loop depending on alignment.
> +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> +      for (; x < (aSourceSize.width - 7); x += 8) {
> +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);

Seems like the type of these should __m128i*. Please fix all of them

@@ +177,5 @@
> +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> +      for (; x < (aSourceSize.width - 7); x += 8) {
> +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);
> +        uint8_t *lowerRow = aSource + ((y + 1) * aSourceStride + x * 4);

It might be worth renaming 4 to pixelSize. It took me a while to figure out how the 8 and 4 went together.

@@ +198,5 @@
> +        __m128i d = loadUnaligned128((__m128i*)lowerRow + 1);
> +
> +        *storage++ = avg_sse2_8x2(&a, &b, &c, &d);
> +      }
> +    } else if (!(uintptr_t(aSource + (y * aSourceStride)) % 16)) {

This condition is the same as above. Presumably you meant y+1? Please add this to the tests.

@@ +247,5 @@
> +
> +void
> +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> +                                const IntSize &aSourceSize, uint8_t *aDest,
> +                                uint32_t aDestStride)

I think we can afford to use Vertical instead of V
Comment 8 Bas Schouten (:bas.schouten) 2012-04-21 07:54:51 PDT
Here's my replies, only on stuff where I had questions. The rest seems strictly improvements.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I haven't reviewed everything yet, but here are some comments to start on.
> 
> ::: gfx/2d/ImageScalingSSE2.cpp
> @@ +79,5 @@
> > + *            == ~((~(a + b)) >> 1)
> > + *            == (a + b) >> 1
> > + *            == g(a, b)
> > + */
> > +__forceinline __m128i avg_sse2_8x2(__m128i *a, __m128i *b, __m128i *c, __m128i *d)
> 
> Why do you pass these by pointer instead of by value?

Since MSVC's calling convention doesn't allow passing more than 3 SSE arguments through registers, it would mean passing one on the stack. MSVC does not allow passing 16-byte aligned structures on the stack. As such passing my pointer is the only option in MSVC.

> __forceinline is msvc only. There's probably something in mfbt to help with
> this.

MOZ_ALWAYS_INLINE, yes, sorry, an oversight on my part.
  
> @@ +133,5 @@
> > +inline __m128i loadUnaligned128(const __m128i *aSource)
> > +{
> > +  // Yes! We use uninitialized memory here, we'll overwrite it though!
> > +  __m128 res = _mm_loadl_pi(_mm_set1_ps(0), (const __m64*)aSource);
> > +  return _mm_castps_si128(_mm_loadh_pi(res, ((const __m64*)(aSource)) + 1));
> 
> Why not just use _mm_loadu_si128?

It's significantly slower according to anyone I've talked to. Bjacob's math libraries to the same thing.

> @@ +148,5 @@
> > +  // underflow.
> > +  sum = (((sum ^ d) & mask) >> 1) + (sum & d);
> > +
> > +  return (((sum ^ carry) & mask) >> 1) + (sum & carry);
> > +}
> 
> It would be nice not to duplicate these functions.

It would, sadly I'd need to shove them into headers if we wanted that, GCC for some reason does not allow using SSE2 intrinsics in a file which is not compiled with sse2 arch, which for GCC also means it will use SSE2 in all other code it compiles.

> @@ +169,5 @@
> > +ImageHalfScaler::HalfImage2DSSE2(uint8_t *aSource, int32_t aSourceStride,
> > +                                 const IntSize &aSourceSize, uint8_t *aDest,
> > +                                 uint32_t aDestStride)
> > +{
> > +  for (int y = 0; y < aSourceSize.height; y += 2) {
> 
> What if aSourceSize.height = 1?

Hrm, my intention was to never size down to below 4x4 (doesn't seem to make too much sense at that point). Seems I forgot those checks, will add.

> @@ +177,5 @@
> > +    if (!(uintptr_t(aSource + (y * aSourceStride)) % 16) &&
> > +        !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16)) {
> > +      for (; x < (aSourceSize.width - 7); x += 8) {
> > +        uint8_t *upperRow = aSource + (y * aSourceStride + x * 4);
> > +        uint8_t *lowerRow = aSource + ((y + 1) * aSourceStride + x * 4);
> 
> It might be worth renaming 4 to pixelSize. It took me a while to figure out
> how the 8 and 4 went together.

Will do.
> @@ +247,5 @@
> > +
> > +void
> > +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> > +                                const IntSize &aSourceSize, uint8_t *aDest,
> > +                                uint32_t aDestStride)
> 
> I think we can afford to use Vertical instead of V

I assume you'd like the same for Horizontal then?
Comment 9 Jeff Muizelaar [:jrmuizel] 2012-04-21 13:02:00 PDT
(In reply to Bas Schouten (:bas) from comment #8)
> > 
> > 
> > Why do you pass these by pointer instead of by value?
> 
> Since MSVC's calling convention doesn't allow passing more than 3 SSE
> arguments through registers, it would mean passing one on the stack. MSVC
> does not allow passing 16-byte aligned structures on the stack. As such
> passing my pointer is the only option in MSVC.

Hmm... You should definitely add a comment about this reason. Hopefully, most compilers are able to inline it all and untangle the loads and stores.

> > 
> > Why not just use _mm_loadu_si128?
> 
> It's significantly slower according to anyone I've talked to. Bjacob's math
> libraries to the same thing.

It looks like this has been fixed in Nehalem and newer. You should add a comment about that and later when a large precentage of our users are on Nehalem+ we can switch it.

> > @@ +247,5 @@
> > > +
> > > +void
> > > +ImageHalfScaler::HalfImageVSSE2(uint8_t *aSource, int32_t aSourceStride,
> > > +                                const IntSize &aSourceSize, uint8_t *aDest,
> > > +                                uint32_t aDestStride)
> > 
> > I think we can afford to use Vertical instead of V
> 
> I assume you'd like the same for Horizontal then?

Yes.
Comment 10 Bas Schouten (:bas.schouten) 2012-04-21 18:48:31 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> (In reply to Bas Schouten (:bas) from comment #8)
> > > 
> > > 
> > > Why do you pass these by pointer instead of by value?
> > 
> > Since MSVC's calling convention doesn't allow passing more than 3 SSE
> > arguments through registers, it would mean passing one on the stack. MSVC
> > does not allow passing 16-byte aligned structures on the stack. As such
> > passing my pointer is the only option in MSVC.
> 
> Hmm... You should definitely add a comment about this reason. Hopefully,
> most compilers are able to inline it all and untangle the loads and stores.

I checked the generated assembly for MSVC 2010 -O2 and it was fine. It did require __forceinline, inline wasn't sufficient to get this inlined and resulted in a bunch of wasted loads/stores and a function call.

> 
> > > 
> > > Why not just use _mm_loadu_si128?
> > 
> > It's significantly slower according to anyone I've talked to. Bjacob's math
> > libraries to the same thing.
> 
> It looks like this has been fixed in Nehalem and newer. You should add a
> comment about that and later when a large precentage of our users are on
> Nehalem+ we can switch it.

Yes, that's what Derf said as well. I'll add a comment.
Comment 11 Bas Schouten (:bas.schouten) 2012-04-22 08:27:07 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +84,5 @@
> > +{
> > +#define shuf1 _MM_SHUFFLE(2, 0, 2, 0)
> > +#define shuf2 _MM_SHUFFLE(3, 1, 3, 1)
> > +
> > +  __m128i t = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(*a), _mm_castsi128_ps(*b), shuf1));
> 
> add a wrapper function that does these casts maybe call it shuffle_si128

I tried this but it's tricky, as it wants __Imm to be a compile time constant. If I make this a wrapper function the compile cannot guarantee that when compiling the function. I'll make it a #define for now.
Comment 12 Bas Schouten (:bas.schouten) 2012-04-22 08:39:09 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 613420 [details] [diff] [review]
> Part 1: Add image scaling code to Azure v2
> 
> Review of attachment 613420 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +198,5 @@
> > +        __m128i d = loadUnaligned128((__m128i*)lowerRow + 1);
> > +
> > +        *storage++ = avg_sse2_8x2(&a, &b, &c, &d);
> > +      }
> > +    } else if (!(uintptr_t(aSource + (y * aSourceStride)) % 16)) {
> 
> This condition is the same as above. Presumably you meant y+1? Please add
> this to the tests.

I can't add this to the tests as when !(uintptr_t(aSource + ((y + 1) * aSourceStride)) % 16) in the current code this will just use the last clause of all unaligned loads, which works just fine but just performs worse.
Comment 13 Bas Schouten (:bas.schouten) 2012-04-22 08:41:46 PDT
Created attachment 617318 [details] [diff] [review]
Part 1: Add image scaling code to Azure v3
Comment 14 Jeff Muizelaar [:jrmuizel] 2012-05-03 13:44:46 PDT
Comment on attachment 617318 [details] [diff] [review]
Part 1: Add image scaling code to Azure v3

Review of attachment 617318 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/ImageScaling.cpp
@@ +107,5 @@
> +  }
> +
> +  delete [] mDataStorage;
> +  // Allocate 15 bytes extra to make sure we can get 16 byte alignment.
> +  mDataStorage = new uint8_t[internalSurfSize.height * mStride + 15];

Perhaps a reference to bug 751696

@@ +122,5 @@
> +
> +  IntSize currentSampledSize = mOrigSize;
> +  uint32_t currentSampledStride = mOrigStride;
> +  uint8_t *currentSampledData = mOrigData;
> +  

It would be nice to add a bit of documentation about how you're dealing with odd sizes. advantages/alternatives etc.

@@ +136,5 @@
> +    if (Factory::HasSSE2()) {
> +      HalfImage2DSSE2(currentSampledData, currentSampledStride, currentSampledSize,
> +                      mData, mStride);
> +    } else
> +#endif

Please move this #ifdefery out of this main function. For example add a HalfImage2D_C and have HalfImage2D call the appropriate function. Also, it might be worth adding an underscore to HalfImage2DSSE2 as case isn't enough to separate the words here.
Comment 16 alexander :surkov 2012-05-11 06:58:49 PDT
I get build error:
gfx/2d/Tools.h(81) : C3861: 'hypotf': identifier not found
on VC 2010. Ideas?
Comment 17 Joe Drew (not getting mail) 2012-05-11 07:24:25 PDT
surkov, bug 753790
Comment 18 alexander :surkov 2012-05-11 07:31:58 PDT
(In reply to Joe Drew (:JOEDREW!) from comment #17)
> surkov, bug 753790

thank you
Comment 19 Bas Schouten (:bas.schouten) 2012-05-11 15:12:23 PDT
*** Bug 753853 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.