Closed Bug 86694 Opened 23 years ago Closed 23 years ago

Image tiling is very slow

Categories

(Core :: Graphics: ImageLib, defect)

PowerPC
Mac System 9.x
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: attinasi, Assigned: sfraser_bugs)

References

Details

(Whiteboard: PDT+ [ETA 09.26])

Attachments

(2 files, 2 obsolete files)

pavlov has indicated that the current tiling code on the Mac and Windows
platforms is slow because it is done in an inefficient XP way. So, this bug is
here to urge the development of faster tiling code where we need it.
Here is a quote from bug 73252:

------- Additional Comments From Stuart Parmenter 2001-06-08 12:56 -------

the tiling code on windows and mac is slow as shit (its done in an XP manor that
is really slow).  it should be fairly easy to make the windows and mac code impl
the same tiling apis as the unix code, which should make this speedup quite a bit.


I'm marking bug 72352 dependent on this one. Also, CC'ing dcone since he knows a
bit about image tiling too.
Blocks: 73252
No longer blocks: 73252
Target Milestone: --- → mozilla0.9.3
Blocks: 73252
Kevin McClusky says this should be Don Cone's bug - I guess he did an
optimization for this on Windows already. Don is already cc'd, but maybe he
should be the owner?
i'm changing some of the tiling code around.. we should probably wait to see 
until i land my changes
Blocks: 78690
The tiling code has changed on windows.. its not XP, but there is an XP 
backstop.  Mac never got this optimization.  Also.. the XP code is rather fast 
if you look what it does. The current XP code is over 100 times faster and over 
1000 times more efficient than Kips original code.  There are cases that windows 
has fixed on some extreme cases.  Mac is next for optimization, but is a low 
priority.
I'd like to urge us to make the Mac's tiling performance improvement a higher
priority. It is a large part of what makes the Mac appear so slow, since lots of
sites use tiled images for page backgroudns and (worse) in tables. Thanks for
hearing my plea.
A normal tiled background.. I believe is very fast.  The slow cases in the XP 
case are those tiles that have transparency like table backgrounds and are very 
small.  Those kind of tiles go thru the original Kip and Peter Linus code that 
does a tile at a time.  I believe this is no more that 5% of the cases.

Windows was optimized using the pattern blitters.. and I also did some code to 
binary double the mask for transparency which skips the tile at a time code.

For the Mac.. this binary doubling of the Mask also would would speed up that 5% 
cases.

I believe the correct thing to do.. would be to cache these binary doubled 
images and if they had them.. the mask.  So once the tile was made bigger.. it 
would never have to be binary doubled again.. this would make our backgrounds 
blit 4 times.. instead of the binary double 4 blit code.. or worst case just a 
few thousand blits of tiny tiles.  I did suggest this once.. but Troy and 
Micheal said it would use to much memory.  I guess we could open the issue 
again.
-> 0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I am looking at image tiling code on both Windows and Mac and these things look
dubious to me.

1. When validWidth / validHeight are calculated the result will be smaller than
necessary in case if (mDecodedX1 > 0) AND (mDecodedX2 < mBHead->biHead). I mean
it should always be (mDecodedX2 - mDecodedX1) and not decreased anymore in case
(mDecodedX1 > 0)

2. When slow tiling is performed calling Draw () then aSx, aSy should be
(validX, validY) instead of (0, 0). For aDx, aDy it must be (x + validX, y +
validY) instead of (x, y). That could be done by adding (validX, validY) to the
 initial values of loop variables x, y.

3. First tvrect.SetRect () [right before slow tiling part] is not necessary
because there is exactly same line couple lines below.

4. Checks if (tileWidth < tvRect.width) looks very dubious because they expand
to if (aSXOffset > 0). Why in this case MAX_BUFFER_WIDTH is used. I guess
tvrect.width could be used in all cases. The same applies to the y axis, too.


These comments apply to Win32, Mac (partly) and other platforms that blindly
copied code from Win32 :)
There is problem in DrawTile () on Windows, Mac & OS/2 that image could be drawn
outside the rectangle which user passed to the function in case if slow tiling
is necessary and each copy of image is drawn with call to nsIImage::Draw ().
  To fix that caller (like layout\html\style\src\nsCSSRendering.cpp
PaintBackground () is forced to use nsIRenderingContext.PushState (),
SetClipRect () and PopState (). That is ugly! The better idea is to fix DrawTile
() to always stay in bounds which user specified.
  Fixed that on OS/2. Probably you are interested to look at the code. Look for
SlowTile () function which uses nsRect methods to do the clipping.


nsresult nsImageOS2::SlowTile (nsIRenderingContext& aContext, nsDrawingSurface
aSurface, 
                               PRInt32 aSXOffset, PRInt32 aSYOffset, const
nsRect &aTileRect)
{
  nsRect ImageRect (0, 0, PR_MIN (mInfo->cx, mDecodedRect.width), PR_MIN
(mInfo->cy, mDecodedRect.height));

  for (PRInt32 y = aTileRect.y - aSYOffset + mDecodedRect.y ; y <
aTileRect.YMost () ; y += mInfo->cy)
    for (PRInt32 x = aTileRect.x - aSXOffset + mDecodedRect.x ; x <
aTileRect.XMost () ; x += mInfo->cx)
    {
      nsRect CroppedImage;

      ImageRect.MoveTo (x, y);
      CroppedImage.IntersectRect (ImageRect, aTileRect);

      Draw (aContext, aSurface,
            CroppedImage.x - x, CroppedImage.y - y, CroppedImage.width,
CroppedImage.height,
            CroppedImage.x, CroppedImage.y, CroppedImage.width,
CroppedImage.height);
    }

  return NS_OK;
}


If you need help I can do necessary changes on Win32.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Blocks: 95652
I can vouch for small png transparency files used as backgrounds cause what
appears to be extreme slowness (windows98 & linux).

A 10px X 10px png file with transparency causes my p3 800 with geforce2 to
struggle.  You can watch mozilla slowly render it onscreen.  In the past I used
a 1px file but mozilla would just take toooooo long to render (almost would seem
to FREEZE).
We should degrade performance on the mac.  We are deploying 6.2 at Time and 25%
of their installed base is mac!  Marking nsenterprise+
Keywords: nsenterprise+
> We should degrade performance on the mac. 

I thought this was a MS-sponsored comment at first, then I realized that you
meant we should NOT degrad performance ;)

Mine.
Assignee: pavlov → sfraser
*** Bug 95272 has been marked as a duplicate of this bug. ***
Simon: you closed bug 95272 as dup but it was filed as a regression coming from 
the ImageLib landing (bug 78690).  The attached patch fixes GFX but maybe we 
still have something wrong in the new ImageLib.
Besides, bug 95272 was filed against the Mac but we need to check whether the 
regression exists on other platforms too.
Pierre: the imagelib2 landing turned off previous image tiling optimizations that 
we had in layout, pushing them into platform-specific code (so that they can take 
advantage of toolkit image tiling features). On Mac, we were left with a tiling 
algorithm that was just very slow. I think pavlov will confirm that we just need 
to fix the Mac gfx code to address this.
*** Bug 95272 has been marked as a duplicate of this bug. ***
This is a Mac bug. If image tiling is slow on Windows, that needs a different 
bug.
Attached patch Work-in-progress patch (obsolete) — Splinter Review
Attachment #48237 - Attachment is obsolete: true
Latest patch does smart tiling only when the tile involves more than 32 blits, 
since any less than this and the cost of allocating GWorlds outweighs the blit 
costs. There is a further optimization that can be done, which is to have the 
last round of tile doublings copy directly to the destination, rather than going 
into the temporary GWorld.

Note that this patch contains some instrumentation code that will be removed.
Status: NEW → ASSIGNED
Attached patch Final patchSplinter Review
Attachment #48371 - Attachment is obsolete: true
(Removed Windows commment from the summary; image tiling optimizations are 
implemneted on Windows.)

My final patch optimizes image tiling by using a doubling algorithm to minimize 
the number of CopyBits calls needed. Unlike the previous patch, it never 
allocates a GWorld bigger than the destination area. Also, it shortcuts to 
simpler tiling when only a few tiles need to be drawn (to avoid GWorld allocation 
overhead).

The patch is ready for r (Pavlov), and sr (scc)
Summary: Image tiling is very slow (on Mac, and reportedly on Windows) → Image tiling is very slow
Can you please remove the tabs that are mixed in with the spaces?  Other than
that, the code looks good.  Glad to see you cleaned up some of the mess :-) 
r=pavlov
Did this make 0.9.4?  If not and the risk is low we should check it into the
branch.  Let me know and I'll mark the bug appropriately.
This did not make it into the branch yet; it still needs super-review, but I 
believe it's fairly safe, and should go in.
Marking nsbranch+
Keywords: nsbranch+
Comment on attachment 48659 [details] [diff] [review]
Final patch

Cleaner and more effecient.  I like it.  The only thing I disagree with are the old-style casts for |BitMap*|.  In Mac only code, no reason not to say |static_cast<BitMap*>(expr)| as you've done with other types elsewhere.  If this is too verbose for your tastes, an appropriate typedef somewhere will allow you to say |BitMap_ptr(expr)| which has all the same good static checking of |static_cast|.  Is it reasonable to add an |NS_ASSERTION| to catch debug-attempts to use a mask of different dimensions than the image?  I don't care that you can't handle it, but it would be nice to notice if it ever happened in the controlled environment of a debug build.

You manage locking pixels with stack-based objects, I wonder why not manage the lifetime of GWorlds similarly?

The icing on the cake would be adding the measuring code to see if the final 4x tiling and big blit really are enough worse to warrant your suggested optimization.  Using a smaller initial |GWorld| would be more robust in low memory situations.

Fix the casts and you have my sr.  Consider adding the |NS_ASSERTION|, measuring code, and stack-based |GWorld| disposers (in that order---though a _very_ wide range---of importance) for the title of perfectionist :-) (That order because (1) catches a problem waiting to happen, (2) decides if something is a problem waiting to happen, and (3) is primarily a style issue in the absence of exceptions)
Hmmm.  Editing comments in the patch manager considered harmful.  See comment above.
Sorry for the line wrapping in my review, Simon.  See bug #97784 for the
explanation.
I wonder one thing about the algorithm:  knowing that it's cheaper to copy rows
than columns, would it be cheaper overall to first double the pattern
horizontally until it covered the destination width (byte- or perhaps long-
rounded), then double vertically till it hit some reasonable height, then blit
large bands to the real destination.  Is this an algorithm that was considered
and rejected?
I did consider this algorithm, though I didn't give sufficient consideration to 
the advantages of larger horizontal blits that it affords (for the number of 
blits ends up being the same). However, I suspect that the difference in CopyBits 
speed attributable to blit aspect is far outweighed by the cost of allocating the 
GWorld in the first place. In other words, I'd expect it to be slightly, but not 
significantly faster. If you like, I'll implement and instrument.
You don't need it to get my sr, but if it's easy to code up a timing test, this
would be good information to have.
Whoa. Early results indicate that scaling up horizontally first, then vertically, 
is 5 times faster (for the scaling part). More testing to come...
Ignore that last comment; I was timing buggy code. It turns out that horizontal 
then vertical scaling is actually a tad slower than the kind of scaling I do now. 
I'll leave things as they are.
About the (BitMap*) casts; they can't be static_cast<BitMap*> because you can't 
safely cast from a PixMap* to a BitMap*. Calling CopyBits always requires a cast. 
I could use reinterpret_cast everywhere, but that seems verbose overkill for zero 
gain in type safely.
Patch checked in on the TRUNK.
Attachment #48659 - Flags: superreview+
Attachment #48659 - Flags: review+
Whiteboard: has r/sr
come to pdt tomorrow at noon to discuss.  Thanks!
Whiteboard: has r/sr
shouldn't this be changed to platform and os all, since windows is also 
affected?
No. If windows is affected, file a separate bug. I did look, and Windows does 
have some smart-tiling code, though I don't know how well it works.
check it into the branch today - PDT+.

Pls send email to Mac community to look for visual issues.
Whiteboard: PDT+
moving mac bugs off nsenterprise.
Whiteboard: PDT+ → PDT+ [ETA 09.26]
If I check this into the branch, we need the fix for 100700 as well.
Depends on: 100700
Landed on the 0.9.4 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Verified fix checked into lxr.mozilla.org
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: