Image tiling is very slow

VERIFIED FIXED in mozilla0.9.5



17 years ago
17 years ago


(Reporter: Marc Attinasi, Assigned: Simon Fraser)


Mac System 9.x
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)


(Whiteboard: PDT+ [ETA 09.26])


(2 attachments, 2 obsolete attachments)



17 years ago
pavlov has indicated that the current tiling code on the Mac and Windows
platforms is slow because it is done in an inefficient XP way. So, this bug is
here to urge the development of faster tiling code where we need it.

Comment 1

17 years ago
Here is a quote from bug 73252:

------- Additional Comments From Stuart Parmenter 2001-06-08 12:56 -------

the tiling code on windows and mac is slow as shit (its done in an XP manor that
is really slow).  it should be fairly easy to make the windows and mac code impl
the same tiling apis as the unix code, which should make this speedup quite a bit.

I'm marking bug 72352 dependent on this one. Also, CC'ing dcone since he knows a
bit about image tiling too.
Blocks: 73252


17 years ago
No longer blocks: 73252


17 years ago
Target Milestone: --- → mozilla0.9.3


17 years ago
Blocks: 73252

Comment 2

17 years ago
Kevin McClusky says this should be Don Cone's bug - I guess he did an
optimization for this on Windows already. Don is already cc'd, but maybe he
should be the owner?

Comment 3

17 years ago
i'm changing some of the tiling code around.. we should probably wait to see 
until i land my changes


17 years ago
Blocks: 78690

Comment 4

17 years ago
The tiling code has changed on windows.. its not XP, but there is an XP 
backstop.  Mac never got this optimization.  Also.. the XP code is rather fast 
if you look what it does. The current XP code is over 100 times faster and over 
1000 times more efficient than Kips original code.  There are cases that windows 
has fixed on some extreme cases.  Mac is next for optimization, but is a low 

Comment 5

17 years ago
I'd like to urge us to make the Mac's tiling performance improvement a higher
priority. It is a large part of what makes the Mac appear so slow, since lots of
sites use tiled images for page backgroudns and (worse) in tables. Thanks for
hearing my plea.

Comment 6

17 years ago
A normal tiled background.. I believe is very fast.  The slow cases in the XP 
case are those tiles that have transparency like table backgrounds and are very 
small.  Those kind of tiles go thru the original Kip and Peter Linus code that 
does a tile at a time.  I believe this is no more that 5% of the cases.

Windows was optimized using the pattern blitters.. and I also did some code to 
binary double the mask for transparency which skips the tile at a time code.

For the Mac.. this binary doubling of the Mask also would would speed up that 5% 

I believe the correct thing to do.. would be to cache these binary doubled 
images and if they had them.. the mask.  So once the tile was made bigger.. it 
would never have to be binary doubled again.. this would make our backgrounds 
blit 4 times.. instead of the binary double 4 blit code.. or worst case just a 
few thousand blits of tiny tiles.  I did suggest this once.. but Troy and 
Micheal said it would use to much memory.  I guess we could open the issue 

Comment 7

17 years ago
-> 0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4

Comment 8

17 years ago
I am looking at image tiling code on both Windows and Mac and these things look
dubious to me.

1. When validWidth / validHeight are calculated the result will be smaller than
necessary in case if (mDecodedX1 > 0) AND (mDecodedX2 < mBHead->biHead). I mean
it should always be (mDecodedX2 - mDecodedX1) and not decreased anymore in case
(mDecodedX1 > 0)

2. When slow tiling is performed calling Draw () then aSx, aSy should be
(validX, validY) instead of (0, 0). For aDx, aDy it must be (x + validX, y +
validY) instead of (x, y). That could be done by adding (validX, validY) to the
 initial values of loop variables x, y.

3. First tvrect.SetRect () [right before slow tiling part] is not necessary
because there is exactly same line couple lines below.

4. Checks if (tileWidth < tvRect.width) looks very dubious because they expand
to if (aSXOffset > 0). Why in this case MAX_BUFFER_WIDTH is used. I guess
tvrect.width could be used in all cases. The same applies to the y axis, too.

These comments apply to Win32, Mac (partly) and other platforms that blindly
copied code from Win32 :)

Comment 9

17 years ago
There is problem in DrawTile () on Windows, Mac & OS/2 that image could be drawn
outside the rectangle which user passed to the function in case if slow tiling
is necessary and each copy of image is drawn with call to nsIImage::Draw ().
  To fix that caller (like layout\html\style\src\nsCSSRendering.cpp
PaintBackground () is forced to use nsIRenderingContext.PushState (),
SetClipRect () and PopState (). That is ugly! The better idea is to fix DrawTile
() to always stay in bounds which user specified.
  Fixed that on OS/2. Probably you are interested to look at the code. Look for
SlowTile () function which uses nsRect methods to do the clipping.

nsresult nsImageOS2::SlowTile (nsIRenderingContext& aContext, nsDrawingSurface
                               PRInt32 aSXOffset, PRInt32 aSYOffset, const
nsRect &aTileRect)
  nsRect ImageRect (0, 0, PR_MIN (mInfo->cx, mDecodedRect.width), PR_MIN
(mInfo->cy, mDecodedRect.height));

  for (PRInt32 y = aTileRect.y - aSYOffset + mDecodedRect.y ; y <
aTileRect.YMost () ; y += mInfo->cy)
    for (PRInt32 x = aTileRect.x - aSXOffset + mDecodedRect.x ; x <
aTileRect.XMost () ; x += mInfo->cx)
      nsRect CroppedImage;

      ImageRect.MoveTo (x, y);
      CroppedImage.IntersectRect (ImageRect, aTileRect);

      Draw (aContext, aSurface,
            CroppedImage.x - x, CroppedImage.y - y, CroppedImage.width,
            CroppedImage.x, CroppedImage.y, CroppedImage.width,

  return NS_OK;

If you need help I can do necessary changes on Win32.

Comment 10

17 years ago
Created attachment 46583 [details]
Sample code from OS/2 (nsImageOS2.cpp)


17 years ago
Target Milestone: mozilla0.9.4 → mozilla0.9.5


17 years ago
Blocks: 95652

Comment 11

17 years ago
I can vouch for small png transparency files used as backgrounds cause what
appears to be extreme slowness (windows98 & linux).

A 10px X 10px png file with transparency causes my p3 800 with geforce2 to
struggle.  You can watch mozilla slowly render it onscreen.  In the past I used
a 1px file but mozilla would just take toooooo long to render (almost would seem

Comment 12

17 years ago
We should degrade performance on the mac.  We are deploying 6.2 at Time and 25%
of their installed base is mac!  Marking nsenterprise+
Keywords: nsenterprise+

Comment 13

17 years ago
> We should degrade performance on the mac. 

I thought this was a MS-sponsored comment at first, then I realized that you
meant we should NOT degrad performance ;)


Comment 14

17 years ago
Assignee: pavlov → sfraser

Comment 15

17 years ago
*** Bug 95272 has been marked as a duplicate of this bug. ***

Comment 16

17 years ago
Created attachment 48237 [details] [diff] [review]
First cut at fix -- do smart tiling

Comment 17

17 years ago
Simon: you closed bug 95272 as dup but it was filed as a regression coming from 
the ImageLib landing (bug 78690).  The attached patch fixes GFX but maybe we 
still have something wrong in the new ImageLib.

Comment 18

17 years ago
Besides, bug 95272 was filed against the Mac but we need to check whether the 
regression exists on other platforms too.

Comment 19

17 years ago
Pierre: the imagelib2 landing turned off previous image tiling optimizations that 
we had in layout, pushing them into platform-specific code (so that they can take 
advantage of toolkit image tiling features). On Mac, we were left with a tiling 
algorithm that was just very slow. I think pavlov will confirm that we just need 
to fix the Mac gfx code to address this.

Comment 20

17 years ago
*** Bug 95272 has been marked as a duplicate of this bug. ***

Comment 21

17 years ago
This is a Mac bug. If image tiling is slow on Windows, that needs a different 

Comment 22

17 years ago
Created attachment 48371 [details] [diff] [review]
Work-in-progress patch


17 years ago
Attachment #48237 - Attachment is obsolete: true

Comment 23

17 years ago
Latest patch does smart tiling only when the tile involves more than 32 blits, 
since any less than this and the cost of allocating GWorlds outweighs the blit 
costs. There is a further optimization that can be done, which is to have the 
last round of tile doublings copy directly to the destination, rather than going 
into the temporary GWorld.

Note that this patch contains some instrumentation code that will be removed.

Comment 24

17 years ago
Created attachment 48659 [details] [diff] [review]
Final patch


17 years ago
Attachment #48371 - Attachment is obsolete: true

Comment 25

17 years ago
(Removed Windows commment from the summary; image tiling optimizations are 
implemneted on Windows.)

My final patch optimizes image tiling by using a doubling algorithm to minimize 
the number of CopyBits calls needed. Unlike the previous patch, it never 
allocates a GWorld bigger than the destination area. Also, it shortcuts to 
simpler tiling when only a few tiles need to be drawn (to avoid GWorld allocation 

The patch is ready for r (Pavlov), and sr (scc)
Summary: Image tiling is very slow (on Mac, and reportedly on Windows) → Image tiling is very slow

Comment 26

17 years ago
Can you please remove the tabs that are mixed in with the spaces?  Other than
that, the code looks good.  Glad to see you cleaned up some of the mess :-) 

Comment 27

17 years ago
Did this make 0.9.4?  If not and the risk is low we should check it into the
branch.  Let me know and I'll mark the bug appropriately.

Comment 28

17 years ago
This did not make it into the branch yet; it still needs super-review, but I 
believe it's fairly safe, and should go in.

Comment 29

17 years ago
Marking nsbranch+
Keywords: nsbranch+

Comment 30

17 years ago
Comment on attachment 48659 [details] [diff] [review]
Final patch

Cleaner and more effecient.  I like it.  The only thing I disagree with are the old-style casts for |BitMap*|.  In Mac only code, no reason not to say |static_cast<BitMap*>(expr)| as you've done with other types elsewhere.  If this is too verbose for your tastes, an appropriate typedef somewhere will allow you to say |BitMap_ptr(expr)| which has all the same good static checking of |static_cast|.  Is it reasonable to add an |NS_ASSERTION| to catch debug-attempts to use a mask of different dimensions than the image?  I don't care that you can't handle it, but it would be nice to notice if it ever happened in the controlled environment of a debug build.

You manage locking pixels with stack-based objects, I wonder why not manage the lifetime of GWorlds similarly?

The icing on the cake would be adding the measuring code to see if the final 4x tiling and big blit really are enough worse to warrant your suggested optimization.  Using a smaller initial |GWorld| would be more robust in low memory situations.

Fix the casts and you have my sr.  Consider adding the |NS_ASSERTION|, measuring code, and stack-based |GWorld| disposers (in that order---though a _very_ wide range---of importance) for the title of perfectionist :-) (That order because (1) catches a problem waiting to happen, (2) decides if something is a problem waiting to happen, and (3) is primarily a style issue in the absence of exceptions)

Comment 31

17 years ago
Hmmm.  Editing comments in the patch manager considered harmful.  See comment above.

Comment 32

17 years ago
Sorry for the line wrapping in my review, Simon.  See bug #97784 for the

Comment 33

17 years ago
I wonder one thing about the algorithm:  knowing that it's cheaper to copy rows
than columns, would it be cheaper overall to first double the pattern
horizontally until it covered the destination width (byte- or perhaps long-
rounded), then double vertically till it hit some reasonable height, then blit
large bands to the real destination.  Is this an algorithm that was considered
and rejected?

Comment 34

17 years ago
I did consider this algorithm, though I didn't give sufficient consideration to 
the advantages of larger horizontal blits that it affords (for the number of 
blits ends up being the same). However, I suspect that the difference in CopyBits 
speed attributable to blit aspect is far outweighed by the cost of allocating the 
GWorld in the first place. In other words, I'd expect it to be slightly, but not 
significantly faster. If you like, I'll implement and instrument.

Comment 35

17 years ago
You don't need it to get my sr, but if it's easy to code up a timing test, this
would be good information to have.

Comment 36

17 years ago
Whoa. Early results indicate that scaling up horizontally first, then vertically, 
is 5 times faster (for the scaling part). More testing to come...

Comment 37

17 years ago
Ignore that last comment; I was timing buggy code. It turns out that horizontal 
then vertical scaling is actually a tad slower than the kind of scaling I do now. 
I'll leave things as they are.

Comment 38

17 years ago
About the (BitMap*) casts; they can't be static_cast<BitMap*> because you can't 
safely cast from a PixMap* to a BitMap*. Calling CopyBits always requires a cast. 
I could use reinterpret_cast everywhere, but that seems verbose overkill for zero 
gain in type safely.

Comment 39

17 years ago
Patch checked in on the TRUNK.


17 years ago
Attachment #48659 - Flags: superreview+
Attachment #48659 - Flags: review+


17 years ago
Whiteboard: has r/sr

Comment 40

17 years ago
come to pdt tomorrow at noon to discuss.  Thanks!
Whiteboard: has r/sr

Comment 41

17 years ago
shouldn't this be changed to platform and os all, since windows is also 

Comment 42

17 years ago
No. If windows is affected, file a separate bug. I did look, and Windows does 
have some smart-tiling code, though I don't know how well it works.

Comment 43

17 years ago
check it into the branch today - PDT+.

Pls send email to Mac community to look for visual issues.
Whiteboard: PDT+

Comment 44

17 years ago
moving mac bugs off nsenterprise.
Keywords: nsenterprise+ → nsenterprise-


17 years ago
Whiteboard: PDT+ → PDT+ [ETA 09.26]

Comment 45

17 years ago
If I check this into the branch, we need the fix for 100700 as well.
Depends on: 100700

Comment 46

17 years ago
Landed on the 0.9.4 branch.
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 47

17 years ago
Verified fix checked into
You need to log in before you can comment on or make changes to this bug.