86694 - Image tiling is very slow

Reporter

Description

•

23 years ago

pavlov has indicated that the current tiling code on the Mac and Windows
platforms is slow because it is done in an inefficient XP way. So, this bug is
here to urge the development of faster tiling code where we need it.

Marc Attinasi

Reporter

Comment 1

•

23 years ago

Here is a quote from bug 73252:

------- Additional Comments From Stuart Parmenter 2001-06-08 12:56 -------

the tiling code on windows and mac is slow as shit (its done in an XP manor that
is really slow).  it should be fairly easy to make the windows and mac code impl
the same tiling apis as the unix code, which should make this speedup quite a bit.


I'm marking bug 72352 dependent on this one. Also, CC'ing dcone since he knows a
bit about image tiling too.

Blocks: 73252

Marc Attinasi

Reporter

Updated

•

23 years ago

No longer blocks: 73252

Stuart Parmenter

Updated

•

23 years ago

Target Milestone: --- → mozilla0.9.3

Marc Attinasi

Reporter

Updated

•

23 years ago

Blocks: 73252

Marc Attinasi

Reporter

Comment 2

•

23 years ago

Kevin McClusky says this should be Don Cone's bug - I guess he did an
optimization for this on Windows already. Don is already cc'd, but maybe he
should be the owner?

Stuart Parmenter

Comment 3

•

23 years ago

i'm changing some of the tiling code around.. we should probably wait to see 
until i land my changes

Stuart Parmenter

Updated

•

23 years ago

Blocks: 78690

dcone (gone)

Comment 4

•

23 years ago

The tiling code has changed on windows.. its not XP, but there is an XP 
backstop.  Mac never got this optimization.  Also.. the XP code is rather fast 
if you look what it does. The current XP code is over 100 times faster and over 
1000 times more efficient than Kips original code.  There are cases that windows 
has fixed on some extreme cases.  Mac is next for optimization, but is a low 
priority.

Marc Attinasi

Reporter

Comment 5

•

23 years ago

I'd like to urge us to make the Mac's tiling performance improvement a higher
priority. It is a large part of what makes the Mac appear so slow, since lots of
sites use tiled images for page backgroudns and (worse) in tables. Thanks for
hearing my plea.

dcone (gone)

Comment 6

•

23 years ago

A normal tiled background.. I believe is very fast.  The slow cases in the XP 
case are those tiles that have transparency like table backgrounds and are very 
small.  Those kind of tiles go thru the original Kip and Peter Linus code that 
does a tile at a time.  I believe this is no more that 5% of the cases.

Windows was optimized using the pattern blitters.. and I also did some code to 
binary double the mask for transparency which skips the tile at a time code.

For the Mac.. this binary doubling of the Mask also would would speed up that 5% 
cases.

I believe the correct thing to do.. would be to cache these binary doubled 
images and if they had them.. the mask.  So once the tile was made bigger.. it 
would never have to be binary doubled again.. this would make our backgrounds 
blit 4 times.. instead of the binary double 4 blit code.. or worst case just a 
few thousand blits of tiny tiles.  I did suggest this once.. but Troy and 
Micheal said it would use to much memory.  I guess we could open the issue 
again.

Jason Kersey

Comment 7

•

23 years ago

-> 0.9.4

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Dainis Jonitis

Comment 8

•

23 years ago

I am looking at image tiling code on both Windows and Mac and these things look
dubious to me.

1. When validWidth / validHeight are calculated the result will be smaller than
necessary in case if (mDecodedX1 > 0) AND (mDecodedX2 < mBHead->biHead). I mean
it should always be (mDecodedX2 - mDecodedX1) and not decreased anymore in case
(mDecodedX1 > 0)

2. When slow tiling is performed calling Draw () then aSx, aSy should be
(validX, validY) instead of (0, 0). For aDx, aDy it must be (x + validX, y +
validY) instead of (x, y). That could be done by adding (validX, validY) to the
 initial values of loop variables x, y.

3. First tvrect.SetRect () [right before slow tiling part] is not necessary
because there is exactly same line couple lines below.

4. Checks if (tileWidth < tvRect.width) looks very dubious because they expand
to if (aSXOffset > 0). Why in this case MAX_BUFFER_WIDTH is used. I guess
tvrect.width could be used in all cases. The same applies to the y axis, too.


These comments apply to Win32, Mac (partly) and other platforms that blindly
copied code from Win32 :)

Dainis Jonitis

Comment 9

•

23 years ago

There is problem in DrawTile () on Windows, Mac & OS/2 that image could be drawn
outside the rectangle which user passed to the function in case if slow tiling
is necessary and each copy of image is drawn with call to nsIImage::Draw ().
  To fix that caller (like layout\html\style\src\nsCSSRendering.cpp
PaintBackground () is forced to use nsIRenderingContext.PushState (),
SetClipRect () and PopState (). That is ugly! The better idea is to fix DrawTile
() to always stay in bounds which user specified.
  Fixed that on OS/2. Probably you are interested to look at the code. Look for
SlowTile () function which uses nsRect methods to do the clipping.


nsresult nsImageOS2::SlowTile (nsIRenderingContext& aContext, nsDrawingSurface
aSurface, 
                               PRInt32 aSXOffset, PRInt32 aSYOffset, const
nsRect &aTileRect)
{
  nsRect ImageRect (0, 0, PR_MIN (mInfo->cx, mDecodedRect.width), PR_MIN
(mInfo->cy, mDecodedRect.height));

  for (PRInt32 y = aTileRect.y - aSYOffset + mDecodedRect.y ; y <
aTileRect.YMost () ; y += mInfo->cy)
    for (PRInt32 x = aTileRect.x - aSXOffset + mDecodedRect.x ; x <
aTileRect.XMost () ; x += mInfo->cx)
    {
      nsRect CroppedImage;

      ImageRect.MoveTo (x, y);
      CroppedImage.IntersectRect (ImageRect, aTileRect);

      Draw (aContext, aSurface,
            CroppedImage.x - x, CroppedImage.y - y, CroppedImage.width,
CroppedImage.height,
            CroppedImage.x, CroppedImage.y, CroppedImage.width,
CroppedImage.height);
    }

  return NS_OK;
}


If you need help I can do necessary changes on Win32.

Dainis Jonitis

Comment 10

•

23 years ago

Attached file Sample code from OS/2 (nsImageOS2.cpp) — Details

Stuart Parmenter

Updated

•

23 years ago

Target Milestone: mozilla0.9.4 → mozilla0.9.5

Cathleen

Updated

•

23 years ago

Blocks: 95652

James Lariviere

Comment 11

•

23 years ago

I can vouch for small png transparency files used as backgrounds cause what
appears to be extreme slowness (windows98 & linux).

A 10px X 10px png file with transparency causes my p3 800 with geforce2 to
struggle.  You can watch mozilla slowly render it onscreen.  In the past I used
a 1px file but mozilla would just take toooooo long to render (almost would seem
to FREEZE).

grega

Comment 12

•

23 years ago

We should degrade performance on the mac.  We are deploying 6.2 at Time and 25%
of their installed base is mac!  Marking nsenterprise+

Keywords: nsenterprise+

Marc Attinasi

Reporter

Comment 13

•

23 years ago

> We should degrade performance on the mac. 

I thought this was a MS-sponsored comment at first, then I realized that you
meant we should NOT degrad performance ;)

Simon Fraser [no longer active]

Assignee

Comment 14

•

23 years ago

Mine.

Assignee: pavlov → sfraser

Simon Fraser [no longer active]

Assignee

Comment 15

•

23 years ago

*** Bug 95272 has been marked as a duplicate of this bug. ***

Simon Fraser [no longer active]

Assignee

Comment 16

•

23 years ago

Attached patch First cut at fix -- do smart tiling (obsolete) — Details — Splinter Review

Pierre Saslawsky

Comment 17

•

23 years ago

Simon: you closed bug 95272 as dup but it was filed as a regression coming from 
the ImageLib landing (bug 78690).  The attached patch fixes GFX but maybe we 
still have something wrong in the new ImageLib.

Pierre Saslawsky

Comment 18

•

23 years ago

Besides, bug 95272 was filed against the Mac but we need to check whether the 
regression exists on other platforms too.

Simon Fraser [no longer active]

Assignee

Comment 19

•

23 years ago

Pierre: the imagelib2 landing turned off previous image tiling optimizations that 
we had in layout, pushing them into platform-specific code (so that they can take 
advantage of toolkit image tiling features). On Mac, we were left with a tiling 
algorithm that was just very slow. I think pavlov will confirm that we just need 
to fix the Mac gfx code to address this.

Stuart Parmenter

Comment 20

•

23 years ago

*** Bug 95272 has been marked as a duplicate of this bug. ***

Simon Fraser [no longer active]

Assignee

Comment 21

•

23 years ago

This is a Mac bug. If image tiling is slow on Windows, that needs a different 
bug.

Simon Fraser [no longer active]

Assignee

Comment 22

•

23 years ago

Attached patch Work-in-progress patch (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #48237 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 23

•

23 years ago

Latest patch does smart tiling only when the tile involves more than 32 blits, 
since any less than this and the cost of allocating GWorlds outweighs the blit 
costs. There is a further optimization that can be done, which is to have the 
last round of tile doublings copy directly to the destination, rather than going 
into the temporary GWorld.

Note that this patch contains some instrumentation code that will be removed.

Status: NEW → ASSIGNED

Simon Fraser [no longer active]

Assignee

Comment 24

•

23 years ago

Attached patch Final patch — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #48371 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 25

•

23 years ago

(Removed Windows commment from the summary; image tiling optimizations are 
implemneted on Windows.)

My final patch optimizes image tiling by using a doubling algorithm to minimize 
the number of CopyBits calls needed. Unlike the previous patch, it never 
allocates a GWorld bigger than the destination area. Also, it shortcuts to 
simpler tiling when only a few tiles need to be drawn (to avoid GWorld allocation 
overhead).

The patch is ready for r (Pavlov), and sr (scc)

Summary: Image tiling is very slow (on Mac, and reportedly on Windows) → Image tiling is very slow

Stuart Parmenter

Comment 26

•

23 years ago

Can you please remove the tabs that are mixed in with the spaces?  Other than
that, the code looks good.  Glad to see you cleaned up some of the mess :-) 
r=pavlov

grega

Comment 27

•

23 years ago

Did this make 0.9.4?  If not and the risk is low we should check it into the
branch.  Let me know and I'll mark the bug appropriately.

Simon Fraser [no longer active]

Assignee

Comment 28

•

23 years ago

This did not make it into the branch yet; it still needs super-review, but I 
believe it's fairly safe, and should go in.

Suresh Duddi (gone)

Comment 29

•

23 years ago

Marking nsbranch+

Keywords: nsbranch+

Scott Collins

Comment 30

•

23 years ago

Comment on attachment 48659 [details] [diff] [review]
Final patch

Cleaner and more effecient.  I like it.  The only thing I disagree with are the old-style casts for |BitMap*|.  In Mac only code, no reason not to say |static_cast<BitMap*>(expr)| as you've done with other types elsewhere.  If this is too verbose for your tastes, an appropriate typedef somewhere will allow you to say |BitMap_ptr(expr)| which has all the same good static checking of |static_cast|.  Is it reasonable to add an |NS_ASSERTION| to catch debug-attempts to use a mask of different dimensions than the image?  I don't care that you can't handle it, but it would be nice to notice if it ever happened in the controlled environment of a debug build.

You manage locking pixels with stack-based objects, I wonder why not manage the lifetime of GWorlds similarly?

The icing on the cake would be adding the measuring code to see if the final 4x tiling and big blit really are enough worse to warrant your suggested optimization.  Using a smaller initial |GWorld| would be more robust in low memory situations.

Fix the casts and you have my sr.  Consider adding the |NS_ASSERTION|, measuring code, and stack-based |GWorld| disposers (in that order---though a _very_ wide range---of importance) for the title of perfectionist :-) (That order because (1) catches a problem waiting to happen, (2) decides if something is a problem waiting to happen, and (3) is primarily a style issue in the absence of exceptions)

Scott Collins

Comment 31

•

23 years ago

Hmmm.  Editing comments in the patch manager considered harmful.  See comment above.

Scott Collins

Comment 32

•

23 years ago

Sorry for the line wrapping in my review, Simon.  See bug #97784 for the
explanation.

Scott Collins

Comment 33

•

23 years ago

I wonder one thing about the algorithm:  knowing that it's cheaper to copy rows
than columns, would it be cheaper overall to first double the pattern
horizontally until it covered the destination width (byte- or perhaps long-
rounded), then double vertically till it hit some reasonable height, then blit
large bands to the real destination.  Is this an algorithm that was considered
and rejected?

Simon Fraser [no longer active]

Assignee

Comment 34

•

23 years ago

I did consider this algorithm, though I didn't give sufficient consideration to 
the advantages of larger horizontal blits that it affords (for the number of 
blits ends up being the same). However, I suspect that the difference in CopyBits 
speed attributable to blit aspect is far outweighed by the cost of allocating the 
GWorld in the first place. In other words, I'd expect it to be slightly, but not 
significantly faster. If you like, I'll implement and instrument.

Scott Collins

Comment 35

•

23 years ago

You don't need it to get my sr, but if it's easy to code up a timing test, this
would be good information to have.

Simon Fraser [no longer active]

Assignee

Comment 36

•

23 years ago

Whoa. Early results indicate that scaling up horizontally first, then vertically, 
is 5 times faster (for the scaling part). More testing to come...

Simon Fraser [no longer active]

Assignee

Comment 37

•

23 years ago

Ignore that last comment; I was timing buggy code. It turns out that horizontal 
then vertical scaling is actually a tad slower than the kind of scaling I do now. 
I'll leave things as they are.

Simon Fraser [no longer active]

Assignee

Comment 38

•

23 years ago

About the (BitMap*) casts; they can't be static_cast<BitMap*> because you can't 
safely cast from a PixMap* to a BitMap*. Calling CopyBits always requires a cast. 
I could use reinterpret_cast everywhere, but that seems verbose overkill for zero 
gain in type safely.

Simon Fraser [no longer active]

Assignee

Comment 39

•

23 years ago

Patch checked in on the TRUNK.

Hong Kwon

Updated

•

23 years ago

Attachment #48659 - Flags: superreview+

Attachment #48659 - Flags: review+

Hong Kwon

Updated

•

23 years ago

Whiteboard: has r/sr

grega

Comment 40

•

23 years ago

come to pdt tomorrow at noon to discuss.  Thanks!

Whiteboard: has r/sr

Markus Hübner

Comment 41

•

23 years ago

shouldn't this be changed to platform and os all, since windows is also 
affected?

Simon Fraser [no longer active]

Assignee

Comment 42

•

23 years ago

No. If windows is affected, file a separate bug. I did look, and Windows does 
have some smart-tiling code, though I don't know how well it works.

Jaime Rodriguez, Jr.

Comment 43

•

23 years ago

check it into the branch today - PDT+.

Pls send email to Mac community to look for visual issues.

Whiteboard: PDT+

grega

Comment 44

•

23 years ago

moving mac bugs off nsenterprise.

Keywords: nsenterprise+ → nsenterprise-

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Whiteboard: PDT+ → PDT+ [ETA 09.26]

Simon Fraser [no longer active]

Assignee

Comment 45

•

23 years ago

If I check this into the branch, we need the fix for 100700 as well.

Depends on: 100700

Simon Fraser [no longer active]

Assignee

Comment 46

•

23 years ago

Landed on the 0.9.4 branch.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Terri Preston

Comment 47

•

23 years ago

Verified fix checked into lxr.mozilla.org

Status: RESOLVED → VERIFIED

Sample code from OS/2 (nsImageOS2.cpp) 23 years ago Dainis Jonitis 29.31 KB, text/plain		Details
First cut at fix -- do smart tiling 23 years ago Simon Fraser [no longer active] 10.29 KB, patch		Details \| Diff \| Splinter Review
Work-in-progress patch 23 years ago Simon Fraser [no longer active] 12.86 KB, patch		Details \| Diff \| Splinter Review
Final patch 23 years ago Simon Fraser [no longer active] 18.60 KB, patch	hong_bugmail : review+ hong_bugmail : superreview+	Details \| Diff \| Splinter Review