Closed
Bug 86694
Opened 23 years ago
Closed 23 years ago
Image tiling is very slow
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
VERIFIED
FIXED
mozilla0.9.5
People
(Reporter: attinasi, Assigned: sfraser_bugs)
References
Details
(Whiteboard: PDT+ [ETA 09.26])
Attachments
(2 files, 2 obsolete files)
29.31 KB,
text/plain
|
Details | |
18.60 KB,
patch
|
hong_bugmail
:
review+
hong_bugmail
:
superreview+
|
Details | Diff | Splinter Review |
pavlov has indicated that the current tiling code on the Mac and Windows platforms is slow because it is done in an inefficient XP way. So, this bug is here to urge the development of faster tiling code where we need it.
Reporter | ||
Comment 1•23 years ago
|
||
Here is a quote from bug 73252: ------- Additional Comments From Stuart Parmenter 2001-06-08 12:56 ------- the tiling code on windows and mac is slow as shit (its done in an XP manor that is really slow). it should be fairly easy to make the windows and mac code impl the same tiling apis as the unix code, which should make this speedup quite a bit. I'm marking bug 72352 dependent on this one. Also, CC'ing dcone since he knows a bit about image tiling too.
Blocks: 73252
Updated•23 years ago
|
Target Milestone: --- → mozilla0.9.3
Reporter | ||
Comment 2•23 years ago
|
||
Kevin McClusky says this should be Don Cone's bug - I guess he did an optimization for this on Windows already. Don is already cc'd, but maybe he should be the owner?
Comment 3•23 years ago
|
||
i'm changing some of the tiling code around.. we should probably wait to see until i land my changes
Comment 4•23 years ago
|
||
The tiling code has changed on windows.. its not XP, but there is an XP backstop. Mac never got this optimization. Also.. the XP code is rather fast if you look what it does. The current XP code is over 100 times faster and over 1000 times more efficient than Kips original code. There are cases that windows has fixed on some extreme cases. Mac is next for optimization, but is a low priority.
Reporter | ||
Comment 5•23 years ago
|
||
I'd like to urge us to make the Mac's tiling performance improvement a higher priority. It is a large part of what makes the Mac appear so slow, since lots of sites use tiled images for page backgroudns and (worse) in tables. Thanks for hearing my plea.
Comment 6•23 years ago
|
||
A normal tiled background.. I believe is very fast. The slow cases in the XP case are those tiles that have transparency like table backgrounds and are very small. Those kind of tiles go thru the original Kip and Peter Linus code that does a tile at a time. I believe this is no more that 5% of the cases. Windows was optimized using the pattern blitters.. and I also did some code to binary double the mask for transparency which skips the tile at a time code. For the Mac.. this binary doubling of the Mask also would would speed up that 5% cases. I believe the correct thing to do.. would be to cache these binary doubled images and if they had them.. the mask. So once the tile was made bigger.. it would never have to be binary doubled again.. this would make our backgrounds blit 4 times.. instead of the binary double 4 blit code.. or worst case just a few thousand blits of tiny tiles. I did suggest this once.. but Troy and Micheal said it would use to much memory. I guess we could open the issue again.
Comment 8•23 years ago
|
||
I am looking at image tiling code on both Windows and Mac and these things look dubious to me. 1. When validWidth / validHeight are calculated the result will be smaller than necessary in case if (mDecodedX1 > 0) AND (mDecodedX2 < mBHead->biHead). I mean it should always be (mDecodedX2 - mDecodedX1) and not decreased anymore in case (mDecodedX1 > 0) 2. When slow tiling is performed calling Draw () then aSx, aSy should be (validX, validY) instead of (0, 0). For aDx, aDy it must be (x + validX, y + validY) instead of (x, y). That could be done by adding (validX, validY) to the initial values of loop variables x, y. 3. First tvrect.SetRect () [right before slow tiling part] is not necessary because there is exactly same line couple lines below. 4. Checks if (tileWidth < tvRect.width) looks very dubious because they expand to if (aSXOffset > 0). Why in this case MAX_BUFFER_WIDTH is used. I guess tvrect.width could be used in all cases. The same applies to the y axis, too. These comments apply to Win32, Mac (partly) and other platforms that blindly copied code from Win32 :)
Comment 9•23 years ago
|
||
There is problem in DrawTile () on Windows, Mac & OS/2 that image could be drawn outside the rectangle which user passed to the function in case if slow tiling is necessary and each copy of image is drawn with call to nsIImage::Draw (). To fix that caller (like layout\html\style\src\nsCSSRendering.cpp PaintBackground () is forced to use nsIRenderingContext.PushState (), SetClipRect () and PopState (). That is ugly! The better idea is to fix DrawTile () to always stay in bounds which user specified. Fixed that on OS/2. Probably you are interested to look at the code. Look for SlowTile () function which uses nsRect methods to do the clipping. nsresult nsImageOS2::SlowTile (nsIRenderingContext& aContext, nsDrawingSurface aSurface, PRInt32 aSXOffset, PRInt32 aSYOffset, const nsRect &aTileRect) { nsRect ImageRect (0, 0, PR_MIN (mInfo->cx, mDecodedRect.width), PR_MIN (mInfo->cy, mDecodedRect.height)); for (PRInt32 y = aTileRect.y - aSYOffset + mDecodedRect.y ; y < aTileRect.YMost () ; y += mInfo->cy) for (PRInt32 x = aTileRect.x - aSXOffset + mDecodedRect.x ; x < aTileRect.XMost () ; x += mInfo->cx) { nsRect CroppedImage; ImageRect.MoveTo (x, y); CroppedImage.IntersectRect (ImageRect, aTileRect); Draw (aContext, aSurface, CroppedImage.x - x, CroppedImage.y - y, CroppedImage.width, CroppedImage.height, CroppedImage.x, CroppedImage.y, CroppedImage.width, CroppedImage.height); } return NS_OK; } If you need help I can do necessary changes on Win32.
Comment 10•23 years ago
|
||
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Comment 11•23 years ago
|
||
I can vouch for small png transparency files used as backgrounds cause what appears to be extreme slowness (windows98 & linux). A 10px X 10px png file with transparency causes my p3 800 with geforce2 to struggle. You can watch mozilla slowly render it onscreen. In the past I used a 1px file but mozilla would just take toooooo long to render (almost would seem to FREEZE).
Comment 12•23 years ago
|
||
We should degrade performance on the mac. We are deploying 6.2 at Time and 25% of their installed base is mac! Marking nsenterprise+
Keywords: nsenterprise+
Reporter | ||
Comment 13•23 years ago
|
||
> We should degrade performance on the mac.
I thought this was a MS-sponsored comment at first, then I realized that you
meant we should NOT degrad performance ;)
Assignee | ||
Comment 15•23 years ago
|
||
*** Bug 95272 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 16•23 years ago
|
||
Comment 17•23 years ago
|
||
Simon: you closed bug 95272 as dup but it was filed as a regression coming from the ImageLib landing (bug 78690). The attached patch fixes GFX but maybe we still have something wrong in the new ImageLib.
Comment 18•23 years ago
|
||
Besides, bug 95272 was filed against the Mac but we need to check whether the regression exists on other platforms too.
Assignee | ||
Comment 19•23 years ago
|
||
Pierre: the imagelib2 landing turned off previous image tiling optimizations that we had in layout, pushing them into platform-specific code (so that they can take advantage of toolkit image tiling features). On Mac, we were left with a tiling algorithm that was just very slow. I think pavlov will confirm that we just need to fix the Mac gfx code to address this.
Comment 20•23 years ago
|
||
*** Bug 95272 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 21•23 years ago
|
||
This is a Mac bug. If image tiling is slow on Windows, that needs a different bug.
Assignee | ||
Comment 22•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #48237 -
Attachment is obsolete: true
Assignee | ||
Comment 23•23 years ago
|
||
Latest patch does smart tiling only when the tile involves more than 32 blits, since any less than this and the cost of allocating GWorlds outweighs the blit costs. There is a further optimization that can be done, which is to have the last round of tile doublings copy directly to the destination, rather than going into the temporary GWorld. Note that this patch contains some instrumentation code that will be removed.
Status: NEW → ASSIGNED
Assignee | ||
Comment 24•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #48371 -
Attachment is obsolete: true
Assignee | ||
Comment 25•23 years ago
|
||
(Removed Windows commment from the summary; image tiling optimizations are implemneted on Windows.) My final patch optimizes image tiling by using a doubling algorithm to minimize the number of CopyBits calls needed. Unlike the previous patch, it never allocates a GWorld bigger than the destination area. Also, it shortcuts to simpler tiling when only a few tiles need to be drawn (to avoid GWorld allocation overhead). The patch is ready for r (Pavlov), and sr (scc)
Summary: Image tiling is very slow (on Mac, and reportedly on Windows) → Image tiling is very slow
Comment 26•23 years ago
|
||
Can you please remove the tabs that are mixed in with the spaces? Other than that, the code looks good. Glad to see you cleaned up some of the mess :-) r=pavlov
Comment 27•23 years ago
|
||
Did this make 0.9.4? If not and the risk is low we should check it into the branch. Let me know and I'll mark the bug appropriately.
Assignee | ||
Comment 28•23 years ago
|
||
This did not make it into the branch yet; it still needs super-review, but I believe it's fairly safe, and should go in.
Comment 30•23 years ago
|
||
Comment on attachment 48659 [details] [diff] [review] Final patch Cleaner and more effecient. I like it. The only thing I disagree with are the old-style casts for |BitMap*|. In Mac only code, no reason not to say |static_cast<BitMap*>(expr)| as you've done with other types elsewhere. If this is too verbose for your tastes, an appropriate typedef somewhere will allow you to say |BitMap_ptr(expr)| which has all the same good static checking of |static_cast|. Is it reasonable to add an |NS_ASSERTION| to catch debug-attempts to use a mask of different dimensions than the image? I don't care that you can't handle it, but it would be nice to notice if it ever happened in the controlled environment of a debug build. You manage locking pixels with stack-based objects, I wonder why not manage the lifetime of GWorlds similarly? The icing on the cake would be adding the measuring code to see if the final 4x tiling and big blit really are enough worse to warrant your suggested optimization. Using a smaller initial |GWorld| would be more robust in low memory situations. Fix the casts and you have my sr. Consider adding the |NS_ASSERTION|, measuring code, and stack-based |GWorld| disposers (in that order---though a _very_ wide range---of importance) for the title of perfectionist :-) (That order because (1) catches a problem waiting to happen, (2) decides if something is a problem waiting to happen, and (3) is primarily a style issue in the absence of exceptions)
Comment 31•23 years ago
|
||
Hmmm. Editing comments in the patch manager considered harmful. See comment above.
Comment 32•23 years ago
|
||
Sorry for the line wrapping in my review, Simon. See bug #97784 for the explanation.
Comment 33•23 years ago
|
||
I wonder one thing about the algorithm: knowing that it's cheaper to copy rows than columns, would it be cheaper overall to first double the pattern horizontally until it covered the destination width (byte- or perhaps long- rounded), then double vertically till it hit some reasonable height, then blit large bands to the real destination. Is this an algorithm that was considered and rejected?
Assignee | ||
Comment 34•23 years ago
|
||
I did consider this algorithm, though I didn't give sufficient consideration to the advantages of larger horizontal blits that it affords (for the number of blits ends up being the same). However, I suspect that the difference in CopyBits speed attributable to blit aspect is far outweighed by the cost of allocating the GWorld in the first place. In other words, I'd expect it to be slightly, but not significantly faster. If you like, I'll implement and instrument.
Comment 35•23 years ago
|
||
You don't need it to get my sr, but if it's easy to code up a timing test, this would be good information to have.
Assignee | ||
Comment 36•23 years ago
|
||
Whoa. Early results indicate that scaling up horizontally first, then vertically, is 5 times faster (for the scaling part). More testing to come...
Assignee | ||
Comment 37•23 years ago
|
||
Ignore that last comment; I was timing buggy code. It turns out that horizontal then vertical scaling is actually a tad slower than the kind of scaling I do now. I'll leave things as they are.
Assignee | ||
Comment 38•23 years ago
|
||
About the (BitMap*) casts; they can't be static_cast<BitMap*> because you can't safely cast from a PixMap* to a BitMap*. Calling CopyBits always requires a cast. I could use reinterpret_cast everywhere, but that seems verbose overkill for zero gain in type safely.
Assignee | ||
Comment 39•23 years ago
|
||
Patch checked in on the TRUNK.
Attachment #48659 -
Flags: superreview+
Attachment #48659 -
Flags: review+
Comment 41•23 years ago
|
||
shouldn't this be changed to platform and os all, since windows is also affected?
Assignee | ||
Comment 42•23 years ago
|
||
No. If windows is affected, file a separate bug. I did look, and Windows does have some smart-tiling code, though I don't know how well it works.
Comment 43•23 years ago
|
||
check it into the branch today - PDT+. Pls send email to Mac community to look for visual issues.
Whiteboard: PDT+
Assignee | ||
Updated•23 years ago
|
Whiteboard: PDT+ → PDT+ [ETA 09.26]
Assignee | ||
Comment 45•23 years ago
|
||
If I check this into the branch, we need the fix for 100700 as well.
Depends on: 100700
Assignee | ||
Comment 46•23 years ago
|
||
Landed on the 0.9.4 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•