Closed Bug 122121 Opened 23 years ago Closed 23 years ago

BeOS gfx implementation extremely slow

Categories

(SeaMonkey :: General, defect)

x86
BeOS
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: mozilla, Assigned: mozilla)

Details

(Keywords: perf)

Attachments

(2 files)

Major problems include the following:

- creating and destroying BBitmaps every time you blit
- copying platform independent bits into BBitmaps every time you blit
- using B_OP_ALPHA even when no alpha channel is present
- excessive locking when drawing (currently twice around every primitive)
- excessive calls to Sync() resulting in a huge slowdown (particularly when tiling)
- unneeded drawing of BBitmap data in DrawToImage()
- and so forth

I'm filing this bug and assigning it to myself as I have a patch currently in
testing that addresses all of the above. Will attach to this bug in the next
week or so.
Accepting bug, adding perf keyword, setting milestone to 0.9.9 (but hopefully
sooner).
Keywords: perf
Target Milestone: --- → mozilla0.9.9
Attached patch Patch for 122121Splinter Review
This patch fixes all problems I reported in the original bug description, plus
the following:

- Added DrawNoLock() method used by DrawTile() calls, to prevent locking and
synchronizing with the server on each iteration
- Fixed several warnings
- Massive code cleanup and commenting
- Rewrote CreateImage() completely since only the 24 -> 32 bit case matters.
Removed non-working cruft related to B_CMAP8 BBitmaps, and optimized code to
work better on systems without write-combining.
- Removed unused member variables
- Reordered member variables for better packing and initialized them in order
- Removed code in DrawToImage() which was updating the BBitmap unnecessarily
- Implemented memory saving feature in Optimize() for when its starts getting
used again
Great work, Daniel!  Unless you have further enhancements, I give this a
thumbs up for checkin! r=

Marking as assigned to Daniel.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
It's all ready to go. Please check in when the tree reopens.
This patch has been checked in.  Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
v (but we aren't still not the fastest..)
Status: RESOLVED → VERIFIED
ok, i'm tried some additional tricks.

1) common C/C++ rules - ++i is faster than i++; footype foo[N] is faster than
footype *foo = new

2)for speed we may also avoid nested function calls like DrawString(PRUnichar)
{DrawString (char*)}, rendering text directly from unicode version, which is
simpler and faster. Same for GetWidth etc functions. Also very frequent calls
for single-character drawing/measuring can be handled bit faster.

3) UpdateView() function. It seems no reason to call SetFont when drawing
polygons, for example. May be moved in font-related functions.

4)Locking1. There are excessive nested LookLooper calls in widget/src/beos code
in path calling async drawing. May be removed.

5)Locking2. Most unsure for me. We may call in UpdateView mSurface->Lock()
instead BView::LockLooper(). Same for UnlockLooper
and surface::Lock may be rewritten a bit, e.g.

surface::Lock(*******) {

if(mOffScreen)
  mBitmap->Lock()
else
  view->LockLooper()
}
also moving locking into surface:: definitely helps to avoid nested locking,
inspite in my debug code i nver noticed LockCount higher than 1.

I tried to ask in BeDevTalk what in reality means call of BView::LockLooper()
for view attached to BBitmap but nobody answered for sure, though.

P.S. We shouldn't be afraid about adding extra code or growing stack size in
this case - even little speed addition counts more in current situation.
Other explicit rule to recall - minimize code inside Lock() - but here is the
question about e.g. adjusted text drawing which happens word by word - what is
better - Lock view once until we finish whole frame (long-time lock, but single)
or do Locks inside loop, allowing (probably)time-windows for other activity
(short-time but multiple Locks) ?
I would say it is much better to lock once, especially as that drawing is to an
offscreen view(???) so the looper doesn't need to be unlocked to handle mouse
messages and things. Also I read somewhere if the GetWidth stuff returns
aFontID, that can speed up rendering. That could be something else to add to the
gfx implementation.

i'm bit confused with your concentration on locking issue. I think comment found
in code about locking overestimates situation.
1)Usually mozilla is compiled with double buffering. It means that 90% of
drawing primitives are called (at least for page layout rendering) on views,
attached to BBitmap, not BWindow.
2) BBitmap itself NEEDS  to be locked, when view, attached to it as child
performs drawing, but BBitmap lock don't lock application window.
3) There is little uncertainity, what is object to lock for BView::LockLooper(),
when view is added to BBitmap. I asked about it in BeDevTalk, nobody knows for
sure, but i have strong feeling, that it calls at the end BBitmap::Lock(). If
not, we can do it explicitly.
4) I did it explicitly in my code, and i never observed attempts to lock already
locked BBitmap. And i think, that it is logical. As far as i understand,
drawingSurface/BBitmap in current BeOS gfx implementation deals with single
view, sometimes created just for this purpose - to perform draw on offscreen bitmap.

So it needs more investigation, if this is so big issue, or not, but sure, there
are excessive LockLooper() calls in nsWindow.cpp
<quote>
2) BBitmap itself NEEDS  to be locked, when view, attached to it as child
performs drawing, but BBitmap lock don't lock application window.
</quote>

Exactly. So there is no real need for it to ever be unlocked during rendering
(no window messages to process). If each primitive drawn to the BBitmap via a
child BView calls Lock/Unlock I think this adds serious overhead. This is why
I'm so concerned with locking stuff. BeOS should be one of the fastest moz
platforms (should be much faster than linux with all the X server round
triping), but it is nowhere as fast as the major platforms. The only major
difference - BeOS requires stuff to be locked. That's why I believe LockLooper
is probably the major slow down in the Be gfx.

<quote>
3) There is little uncertainity, what is object to lock for BView::LockLooper(),
when view is added to BBitmap. I asked about it in BeDevTalk, nobody knows for
sure, but i have strong feeling, that it calls at the end BBitmap::Lock(). If
not, we can do it explicitly.
</quote>

From the BeBook:
------
BBitmap::Lock(), BBitmap::Unlock()
These functions lock and unlock the off-screen window where BViews associated
with the BBitmap draw. Locking works for this window and its views just as it
does for ordinary on-screen windows.
------
So as far as the appserver is concerned a BView with a BBitmap parent still
draws into a window, and even though it's offscreen it still needs to be locked.
This probably adds about the same overhead as locking a normal Be Window. If
this is called around every word on a page (maybe a few thousand times) this is
a lot of overhead.

I'm not concerned about nested lock calls, just the number of times
lock()/unlock() happens.
Actually BeOS will be inherently slower than Windows for two reasons: all the drawing is done by the graphics server in another process (the app_server), and we don't have nearly as much hardware acceleration as they do. Remember, all offscreen drawing in BeOS is done completely in software. Under Windows, you can load an offscreen onto the graphics card, and do most of your drawing in hardware (although I don't know if the Mozilla Windows port takes advantage of this or not). However, I agree that all the locking is imposing overhead (you wouldn't believe how much work locking a looper does) and that we could improve.
ok, Daniel, LockLooper is may be really complex "intelligent" method, doing
additional work for atomizing and safety, as far as i understand, but what about
direct usage of BBitmap:Lock?

and second thought - what do you think about possibility to use BDirectWindow ?
There is notice in BeBook that it don't work on some cards, but i really doubt
that bezill ais usable at all on machines using such cards.
Btw, most resource wasting (in locking-sense) thing for double-buffered page
layout rendering is picture animation.
For each frame update of each image it performs lock, good if only one.
Even if we can reduce overhead by copying only changed part of image.
Locking a bitmap with no view attached is cheap, I'm not sure about locking it when using a view (and therefore another server thread). Using a direct window is an interesting idea, but the major problem is that you lose CopyBits() in the process. Without a hardware accelerated screen-to-screen blit (which is what CopyBits() does) scrolling will be unusable. I tried this in the current code base to work around the iframe scrolling problems, and it was deadly slow on a 1.1 GHz Athlon. Therefore BDirectWindow is out.
Attached image Screenshot
Ok, Simon. In theory i have code you desired.
1) Most LockLooper and UnlockLooper instances in nsRenderingContext were
replaced by nsDrawingSurface::Lock and Unlock calls.
2)in nsDrawingSurface i'm doing BBitmap::Lock and Unlock only once for surface
- after mBitmap = new and before delete mBitmap - so same Lock stays since
creation untile destroy.
3)mDrawingSurface Lock and Unlock Looks like:
{
if(mLocked)
  return NS_OK;
if(mBitmap)
  {
    fprintf(stdout,"LS %p\n",mBitmap);
  }
  else
  {
	mView->LockLooper(); 
	fprintf(stdout,"LL%p\n",mView);
  } 
  fflush(stdout);
  mLocked = PR_TRUE;
}
same for unlock - it never unlocks BBitmap in reality.

And now look at screenshot in attachment - most locks (LS) are for bitmap
(while drawing)- and it is same bitmap for page.
LLs (LockLooper for view when NO surface bitmap created) start dominate when
Mozilla is idle.
And not that i noticed SO big speed improvement, but maybe it is still worth to
be proposed as patch.
now some results - witch cheap profiling enabled in nsDrawingSurface
(#define CHEAP_PERFORMANCE_MEASUREMENT)
"Time taken to lock" didn't change noticeably - probably locking (Lock call till
return) is not so time-consumptive.
But "Time taken to unlock" reduced in times, up to 5 times, and for offscreen
case (mBitmap exists) is in my Duron800 2ms(?) in average, instead 10 in average
before code changes.
Btw, it seems that even previous code i had here
if(mBitmap)
  {
      mBitmap->Unlock(); //now empty
  }
  else
  {
      mView->UnlockLooper(); 
  }
was big win over calling Lock/UnlockLooper in drawing primitives- it seems that
BView::UnlockLooper() is again in  times slower than mBitmap:Lock/Unlock, even
BBitmap has bView attached (this method is used in latest StripZilla with updates)
Another slow thing - text typing - for each letter mozilla calls FillRect 6
times, like this:
fillr 0 0 258 12
fillr -12 -477 823 673
fillr -10 -363 821 537
fillr 0 0 258 12
fillr -10 -627 5016 4206
fillr -2 -1 658 125
and most of them - in onscreen mode, it means - with real locking even if we
apply technique proposed above.

And, as fact, mozilla pushes out whole text (and proceeds full frame reflow)
even if we add single letter to end of text. But this is out of our competence :(

Only thing which we can investigate - weird values for width and height for
fillrect - like fillr -10 -627 width-> 5016 height->4206
And wwidget with worst design in Mozilla seems URL bar (probably other drop-down
 edit boxes too). With autocomplete on
adding single letter needs 18 (!!!) FillRect calls, and 2 DrawString calls.
Sure, with output of whole string.

Sad that we cannot use BeOS Text classes, may be will be better.
Isn't it time to reopen this bug? not only does Mozilla still suffer from 
performance issues under BeOS, as it is, users can't find this bug via the 
default Bugzilla search (which only targets open bugs).

Prog.
Actually, no, this should not be reopenned.  Rather, this whole discussion
should really be taken either to the forums at bezilla.org, or
news://news.mozilla.org/netscape.public.mozilla.beos
I think bugzilla is a good place for discussion, so in the future there is a
record of the process we went through to speed up Bezilla. Maybe a new bug "BeOS
gfx implementation could be faster" as an enhancement or something.

While I remember this, I'll post it here:

I found quite a good bit in a Be Newsletter article - google for "BeOS
newsletter 82" (without the quotes). A quote from this:
-----
If you're writing, say, a text engine or a web browser (read: something
string-width intensive), simply caching a single string's width might not be
what you want. You probably want to be able to find out any string's width
without incurring the messaging overhead each time. Sounds like you want your
own StringWidth() function.

I happened to have such a "width buffer" object, and did some profiling with it.
BStopWatch revealed the potential performance boost of a local string-width
mechanism to be significant. On average, it took a version of NetPositive that
did no caching roughly 530 microseconds to calculate the various strings (one at
a time) in its default page. With caching enabled, the number went down to 54
microseconds.

I'll save the implementation details of my width buffer class for another
article, but here are some things to remember if you decide to write your own. 
....
--------

Pity that he never published the implementation details (not that I could find,
anyway). If you look at BeOS Newsletter 90 there is another good article on
getting fast interfaces on Be.

On another note, I got paid this week, so I'll be getting some decent hardware
as soon as I get a bit of free time. Then the patches will start flying (maybe) ;)
Discovered yet another reason why it looks sometimes sluggish.
Background.
Mozilla don't use (and probably cannot due its buffering specifics) Be API's
SetViewBitmap for background drawing from image.
Instead, it uses its own method DrawTile, which uses usual DrawBitmap
(DrawBitmapAsync for every tile, though)

But problem is that, it seems, overriden BView::Draw/Invalidate causes also call
for DrawTile, EVEN IF background ISN'T VISIBLE (obscured by other elements).
Good ilustration is www.beosjournal.com - if you move little square (some
window) over text area, you see that update is still sluggish - noticeably worse
than on blank page. Or on page without background image, e.g. www.mozilla.org

And on such sites with "specific" background as www.newsru.com or even
www.opera.com this is total nightmare.
Main problem with DrawTile is weird idea of some web designers to create
background consisting of zillions 1*1 pix images. Even with DrawBitmapAsync and
intelligent locking it is big overhead.
It seems nsImageBeOS::DrawTile may be improved a little bit, though.
Currently it performs DrawBitmap for every tile, but once first tile is rendered
onto BBitmap via attached BView, we can use CopyBits on this view, which may be
more effective.
Right, SetViewBitmap only makes sense with a view which is drawn directly to
screen. You know, the scary thing was DrawTile used to use the synchronous
version of DrawBitmap for each tile (Mathias and I changed that in our rewrite).

Redrawing the background when it isn't visible sounds like a huge problem - do
you think you can fix it?
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: