Closed Bug 9489 Opened 25 years ago Closed 25 years ago

DOGFOOD: tree scrolling through msg thread is slow

Categories

(SeaMonkey :: MailNews: Message Display, defect, P2)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: pmock, Assigned: hyatt)

References

Details

(Whiteboard: [PDT+][Perf][PR1] 12/10)

Build Date & Platform Found:
 Mac Seamonkey commercial build 1999070810 installed on PPC 9600/300 OS 8.5.1

Overview Description:
 Scrolling through message in the Thread pane is slower than M7.  Doing a simple
operations such as scrolling through <20 messages in the thread pane become
incredibily hard and painful to do.  Running on a 300 mhz machine with 144MB of
ram, the scroll function should not have to pause when scrolling.

Note: this behavior does not occur in the message pane using the vertical or
horizontal scroll bar

Steps to reproduce:
1) Start Apprunner
2) Launch Messenger using the Task or Component bar
3) Double click on the server icon to expand the folder list
4) Select a folder (such as the Inbox) that has about 20 messages
5) In the Thread pane, use the vertical scroll bar to scroll through the
messages.  You should see the slow down

Actual Results:
 Click once on the up or down arrow of the vertical scroll bar.  The arrow is
depressed than a pause, then it slowing scroll up one.  If you hold down on the
up/down arrow off the scroll, it scroll in a jerky fashion instead of smoothly

Expected results:
 You should be able to scroll though the list of message smoothly and quickly.

Additional Information:
 (Personal note: my 300 mhz PPC feels like a 100 mhz)
Whiteboard: [Perf]
cc: putterman.
Assignee: phil → hyatt
reassigning to hyatt.
Status: NEW → ASSIGNED
Target Milestone: M10
Blocks: 9161
While I'm waiting for the message to scroll, you can not switch application.  You

have to wait until it finish scrolling.
fyi
Comparing the performance on my windows machine Gateway P200 with 64MB running
Win98 vs. my PPC 9600/300 with 144MB (vm off) in recents M8 optimized builds, I
can say my windows machine run apprunner faster than my PPC. It's night and day
scrolling. It's very painful to test apprunner on the Mac.

Do you think we could bump the priority up and maybe set the target milstone to
M9?  I believe Mac users will get frustrated.
Summary: [Perf] Mac 1999070810 scrolling through msg thread is worst than M7 → [Perf] Mac M8 - scrolling through msg thread is worst than M7
The problem exist in final release of M8 commercial build 1999-07-14-16-m8
Summary: [Perf] Mac M8 - scrolling through msg thread is worst than M7 → [PP]Mac M8 - scrolling through msg thread is worst than M7
Blocks: 11091
Whiteboard: [Perf] → [Perf] [PR1]
Sounds like this needs to be fixed by PR1, so I added a note to the Status
Whiteboard
Target Milestone: M10 → M11
Moving this one off to M11.  Right now, I'm just trying to make it work.
Performance will have to come later.
Blocks: 12716
Blocks: 12671
Target Milestone: M11 → M14
moving to m14
No longer blocks: 11091
Blocks: 15008
OS: Mac System 8.5 → All
Hardware: Macintosh → All
Summary: [PP]Mac M8 - scrolling through msg thread is worst than M7 → tree scrolling through msg thread is slow
updating summary field. this is not a PP bug but XP. platform/os all.
mass moving m14 bugs to m15
Summary: tree scrolling through msg thread is slow → DOGFOOD: tree scrolling through msg thread is slow
PDT team: please consider this bug for dogfood, and please try it yourself on
Linux on a reasonably large inbox (mine is about 450 messages, many people have
much larger folders than that) before dismissing it.

On Unix, it often takes me several tries to scroll down to the bottom of my
inbox (which has to be done every time you open a folder) because the scrolling
is so slow and there's no feedback on whether I'm there yet.
Adding bienvenu to cc
FYI: I have a hair less than 4000 messages in my inbox, and have no trouble
reading them.
I just ran my 4000 message inbox on both Linux and Windows, on the same 300Mhz
laptop with a ton (256M) of memory.  This was pretty usable in Windows, and
slightly less usable under Linux.  I tried both sorted by date format, as well
as thread-sorted format.
Note that to scroll, I grab the thumb at the right side of the message pane, and
drag it to the bottom of the box.  I typically have about 15 message titles
listed in the message pane when I do the dragging. Total scrolling time is about
4 seconds, including about 5 refreshes.
I think the problem has to do with wasting time drawing the entire set of lines,
only to immediately erase them all for another bit of scrolling.  This
(missing) coallescing of redraws seems pretty critical.  On Linux it is probably
a tiny bit worse because the redraws are so painful (and again, I think there
is a lot of missing coallescing).
To demonstrate what is (I think) at the heart of the issue, try clicking 7 times
beneath the thumb of the scroll bar.  You can then wait and watch 7 individual
refreshes as the message lists page down (on linux, due to another bug, both
single click and double click events probably cause a scroll... and hence 7
rapid clicks in succession yeild 10 refreshes ... but that is another bug).

I'll come by and look at akkana's machine to better understand the
usability issue.  It is certainly not pleasant to use at the current speed...
but for me, it is possible.
there are a few basic problems here. the painting is the least of the problems.
For one, we re-walk the content model with each reflow in order to determine the
 total lines in the tree... I have a feeling this is fairly expensive... the
problem is that caching this number can be fairly tricky.

Also note that performance has increased DRAMATICALLY since this bug was
originally filed... I realize it still needs pleanty of work, but it's SO much
better than it was in july, I'd argue it's 'good enough' for dogfood.

I won't argue that the scrolling issue Akkana sees is definately there, but I
think there are other higher priority bugs that we could work on in the mean
time.
I remember profiling this for bookmarks, and painting was the clear dominator.
Of course for mailnews with large mailboxes, the fact that I have to recount on
a reflow probably means that the numbers are more biased towards reflow in the
mailnews case.

I think fixing painting will give it a big boost though.  Unfortunately that's
blocked on getting ScrollRect implemented (Eric? Pav?).
Whiteboard: [Perf] [PR1] → [PDT+][Perf] [PR1]
Putting on PDT+ radar.
Pav is on vacation, but adding him and eric to the CC
I really don't see how this could be fixed for M12... but maybe eric/pavlov are
further along than I thought.
My profiling shows for page scrolling 5 times, ~10% is repainting, ~10% is
reflow, 8% in FrameManager::ComputeStyleChangeFor (most of this is in resolving
the style context) ~8% constructing table row frames, and lots of other little
things (beyond this, it's hard for me to be sure that I'm looking at unique
things). Cutting down painting should make a significant improvement. Waterson
thinks the style stuff is fairly well-optimized.
Priority: P3 → P2
Whiteboard: [PDT+][Perf] [PR1] → [PDT+][Perf][PR1] Dep on scrollRect work (beard et.al.)
Target Milestone: M15 → M13
Noting dependency in whiteboard (will replace with real dependency when I find
the bug), and bumping priority to p2, M13
Depends on: 13131
Whiteboard: [PDT+][Perf][PR1] Dep on scrollRect work (beard et.al.) → [PDT+][Perf][PR1] Open dep bug has 11/26 fix date.
Added dependency, updated whiteboard.
Made some progress profiling today.  Some very bad stuff was checked into
global.css by hangas that slowed down titledbuttons immensely.  There was also a
bad pattern in the button frame renderer that made it slow.

Also, the tree anonymous content creation function was doing a lot of string
thrashing.  All of these problems have been fixed (waterson checked them in), so
this will help out a little.

Unfortunately this is looking to be a pretty nasty problem.  Paging up and down
in bookmarks is about 78% frame construction and only 23% painting.  I'm worried
that the solution to this problem could well involve drastic changes to the tree
widget (e.g., brutal sharing of frames, reuse of frames from a pool).
Checked in nsAutoVoidArray and made CSSRuleProcessor::RulesMatching() use it.
This picks up a bit of performance in style resolution (~5%.
*** Bug 14559 has been marked as a duplicate of this bug. ***
Status Report for the Powers That Be.

We got about a 10% boost on scrolling by improving titled button's paint
routine.  That fix has been checked in.

In our latest profiling, we now have two more culprits that we can blame for our
problems: the allocation of content node attributes (XULAttribute) and the
new/delete of frames.

Waterson and I are going to split up and attack each of these two tasks
separately.  The first, XULAttributes, involves making an atom table and
addrefing/releasing atoms as needed... in this way the table only contains
attribute values that are actually in use, and it allows attribute values to be
shared by multiple attribute instances (e.g., in the tree widget there are lots
of maintenance atoms, like CLASS that are always the same on the anonymous
content).

The second task is one I'm going to take on, and I'm adding troy to the cc list
to get his input/feedback.  In order to prevent new/delete of frames, I want to
create an nsFramePool object that the tree widget can place frames into (instead
of having them be deleted), so that when you scroll it can just put frames into
the pool (rather than destroying them), and then pull them out of the pool
(instead of mallocing new ones).

This seems pretty simple, and I can do it with no bloat on nsFrame (although it
might be convenient to put the frame pool pointer in the base class).  I propose
adding the following methods to nsIFrame:

NS_IMETHOD SupportsPooling(PRBool& anAnswer);
  Indicates whether or not the frame has been modified to support pooling.  This
defaults to PR_FALSE, and as I modify frames to use pooling, I'll override in
the derived frame classes and start returning PR_TRUE.

NS_IMETHOD SetPool(nsIFramePool* aPool);
  Gives a frame a frame pool to use.

NS_IMETHOD GetPool(nsIFramePool** aPool);
  Get the pool for the frame.  The code that deletes frames will need to be
changed to call GetPool.  If it gets back a valid frame pool, it will place it
into the pool instead of deleting it.

NS_IMETHOD Reset();
  When a frame is placed into the frame pool, this method gets called
automatically by the frame pool code.  This method does everything the
destructor for the frame would do.  Poolable frames should be modified to call
Reset inside their destructors.

I'm going to start working on this... troy, if you foresee any trouble with this
approach or want to suggest an alternative, let me know ASAP.  I've been told
this bug is highest priority, which is why I'm jumping right into this.
Status Report for the Powers That Be.

We got about a 10% boost on scrolling by improving titled button's paint
routine.  That fix has been checked in.

In our latest profiling, we now have two more culprits that we can blame for our
problems: the allocation of content node attributes (XULAttribute) and the
new/delete of frames.

Waterson and I are going to split up and attack each of these two tasks
separately.  The first, XULAttributes, involves making an atom table and
addrefing/releasing atoms as needed... in this way the table only contains
attribute values that are actually in use, and it allows attribute values to be
shared by multiple attribute instances (e.g., in the tree widget there are lots
of maintenance atoms, like CLASS that are always the same on the anonymous
content).

The second task is one I'm going to take on, and I'm adding troy to the cc list
to get his input/feedback.  In order to prevent new/delete of frames, I want to
create an nsFramePool object that the tree widget can place frames into (instead
of having them be deleted), so that when you scroll it can just put frames into
the pool (rather than destroying them), and then pull them out of the pool
(instead of mallocing new ones).

This seems pretty simple, and I can do it with no bloat on nsFrame (although it
might be convenient to put the frame pool pointer in the base class).  I propose
adding the following methods to nsIFrame:

NS_IMETHOD SupportsPooling(PRBool& anAnswer);
  Indicates whether or not the frame has been modified to support pooling.  This
defaults to PR_FALSE, and as I modify frames to use pooling, I'll override in
the derived frame classes and start returning PR_TRUE.

NS_IMETHOD SetPool(nsIFramePool* aPool);
  Gives a frame a frame pool to use.

NS_IMETHOD GetPool(nsIFramePool** aPool);
  Get the pool for the frame.  The code that deletes frames will need to be
changed to call GetPool.  If it gets back a valid frame pool, it will place it
into the pool instead of deleting it.

NS_IMETHOD Reset();
  When a frame is placed into the frame pool, this method gets called
automatically by the frame pool code.  This method does everything the
destructor for the frame would do.  Poolable frames should be modified to call
Reset inside their destructors.

I'm going to start working on this... troy, if you foresee any trouble with this
approach or want to suggest an alternative, let me know ASAP.  I've been told
this bug is highest priority, which is why I'm jumping right into this.
I like the concept okay, but I wonder if the issue is the time it takes to
create the frames itself (i.e., allocating the memory) or is it style resolution
and creating of the style contexts?

It would be nice to have some additional breakdown on where the time is spent.

If it is the time to allocate the memory, then you can use memory arenas or a
similar technique to reduce the allocation ovrehead. If style is the big
culprit, then that problem can be attacked.

I'm not thrilled with adding all those member functions to nsIFrame, though. The
goal is to make nsIFrame less complicated rather than more complicated, and now
we're adding four new member functions that are only needed for the tree view.

We absolutely can not add any new data members to nsFrame. We just went through
four weeks of work to reduce the size of the frame model, and I would hate to
see us regress

Why do we need SuppportsPooling()? Ideally the frames that are pooled would not
even know that it is happening. Are you thinking they would clean up some of
their state (it seems like that is what Reset() is for)? If they did, then when
would they recreate their state? I don't see a function that gets called to
re-initialize them when inserting them back in the frame hierarchy. I hope we
wouldn't call Init() again, because that is one-time initialization.

We create frames in frame construction and so can't we do pooling there?

As far as GetPool() and SetPool() go, can't these be done as frame properties
instead? I don't think we want API functions for this.
Yes, we profiled it, and new/delete together are beating out style resolution.

I talked about this with evaughan, and I think we can avoid altering nsIFrame
(or at the very least minimize changes to it) with some cleverness.

How about this?

On a per-derived class basis (e.g. nsTitledButtonFrame) we selectively implement
pooling.  So for example, on a titledbutton, we could give the class a static
list of pooled titled button frames that it would maintain.

The NS_NewTitledButtonFrame could take an additional parameter (e.g., usePool),
and it would then go to the list of recycled frames first before trying a new.

I'm not sure exactly where frames are deleted, but it would then be a matter of
getting the recycled list for the derived class frame, and if one exists it
would be handed back.  This API could be a different interface that the frame
impl (nsTitledButtonFrame) could implement.. .e.g., getFramePool.

I'm not a C++ whiz, but couldn't the derived frame do the "adding to the pool"
op inside a custom delete operator and then not really delete the object?  Could
it do the same thing with a custom new operator?

In this example, I think you'd avoid altering nsIFrame.

Also, nsIFramePool doesn't need to exist as its own interface. It could just be
a void array or list.
After talking to Troy...

Even simpler (rather than keeping the objects around) is to let them be deleted
and just use an arena so that allocation of new objects would be fast.

I think I'm going to try this first, since I think it would give us most of the
win that we need.
Whiteboard: [PDT+][Perf][PR1] Open dep bug has 11/26 fix date. → [PDT+][Perf][PR1] 12/10
Added tentative fix date.
Need to split this bug into two distinct reports: one for line by line scrolls
(used in scrolling with cursor, msg delete, goto Next message operations) and
one for pgUp/pgDown.
Depends on: 20724
The frame recycler and arena are checked in.  I'd be interested in doing another
profile now, waterson, to see how much it helped.

I should say that on Windows at least I think the scrolling is dogfood ready.
I'd like to know who considers the scrolling unacceptable for dogfood and on
what platforms.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
should we mark this fixed and open new bugs for any outstanding
and remaining issues.   marking it fixed helps it to get on the
testing radar.
FYI
Build 1999120808M12:Linux,Mac
Build 1999120712M12: NT4

Overall the scrolling is much faster on my systems with the current builds
versus builds in July. Peter, how is it working on your Mac?
On my Mac, G3/400, scrolling in the thread pane has improved.  It usable for me.
It's not as noticable or painful as when you initially reported this bug.  When
I compare this to 4.7, seamonkey is still slower.

On my Pentium 166 running win98 and pentium 200 running RedHat 6.1, the scolling
performance is usable. It appears on all platforms that the scroll speed is
still slower than scolling in the message pane.
Blocks: 21564
Status: RESOLVED → VERIFIED
Verified Fixed. Scrolling is much faster now.
A seperate bug has been opened to track the smoothness while scrolling in the
thread pane (bug# 21592)
Try scrolling through a newsgroup thread pane with a thousand messages. It may
be faster than it was, but it's still an endurance test. (using 12-15 commercial
build on linux 6.0 on my fastest machine -- 400mhz?)
No longer blocks: 21564
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.