Closed Bug 74656 Opened 24 years ago Closed 20 years ago

Optimizations possible in AppendLines

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: hyatt, Assigned: shaver)

References

Details

(Keywords: perf)

Attachments

(17 files)

use dhash for frame properties 24 years ago Mike Shaver (:shaver emeritus) 15.95 KB, patch		Details \| Diff \| Splinter Review
Cache last-line and last-child for block frame 24 years ago Mike Shaver (:shaver emeritus) 4.34 KB, patch		Details \| Diff \| Splinter Review
Add one-entry cache for frame properties. 24 years ago Mike Shaver (:shaver emeritus) 7.24 KB, patch		Details \| Diff \| Splinter Review
Better (fixed?) block-frame caching patch 24 years ago Mike Shaver (:shaver emeritus) 4.80 KB, patch		Details \| Diff \| Splinter Review
BlockFrame: Only use frame property when number of lines exceeds $BLOCK_FRAME_LINE_CACHE_THRESHOLD 24 years ago Mike Shaver (:shaver emeritus) 9.26 KB, patch		Details \| Diff \| Splinter Review
Even _waterson_ will like this one! 24 years ago Mike Shaver (:shaver emeritus) 9.69 KB, patch		Details \| Diff \| Splinter Review
once more, with feeling: handle the invalidated-cache case correctly 24 years ago Mike Shaver (:shaver emeritus) 8.54 KB, patch		Details \| Diff \| Splinter Review
full patch, sorry for the mixup 24 years ago Mike Shaver (:shaver emeritus) 9.50 KB, patch		Details \| Diff \| Splinter Review
all the pieces, none of hyatt's, I'm such a loser 24 years ago Mike Shaver (:shaver emeritus) 9.86 KB, patch		Details \| Diff \| Splinter Review
msgcompose log before patch 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.54 KB, text/plain		Details
msgcompose log after patch 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.92 KB, text/plain		Details
Output with threshold env variable at 1 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.55 KB, text/plain		Details
Output with threshold env variable at 64 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 3.05 KB, text/plain		Details
Output with threshold env variable at 128 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.55 KB, text/plain		Details
Output with threshold env variable at 16 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.55 KB, text/plain		Details
Matrix of all the values, stripped out the mindless assertions. 24 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 2.79 KB, text/plain		Details
gprof output from nsBlockFrame::AddFrames 23 years ago Tomi Leppikangas 1.49 KB, text/plain		Details

David Hyatt

Reporter

Description

•

24 years ago

Frame lists need to keep a pointer to their lastChild.  Because they don't do
this, content insertion and appending is preposterously slow (especially when a
large # of insertions/appends are done).

Chris Waterson

Comment 1

•

24 years ago

I foggily recall that we hashed through this before and decided it would be a 
Bad Thing to add more pointers to nsContainerFrame. Would it be possible to 
detect that the child list has grown past a certain limit (say, 16 children) and 
add a frame property?

David Hyatt

Reporter

Updated

•

24 years ago

Status: NEW → ASSIGNED

Summary: 20% of reply time spent in O(n^2) LastChild retrieval in frame code → 10% of reply time spent in O(n^2) line walking algorithm in nsLineBox

Target Milestone: --- → mozilla0.9.1

David Hyatt

Reporter

Comment 2

•

24 years ago

nsLineBox needs to be improved.  The implementation is a study in inefficiency.

David Hyatt

Reporter

Comment 3

•

24 years ago

I can hack around the frame problem, but nsLineBox has to be patched up.  Its
implementation is ridiculous.

Hixie (not reading bugmail)

Updated

•

24 years ago

Keywords: perf

QA Contact: amar → waterson

Marc Attinasi

Comment 4

•

24 years ago

nsLineBox has to be rewritten. In fact, how about nsBlockFrame too while we are
at it? Considering where these live in the food-chain, I think we would do well
to get them as fast and clean as possible.

Chris, I like your idea of creating a lastChild pointer when the child list gets
long (I was thinking much longer than 16, but it could be tuned). Isn't
accessing frame properties pretty slow though? I know accessing the View for a
frame is via a frame property and it shows up occassionally on profiles or
really large documents, which was initially surprising to me. Also, would the
lastChild property have to be per-child-list, or would it just be for the
default child list?

Chris Waterson

Comment 5

•

24 years ago

(Tangent Alert.) Accessing frame properties should be O(1), but it isn't, thanks
to troy's baloney-headed nsDST. I think (but haven't proven) that although
intended to be O(log n), it degenerates because of the way it uses the pointer
bits. (Using pointer bits to generate a tree is silly. If you do it
left-to-right, you're screwed, because the high bits are likely to all be the
same: locality. If you go right-to-left, you're screwed, because the lowest bits
are all going to be very similar: alignment.) Anyway, I've been looking for an
excuse to replace it with pldhash. Point me to a profile!

David Hyatt

Reporter

Comment 6

•

24 years ago

I can show you plenty of profiles where GetPrimaryFrameFor shows up as a
significant amount.  It's obviously *not* O(1) because of the dumbass desire to
not use a hash table.

Edward Kandrot

Comment 7

•

24 years ago

You correct, we did hash through this before, a few times.  I pointed this out
in August and in January.  Both times, all involved thought that the extra bytes
were not worth the O(n^2) (I have the email logs, in case anyone wants the info
from that heated debate).  I disagreed then, and am I glad to see that people
are finally seeing the light (even if I am not credited with the find anymore).

Go for it, fix this problem (there are a few O(N^2) pieces of code, all of which
will fall at some point, I hope).

Chris Waterson

Comment 8

•

24 years ago

hyatt: recall that GetPrimaryFrameFor() often needs to search for frames. Not
all frames are hashed (er, dst-ed) from the get-go. It would be interesting to
see how much of GetPrimiaryFrameFor() is spent in nsDST::Search() (or whatever).

Mike Shaver (:shaver emeritus)

Assignee

Comment 9

•

24 years ago

I dunno about redoing nsLineBox, but I am right now working on making frame
properties much cheaper (in time and space), and then making nsLineBox use a
frame property to track its lastChild/lastLine.

Hyatt: feel free to assign this to me if you want.

rbs

Comment 10

•

24 years ago

[Just want to re-iterate that if rewriting ns[Block/Inline]Frame is in the air 
anytime, a fundamental change that I suggest to incorporate is bug 64763 so as
to get rid of the cruff within the PerFrameData stuff.]

Joe Francis

Updated

•

24 years ago

Blocks: 22486

Mike Shaver (:shaver emeritus)

Assignee

Comment 11

•

24 years ago

Waterson and I are no longer certain that this will be a space win, but I have a
patch anyway.  I'm going to Do The Math on this stuff shortly to see what our
real-world frame-property memory usage is with the current DST stuff and compare
to a pldhash implementation.

Waterson: as far as the effects of alignment on the bit patterns, it looks like
the DST code tries to skip the lower few bits of the pointer.

Mike Shaver (:shaver emeritus)

Assignee

Comment 12

•

24 years ago

Attached patch use dhash for frame properties — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 13

•

24 years ago

<hyatt> take it off my list!

OK, Dave.

The patch that I'm about to attach WORKSFORME, though I haven't run the layout
block-frame regressions, and I haven't run any profiles yet.  Waterson, honey,
can you profile for me?

Assignee: hyatt → shaver

Status: ASSIGNED → NEW

Mike Shaver (:shaver emeritus)

Assignee

Comment 14

•

24 years ago

Attached patch Cache last-line and last-child for block frame — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 15

•

24 years ago

Oh yeah: I looked at the frame properties stuff, and we usually don't get more
than a few dozen properties.  We _do_ often ask for the same (frame,prop) pair
repeatedly in succession, sometimes more than 900 times (!), so I have a little
patch that adds a one-entry quick cache to avoid consulting the DST for those
cases.  I'm going to clean it up, and make PropertyListFor move the found list
to the head, and then I'll post it here.  It won't change the world, but every
bit counts, right?

Status: NEW → ASSIGNED

Target Milestone: mozilla0.9.1 → mozilla0.9

Mike Shaver (:shaver emeritus)

Assignee

Comment 16

•

24 years ago

Marc: actually ViewProperty was one of the hot spots here, so my little patch
could help those cases a fair bit.  Do you have test cases we should use for
before/after profiling?

Mike Shaver (:shaver emeritus)

Assignee

Comment 17

•

24 years ago

Attached patch Add one-entry cache for frame properties. — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 18

•

24 years ago

The latest patch is for the 1-entry cache I mentioned, which is only tangentally
related to this bug.  If people want, I can break it out into another.

Marc: if you're running profiles on those big pages again, do you want to see if
this helps take GetFrameProperty out of the picture?

I was tempted to make the FrameHashTable stuff use dhash, but I think it's
better to wait until brendan's done with bug 72722 and just convert to
nsHashtable there.

Comments and review solicited for my latest two patches.  (I haven't Done The
Math on memory usage for dhash vs. dst for frame properties yet, so I'm not
ready to put the first one in.)

Mike Shaver (:shaver emeritus)

Assignee

Comment 19

•

24 years ago

Attached patch Better (fixed?) block-frame caching patch — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 20

•

24 years ago

The first block-frame patch didn't invalidate the cache when we removed frames,
which caused quite a bit of trouble.  This one should work much better (I was
editing documents with it for a while before I destroyed my tree).

Chris Waterson

Comment 21

•

24 years ago

Looks pretty good.

- You add two words to each block frame. It'd be better if we could
  conditionally cache this when we knew it'd be a big win (i.e., block
  frame has many lines). (That's my biggest complaint.)

- Not sure why you needed to add GetLastChild(). Couldn't you have just
  used the extant LastChild() method? (Also means less damage to
  nsBlockFrame.h)

- In this part:

  +    if (appending) {
  +      prevSibLine = mCachedLastLine;

  it's not immediately obvious that mCachedLastLine is guaranteed to be
  valid because of your previous call to GetLastChild() to compute
  the boolean ``appending''. You should put a bright red comment here.

- Rename GetLastLine() to ComputeLastLineFrom() or something. Heck you only
  call it from one place. Why does it need to be a function at all?

  Instead of...

  +    // make sure we're (still) tracking the last line
  +    nsLineBox *line;
  +    if (mCachedLastLine)
  +      line = mCachedLastLine;
  +    else
  +      line = mLines;
  +    line = GetLastLine(line);

  how about:

  nsLineBox line = mCachedLastLine ? mCachedLastLine : mLines;
  while (line->mNext)
    line = line->mNext;

- Local custom...

  . names pointers as |nsIFoo* foo|, not |nsIFoo *foo|
  . uses |nsnull| for null pointers, as opposed to bare zero
  . is not sparse on vertical whitespace (e.g., ``We check here'')

Mike Shaver (:shaver emeritus)

Assignee

Comment 22

•

24 years ago

GetLastChild and LastLineFrom are both remnants of earlier revs, with much more
debugging and validation cruft in them.  They can (and should) go.

I'll comment and morph to dominant code style (bad me!).

After I do that (and post another patch), I'll look at using a frame property if
someone can tell me how big ``big enough''is, and whether we just count lines or
lines+children-in-last-line, etc.

Chris Waterson

Comment 23

•

24 years ago

Well, the problem hasn't been noticed except in editor and in a DOM case that 
edward cooked up, that inserted/appended one node at a time. So, that makes me 
believe that mainline layout is not typically suffering here, and is why I'm 
being stingy. :-)

Could you mangle your existing patch you could play with different values of 
`n', and empirically determine ``when you start to notice the evil behavior in 
editor''? If I had to pick a number out of my ass, I'd say something like 32 or 
64 should be a reasonable threshold at which to start caching the last child.

Chris Waterson

Comment 24

•

24 years ago

...and I think we should just count lines. You can count the lines in 
nsBlockFrame::ReflowDirtyLines(), and when the number of lines exceeds `n', set 
a new frame state bit (maybe NS_BLOCK_HAS_CACHED_LAST_CHILD) that tells your 
caching code to keep the properties up-to-date.

Mike Shaver (:shaver emeritus)

Assignee

Comment 25

•

24 years ago

OK, that's a good tip.  I'll whack up some environment setting that gets used to
set `n' and you can try different values with your choice of test sets!

Mike Shaver (:shaver emeritus)

Assignee

Comment 26

•

24 years ago

So, as I work on making all of this stuff tunable, I start to wonder if counting
lines is the right approach at all: aren't we more concerned with the access
model (lots of appends causing lots of LastChild calls, and therefore lots of
list-walking) than the number of lines?  If we just do a few big AppendFrames
calls, we can put together a large document without too much pain, right?

(I also wonder if perhaps we shouldn't just be inverting the list and storing
the _tail_, since appending would seem to be a much more frequent operation than
prepending.  But I haven't thought much about that yet.)

I'll post a patch in an hour or so that only caches the last line -- caching the
last child would cost us an additional two words per caching-enabled block
frame, and waterson says that lines are usually pretty short anyway -- and we
can discuss better heuristics.

Mike Shaver (:shaver emeritus)

Assignee

Comment 27

•

24 years ago

Attached patch BlockFrame: Only use frame property when number of lines exceeds $BLOCK_FRAME_LINE_CACHE_THRESHOLD — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 28

•

24 years ago

Best patch yet!  Set BLOCK_FRAME_LINE_CACHE_THRESHOLD in your environment to
tune the point at which we switch to using a frame property (for last line, not
last child -- I now see that I should rename the frame-state flag).

Seth, Edward, do you want to play with settings here and see what effect they
have on your specific test cases?

rbs

Comment 29

•

24 years ago

collision: 
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsHTMLParts.h#44

+#define NS_BLOCK_HAS_CACHED_LAST_CHILD      0x00100000
 #define NS_BLOCK_SHRINK_WRAP                0x00100000

Edward Kandrot

Comment 30

•

24 years ago

Seems to me to be a rather large patch for what I thought to be a simple change.
 I guess I do not understand why we just do not add the 4 bytes for a tail
pointer and be done with it.  The amount of code needed for this patch and the
time overhead for the extra calculations seem to more than outweigh adding 4
bytes.  Please correct me if I am wrong, for I just want to know why?  Thanks.

Chris Waterson

Comment 31

•

24 years ago

Well, the patch at 04/10/01 07:21 isn't *that* much larger than the patch from 
04/07/01 20:47 (which just keeps a `last' pointer, all the time). And, it's 
probably a bit larger simply because shaver added some stuff to help pick a 
sweet spot.

In defense of not doing it all the time (since I lobbied for it), a large web 
page could have tens or even hundreds of thousands of block frames. So, keeping 
block frames as small as possible is important. We don't run in to this 
performance problem in the ``normal'' course of layout. So why pay the penalty 
in space unless there's a win in time?

Chris Waterson

Comment 32

•

24 years ago

I like it, I like it. Nits below.

+static int CachedLineThreshold = 0;
+

gCachedLineThreshold?

+#ifdef DEBUG
+      fprintf(stderr, "BLOCK_FRAME_LINE_CACHE_THRESHOLD is %d\n",
+              CachedLineThreshold);
+#endif

All kipps other debug printf()'s just go to stdout. (Or is this code just in
there until we figure out what the ``sweet spot'' is?)

+inline void
+nsBlockFrame::SetCachedLastLine(nsIPresShell *aPresShell,
+                                nsIFrameManager *aFrameManager,
+                                nsLineBox *aLine)

Why inline?

+    // If we fail to set it, clear the cached flag, because something
+    // is screwy in the cache and we'd rather be slow than crash.
+    mState &= ~NS_BLOCK_HAS_CACHED_LAST_CHILD;

Is this really necessary or even sufficient? Seems like LastLine() can recover
gracefully without this.

+  if (!mLines)
+    return nsnull;
+  
+  // If we don't have a cache, we walk lines
+  if (!(mState & NS_BLOCK_HAS_CACHED_LAST_CHILD))
+    return nsLineBox::LastLine(mLines);
+
+  nsCOMPtr<nsIFrameManager> frameManager;
+  aPresShell->GetFrameManager(getter_AddRefs(frameManager));
+  if (!frameManager)
+    return nsLineBox::LastLine(mLines);
+  
+  nsLineBox* cachedLine;
+  if (NS_FAILED(frameManager->GetFrameProperty(this,
+                                               nsLayoutAtoms::cachedLastLine,
+                                               0, (void **)&cachedLine))) {
+    return nsLineBox::LastLine(mLines);

Flow of control a bit muddy here. Why not just do nested if()? (It'll only end
up being three deep, and we've got a c-basic-offset of `2'...)

+  // the call to LastChild will ensure that the line cache is up to date
+  PRBool appending = (mState & NS_BLOCK_HAS_CACHED_LAST_CHILD &&
+                      aPrevSibling == LastChild(presShell));
+

Does bitwise-and have precedence over logical-and? I always forget. Parens?

+ * Frame state flag indicating that this frame is using a frame property to
+ * cache its last line and child.  See LastChild below and in nsBlockFrame.cpp
+ * for details.
+ */
+#define NS_BLOCK_HAS_CACHED_LAST_CHILD      0x00100000
+
+/*

what rbs said; this should really be in nsHTMLParts.h (it's not obvious to me
*why*, but that's where the other flags are).

+  void InvalidateLastLineCache(nsIPresShell* aPresShell) {
+    if (!(mState & NS_BLOCK_HAS_CACHED_LAST_CHILD))
+      return;
+    nsCOMPtr<nsIFrameManager> frameManager;
+    aPresShell->GetFrameManager(getter_AddRefs(frameManager));
+    if (frameManager)
+      frameManager->SetFrameProperty(this, nsLayoutAtoms::cachedLastLine,
+                                     nsnull, nsnull);
+  }

Does nested if make flow-of-control simpler?

Chris Waterson

Comment 33

•

24 years ago

Also, when you check in, should we just guess at a sweet spot for
[g]CachedLineThreshold? Say, 64 to start?

David Hyatt

Reporter

Comment 34

•

24 years ago

This seems overly complex to me.  Wouldn't it be simpler to not worry about the 
threshold stuff?

Chris Waterson

Comment 35

•

24 years ago

For the reasons I cite above, I will accept the *very marginal* increase in code
complexity.

Mike Shaver (:shaver emeritus)

Assignee

Comment 36

•

24 years ago

gCachedLineThreshold it is.

Of course it's only there until we find the sweet spot!  Who would leave debug
spew like that in for any length of time? =)

Dunno why inline.  Not inline now!

LastLine _can't_ recover from the case where it has a line cached that is no
longer part of the block frame at all, and if I don't know what the state is in
the DST, I don't want to trust it.  Maybe that can't really happen, but I'm
treading very lightly here, and I had a spare belt that matched my suspenders.

I'll wield the mighty Nested If, yes.

And the mighty parens, just for you!

When (if?) I check in, I'm leaning towards leaving gCachedLineThreshold unset,
but only because I don't understand where the number ``64'' comes from.  Here's
a question I want to ask of you: what %age space-waste is acceptable for this? 
Then I can set gCachedLineThreshold such that the frame property is < n% of the
BlockFrame + its LineBox children.  3%? 5%? 10%?  (Don't cheat and work
backwards from 64!)

New patch coming right up.

Mike Shaver (:shaver emeritus)

Assignee

Comment 37

•

24 years ago

Attached patch Even _waterson_ will like this one! — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 38

•

24 years ago

Anyone interested in this patch?  Anyone dislike it?

Mozilla 0.9 is coming soon, and I want to know if I should just walk away from
this or spend precious end-of-release cycles making it appropriate for checkin.

rbs

Comment 39

•

24 years ago

Looked at the patch and it is good to go. Will give r=rbs so that waterson
can give sr for you to checkin.

Some nits:
=====
+static int gCachedLineThreshold = 0;

Since the whole point of the patch is to improve the perceived perf, maybe a 
non-zero value should be used to start with. It is not going to benefit the 
end-user if 0 is kept (the env variable is undiscoverable). In the meantime,
the hunt for the optimal value could continue.

==========
+  // XXX Use cached last-line data to speed up the line and other checks
+  // XXX below, instead of just invalidating
+  InvalidateLastLineCache(presShell);

Since 'XXX' more often than not implies: a todo, an uncertainty or doubt, a
problem with an edge-case, I will remove the 'XXX' and just say 'This uses 
the...'

Mike Shaver (:shaver emeritus)

Assignee

Comment 40

•

24 years ago

If you can tell me a good number (or, better, answer the acceptable-bloat-%age
question I posed to waterson), I'll gleefully set gCachedLineThreshold to it. 
In the meantime, I was hoping to tell .performance about it and get feedback
from people like varada and stephend about the effects of various settings,
before we make everyone pay the bloat cost.

The XXX does, in fact, indicate a todo: we should use the cached line data to
speed up those other checks, rather than simply invalidating the cache there (as
we do on the line below the comment).

Want to r= it anyway, or wait until we settle the gCachedLineTheshold debate?

rbs

Comment 41

•

24 years ago

r=rbs. I don't see much that can be reliably added to this in this time-frame.

re:threshold, what about trying 64 to start-with as waterson suggested. Any
value >> 1 is okay to me as a start.

As an additional comment as to why it is better in the long run to add these
few lines of code so as not to add the 4 bytes all the time. The block frame is
pervasive and is also used elsewhere as a wrapper frame (e.g., in table) to 
provide functionalities not supported by other frames. Quite often, a block
frame can contain lineboxes with a block frame as sole children. That
contributes to the thousands of block frames that could be around in a page.

Mike Shaver (:shaver emeritus)

Assignee

Comment 42

•

24 years ago

This 64 number seems really popular.  Are there data out there about the
line-count for nsBlockFrames in general?  Do people just like powers of two? 
I'd rather not just guess a number and check it in, because, as we saw with the
reflow interval stuff, it can stay that way for an unpleasantly long time.

rbs

Comment 43

•

24 years ago

The main mistake that was made with the interval stuff is that no non-futured
bug was filed in bugzilla. In order not to repeat the same mistake, a bug should
be entered this time around. As I was writting these comments, I thought of
filing that before completing my comments, but this code is yet to be
checked-in. Also since a shake-up will do most good to layout, this code may
perhaps not appear at some later stage in its present form.

rbs

Comment 44

•

24 years ago

The one-entry cache in the frame manager that is saving from the +900 successive 
requests of the same property is good stuff. The uncertainty about the threshold
is blurring that that one is nice.

Chris Waterson

Comment 45

•

24 years ago

I just debugged an issue where frame-state flags needed to be carried forward 
into a continuing frame (bug 74184). Let me convince myself that this won't 
cause problems if we need to create a continuing frame for the block. (I think 
this'd only happen while printing, which might make it doubly fun to debug!)

As far as picking a ``good value'', I've tried to think of some way to 
statically analyze the problem (to impress brendan, of course), but I can't. So, 
I think that the way to pick a good value is to pick the largest value that 
doesn't empirically create a ``noticable delay'' in the test case that's 
described above on a low-end machine. I've got a crappy machine at work (133MHz 
Win98) that I'll apply the patch on and try to cough up the juice.

Tomi Leppikangas

Comment 46

•

24 years ago

I tryed that patch on linux and i get crash in large page:
http://www.student.oulu.fi/stats/Jan.wwwstats.html

Stacktrace is like this:

#0  0x41ac11e3 in nsBlockFrame::LastLine(nsIPresShell*) (this=0x91bc744,
aPresShell=0x87f1200)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5351
#1  0x41c60db3 in nsBlockFrame::LastChild(nsIPresShell*) (this=0x91bc744,
aPresShell=0x87f1200)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5373
#2  0x41ac130e in nsBlockFrame::AppendFrames(nsIPresContext*, nsIPresShell&,
nsIAtom*, nsIFrame*) (this=0x91bc744, 
    aPresContext=0x86f0bd0, aPresShell=@0x87f1200, aListName=0x0,
aFrameList=0x947d338)
    at ../../../../../layout/html/base/src/nsBlockFrame.cpp:5400
#3  0x41adae6e in FrameManager::AppendFrames(nsIPresContext*, nsIPresShell&,
nsIFrame*, nsIAtom*, nsIFrame*) (
    this=0x86a6398, aPresContext=0x86f0bd0, aPresShell=@0x87f1200,
aParentFrame=0x91bc744, aListName=0x0, 
    aFrameList=0x947d338) at
../../../../../layout/html/base/src/nsFrameManager.cpp:777
#4  0x41b91af7 in nsCSSFrameConstructor::AppendFrames(nsIPresContext*,
nsIPresShell*, nsIFrameManager*, nsIContent*, nsIFrame*, nsIFrame*)
(this=0xbfffe440, aPresContext=0x86f0bd0, aPresShell=0x87f1200,
aFrameManager=0x86a6398, 
    aContainer=0x872eab8, aParentFrame=0x91bc744, aFrameList=0x947d338)
    at ../../../../../layout/html/style/src/nsCSSFrameConstructor.cpp:7596
#5  0x41b93d86 in nsCSSFrameConstructor::ContentAppended(nsIPresContext*,
nsIContent*, int) (this=0x8753310, 
    aPresContext=0x86f0bd0, aContainer=0x872eab8, aNewIndexInContainer=57)
    at ../../../../../layout/html/style/src/nsCSSFrameConstructor.cpp:8142


code at crash is:
5349            // ...then we can use the cache!
5350            nsLineBox* line = cachedLine;
5351            while (line->mNext)            <-- crash here
5352              line = line->mNext;

rbs

Comment 47

•

24 years ago

Weird. In theory, you shouldn't get there at all because gCachedLineThreshold=0. 
Did you set a non-zero value in the env variable? If so, which value?

Tomi Leppikangas

Comment 48

•

24 years ago

I set BLOCK_FRAME_LINE_CACHE_THRESHOLD=64 when it crashes, didnt
try other values.

rbs

Comment 49

•

24 years ago

retracting my r pending further investigations.

Mike Shaver (:shaver emeritus)

Assignee

Comment 50

•

24 years ago

Dammit.  This isn't for 0.9, obviously.

I'll investigate that crash when I'm finished being sick.

Target Milestone: mozilla0.9 → mozilla0.9.1

Mike Shaver (:shaver emeritus)

Assignee

Comment 51

•

24 years ago

Attached patch once more, with feeling: handle the invalidated-cache case correctly — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 52

•

24 years ago

So, here's what was happening: when we remove frames, we invalidate the cache,
by storing a null value in the property.  I guess I expected that a subsequent
retrieval would fail, but that's not the case (I think it was with my
dhash-based frame property code, which overloaded null).

Much better now.  I don't crash on Tomi's page (though it _is_ still
heart-breakingly slow :-((((((((( ).

Mike Shaver (:shaver emeritus)

Assignee

Comment 53

•

24 years ago

(On my happier days, I dream that someone who really cares about this, like
perhaps the mail-compose performance guys, will test it and tell me if I've been
wasting my time and everyone else's.)

Mike Shaver (:shaver emeritus)

Assignee

Comment 54

•

24 years ago

Attached patch full patch, sorry for the mixup — Details — Splinter Review

Mike Shaver (:shaver emeritus)

Assignee

Comment 55

•

24 years ago

Even on the "full patch", I am mortified by my own incompetence.  I managed to
get a bit of hyatt's painting-delay patch in there, and also to forget the
nsLayoutAtomList.h change.

I'm going to go soak my head now, after I post another patch.  Special apologies
to stephend and blake, who I have thwarted repeatedly in their attempts to help
me profile.

Mike Shaver (:shaver emeritus)

Assignee

Comment 56

•

24 years ago

Attached patch all the pieces, none of hyatt's, I'm such a loser — Details — Splinter Review

scottputterman

Comment 57

•

24 years ago

let's get some of those mail compose guys you dream about cc'd on this bug.  It
looks like stephend is helping out with testing perf, but maybe JF or varada can
also comment on whether or not this speeds up reply.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 58

•

24 years ago

I'm going to Quantify this as soon as Rational comes through with my keys.

rbs

Comment 59

•

24 years ago

Wondering if there is any connection with this bug an bug 29584: Exponential 
time to open text files in the Editor.

Chris Waterson

Updated

•

24 years ago

Blocks: 56854

Chris Waterson

Comment 60

•

24 years ago

Is profiling really necessary here? Can't someone just apply the darn patch and 
see if it makes any difference on the test case described in bug 22486?

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 61

•

24 years ago

Attached file msgcompose log *before* patch — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 62

•

24 years ago

Attached file msgcompose log *after* patch — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 63

•

24 years ago

The two output logs I attached showed the results of being unpatched and 
patched, but I haven't yet enabled the pref.  I'll test 
BLOCK_FRAME_LINE_CACHE_THRESHOLD values of 0, 1, 16, 64 and 128, and post logs 
for each run.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 64

•

24 years ago

Attached file Output with threshold env variable at 1 — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 65

•

24 years ago

Attached file Output with threshold env variable at 64 — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 66

•

24 years ago

Attached file Output with threshold env variable at 128 — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 67

•

24 years ago

Attached file Output with threshold env variable at 16 — Details

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 68

•

24 years ago

My machine used to generate this data is a Pentium III 750 with 128 megs of RAM, 
running Windows 2000 SP1 with 1 GIG of virtual memory.  The build was a 
debug trunk build pulled at around noon.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 69

•

24 years ago

Attached file Matrix of all the values, stripped out the mindless assertions. — Details

Chris Waterson

Comment 70

•

24 years ago

Cool, thanks for doing this Stephen. To tabularize this inline:

            
    n       Total  Insertion
    0 (off)  8734        681
    1        8164        678
   16        7939        678
   64        8253        683
  128        8679        689

I presume that the times are given in msec?

How many lines did the message that you replied to have? I was under the 
impression that insertion time contributed much more significantly to the 
overall time to open the window than is indicated by your data. (It looks like 
the delta that this makes to insertion time is in the noise, frankly. But maybe 
you tested with a small message?)

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 71

•

24 years ago

Oh, stone me senseless.  I knew this was for message compose, but didn't realize 
this was intended for replying to messages.  I thought this was just for 
initializing the editor widget.  I'm also assuming that this is milliseconds, as 
http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html doesn't nail 
that down.  What I tested with was actually just a blank document, with a 2.43 
meg attachment.  Chris/Shaver, what size of messages should I be *replying* to 
to test this properly? Let me know, I'll run the tests again.

John Morrison

Comment 72

•

24 years ago

> The build was a debug trunk build pulled at around noon.

Wouldn't it be better to test this way in an optimized build?

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 73

•

24 years ago

John is absolutely right, same procedure as with Quantify, etc, (setting
moz_debug to be null).  So, with the patches in my tree, the right variable
unset, and ready to rock and roll on some more testing, can I receive some
scenarios to help me get this tested properly in anticipation of Mike landing
his patch and helping out mail/news? Thanks.

rbs

Comment 74

•

24 years ago

If I understand that code correctly, the code-section being exercised is when a
_long_ content needs to be broken in several lines. Mail&News provide a good
example for this because the message text has to be broken in several lines in
the content window. But in addition to testing Mail&News, another really good
example is "viewsource". This is because all of the viewsource content is inside
a <pre>...</pre> block. Hence some interesting data can also be obtained by
viewing source of
http://lxr.mozilla.org/seamonkey/source/layout/html/style/src/nsCSSFrameConstructor.cpp. 

But this might be an extreme case and other cases could be found. As far as
testing is concerned, viewsource might be easier to deal with.

Chris Waterson

Comment 75

•

24 years ago

To test this, I created a 6,000 line plain-text message and mailed it to myself. 
Then, I tried to reply-to this message. I ended up with about 31s to reply to 
the message without the patch, and 31s to reply to the message with the patch, 
and with BLOCK_FRAME_LINE_CACHE_THRESHOLD=16. I also tried plain clipboard 
insertion, and that didn't seem to show any difference either.

Maybe something in editor has changed? Anyway, I'll re-profile this and report 
back soon.

Chris Waterson

Comment 76

•

24 years ago

Ok, so I don't have a ton of time to do a detailed analysis of the profile, but 
it looks like nsBlockFrame::AppendFrames() is called _once_, and accounts for a 
tiny fraction of the time. nsBlockFrame::InsertFrames() is called 11K times, and 
accounts for 14% of the time, which this patch won't help (I don't think). (I'm 
profiling clipboard insertion, too...)

There seems to be quite a bit of low-hanging fruit here; I'll attach a text 
version of the profile to the original bug (bug 22486).

Mike Shaver (:shaver emeritus)

Assignee

Comment 77

•

24 years ago

This patch has rotted, and I'm sick of chasing the conflicts in my tree, so I
hereby declare it dormant.  If someone thinks it's worthwhile (and I'm pretty
sure it is for large pages, but hey), they can update it and live the adventure.

Untargetting and updating summary to keep mail-perf folks from suffering under
false hopes.  (Someone may want to remove the dependency, but I'm not going to
ass-u-me.)

Thanks for your time, everyone.

OS: Windows 2000 → All

Priority: -- → P5

Hardware: PC → All

Summary: 10% of reply time spent in O(n^2) line walking algorithm in nsLineBox → Optimizations possible in AppendLines

Target Milestone: mozilla0.9.1 → ---

Chris Waterson

Comment 78

•

24 years ago

mjudge and I were looking at some selection stuff yesterday (bug 62971), and 
stumbled across this thing that kipp allegedly hacked up days before his 
departure. It's a ``line iterator''. Here's how it works: it walks the line list 
once to count the lines, then allocates an array big enough to hold pointers to 
all of them, then walks the line list again to put the pointers into the array. 
Now the lines can be accessed like an array.

Which made me wonder...why aren't the lines just stored in an array? That would 
solve this problem (getting to the last line quickly). And it would also solve 
some of our hit-test headaches (e.g., in selection), since each line knows its 
bounds, lines don't overlap, and each line's y-coordinate increases 
monotonically. (I think, somebody call my bluff if this isn't the case.)

I think that there's very little code (if any) that inserts lines ``in the 
middle'' of the line array -- and that which does ends up throwing away all the 
subsequent lines anyway IIRC. Anyway, something else to think about.

Tomi Leppikangas

Comment 79

•

23 years ago

I did some profiling with this, with long html page time spend
in nsBlockFrame::AddFrames() drop from 1.9sec to 0.02 (5.7%->0.1% total time)

I'll attach interesting parts from gprof output.

Tomi Leppikangas

Comment 80

•

23 years ago

Attached file gprof output from nsBlockFrame::AddFrames — Details

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 81

•

23 years ago

The work on bug 86947 should also fix the nsBlockFrame::AddFrames problem, since
it makes the line list doubly-linked (for other reasons) and searches the list
from the end in the AddFrames case.

Chris Waterson

Comment 82

•

23 years ago

I think dbaron will have fixed this in bug 86947.

Depends on: 86947

Chris Waterson

Updated

•

23 years ago

No longer depends on: 86947

basic

Comment 83

•

23 years ago

so is this fixed?

basic

Comment 84

•

23 years ago

remove self

Markus Hübner

Comment 85

•

23 years ago

What's the current status of this one?

Chris Waterson

Comment 86

•

22 years ago

Well, dbaron fixed the block frame; however, there are other places where it'd
be nice to have a last-child member. For example, see bug 145425.

Boris Zbarsky [:bzbarsky]

Updated

•

21 years ago

Depends on: 233463

Boris Zbarsky [:bzbarsky]

Updated

•

21 years ago

Blocks: 145425

Mike Shaver (:shaver emeritus)

Assignee

Updated

•

20 years ago

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → WORKSFORME

BMO Automation

Updated

•

6 years ago

Product: Core → Core Graveyard

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Layout: HTML Frames → Layout: Images

Product: Core Graveyard → Core

You need to log in before you can comment on or make changes to this bug.