Closed Bug 51938 Opened 24 years ago Closed 20 years ago

Getting the frame for a window coordinate is inefficient (GetFrameForPoint)

Categories

(Core :: Layout: Block and Inline, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: sfraser_bugs, Assigned: roc)

References

()

Details

(Keywords: perf, Whiteboard: [dogfood+])

Attachments

(3 files, 4 obsolete files)

Select and mouseover code require that we get the nsFrame that 'owns' a point on 
the screen an awful lot, and the process of getting the correct frame is very 
inefficient. Quantify data for selecting text in large documents (e.g. LXR blame 
documents) show that nsFrame::GetFrameForPoint() is being called hundreds of 
thousands of times for each move or click event that comes in, and this takes 
considerable time. Even processing mouse move events on the sample page (see URL) 
eats up 100% of the CPU time on a Windoze box.

This is a serious performance problem for selection.
See bug 49772 which describes the selection problems to which this bug 
contribute.
Keywords: dogfood, perf
Blocks: 49772
marking [dogfood+] per karnaze.
Status: NEW → ASSIGNED
Whiteboard: [dogfood+]
Adding kin@netscape.com to cc list.
One way to solve this would be to keep line boxes sorted by their y-coordinate
once we passed a threshold (say, 100 line boxes in a block frame). In this case,
we'd use the sorted data structure to find reasonable points to start and stop
searching for a frame that contains the point. Does this seem reasonable?
If you do something like that you have to be careful of line boxes that contain
blocks that contain floats or absolutely positioned elements (? - they may not
be a problem b/c of views) or blocks with negative margins.
Completely off the wall idea. As we render into the back buffer, or to the 
screen, whatever, we keep a map for each pixel and what frame rendered it. A 
classic memory vs speed trade off.

Probably a post 6.0 thing, but I thought I'd throw it out there.
For *each pixel*? Isn't that rather a lot of memory?
You don't have to keep an entry for each and every pixel, just regions/rects.
And even if you did, 32bits per address is the same as the 32bits per pixel, so
it is just like having a second buffer. Is that a lot of memory? Yes. In
compairison to our footprint? No comment.

The idea is that you can make a much smaller search space because you already
know what frame a pixel corresponds to since you did the work to put it there in
the first place.
It looks like nsContrainerFrame::GetFrameForPointUsing() does most the heavy
lifting, and it does a simple depth-first search.  If instead of this, it tried
to be smart about which child to examine, it might speed up considerably.  We
know the bounding rect of each child, and I think we have a flag on each child
that says whether it's children all live within that rect or whether they can
extend outside of it (which I think it rare.)  So, we should be able to make a
good guess about which child is likely to contain the point, rather than doing
the blind tree walk.

It might be reasonable to keep an auxillary data structure to facilitate this,
but offhand I can't think of anything that adds considerable value to the
existing frame tree. The space manager exists to map regions to data, so if we
wanted to try saari's suggestion, the space manager could be the foundation for
mapping regions to frames.
We already use the flag that says whether frames have outside children.  
See, for example, nsContainerFrame::GetFrameForPoint, which begins:

244   PRBool inThisFrame = mRect.Contains(aPoint);
245 
246   if (! ((mState & NS_FRAME_OUTSIDE_CHILDREN) || inThisFrame ) 
) {
247     return NS_ERROR_FAILURE;
248   }
Yo, Chris: if you're busy, I'm happy to take this one from your plate. I've done 
this code before in prior layout systems.
go for it.
Assignee: waterson → rickg
Status: ASSIGNED → NEW
Ooh-wee! Let's see what I can do...
Status: NEW → ASSIGNED
Buster and I discussed the frame system (again), and concluded that there may be 
cases where a child is floated, extending beyond it's parent/grandparent. There 
are bits (which may not be getting propagated) which indicate this case. In a 
nutshell -- it's very possible that layout isn't yet smart enough for a solution 
based on this state to succeed.

Therefore, I'm proceeding with a sneaky hack (I can sense the snide remarks 
forming already). I'm going to build a tiny little MRU frame cache, upon the 
premise that there is locality of scope to selection and targeting events. 
The hit rate will be a function of 1) cache size; 2) locality of event.
I'll post more after I've tried it.
The bit that says a frame has outside children *does* work.  If it didn't, there
would be a lot of cases where events didn't get through, because we use that bit
now.  (What we don't know is the extent of the overflow.)
Trying cache now...
The simple cache approach looks promising so far (on visual tests); I'm trying 
to run quanity now to prove the performance improvement. 
*** Bug 53676 has been marked as a duplicate of this bug. ***
Add me to cc: 

This impacts me in a very noticable way when a <textarea> multiline textfield 
has LOTS of lines in it.

Adding a printf("blah"); to nsFrame::GetFrameForPoint() showed it just going 
through time and again and again and again and again...
Quantify shows a 2x speedup in the most trivial case. It's my hope that we'll do 
even better on more complex pages (just guessing though). 

I've handed the (rather simple) diffs off to attinasi to review and stress test 
(he's more familiar with frame code then I am).
Assignee: rickg → attinasi
Status: ASSIGNED → NEW
Accepting: I'll be testing Rick's fixes for starters.
Status: NEW → ASSIGNED
I have the code changes that Rick put together, however I may not get to this
very soon due to higer prioroty RTM+ bugs in my list, so if somebody else has
time and wants to shepard this through, please let me know and I'll happliy send
the package over to you to finish testing and shepard throught the approval /
checkin process.
'Cause he forced me, back to Rick.
Assignee: attinasi → rickg
Status: ASSIGNED → NEW
My TEXTAREA example seems to be MUCH improved since my last post (tried with 
current tree sources).

I have not tried the printf() debug again, but I would almost dare to say my 
problem (as observed) has been fixed.

Have any changes landed from this bug ?  Status anyone ?

Has anyone else tried their test set lately ?



check url from bug 52005, "poor performance scrolling large plain text files":
ftp://ftp.funet.fi/pub/doc/dictionaries/Finnish/words.finnish

on this page, mouse movements generate an _extraordinary_ amount of
GetFrameForPoint() calls. i put a printf in that function to see how often it
gets called, and unlike on other pages i tried, on this large text file, xterm
can't even keep up. you really want to check that out, it's quite drastic.... :I
This bug was marked to be fixed in a previous milestone but it didn't get fixed 
properly. Nominated for beta1.
Keywords: nsbeta1
perf issues bubbling to the top.
the mru cache never got checked in,and unfortunately, rickg and marc both seem
to have lost the code.  :(
so I'm starting over with that idea
Assignee: rickg → buster
Severity: normal → major
OS: Mac System 8.5 → All
Priority: P3 → P2
Target Milestone: --- → mozilla0.9
In the future, please to be attaching the proposed patches to the bug report.
that's ok, my new implementation is undoubtedly better than whatever hack those
two clowns cobbled together  :)

fix to be attached soon.  testing and measuring now...
Status: NEW → ASSIGNED
I have the cache for GetFrameForPoint().  Attaching diff.  attinasi, please
review.  waterson, please sr.  still a work in progress, but it is currently
good enough to remove GetFrameForPoint() as an issue in perf measurements.
Attached patch diffs for new cache (obsolete) — Splinter Review
Could you attach unified diffs?
What dbaron said! ;-)

From what I could read...

- In the MRU cache's Search() method, don't you want to move the `found frame'
  to the front of the cache?

- How did you arrive at `20' for the magic number? Seems a bit high to
  me, mostly because that's a lot of deque entries to grovel through on
  a miss...
the diffs I've attached greatly reduce the cost for a hit.  With a cache of size
10 (approximately 200 bytes), I consistently get a hit rate of 50-60% for simple
mouse and selection actions over text and standard content blocks.  But, this
gives us no improvement whatsoever in the case where the point is over the "dead
space" of a container.  That is, when the point is not over a leaf frame, the
cache is useless.
Summary: Getting the frame for a window coordinate is inefficient → Getting the frame for a window coordinate is inefficient (GetFrameForPoint)
So does this cache make the assumption that if a point is within a cached leaf
frame, then that leaf frame is the correct result?  That's a bad assumption.
about to attach -u diffs.

concerning Chris' questions:
- In the MRU cache's Search() method, don't you want to move the `found frame'
  to the front of the cache?

The cache as a FIFO queue, and the "found frame" is being put at the "back", so
that it will live the longest.  Is that what you meant?  You can verify that by
running MRUFrameCache::SelfTest() and looking at the console output.  For example:

added first MRU_FRAME_CACHE_SIZE items
item 0 = 00000001 (0, 0, 10, 10)
item 1 = 00000002 (0, 10, 10, 10)
item 2 = 00000003 (0, 20, 10, 10)
item 3 = 00000004 (0, 30, 10, 10)
item 4 = 00000005 (0, 40, 10, 10)
item 5 = 00000006 (0, 50, 10, 10)
item 6 = 00000007 (0, 60, 10, 10)
item 7 = 00000008 (0, 70, 10, 10)
item 8 = 00000009 (0, 80, 10, 10)
item 9 = 0000000A (0, 90, 10, 10)
added n+1th item
item 0 = 00000002 (0, 10, 10, 10)
item 1 = 00000003 (0, 20, 10, 10)
item 2 = 00000004 (0, 30, 10, 10)
item 3 = 00000005 (0, 40, 10, 10)
item 4 = 00000006 (0, 50, 10, 10)
item 5 = 00000007 (0, 60, 10, 10)
item 6 = 00000008 (0, 70, 10, 10)
item 7 = 00000009 (0, 80, 10, 10)
item 8 = 0000000A (0, 90, 10, 10)
item 9 = 0000000B (0, 100, 10, 10)
added n+2th item
item 0 = 00000003 (0, 20, 10, 10)
item 1 = 00000004 (0, 30, 10, 10)
item 2 = 00000005 (0, 40, 10, 10)
item 3 = 00000006 (0, 50, 10, 10)
item 4 = 00000007 (0, 60, 10, 10)
item 5 = 00000008 (0, 70, 10, 10)
item 6 = 00000009 (0, 80, 10, 10)
item 7 = 0000000A (0, 90, 10, 10)
item 8 = 0000000B (0, 100, 10, 10)
item 9 = 0000000C (0, 110, 10, 10)


- How did you arrive at `20' for the magic number? Seems a bit high to
  me, mostly because that's a lot of deque entries to grovel through on
  a miss...

Yeah, the comment by the number says:
#define MRU_FRAME_CACHE_SIZE 10  // fairly arbitrary, needs to be tuned

Notice I have it set at 10 now.  It's easy to fool around with the markup and
the user action (this is, after all, an optimization primarily targeted at
speeding up interactive mouse events) to get cache hit rates all over the board.
 By far, the biggest payoff is at cachesize=1, since mouse move events come in
fast enough for us to see a hit rate of nearly 50% based on just remembering the
latest hit.  That's true when we consider only mouse events over leaf elements,
see my previous comments for the problems of "dead space" points.
David:  I haven't yet found a case where it isn't true.  But I'm very early on
in my testing.  Please elaborate, what case are you thinking about?  Positioned
elements? 
buster: typically an MRU algorithm will promote items that it finds in the cache 
so that they tend to stay in the cache. I don't think that Search() does this, 
does it?
no, not yet.  maybe never, it might be that FIFO is good enough.  still testing.

good catch, though.  I should have documented that, especially given the name I
chose for the class!
Just for reference, bug 26785 used a similar testcase.
this cache works great for simple documents.  as david points out, it fails
miserably for documents with overlapping elements, as may be caused by negative
margins or positioned elements.

as a first step, I will look into what it would take to determine whether the
document contains any of these conditions, basically a bool that says whether to
use the cache nor not.  The most expediant way to know this might be a hook in
the style system, so if the style system ever resolves a rule that might cause
an overlap, we just don't use the cache (regardless of whether that rule is ever
actually ever used by an instantiated element.)  This would give us the
performance boost we're looking for on the vast majority of pages in the short
term, and we could investigate alternatives post-6.5.
Another approach to speeding up GetFrameForPoint might be to add a state bit on
nsLineBox indicating whether there were overflowing children and then overriding
GetFrameForPointUsing on nsBlockFrame (for the aList==null case, and use the
inherited one for the others).

This would avoid descending into 99%+ of the lines in cases with a single block
with a huge child list (like the one above), which are probably the only cases
where we have significant GetFrameForPoint performance problems.
taking
Target Milestone: mozilla0.9 → mozilla0.9.1
really taking.
Assignee: buster → waterson
Status: ASSIGNED → NEW
Keywords: patch
Status: NEW → ASSIGNED
I wonder if there's something weird going on in the test case for this bug, like
every frame having 6 twips overflow...
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla1.0
Blocks: 104166
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Attachment #21692 - Attachment is obsolete: true
Target Milestone: mozilla1.0.1 → Future
*** Bug 235517 has been marked as a duplicate of this bug. ***
roc had some ideas about this in bug 235517:

  In particular, can't we make nsBlockFrame::GetFrameForPoint to iterate through
  the lines, scanning the frames within a line only if the point is inside the
  line's combinedArea? I bet that would speed things up a lot. Actually I'm sure
  this would work because that's how nsBlockFrame::Paint works.

  If we were really cool we would keep a bit in each nsBlockFrame, and set it to
  true if we detect that the sequence of 'line->combinedArea.y's is
  nondecreasing and the sequence of 'line->combinedArea.YMost's is
  nondecreasing. We would also cache somewhere a pointer to the most recently
  accessed line. Then we can quickly find the complete set of line(s) for a
  given point by starting at the cached pointer and searching forwards or
  backwards. I remember at one time thinking this could speed up painting on
  large pages too.

[editorial note -- the painting issue is bug 52005]
Blocks: 52005
Component: Layout → Layout: Block and Inline
Priority: P2 → --
Target Milestone: Future → ---
Attached patch Implement by-line event dispatch (obsolete) — Splinter Review
This implements the first part of my suggestion. It's quite straightforward and
seems to work correctly.

Any suggestions for how to measure the performance effect of this?
BTW, a note about the cursor idea I proposed in bug 52005 and here:

It's probably not hard to do. We make the line cursor a frame property. We keep
a state bit that tells us quickly whether we have such a property. When there is
no property, Paint or GetFrameForPointUsing iterate through all lines, just like
they do now. During this iteration they also check whether the line boxes
combinedArea.ys and combinedArea.YMosts are nondecreasing. If so, and there are
many lines (say more than 50), then they set the line cursor property to some
value, say the first line. right?) When the property is available (and thus the
nondecreasing invariants hold), Paint and GetFrameForPointUsing will locate the
lines they're interested in based on coordinates by searching from the cursor
forward or backward and storing the last line they touch back into the cursor
property. Reflow just always clears the property. (We can't create or destroy
lines, or change their combinedareas, except during reflow and frame teardown,
right?)

This would make for fairly simple code and I think negligible space and time
overhead.
I just tried this patch on the testcase in bug 235517 (seeing how many profiler
hits I got in a minute of moving the mouse, so it's kinda crude).  I see at best
a 10% improvement and the cpu still gets pegged when I move the mouse; with your
patch 80% of the profiler hits are in (not under)
nsBlockFrame::GetFrameForPointUsing.  So the real problem is still the O(N) loop
over all lines....
Attached patch line cursor cache (obsolete) — Splinter Review
This patch implements the previous stuff plus adds the line cursor cache that I
suggested. Seems to work OK once I explicitly skipped lines with empty
combinedAreas when determining whether the invariant holds. My tests seem to
show that it's effective on the Postgres manual testcase; instead of scanning
>18K lines in the main DIV on every mouse move, we usually scan 3.

bz, would you mind checking out the profile performance of this one? If it
looks good I'll finish the patch by using/setting the cursor in
nsBlockFrame::PaintChildren.
Assignee: waterson → roc
Attachment #142397 - Attachment is obsolete: true
With that patch on the testcase in bug 235517 (same test as in comment 53 -- 1
minute wall-clock time) I see a factor of 6 reduction in the number of profiler
hits (since this is a non-realtime profile, that's actually meaningful).    The
top of the flat profile now looks like:

Total hit count: 322
Count %Total  Function Name
 10   3.1     nsXULElement::HandleDOMEvent
  8   2.5     nsQueryInterface::operator()
  7   2.2     __libc_poll
  6   1.9     SearchTable
  5   1.6     _end
  5   1.6     __pthread_alt_unlock
  5   1.6     nsRuleNode::GetStyleData

and there are only 4 total hits under nsBlockFrame::GetFrameForPoint.

So certainly looks much better.  ;)  I haven't looked at the patch itself; just
applied it blindly and profiled.

Once we have a finalized patch, we should file some followup bugs on what's left
after this one is fixed -- of those 322 hits 154 are handling mousing on and off
anchors (and 133 of those are updating the status bar!).
Excellent. I'll finish the patch shortly.

Will we really need to optimize this further? If you're waving the mouse around
I'd expect to spend some time processing events and updating the status bar.
Sure.  There are just some issues that I wanted to look at (like the fact that
setting the status flushes pending notifications).  Probably not a big deal
either way... ;)
Attached patch v2 (obsolete) — Splinter Review
Here's the version which optimizes painting as well.
Comment on attachment 143083 [details] [diff] [review]
v2

let me know if you can handle this review
Attachment #143083 - Flags: superreview?(bzbarsky)
Attachment #143083 - Flags: review?(bzbarsky)
Note: if we ever find important test cases where this doesn't work because the
line combinedArea.ys/ymosts decrease at some non-empty line (e.g., because some
line has some relative-positioned stuff), then we can still save the day. One
way would be to actually grow the line combined areas as necessary to force that
invariant to hold.
*** Bug 52005 has been marked as a duplicate of this bug. ***
roc, I'm not sure I understand how lines, blockframes, out-of-flows all interact
well enough to review this.... I could probably sr, but dbaron should probably
do the r=.
Comment on attachment 143083 [details] [diff] [review]
v2

no problem. reassigning to dbaron.
Attachment #143083 - Flags: superreview?(dbaron)
Attachment #143083 - Flags: superreview?(bzbarsky)
Attachment #143083 - Flags: review?(dbaron)
Attachment #143083 - Flags: review?(bzbarsky)
Comment on attachment 143083 [details] [diff] [review]
v2

OK, I did a little self-improvement... (and asked roc what was up with the
check for an empty combined area ;) )

>Index: layout/html/base/src/nsBlockFrame.cpp
> nsBlockFrame::PaintChildren(nsIPresContext*      aPresContext,

>+    for (line_iterator line = mLines.begin(cursor);
>+           line != line_end;

Weird indent there.

>+        if (lineArea.Intersects(aDirtyRect)) {

This if/else statement is duplicated in the "no cursor" case.  Perhaps it
should move into a helper function?  That can be a nice inline function and all
if we want, but it's bad to have identical code in two different spots...

>+    for (line_iterator line = begin_lines();
>+           line != line_end;

Weird indent.

>+        if (lineArea.y < lastY
>+            || lineArea.YMost() < lastYMost) {
>+          nonDecreasingYs = PR_FALSE;
>+        }
>+        lastY = lineArea.y;
>+        lastYMost = lineArea.YMost();

I assume there is no really good reason to wrap all that in a check that
nonDecreasingYs is still true since in most cases it _will_ be true and all
that is pretty cheap, right?

>+nsLineBox* nsBlockFrame::GetFirstLineContaining(nscoord y) {

This seems to assume that "y" is within the YMost of one of our lines (hence
that it's in our rect).  We should probably assert that up front.

>+nsBlockFrame::GetFrameForPointUsing(nsIPresContext* aPresContext,

>+  nsIFrame *hit;
>+  nsPoint tmp;

Move declarations to first use?  Eg move the decl of tmp down to where the
MoveTo is right now?

>+        if (lineArea.Contains(tmp)) {

Again, this block is identical to the one in the "no cursor" case...

>+    PRBool nonDecreasingYs = PR_TRUE;

I really wonder whether we can pull the logic for all this out into a function
that will get that inner bit (the part starting with if
(lineArea.Contains(tmp)) as a function pointer param or something.... I guess
that's hard because here we actually need to return a value.  <sigh>.

>Index: layout/html/base/src/nsBlockFrame.h

>+  // Get the lines containing y-coord 'y', or nsnull if you must search
>+  // all lines. If nonnull is returned then we guarantee that the lines'
>+  // combinedArea.ys and combinedArea.yMosts are non-decreasing.
>+  nsLineBox* GetFirstLineContaining(nscoord y);

Comment should probably say "Get the first line containing..."

r+sr=bzbarsky with the changes to factor out shared code and other nits
addressed.
Attachment #143083 - Flags: superreview?(dbaron)
Attachment #143083 - Flags: superreview+
Attachment #143083 - Flags: review?(dbaron)
Attachment #143083 - Flags: review+
One other thing.  The line

>+const int MIN_LINES_NEEDING_CURSOR = 20;

In that patch is inside a DEBUG ifdef.  I don't think you want that.  I _do_
think you want to make that a static const int, though.
> I assume there is no really good reason to wrap all that in a check that
> nonDecreasingYs is still true since in most cases it _will_ be true and all
> that is pretty cheap, right?

That's my guess.

> This seems to assume that "y" is within the YMost of one of our lines (hence
> that it's in our rect).  We should probably assert that up front.

That's not necessarily true (e.g., a DIV with CSS width and height but a small
number of small lines), but it doesn't matter. If y is below the YMost of the
last line, we'll leave the cursor at the last line and return that. The loops
that use the cursor will still work fine. I'll comment GetFirstLineContaining to
indicate that sometimes it may return the last line even if that line doesn't
contain the y.

> I really wonder whether we can pull the logic for all this out into a function
> that will get that inner bit (the part starting with if
> (lineArea.Contains(tmp)) as a function pointer param or something.... I guess
> that's hard because here we actually need to return a value.  <sigh>.

Yeah, that's a bit of a hassle. Also lots of parameters need to be threaded
down into the enclosed code. My inline functions that I created to extract the
common inner code are PaintLine with 8 parameters and GetFrameFromLine with 6
parameters.

I'll attach the patch for checkin.
roc, see comment 66....
Oops, thanks for that catch. Fix checked in :-)
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: