Closed
Bug 51938
Opened 24 years ago
Closed 21 years ago
Getting the frame for a window coordinate is inefficient (GetFrameForPoint)
Categories
(Core :: Layout: Block and Inline, defect)
Core
Layout: Block and Inline
Tracking
()
RESOLVED
FIXED
People
(Reporter: sfraser_bugs, Assigned: roc)
References
()
Details
(Keywords: perf, Whiteboard: [dogfood+])
Attachments
(3 files, 4 obsolete files)
386 bytes,
text/html
|
Details | |
16.72 KB,
patch
|
Details | Diff | Splinter Review | |
19.49 KB,
patch
|
Details | Diff | Splinter Review |
Select and mouseover code require that we get the nsFrame that 'owns' a point on
the screen an awful lot, and the process of getting the correct frame is very
inefficient. Quantify data for selecting text in large documents (e.g. LXR blame
documents) show that nsFrame::GetFrameForPoint() is being called hundreds of
thousands of times for each move or click event that comes in, and this takes
considerable time. Even processing mouse move events on the sample page (see URL)
eats up 100% of the CPU time on a Windoze box.
This is a serious performance problem for selection.
Reporter | ||
Comment 1•24 years ago
|
||
See bug 49772 which describes the selection problems to which this bug
contribute.
Comment 2•24 years ago
|
||
marking [dogfood+] per karnaze.
Status: NEW → ASSIGNED
Whiteboard: [dogfood+]
Adding kin@netscape.com to cc list.
Comment 4•24 years ago
|
||
One way to solve this would be to keep line boxes sorted by their y-coordinate
once we passed a threshold (say, 100 line boxes in a block frame). In this case,
we'd use the sorted data structure to find reasonable points to start and stop
searching for a frame that contains the point. Does this seem reasonable?
If you do something like that you have to be careful of line boxes that contain
blocks that contain floats or absolutely positioned elements (? - they may not
be a problem b/c of views) or blocks with negative margins.
Comment 6•24 years ago
|
||
Completely off the wall idea. As we render into the back buffer, or to the
screen, whatever, we keep a map for each pixel and what frame rendered it. A
classic memory vs speed trade off.
Probably a post 6.0 thing, but I thought I'd throw it out there.
Reporter | ||
Comment 7•24 years ago
|
||
For *each pixel*? Isn't that rather a lot of memory?
Comment 8•24 years ago
|
||
You don't have to keep an entry for each and every pixel, just regions/rects.
And even if you did, 32bits per address is the same as the 32bits per pixel, so
it is just like having a second buffer. Is that a lot of memory? Yes. In
compairison to our footprint? No comment.
The idea is that you can make a much smaller search space because you already
know what frame a pixel corresponds to since you did the work to put it there in
the first place.
It looks like nsContrainerFrame::GetFrameForPointUsing() does most the heavy
lifting, and it does a simple depth-first search. If instead of this, it tried
to be smart about which child to examine, it might speed up considerably. We
know the bounding rect of each child, and I think we have a flag on each child
that says whether it's children all live within that rect or whether they can
extend outside of it (which I think it rare.) So, we should be able to make a
good guess about which child is likely to contain the point, rather than doing
the blind tree walk.
It might be reasonable to keep an auxillary data structure to facilitate this,
but offhand I can't think of anything that adds considerable value to the
existing frame tree. The space manager exists to map regions to data, so if we
wanted to try saari's suggestion, the space manager could be the foundation for
mapping regions to frames.
We already use the flag that says whether frames have outside children.
See, for example, nsContainerFrame::GetFrameForPoint, which begins:
244 PRBool inThisFrame = mRect.Contains(aPoint);
245
246 if (! ((mState & NS_FRAME_OUTSIDE_CHILDREN) || inThisFrame )
) {
247 return NS_ERROR_FAILURE;
248 }
Comment 11•24 years ago
|
||
Yo, Chris: if you're busy, I'm happy to take this one from your plate. I've done
this code before in prior layout systems.
Comment 14•24 years ago
|
||
Buster and I discussed the frame system (again), and concluded that there may be
cases where a child is floated, extending beyond it's parent/grandparent. There
are bits (which may not be getting propagated) which indicate this case. In a
nutshell -- it's very possible that layout isn't yet smart enough for a solution
based on this state to succeed.
Therefore, I'm proceeding with a sneaky hack (I can sense the snide remarks
forming already). I'm going to build a tiny little MRU frame cache, upon the
premise that there is locality of scope to selection and targeting events.
The hit rate will be a function of 1) cache size; 2) locality of event.
I'll post more after I've tried it.
The bit that says a frame has outside children *does* work. If it didn't, there
would be a lot of cases where events didn't get through, because we use that bit
now. (What we don't know is the extent of the overflow.)
Comment 16•24 years ago
|
||
Trying cache now...
Comment 17•24 years ago
|
||
The simple cache approach looks promising so far (on visual tests); I'm trying
to run quanity now to prove the performance improvement.
Comment 18•24 years ago
|
||
*** Bug 53676 has been marked as a duplicate of this bug. ***
Comment 19•24 years ago
|
||
Add me to cc:
This impacts me in a very noticable way when a <textarea> multiline textfield
has LOTS of lines in it.
Adding a printf("blah"); to nsFrame::GetFrameForPoint() showed it just going
through time and again and again and again and again...
Comment 20•24 years ago
|
||
Quantify shows a 2x speedup in the most trivial case. It's my hope that we'll do
even better on more complex pages (just guessing though).
I've handed the (rather simple) diffs off to attinasi to review and stress test
(he's more familiar with frame code then I am).
Assignee: rickg → attinasi
Status: ASSIGNED → NEW
Comment 21•24 years ago
|
||
Accepting: I'll be testing Rick's fixes for starters.
Status: NEW → ASSIGNED
Comment 22•24 years ago
|
||
I have the code changes that Rick put together, however I may not get to this
very soon due to higer prioroty RTM+ bugs in my list, so if somebody else has
time and wants to shepard this through, please let me know and I'll happliy send
the package over to you to finish testing and shepard throught the approval /
checkin process.
Comment 23•24 years ago
|
||
'Cause he forced me, back to Rick.
Assignee: attinasi → rickg
Status: ASSIGNED → NEW
Comment 24•24 years ago
|
||
My TEXTAREA example seems to be MUCH improved since my last post (tried with
current tree sources).
I have not tried the printf() debug again, but I would almost dare to say my
problem (as observed) has been fixed.
Have any changes landed from this bug ? Status anyone ?
Has anyone else tried their test set lately ?
Comment 25•24 years ago
|
||
check url from bug 52005, "poor performance scrolling large plain text files":
ftp://ftp.funet.fi/pub/doc/dictionaries/Finnish/words.finnish
on this page, mouse movements generate an _extraordinary_ amount of
GetFrameForPoint() calls. i put a printf in that function to see how often it
gets called, and unlike on other pages i tried, on this large text file, xterm
can't even keep up. you really want to check that out, it's quite drastic.... :I
Comment 26•24 years ago
|
||
This bug was marked to be fixed in a previous milestone but it didn't get fixed
properly. Nominated for beta1.
Keywords: nsbeta1
Comment 27•24 years ago
|
||
perf issues bubbling to the top.
the mru cache never got checked in,and unfortunately, rickg and marc both seem
to have lost the code. :(
so I'm starting over with that idea
Assignee: rickg → buster
Severity: normal → major
OS: Mac System 8.5 → All
Priority: P3 → P2
Target Milestone: --- → mozilla0.9
Comment 28•24 years ago
|
||
In the future, please to be attaching the proposed patches to the bug report.
Comment 29•24 years ago
|
||
that's ok, my new implementation is undoubtedly better than whatever hack those
two clowns cobbled together :)
fix to be attached soon. testing and measuring now...
Status: NEW → ASSIGNED
Comment 30•24 years ago
|
||
I have the cache for GetFrameForPoint(). Attaching diff. attinasi, please
review. waterson, please sr. still a work in progress, but it is currently
good enough to remove GetFrameForPoint() as an issue in perf measurements.
Comment 31•24 years ago
|
||
Could you attach unified diffs?
Comment 33•24 years ago
|
||
What dbaron said! ;-)
From what I could read...
- In the MRU cache's Search() method, don't you want to move the `found frame'
to the front of the cache?
- How did you arrive at `20' for the magic number? Seems a bit high to
me, mostly because that's a lot of deque entries to grovel through on
a miss...
Comment 34•24 years ago
|
||
the diffs I've attached greatly reduce the cost for a hit. With a cache of size
10 (approximately 200 bytes), I consistently get a hit rate of 50-60% for simple
mouse and selection actions over text and standard content blocks. But, this
gives us no improvement whatsoever in the case where the point is over the "dead
space" of a container. That is, when the point is not over a leaf frame, the
cache is useless.
Summary: Getting the frame for a window coordinate is inefficient → Getting the frame for a window coordinate is inefficient (GetFrameForPoint)
So does this cache make the assumption that if a point is within a cached leaf
frame, then that leaf frame is the correct result? That's a bad assumption.
Comment 36•24 years ago
|
||
about to attach -u diffs.
concerning Chris' questions:
- In the MRU cache's Search() method, don't you want to move the `found frame'
to the front of the cache?
The cache as a FIFO queue, and the "found frame" is being put at the "back", so
that it will live the longest. Is that what you meant? You can verify that by
running MRUFrameCache::SelfTest() and looking at the console output. For example:
added first MRU_FRAME_CACHE_SIZE items
item 0 = 00000001 (0, 0, 10, 10)
item 1 = 00000002 (0, 10, 10, 10)
item 2 = 00000003 (0, 20, 10, 10)
item 3 = 00000004 (0, 30, 10, 10)
item 4 = 00000005 (0, 40, 10, 10)
item 5 = 00000006 (0, 50, 10, 10)
item 6 = 00000007 (0, 60, 10, 10)
item 7 = 00000008 (0, 70, 10, 10)
item 8 = 00000009 (0, 80, 10, 10)
item 9 = 0000000A (0, 90, 10, 10)
added n+1th item
item 0 = 00000002 (0, 10, 10, 10)
item 1 = 00000003 (0, 20, 10, 10)
item 2 = 00000004 (0, 30, 10, 10)
item 3 = 00000005 (0, 40, 10, 10)
item 4 = 00000006 (0, 50, 10, 10)
item 5 = 00000007 (0, 60, 10, 10)
item 6 = 00000008 (0, 70, 10, 10)
item 7 = 00000009 (0, 80, 10, 10)
item 8 = 0000000A (0, 90, 10, 10)
item 9 = 0000000B (0, 100, 10, 10)
added n+2th item
item 0 = 00000003 (0, 20, 10, 10)
item 1 = 00000004 (0, 30, 10, 10)
item 2 = 00000005 (0, 40, 10, 10)
item 3 = 00000006 (0, 50, 10, 10)
item 4 = 00000007 (0, 60, 10, 10)
item 5 = 00000008 (0, 70, 10, 10)
item 6 = 00000009 (0, 80, 10, 10)
item 7 = 0000000A (0, 90, 10, 10)
item 8 = 0000000B (0, 100, 10, 10)
item 9 = 0000000C (0, 110, 10, 10)
- How did you arrive at `20' for the magic number? Seems a bit high to
me, mostly because that's a lot of deque entries to grovel through on
a miss...
Yeah, the comment by the number says:
#define MRU_FRAME_CACHE_SIZE 10 // fairly arbitrary, needs to be tuned
Notice I have it set at 10 now. It's easy to fool around with the markup and
the user action (this is, after all, an optimization primarily targeted at
speeding up interactive mouse events) to get cache hit rates all over the board.
By far, the biggest payoff is at cachesize=1, since mouse move events come in
fast enough for us to see a hit rate of nearly 50% based on just remembering the
latest hit. That's true when we consider only mouse events over leaf elements,
see my previous comments for the problems of "dead space" points.
Comment 37•24 years ago
|
||
David: I haven't yet found a case where it isn't true. But I'm very early on
in my testing. Please elaborate, what case are you thinking about? Positioned
elements?
Comment 38•24 years ago
|
||
buster: typically an MRU algorithm will promote items that it finds in the cache
so that they tend to stay in the cache. I don't think that Search() does this,
does it?
Comment 39•24 years ago
|
||
no, not yet. maybe never, it might be that FIFO is good enough. still testing.
good catch, though. I should have documented that, especially given the name I
chose for the class!
Comment 41•24 years ago
|
||
Just for reference, bug 26785 used a similar testcase.
Comment 43•24 years ago
|
||
this cache works great for simple documents. as david points out, it fails
miserably for documents with overlapping elements, as may be caused by negative
margins or positioned elements.
as a first step, I will look into what it would take to determine whether the
document contains any of these conditions, basically a bool that says whether to
use the cache nor not. The most expediant way to know this might be a hook in
the style system, so if the style system ever resolves a rule that might cause
an overlap, we just don't use the cache (regardless of whether that rule is ever
actually ever used by an instantiated element.) This would give us the
performance boost we're looking for on the vast majority of pages in the short
term, and we could investigate alternatives post-6.5.
Another approach to speeding up GetFrameForPoint might be to add a state bit on
nsLineBox indicating whether there were overflowing children and then overriding
GetFrameForPointUsing on nsBlockFrame (for the aList==null case, and use the
inherited one for the others).
This would avoid descending into 99%+ of the lines in cases with a single block
with a huge child list (like the one above), which are probably the only cases
where we have significant GetFrameForPoint performance problems.
Comment 46•24 years ago
|
||
really taking.
Updated•24 years ago
|
Status: NEW → ASSIGNED
I wonder if there's something weird going on in the test case for this bug, like
every frame having 6 twips overflow...
Updated•24 years ago
|
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Updated•24 years ago
|
Target Milestone: mozilla0.9.2 → mozilla1.0
Comment 48•23 years ago
|
||
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1
(you can query for this string to delete spam or retrieve the list of bugs I've
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Updated•23 years ago
|
Attachment #21692 -
Attachment is obsolete: true
Updated•23 years ago
|
Target Milestone: mozilla1.0.1 → Future
![]() |
||
Comment 49•21 years ago
|
||
*** Bug 235517 has been marked as a duplicate of this bug. ***
![]() |
||
Comment 50•21 years ago
|
||
roc had some ideas about this in bug 235517:
In particular, can't we make nsBlockFrame::GetFrameForPoint to iterate through
the lines, scanning the frames within a line only if the point is inside the
line's combinedArea? I bet that would speed things up a lot. Actually I'm sure
this would work because that's how nsBlockFrame::Paint works.
If we were really cool we would keep a bit in each nsBlockFrame, and set it to
true if we detect that the sequence of 'line->combinedArea.y's is
nondecreasing and the sequence of 'line->combinedArea.YMost's is
nondecreasing. We would also cache somewhere a pointer to the most recently
accessed line. Then we can quickly find the complete set of line(s) for a
given point by starting at the cached pointer and searching forwards or
backwards. I remember at one time thinking this could speed up painting on
large pages too.
[editorial note -- the painting issue is bug 52005]
Blocks: 52005
Component: Layout → Layout: Block and Inline
Priority: P2 → --
Target Milestone: Future → ---
Assignee | ||
Comment 51•21 years ago
|
||
This implements the first part of my suggestion. It's quite straightforward and
seems to work correctly.
Any suggestions for how to measure the performance effect of this?
Assignee | ||
Comment 52•21 years ago
|
||
BTW, a note about the cursor idea I proposed in bug 52005 and here:
It's probably not hard to do. We make the line cursor a frame property. We keep
a state bit that tells us quickly whether we have such a property. When there is
no property, Paint or GetFrameForPointUsing iterate through all lines, just like
they do now. During this iteration they also check whether the line boxes
combinedArea.ys and combinedArea.YMosts are nondecreasing. If so, and there are
many lines (say more than 50), then they set the line cursor property to some
value, say the first line. right?) When the property is available (and thus the
nondecreasing invariants hold), Paint and GetFrameForPointUsing will locate the
lines they're interested in based on coordinates by searching from the cursor
forward or backward and storing the last line they touch back into the cursor
property. Reflow just always clears the property. (We can't create or destroy
lines, or change their combinedareas, except during reflow and frame teardown,
right?)
This would make for fairly simple code and I think negligible space and time
overhead.
![]() |
||
Comment 53•21 years ago
|
||
I just tried this patch on the testcase in bug 235517 (seeing how many profiler
hits I got in a minute of moving the mouse, so it's kinda crude). I see at best
a 10% improvement and the cpu still gets pegged when I move the mouse; with your
patch 80% of the profiler hits are in (not under)
nsBlockFrame::GetFrameForPointUsing. So the real problem is still the O(N) loop
over all lines....
Assignee | ||
Comment 54•21 years ago
|
||
Okay. thanks
Assignee | ||
Comment 55•21 years ago
|
||
This patch implements the previous stuff plus adds the line cursor cache that I
suggested. Seems to work OK once I explicitly skipped lines with empty
combinedAreas when determining whether the invariant holds. My tests seem to
show that it's effective on the Postgres manual testcase; instead of scanning
>18K lines in the main DIV on every mouse move, we usually scan 3.
bz, would you mind checking out the profile performance of this one? If it
looks good I'll finish the patch by using/setting the cursor in
nsBlockFrame::PaintChildren.
Assignee | ||
Updated•21 years ago
|
Assignee: waterson → roc
Attachment #142397 -
Attachment is obsolete: true
![]() |
||
Comment 56•21 years ago
|
||
With that patch on the testcase in bug 235517 (same test as in comment 53 -- 1
minute wall-clock time) I see a factor of 6 reduction in the number of profiler
hits (since this is a non-realtime profile, that's actually meaningful). The
top of the flat profile now looks like:
Total hit count: 322
Count %Total Function Name
10 3.1 nsXULElement::HandleDOMEvent
8 2.5 nsQueryInterface::operator()
7 2.2 __libc_poll
6 1.9 SearchTable
5 1.6 _end
5 1.6 __pthread_alt_unlock
5 1.6 nsRuleNode::GetStyleData
and there are only 4 total hits under nsBlockFrame::GetFrameForPoint.
So certainly looks much better. ;) I haven't looked at the patch itself; just
applied it blindly and profiled.
Once we have a finalized patch, we should file some followup bugs on what's left
after this one is fixed -- of those 322 hits 154 are handling mousing on and off
anchors (and 133 of those are updating the status bar!).
Assignee | ||
Comment 57•21 years ago
|
||
Excellent. I'll finish the patch shortly.
Will we really need to optimize this further? If you're waving the mouse around
I'd expect to spend some time processing events and updating the status bar.
![]() |
||
Comment 58•21 years ago
|
||
Sure. There are just some issues that I wanted to look at (like the fact that
setting the status flushes pending notifications). Probably not a big deal
either way... ;)
Assignee | ||
Comment 59•21 years ago
|
||
Here's the version which optimizes painting as well.
Assignee | ||
Updated•21 years ago
|
Attachment #142976 -
Attachment is obsolete: true
Assignee | ||
Comment 60•21 years ago
|
||
Comment on attachment 143083 [details] [diff] [review]
v2
let me know if you can handle this review
Attachment #143083 -
Flags: superreview?(bzbarsky)
Attachment #143083 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 61•21 years ago
|
||
Note: if we ever find important test cases where this doesn't work because the
line combinedArea.ys/ymosts decrease at some non-empty line (e.g., because some
line has some relative-positioned stuff), then we can still save the day. One
way would be to actually grow the line combined areas as necessary to force that
invariant to hold.
Assignee | ||
Comment 62•21 years ago
|
||
*** Bug 52005 has been marked as a duplicate of this bug. ***
![]() |
||
Comment 63•21 years ago
|
||
roc, I'm not sure I understand how lines, blockframes, out-of-flows all interact
well enough to review this.... I could probably sr, but dbaron should probably
do the r=.
Assignee | ||
Comment 64•21 years ago
|
||
Comment on attachment 143083 [details] [diff] [review]
v2
no problem. reassigning to dbaron.
Attachment #143083 -
Flags: superreview?(dbaron)
Attachment #143083 -
Flags: superreview?(bzbarsky)
Attachment #143083 -
Flags: review?(dbaron)
Attachment #143083 -
Flags: review?(bzbarsky)
![]() |
||
Comment 65•21 years ago
|
||
Comment on attachment 143083 [details] [diff] [review]
v2
OK, I did a little self-improvement... (and asked roc what was up with the
check for an empty combined area ;) )
>Index: layout/html/base/src/nsBlockFrame.cpp
> nsBlockFrame::PaintChildren(nsIPresContext* aPresContext,
>+ for (line_iterator line = mLines.begin(cursor);
>+ line != line_end;
Weird indent there.
>+ if (lineArea.Intersects(aDirtyRect)) {
This if/else statement is duplicated in the "no cursor" case. Perhaps it
should move into a helper function? That can be a nice inline function and all
if we want, but it's bad to have identical code in two different spots...
>+ for (line_iterator line = begin_lines();
>+ line != line_end;
Weird indent.
>+ if (lineArea.y < lastY
>+ || lineArea.YMost() < lastYMost) {
>+ nonDecreasingYs = PR_FALSE;
>+ }
>+ lastY = lineArea.y;
>+ lastYMost = lineArea.YMost();
I assume there is no really good reason to wrap all that in a check that
nonDecreasingYs is still true since in most cases it _will_ be true and all
that is pretty cheap, right?
>+nsLineBox* nsBlockFrame::GetFirstLineContaining(nscoord y) {
This seems to assume that "y" is within the YMost of one of our lines (hence
that it's in our rect). We should probably assert that up front.
>+nsBlockFrame::GetFrameForPointUsing(nsIPresContext* aPresContext,
>+ nsIFrame *hit;
>+ nsPoint tmp;
Move declarations to first use? Eg move the decl of tmp down to where the
MoveTo is right now?
>+ if (lineArea.Contains(tmp)) {
Again, this block is identical to the one in the "no cursor" case...
>+ PRBool nonDecreasingYs = PR_TRUE;
I really wonder whether we can pull the logic for all this out into a function
that will get that inner bit (the part starting with if
(lineArea.Contains(tmp)) as a function pointer param or something.... I guess
that's hard because here we actually need to return a value. <sigh>.
>Index: layout/html/base/src/nsBlockFrame.h
>+ // Get the lines containing y-coord 'y', or nsnull if you must search
>+ // all lines. If nonnull is returned then we guarantee that the lines'
>+ // combinedArea.ys and combinedArea.yMosts are non-decreasing.
>+ nsLineBox* GetFirstLineContaining(nscoord y);
Comment should probably say "Get the first line containing..."
r+sr=bzbarsky with the changes to factor out shared code and other nits
addressed.
Attachment #143083 -
Flags: superreview?(dbaron)
Attachment #143083 -
Flags: superreview+
Attachment #143083 -
Flags: review?(dbaron)
Attachment #143083 -
Flags: review+
![]() |
||
Comment 66•21 years ago
|
||
One other thing. The line
>+const int MIN_LINES_NEEDING_CURSOR = 20;
In that patch is inside a DEBUG ifdef. I don't think you want that. I _do_
think you want to make that a static const int, though.
Assignee | ||
Comment 67•21 years ago
|
||
> I assume there is no really good reason to wrap all that in a check that
> nonDecreasingYs is still true since in most cases it _will_ be true and all
> that is pretty cheap, right?
That's my guess.
> This seems to assume that "y" is within the YMost of one of our lines (hence
> that it's in our rect). We should probably assert that up front.
That's not necessarily true (e.g., a DIV with CSS width and height but a small
number of small lines), but it doesn't matter. If y is below the YMost of the
last line, we'll leave the cursor at the last line and return that. The loops
that use the cursor will still work fine. I'll comment GetFirstLineContaining to
indicate that sometimes it may return the last line even if that line doesn't
contain the y.
> I really wonder whether we can pull the logic for all this out into a function
> that will get that inner bit (the part starting with if
> (lineArea.Contains(tmp)) as a function pointer param or something.... I guess
> that's hard because here we actually need to return a value. <sigh>.
Yeah, that's a bit of a hassle. Also lots of parameters need to be threaded
down into the enclosed code. My inline functions that I created to extract the
common inner code are PaintLine with 8 parameters and GetFrameFromLine with 6
parameters.
I'll attach the patch for checkin.
Assignee | ||
Comment 68•21 years ago
|
||
Attachment #143083 -
Attachment is obsolete: true
![]() |
||
Comment 69•21 years ago
|
||
roc, see comment 66....
Assignee | ||
Comment 70•21 years ago
|
||
Oops, thanks for that catch. Fix checked in :-)
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•